Logic Testing: Paint brush or spray gun?
[editors note: This post is second in a regular series of featured contributions from Stephen Pateras of LogicVision]
One of the most common questions I get about logic BIST is “how can it possibly guarantee the coverage of specific faults if it’s random pattern based?” Well the answer, of course, is that it can’t… But that, it turns out, is irrelevant. Because in the real world, you can only afford a certain amount of test time. And so the relevant question becomes: “What percentage of real defects can you cover in a fixed amount of time?” This is where the high-throughput aspect of logic BIST comes into play. And this is where I like to use my paint brush versus spray gun analogy.
Say you need to paint a handful of small items, for example a few cabinet knobs. The most cost effective way to do this would be to use a small brush and hand paint each knob one by one. Now, what if, instead of just a few knobs, you were a cabinet knob maker who produced thousands of knobs every day. You’d, of course, quickly run out of steam using a brush to paint them individually. So, as a solution, you’d probably lay hundreds of them on the floor and use a spray gun to paint them. Sure, you’d waste some paint and, yes, you might miss a spot or two on a couple of knobs, but the time you’d save would more than make the spray gun approach worth it.
Are you wondering how this analogy applies to logic testing? Well, the knobs are defects, the paint brush is ATPG, and the spray gun is random pattern logic BIST. The point I’m making is this: When a certain design size is reached, the ATPG approach of targeting individual faults becomes impractical. Given a fixed amount of test time, the more cost effective approach is to simply apply as many different patterns as possible in order to cover as many defects as possible. On-chip random pattern generation provides, by far, the most cost efficient way of achieving this. Random patterns will do because there are so many different faults that most any pattern will provide a coverage contribution.
It turns out that these arguments can be backed up with real data. First of all, fault simulation clearly shows that a large random pattern test set will achieve very high coverage of most common fault types, including stuck-at, transition and bridging faults. You may need to add some test points to get to the coverage you want, but this generally results in minimal overhead and no performance impact. It also turns out that very high N-detect levels are generally achieved with random patterns. N-detect is a measure of how many times each modeled fault is covered by a different pattern. A lot of published material has shown (both analytically and empirically) that high N-detect levels translate directly to high defect coverage.
So it comes down to a version of that old saying: “So many faults and so little time”… with these constraints an on-chip approach to test provides the most effective solution.
[Steve Pateras is VP of marketing at LogicVision. Steve performed his graduate work in test and has spent his entire career involved in either using, defining, or marketing DFT and BIST products and technologies]


Stumble It!
In fact you blast it with your “spray gun” and then go back in and add a few dabs with the paint brush to “top off” the coverage…er paint, right?
No way to stretch the analogy to reference burst mode advantages? Work on that one
-MK
I like the dabs of paint analogy extension!…good one. What Matthias is referring to is that in cases where the random patterns don’t quite get you the fault coverage you want, you can always generate ATPG patterns for those remaining faults. This is called “topping off” the logic BIST test.
As for extending the analogy to BurstMode timing, the only thing I can think of is using Benjamin Moore paint versus a much cheaper brand….
Do you need to fault simulate the random patterns to know what coverage you’ve got, and to know what you’ve missed and need ATPG for?
How practical is it to fault simulate millions of patterns on huge chips? How is it done these days? I’ve been away from DFT for a while.
I know. Burst mode is like “Color Matching” so the paint is the EXACT color you need instead slightly off (Color=Frequency… 250Mhz core tested @ 235Mhz due to Vdd power droop)…
The analogy is complete…
Yes the random patterns are fully fault simulated. Standard parallel fault simulation techniques are used. Tens of thousands of patterns are needed, not millions. Fault simulation times are typically very reasonable.
Thanks for the reply, Steve.
Hi Steve,
I’d like you to clarify a few things:
1) You say that LBIST has “no performance impact”. My experience is that designs with complex logic cones may required thousands of additional observation points and hundreds of control points. As I recall, the control points are additional gates added to improve the randomness of the downstream logic, tend to be in critical paths, and hence impact performance. Perhaps there is some new innovation that does not result in additional delay … it’s been a few years. Also, there is more area impact for LBIST as well.
2) I believe it when you say that LBIST is the most “time” efficient way to get to a reasonably high fault coverage. This is mainly due to the fact that you can have much more scan chains of shorter length as compared to ATPG, hence each vector takes less time to load. However, each ATPG vector is more efficient than a random vector since it is targeted and compacted to remove redundancy. The “hard to get” faults are tested by standard ATPG whereas control and observe points are needed to reach them with LBIST. On the down side, the storage required for each vector can be an issue if your tester or test house has a limit.
3) An LBIST plus top-off approach is probably the best way to achieve high fault coverage with a small amount of vector storage and short test time.
I think which approach is best depends on exactly what are your goals and your limitations in test vector size and test time. But it’s been a while, so I’d like you to clarify if I’m way off base.
harry
Hi Harry:
Thanks for some very good questions. Let me address each one:
1) You are right that a control point (as opposed to an observation point) adds a gate in a functional path and yes control points tend to be in the longer paths. The key however is that these points are added right after synthesis and before physical optimization. The added test point related gates are taken into account during the optimization process and as a result do not effect the final timing outcome. The fact that the test points tend to be in the longer paths is actually a plus as the extra logic gives the optimization process more room to maneuver. Bottom line is that we continue to not see any timing impacts due to test points in customer designs. By the way, many people don’t realize this, but test points are also very helpful for increasing ATPG efficiency and thus reduce pattern counts.
2) Yes LBIST has much higher throughput that even ATPG compression, not only due to the larger number of scan chains but also due to typically higher scan rates since with LBIST you are not limited by tester speeds or performance constraints due to fixturing. However your argument that ATPG is more efficient is precisely what my whole post was arguing against… Sure if all you care about are stuck-at faults, then ATPG will get you to 100% much faster than with random patterns and the throughput advantage of LBIST may not be sufficient. However once you start adding coverage of other important fault types such as at-speed transition, bridging, etc, what quickly happens is that the ATPG pattern count quickly catches up to what you can achieve with random patterns and this is when the LBIST throughput efficiency results in significant time savings (not to mention all of the other pattern storage and bring-up advantages).
3) This is the point Matthias was making as is certainly an option to consider. However in many cases, the LBIST coverage numbers are high enough that the added expense and complication of adding ATPG patterns is not economical.
Hi Steve,
Regarding your answer to point1, I’m not sure if I understood correctly and if thats the reality. Please correct me if I’m missing something here.
Once the logic optimization is done and then we insert gates in functional paths (lets say datapaths as an example), its much more difficult to close timing in physical optimization. It actually leads to more buffering etc. While I agree that theoretically synthesis tools should be restructure paths and close timing,but it doesnt work that way in timing critical blocks/modules. Once datapaths architecture is lost by adding gate, most synthesis tools will not be able to change to another architecture and timing closure is a nightmare. Depending on design criticality and how many TP are added, you can only recover little slack.
I think its more of a tradeoff designer has to make how much he coverage he looses reducing or preventing testpoints in datapaths.
Hi Kiran:
Thanks for your comment. The key to test points not having any timing impact is when they are introduced into the netlist, and there are differences between which design tools you are using. If you are using a combination of DC and ICC then the test points can be added after DC as ICC supports iterative optimization which will effectively deal with the introduction of the test points. If you are using Talus (which I assume is the case you are referring to), then you would need to perform an incremental Fix Time pass after adding the test points before performing a Fix Cell pass. This would avoid the issue you are describing. Performing an additional Fix Time pass is actually quite common coming from DC for example. Hope this addresses your point.
I have a question – not so much in the implementation but the verification of LBIST (maybe a bit off-topic, but…):
What is industry common practice for simulating LBIST at the gate-level? I could imagine this being an almost intractable activity – something that people just don’t do.
JMF
John:
I assume you are asking about LBIST IP verification as opposed to fault simulation as John B asked about last week. LBIST verification can in fact be performed quite rapidly. Remember that the LBIST controller performs a very repetitive set of tasks:
- Load the scan chains with pseudo-random bits generated on the fly by a PRPG
- Perform some form of launch and capture of the scanned in data on the functional logic
- Unload the captured results in the scan chains into a MISR that compresses this data on the fly.
So LBIST verification can be reduced to simply verifying that the above three tasks work correctly. In practice, the above tasks are simulated for a handful of patterns just to cover corner cases. The key is that there is no need to simulate all of the remaining thousands of patterns. Note however that all patterns are fault simulated. But the fault simulation only simulates the parallel application of the random bits to the functional logic. Not the scanning in and out of the patterns through the scan chains and so can be performed in a reasonable amount of time (hours).
Thanks for your quick reply on that – it sounds very similar to most people’s approach to scan sims – make sure the load/unload works and sim broadside capture patterns, and maybe a couple that do load, capture, unload.
JMF
Hi Steve,
Thanks for the reply. What you said was true for both synthesis tools but with one limitation. The limitation is even if you incremental optimization in DC or fix time in Talus, it will not be able to change datapath architecture. It will work for other paths like control paths etc and should be able to recover the lost timing. The only work around I see that is to reduce the TPI in datapath centric logic and take the fault coverage hit till industry finds a way to overcome this limitation.
Thanks,
Kiran.
Hi Kiran:
I agree that datapaths can be problematic. However this rarely results in any significant fault coverage loss for the following reasons:
- Test points can be added at the inputs/outputs of the datapaths modules
- Most data path modules are random pattern testable and hence don’t need test points to begin with. Testability problems are typically more in the control logic which doesn’t have the fixed architecture issue
- If the fault coverage within a datapath is not sufficient despite the above then some APTG top-off patterns can always be used as mentioned earlier by both Matthias and Harry.
[...] when I’m to busy (well there is Steve Pateras, who has posted a couple of very compelling articles, which consistently get a good flow of comments). Anyone else like to write? Have something to [...]