Memory Repair Basics
OK, well that 10% story is a huge myth, but having extrabrain cells to work with is a decent analogy for how we attempt to improve yields during the test of embedded memories.
Memory BIST implementation has been standard practice in the industry for many years. But for the past few years, those that keep track of the trends have pointed out that an increasing percentage of silicon area in our chip-level products is taken up by embedded memory. So BIST is good, right? Yes, of course, but multiply the increasing area by the fact that memory structures are inherently more dense than standard logic, and you have a potential yield problem on your hands. Any defect that might ’slip between the cracks’ in a standard logic area, most certainly won’t in a memory array. For this reason, memory diagnosis and repair algorithms were developed – a.k.a Built-In Self-Repair or BISR.
Memory repair is not really ‘repair’. It’s redundancy and re-direction. Electronic sleight-of-hand, if you will. For each array being ‘repaired’ there are a certain amount of extra rows and/or columns set aside so that if bad bits are found, there are extras that can be used in their place. Every time a bad bit is accessed during normal operation, the addressing logic actually accesses one of the spares. This, for efficiency reasons, is done on a row or column basis. If one bad bit is found, the whole row or column is replaced.
Now obviously, you determine where the bad bits are by testing the memory during the manufacturing test process. Once the bad cells are found, then on-chip algorithms are enabled to determine the best use of the extra rows and/or columns to replace them (These algorithms are called BIRA, for Built-In Redundancy Analysis, and are very different than those implemented in software on memory ATE).
If all the bad cells in a faulty memory cannot be covered by a combination of spare rows and columns, the memory is considered ‘unrepairable’. If the memory is ‘repairable’, then the extra rows/columns will be used in an optimal way each time the device is powered up. How the extras are used must be coded and stored somehow.
The most common way for this to be done for SoCs (memory vendors have different, more optimized methods, I assume) is the use of the electrical fuse array (non-volatile storage). When the repair signature for the combined set of memories has been determined, it is shifted (most likely in compressed form) into the fuse array, and the signature is then burned into the array (a device pin capable of delivering high current to the fuses does the ‘burning’).
Once this signature is permanently in the fuse array, then whenever the device is powered up, the fuse values are decompressed and shifted into the repair circuitry of each memory to be repaired. This whole process of testing, analysis and repair is called ‘hard repair’. In addition to hard repair (or in place of it), an extra test/repair operation may be done after power up, called ’soft repair’. The repair signature from soft repair is not saved, so it must be repeated each time the device is powered up. Since this is time consuming, and is only done at power up (before the chip has heated up), there is concern that using only soft repair is not very reliable.
That’s the nutshell – have any questions about memory test annd/or repair? Comment below, or post your questions to DFT Forum. You can also read up on it yourself in this book, or google memory BIST (the first link in this search is a very comprehensive presentation by Cheng-Wen Wu, same person that wrote the memory test chapter in the book).
Cheers,


Stumble It!
Nice overview! Very useful for those of us not dealing with these issues very day.
Nice post! I’ve shared it with my readers:
http://www.coolverification.com/2009/01/dftdigest-memory-repair-basics.html
JL
Thanks for the comments guys! Now I know to do more posts like this.
JMF
John,
Your summary glosses over some rather important usage issues with repairable memories. For decades embedded static rams have always been available as soon as you had power and a clock. Now you have to “boot” your sram with a reset signal and a clock. These are not “instant” on memories. We have used them for the last several years and sometimes it takes hundreds or thousands of clocks before the sram is ready. This can be a problem if you use repair on a sram that firmware needs to boot. We hold off releasing the cpu until after the repair process is finished.
John Eaton
John,
It’s definitely easy to gloss over something in a blog post, but as usual, you are spot on – that is definitely a problem, and it’s kind of a pain to design around. And you don’t need a reset and *a* clock, but a rest and *2* clocks, TCK and your BIST controller clocks. If you don’t want to boot up with a TCK, you have to do extra clock dividing to get it (and design a clock switch). Add that to the delayed reset you need to release the cpu after you’re done, and you have some extra work to do to get it right. Fun, eh?
Thanks again for reading, and thanks for the insightful comment!
JMF
hello,
Can some one explain in detail what is self repair
Thanks it is a Nice overview.
[...] provided a good overview of embedded memory repair techniques in his Memory Repair Basics post late last year. Although there’s been some limited adoption of memory repair at 90nm, at 65 [...]
[...] you recall from John’s post on Memory Repair Basics, the self-repair process involves determining the repair info for each repairable memory on the die [...]