[Yaffs] ECC algorithm is hamming code now,
will any new algorithm be enter YAFFS?
Charles Manning
manningc2 at actrix.gen.nz
Sun Aug 7 20:13:48 BST 2005
On Sunday 07 August 2005 20:22, Thomas Gleixner wrote:
> On Sun, 2005-08-07 at 16:07 +1200, Charles Manning wrote:
> > > M-systems uses BCH code to support MLC inside its DOC products. This
> > > algorithm can detect 4bit and correct 4bit error. Will YAFFS employ
> > > any other new ECC algorithm?
> >
> > Are the 4 bits 4 bits per page or what? With most ECC structures used
> > with NAND, the ECC corrects one bad bit per 256 bytes. Correcting more
> > requires larger ECC areas and requires more ECC computation (hardware or
> > software).
> >
> > Since ECC is part of mtd (or whatever NAND layer you are using), this is
> > realy independent of YAFFS.
>
> The NAND/MTD layer supports a couple of different ECC solutions. The DoC
> devices use a Reed-Solomon code, which is supported by a en/decoder
> library.
> Reed-Solomon codes can correct and detect more errors than the
> SmartMedia Hamming Code which is the standard ECC since NAND came up.
> OTH such codes need hardware support because the calculation by software
> would be too time consuming. DoC's have a builtin hardware RS encoder.
>
> > It is also important to consider most likely failure modes. I am not
> > familair with MLC failure modes, but single bit errors (as corrected by
> > ECC) are typically very rare with NAND (as used by YAFFS). Double bit
> > errors are even more rare. I have done done tests a few times where over
> > 100Gbytes of data was written to a file system without a single bit of
> > corruption. Since 100Gbytes translates into many lifetimes of most
> > mobile/embedded products, I am pretty confident that for most usages bit
> > errors are not a significant problem when used with single-bit ECC.
>
> 100GiB data related to which partition size ?
The partition size was about 450MB. The largest test was 300GB of actual NAND
writing.
>
> Lets assume a 16 MiB partition, where you write 100GiB data. Lets
> further assume that we have a real sum of 256GiB of data written to
> FLASH due to garbage collection, wear levelling...
>
> 256 GiB / 16 MiB = 16384
>
> That means we erased / programmed each block of the FLASH 16k times.
> This is nowhere near the 100k erase cycles.
Yes, *most* systems never get anywhere near the 100k lifetime - a fact that
should be kept in mind when worrying about lifetime and wear levelling
issues. For most mobile/embedded systems you can do a lifetime calculation
something like:
10Mbytes of data per day, 365 days per year, 10 years product life =
36500Mbytes.
Say 16MB flash size: 36500/16 = 2281 cycles average.
Say *10 for garbage collection, skew etc = 22810 cycles
16MB is probably unrealistically small for most devices that would see
anywhere near this sort of traffic,
It is really up to the system integrator to work the numbers for a particular
system.
>
> We conducted long time tests, where we definitely encountered multi bit
> errors in different contexts. The effects start to show up, when you
> reach the 60k erase/program range. There are also known effects with
> occasional bitflips on read, but those seem to be single bit only.
Such test results are going to be dependent on many factors:
1) Were you doing partial page programming? THis hurts the flash more.
2) What flash? The newer stuff seems far more reliable than the older stuff.
The 100k lifetimes are based on using 1-bit ECC, and assume a few lost blocks.
Multi-bit errors are most likely to occur when you are doing partial page
programming: something that YAFFS2 does not do.
YAFFS currently retires blocks if they show any ECC errors - read or write. -
single or multi-bit. This might seem a bit conservative, but it is probably
safer, based on the assumption that a block of NAND will encounter 1-bit
errors before it encounters multi-bit errors.
#disclaimer: I have not tried out MLC and the above might not hold for that.
-- Charles
More information about the yaffs
mailing list