[Yaffs] Re: power fail testing

Charles Manning manningc2@actrix.gen.nz
Sun, 22 May 2005 10:14:00 +1200


Sergei

I know I have been silent on this a while, but that is mainly because I h=
ave=20
been thinking...

For now I will put aside the rename problem which has a fix that I need t=
o=20
complete and will instead focus on the actual power fial issues.

> As I reported sometimes power cycling produces files where some/few zer=
os
> are flipped to ones. I have the following hypothesis, please let me kno=
w if
> it makes sense or I am missing something in my understanding of yaffs_g=
uts.
>
> 1. The error does not happen under stable power condition so it is like=
ly
> that power fail causes partial programming, e.g. some ones do not progr=
am
> to zero. 2. Initial scan will not check crc on data and happily count a
> page as a valid chunk of the file.

YAFFS currently assumes that a power failure will not destroy a write. Fo=
r=20
the most par that should be an OK assumption since once a flash programmi=
ng=20
cycle has been set up it should execute in 200uS. THere should be enough=20
residual power in the system to complete that.

Two things that can be done to improve the situation at the low level:

1) Ensure that the whole page write is being done as a single write at th=
e=20
mtd level (ie. writing the data and oob as seperate operations is not goo=
d).
2) Add a power check step just before the write in the mtd (assuming you =
have=20
a power fail warning flag)

ie=20

     nand_write(..)
    {
          set up write
          while(!power_good){spin}
          complete write
    }

There is also something that can be done in YAFFS: Better handling of pow=
er=20
fail by checking the condition of the last write beforepower failed. If i=
t=20
was bad we can just discard it.

> 3. Garbage collector may later copy the bad page without checking crc (=
!)
> to a new block and assign it a (new) good crc.

There is no crc, but there is ecc. YAFFS should be applying ECC during gc=
 too=20
as it is part of the standard read. THis will fix single bit errors, but =
not=20
partially written pages.

-- Charles