Re: [Yaffs] bad block management

Top Page
Attachments:
Message as email
+ (text/plain)
+ (text/html)
Delete this message
Reply to this message
Author: Chris Gofforth
Date:  
To: bpqw
CC: yaffs@lists.aleph1.co.uk
Subject: Re: [Yaffs] bad block management
On this topic of bit flips on reads,

The logic:

      if (!bi->gc_prioritise) {


            bi->gc_prioritise = 1;


            dev->has_pending_prioritised_gc = 1;


Is going to tell the Garbage Collection routine(s) to GC this block.

1.    Will that that process will result in the movement/refreshing of that
bloc'ks data, correct?


2.    If this is correct, when is GC performed?, Is it on any write
operation , or does a separate thread have to be provided to call the GS
routines?




If the act of doing GC on that block will perform the refresh operation,
then the logic:

            bi->chunk_error_strikes++;




            if (bi->chunk_error_strikes > 3) {


                  bi->needs_retiring = 1; /* Too many strikes, so retire */


                  yaffs_trace(YAFFS_TRACE_ALWAYS,


                        "yaffs: Block struck out");
Is not valid here, as the operation was to refresh the block, not say that
it is bad.


the check of

tags->ecc_result > YAFFS_ECC_RESULT_NO_ERROR

has to be changed to say:

if (tags->ecc_result == EUCLEAN )
- indicate to GC this block
else
if tags->ecc_result > YAFFS_ECC_RESULT_NO_ERROR
- this exceeded the threshold and the data read is bad.


The problem is, should another read from that location occur BEFORE the GC
of the block happens, you may get a failure. That is why the block needs to
get moves ASAP. (See question 2).

Can anyone answer how GC works and when?


Chris Gofforth / Pr Software Engineer

MS 131-102, Cedar Rapids, IA, USA

Phone: 319-295-0373 Fax: 319-295-8100



www.rockwellcollins.com



On Wed, Aug 6, 2014 at 2:26 AM, bpqw <> wrote:

> Hi Clarles,
> We recommended if the bitflip over threshold we just need to refresh the
> block but not retire it.
> So we doubt is it reasonable just according to the bitflips over
> mtd->bitflip_threshold over three times to judge the block as bad block?
>
> Br
> White Ding
> ____________________________
> EBU APAC Application Engineering
> Tel:86-21-38997078
> Mobile: 86-13761729112
> Address: No 601 Fasai Rd, Waigaoqiao Free Trade Zone Pudong, Shanghai,
> China
>
> -----Original Message-----
> From: Charles Manning [mailto:cdhmanning@gmail.com]
> Sent: Wednesday, August 06, 2014 8:21 AM
> To:
> Cc: bpqw
> Subject: Re: [Yaffs] bad block management
>
> On Friday 25 July 2014 16:50:25 bpqw wrote:
> > Hi
> >
> > I have review the yaffs2 source code and have a doubt. See the follow
> >
> >
> >
> > In Yaffs2 the read interface is yaffs_rd_chunk_tags_nand int
> > yaffs_rd_chunk_tags_nand(struct yaffs_dev *dev, int nand_chunk,
> >
> >                        u8 *buffer, struct yaffs_ext_tags *tags) {

> >
> >       .........

> >
> >       result = dev->tagger.read_chunk_tags_fn(dev, flash_chunk,
> > buffer, tags);

> >
> >       if (tags && tags->ecc_result > YAFFS_ECC_RESULT_NO_ERROR) {

> >
> >
> >
> >             struct yaffs_block_info *bi;

> >
> >             bi = yaffs_get_block_info(dev,

> >
> >                                 nand_chunk /

> >
> >                                 dev->param.chunks_per_block);

> >
> >             yaffs_handle_chunk_error(dev, bi);

> >
> >       }

> >
> >       return result;

> >
> > }
> >
> >
> >
> > The yaffs_rd_chunk_tags_nand will call the mtd interface mtd_read_oob
> >
> >
> >
> > int mtd_read_oob(struct mtd_info *mtd, loff_t from, struct mtd_oob_ops
> > *ops) {
> >
> >       int ret_code;

> >
> >       ops->retlen = ops->oobretlen = 0;

> >
> >       if (!mtd->_read_oob)

> >
> >             return -EOPNOTSUPP;

> >
> >       /*

> >
> >       * In cases where ops->datbuf != NULL, mtd->_read_oob() has
> > semantics

> >
> >       * similar to mtd->_read(), returning a non-negative integer

> >
> >       * representing max bitflips. In other cases, mtd->_read_oob()
> > may

> >
> >       * return -EUCLEAN. In all cases, perform similar logic to
> mtd_read().

> >
> >       */

> >
> >       ret_code = mtd->_read_oob(mtd, from, ops);

> >
> >       if (unlikely(ret_code < 0))

> >
> >             return ret_code;

> >
> >       if (mtd->ecc_strength == 0)

> >
> >             return 0;   /* device lacks ecc */

> >
> >       return ret_code >= mtd->bitflip_threshold ? -EUCLEAN : 0; }

> >
> >
> >
> > So if the bitflips num over mtd->bitflip_threshold the mtd_read_oob
> > will return -EUCLEAN and tags->ecc_result > YAFFS_ECC_RESULT_NO_ERROR.
> >
> > Then we will call yaffs_handle_chunk_error.
> >
> > void yaffs_handle_chunk_error(struct yaffs_dev *dev,
> >
> >                         struct yaffs_block_info *bi)

> >
> > {
> >
> >       if (!bi->gc_prioritise) {

> >
> >             bi->gc_prioritise = 1;

> >
> >             dev->has_pending_prioritised_gc = 1;

> >
> >             bi->chunk_error_strikes++;

> >
> >
> >
> >             if (bi->chunk_error_strikes > 3) {

> >
> >                   bi->needs_retiring = 1; /* Too many stikes, so
> > retire */

> >
> >                   yaffs_trace(YAFFS_TRACE_ALWAYS,

> >
> >                         "yaffs: Block struck out");

> >
> >
> >
> >             }

> >
> >       }

> >
> > }
> >
> >
> >
> > From the code we can see if bitflips num over mtd->bitflip_threshold
> > we will mark this block as gc if bitflips num over
> > mtd->bitflip_threshold over three times we will mark this block as bad
> block.
> >
> >
> >
> > We define bad block is if erase or program failed we can mark this
> > block as bad block.
> >
> > So is it reasonable just according to the bitflips over
> > mtd->bitflip_threshold over three times to judge the block as bad block?
> >
> > What's your opinion about my doubts?
>
> Hello White Ding
>
> I apologise for taking a while to get back to looking at this.
>
> First let me explain the history behind what is there.
>
> In the beginning, there was SLC and Yaffs only supported two levels:
> * Good: No ECC errors.
> * Single bit ECC error: data is recoverable, but we are worried about a
> future failure.
> * Multi-bit ECC error: bad.
>
> In the beginning, the concern was that the blocks with a single bit error
> were on their way to going bad, so we better retire it soon.
>
> Then bits got a bit worse, so we modified the policy slightly. A block
> with a single bit error got rewritten but if too many errors were observed
> then we retire the block.
>
> Then with MLC and multi-bit ECC errors we move up to a new step. Single
> bit errors became common. Yaffs kept the same basic policy, but the drivers
> (at mtd level) start telling "lies".
>
> For example in a multi-bit ECC system that fixes 4 bits, we might see:
> 0-2 bit errors are reported as zero errors.
> 3-4 bit errors reported as -EUCLEAN,
>
> This is essentially the logic you are talking about here, but I need to
> dig into the mtd terminology a bit better to understand this fully.
>
> Some flash parts (eg Micron MT29F8Gxxx parts)with built in ECC do not
> report the number of bit errors, but just a "please refresh" indicator.
>
> I think we are now getting to a point where increasing numbers of bit
> errors are expected and should not be treated as a failure.
>
> Thus we probably need a new level that does a refresh, but does not apply
> the three strikes failure policy.
>
> For example, say something that supports 6 bit correcting we might want
> something like this:
> 0-2: These are expected, do nothing.
> 3-4: Refresh. Do not retire.
> 5-6: It looks like the block is failing. Suck the data off and retire if
> this happens too often.
> 7+: Data is corrupted.
>
> If there are enough bits to make bands like this then it makes sense.
> However parts that hide the bad bits behind an ONFI-like interface do not
> really give us the data we need to make fine grained decisions.
>
> I hope that helps.
>
> -- Charles
>
>
> _______________________________________________
> yaffs mailing list
>
> http://lists.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs
>