Re: [Yaffs] Disadvantage of using yaffs checkpointing?

Attachments:
Message as email (text/plain) (text/html)

Author: peterlingoal
Date:
To: Charles Manning
CC: yaffs
Subject: Re: [Yaffs] Disadvantage of using yaffs checkpointing?

Hi Charles:

following is a proposed patch that check for block state after discovering
a check point block, and continues search if the block state is DEAD:
>From b08b8c5fc21c2820f66454968be3a5115477fc96 Mon Sep 17 00:00:00 2001
From: Peter Lin <peter.lin@gdc-tech.com>
Date: Mon, 21 May 2012 20:25:55 +0800
Subject: [PATCH] yaffs: ignore checkpt if it is in bad block

---
yaffs_checkptrw.c | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/yaffs_checkptrw.c b/yaffs_checkptrw.c
index 997a618..0d63e74 100644
--- a/yaffs_checkptrw.c
+++ b/yaffs_checkptrw.c
@@ -13,6 +13,8 @@

#include "yaffs_checkptrw.h"
#include "yaffs_getblockinfo.h"
+#include "yaffs_nand.h"
+#include "yaffs_guts.h"

static int yaffs2_checkpt_space_ok(struct yaffs_dev *dev)
{
@@ -117,6 +119,15 @@ static void yaffs2_checkpt_find_block(struct yaffs_dev
*dev)
tags.ecc_result);

if (tags.seq_number == YAFFS_SEQUENCE_CHECKPOINT_DATA) {
+ enum yaffs_block_state state = 0;
+ u32 seq_number = 0;
+ yaffs_query_init_block_state(dev, i, &state, &seq_number);
+ if( YAFFS_BLOCK_STATE_DEAD == state )
+ {
+ yaffs_trace(YAFFS_TRACE_CHECKPOINT,
+ "ignore bad checkpt block %d", i);
+ continue;
+ }
/* Right kind of block */
dev->checkpt_next_block = tags.obj_id;
dev->checkpt_cur_block = i;
--
1.7.6.1

On Fri, May 18, 2012 at 1:02 PM, peterlingoal <peterlingoal@gmail.com>wrote:

> I did a quick testing using HEAD yaffs to search for a checkpoint block on
> my NAND, it returned the same one in the bad block area. Even it is
> rejected in the later checking, checkpoint will not work properly every
> time if this back block is at the starting.
>
> shall the checkpoint be ignored and search continues if it is in a bad
> block area?
>
>
> On Fri, May 18, 2012 at 9:39 AM, peterlingoal <peterlingoal@gmail.com>wrote:
>
>> Yes we are are using a pretty old version (back in Sep 2010), and now we
>> are trying to upgrade to latest.
>> Could you please help point out which checksum would prevent an old
>> checkpoint being used? Right now I could not simply try a new version as
>> the version mismatch would always force a re-scan.
>>
>> BTW, HowYaffsWorks is a great document, however there's no download link
>> in yaffsDotnet. I didn't find this doc until I googled for the file
>> directly. Could this be fixed so newbie like me could read the document
>> first before asking questions?
>>
>> Thanks,
>> Peter
>>
>>
>> On Fri, May 18, 2012 at 5:26 AM, Charles Manning <manningc2@actrix.gen.nz
>> > wrote:
>>
>>> On Thursday 17 May 2012 22:29:42 peterlingoal wrote:
>>> > After spending sometime looking around in my corrupted NAND, I think I
>>> am
>>> > clear what's going on there:
>>> >
>>> > There's some *outdated* checkpoint block in the bad blocks portion,
>>> and the
>>> > real good one is located at a later block. During mounting, yaffs
>>> firstly
>>> > found the *outdate* checkpoint block and loaded from there. That's why
>>> > loading from checkpoint will always result a corrupted FS, even after
>>> > re-scanning all the blocks with no-checkpoing-read.
>>> >
>>> > now the question part:
>>> >
>>> > 1. why in the first place there's some checkpoint block 'left over' >>> in >>> > the bad blocks? shall they be erased? >>> It is generally a bad idea to erase bad blocks. >>> > 2. While looking for a checkpoint block, shall the block status be >>> > checked? Or is there any better way to handle this situation? I >>> simply >>> > used mtd->block_isbad and continue searching and it seemed working. >>> That should be happening. I'll fix it if that is broken.

>>>
>>> Now my question :-):
>>> Are you using an old version of yaffs or the latest? There are various
>>> checksums on the checkpoint data which should fail if old data is found.
>>>
>>> >
>>> > regards,
>>> > Peter
>>> >
>>> > On Mon, May 7, 2012 at 3:08 AM, Charles Manning
>>> <manningc2@actrix.gen.nz>wrote:
>>> > > On Friday 04 May 2012 00:30:55 peterlingoal wrote:
>>> > > > Hi Charles,
>>> > > >
>>> > > > Thanks for the reply.
>>> > > >
>>> > > > I am quite confused about the bad block management methodology,
>>> seems
>>> > >
>>> > > both
>>> > >
>>> > > > MTD and yaffs2 have some kind of bad block control. The problem of
>>> my
>>> > >
>>> > > case
>>> > >
>>> > > > is, after some period of usage, the yaffs2 file system on some NAND
>>> > > > begin to fail. Remounting with ignoring checkpoint could recover
>>> the
>>> > > > file
>>> > >
>>> > > system,
>>> > >
>>> > > > but for once only. The file system is still boomed after reboot and
>>> > > > mount (with checkpoint).
>>> > > >
>>> > > > I tried to read the codes of yaffs2 about scanning if checkpoint is
>>> > > > ignored, and got confused. Seems yaffs2 driver is querying status
>>> of
>>> > > > each block (in function yaffs2_scan_backwards). My question is:
>>> > >
>>> > > I suggest you read the HowYaffsWords doc. You can find that on
>>> > > yaffs.netor find the openoffice doc on the yaffs git.
>>> > >
>>> > > > 1. what does function yaffs2_scan_backwards do?

>>> > >
>>> > > This function scans the nand partition if there is no checkpoint. It
>>> > > reads the
>>> > > tags and builds up the file system state.
>>> > >
>>> > > > 2. MTD keeps a BBT (in NAND in my case), how does yaffs2 module

>>> > >
>>> > > obtains
>>> > >
>>> > > > the BBT information? Why rescan from backward is needed in my >>> case >>> > > > in order to recover a file system.

>>> > >
>>> > > Yaffs calls the MTD function to determine if a block is good or bad.
>>> > > Yaffs does not know or care if mtd used a bad block table or not.
>>> > >
>>> > > > 3. After recovering the system, seems the bad block information >>> is >>> > > > not saved. So re-scan is still needed after a reboot. This is my >>> guess, >>> > > > please correct me if I am wrong.

>>> > > >
>>> > > > Also I am using a quite old version of yaffs2 ( back in 2010).
>>> What's
>>> > > > the most recommended stable version of yaffs2,
>>> > >
>>> > > I suggest using a more recent version. I would recommend using the
>>> > > current HEAD.
>>> > >
>>> > > > and the kernel MTD driver
>>> > > > version?
>>> > >
>>> > > Sorry I don't keep current with all mtd changes and cant't advise
>>> that
>>> > > off the
>>> > > top of my head.
>>> > >
>>> > > > To cut some boot up time I am saving BBT on NAND and reuse it
>>> > > > after reboot, will this make any negative impact?
>>> > >
>>> > > I don't see that this will cause any problems. yaffs does not care
>>> how or
>>> > > if
>>> > > you store bbt info.
>>> > >
>>> > > > I am interested in block
>>> > > > summaries, but I would like to stick to checkpoint at the moment.
>>> > >
>>> > > If you use the new code you will get summaries as part of the
>>> > > improvement.
>>> > >
>>> > > > I am new to kernel level debugging, so I am quite lost here. Any
>>> help
>>> > > > is appreciated. Thanks!
>>> > >
>>> > > We've all been there.
>>> > >
>>> > > > regards,
>>> > > > Peter
>>> > > >
>>> > > > On Mon, Apr 30, 2012 at 7:41 AM, Charles Manning
>>> > >
>>> > > <manningc2@actrix.gen.nz>wrote:
>>> > > > > On Saturday 28 April 2012 05:26:23 Peter Lin wrote:
>>> > > > > > I have several NANDs that yaffs2 module would consider itself
>>> > > > >
>>> > > > > successfully
>>> > > > >
>>> > > > > > recovered from check pointing and skip scanning, but the
>>> filesystem
>>> > >
>>> > > is
>>> > >
>>> > > > > not
>>> > > > >
>>> > > > > > usable. Mounting with option no-checkpoint-read could recover
>>> the
>>> > > > > > filesystem.
>>> > > > > >
>>> > > > > > I understand that bad block management shall be provided from
>>> MTD
>>> > > > > > layer, and rescanning fixing the problem proved MTD is doing
>>> his
>>> > > > > > job. But I do have some questions:
>>> > > > > >
>>> > > > > > 1. why in the first place the check point restoring succeeded
>>> but
>>> > >
>>> > > left
>>> > >
>>> > > > > > a corrupted filesystem?
>>> > > > >
>>> > > > > It is impossible to say with so little info.
>>> > > > >
>>> > > > > > 2. What would happen if a used block become a bad
>>> > > > > > block?
>>> > > > >
>>> > > > > That block will not be scanned. But blocks don't just"go bad". We
>>> > > > > have
>>> > >
>>> > > to
>>> > >
>>> > > > > mark
>>> > > > > them as bad, That normally means we have timne to extract the
>>> useful
>>> > >
>>> > > data
>>> > >
>>> > > > > first.
>>> > > > >
>>> > > > > > will the whole filesystem got crazy?
>>> > > > >
>>> > > > > No. Yaffs uses a log structure with tags. That means there is no
>>> > >
>>> > > "master
>>> > >
>>> > > > > table" or such which holds all the information.
>>> > > > >
>>> > > > > > Any way to recover from it?
>>> > > > > >
>>> > > > > > 3.
>>> > > > > > Any way to check or indicate an inconsistence in the
>>> filesystem, so
>>> > >
>>> > > the
>>> > >
>>> > > > > > mounting script could try with the option no-checkpoint-read?
>>> > > > >
>>> > > > > There is no such provision at present. Since there is no
>>> scanning if
>>> > >
>>> > > the
>>> > >
>>> > > > > checkpoint works, it is really hard to see how you would decise
>>> that
>>> > >
>>> > > the
>>> > >
>>> > > > > checkpoint was bad.
>>> > > > >
>>> > > > > If you are having problems with checkpoint, then consider just
>>> > > > > turning
>>> > >
>>> > > it
>>> > >
>>> > > > > off.
>>> > > > > Since block summaries were introduced, the boot speed up
>>> benefits of
>>> > > > > checkpointing are not as dramatic as they were.
>>> > > > >
>>> > > > > > Thanks for your work and help. Please let me know if there's
>>> any
>>> > > > > > mistake
>>> > > > >
>>> > > > > in
>>> > > > >
>>> > > > > > my understanding.
>>> > > > > >
>>> > > > > > regards,
>>> > > > > > Peter
>>> > > > > >
>>> > > > > > does the official kernel has this function enabled or is there
>>> any
>>> > > > > > option that controls it?
>>> > > > > >
>>> > > > > > On 2010-03-04 20:55, Charles Manning wrote:
>>> > > > > > > On Friday 05 March 2010 07:14:59 Shivdas Gujare wrote:
>>> > > > > > > > Hi Charles,
>>> > > > > > > >
>>> > > > > > > > Thanks lot for your help.
>>> > > > > > > >
>>> > > > > > > > On Wed, Mar 3, 2010 at 12:34 PM, Charles Manning
>>> > > > > > > >
>>> > > > > > > > wrote:
>>> > > > > > > > > On Wednesday 03 March 2010 23:33:31 Sven Van Asbroeck
>>> wrote:
>>> > > > > > > > >> Hello Shivdas,
>>> > > > > > > > >>
>>> > > > > > > > >> > So, what does actually "check pointing" saves while
>>> > > > > > > > >> > unmount?
>>> > > > > > > > >>
>>> > > > > > > > >> It's my understanding that the check point consists of
>>> the
>>> > > > > > > > >> RAM
>>> > > > >
>>> > > > > data
>>> > > > >
>>> > > > > > > > >> structure which is assembled when a yaffs partition is
>>> > >
>>> > > scanned.
>>> > >
>>> > > > > > > > >> It consists of meta-information associated with each
>>> chunk
>>> > > > > > > > >> and block. If you'd like to know more, I recommend
>>> reading
>>> > > > > > > > >> the
>>> > >
>>> > > 'How
>>> > >
>>> > > > > > > > >> Yaffs works' document, which is available in CVS.
>>> > > > > > > > >
>>> > > > > > > > > A full scan builds up a set of data structures that
>>> define
>>> > > > > > > > > the file system state. A checkpoint captures a reduced
>>> > > > > > > > > version of that,
>>> > > > >
>>> > > > > enough
>>> > > > >
>>> > > > > > > > > to reconstitute the main part of the state and the rest
>>> can
>>> > > > > > > > > be
>>> > > > >
>>> > > > > built
>>> > > > >
>>> > > > > > > > > up on a lazy basis.
>>> > > > > > > > >
>>> > > > > > > > >> > and Is it
>>> > > > > > > > >> > safe to use check-pointing always in final product?
>>> > > > > > > > >>
>>> > > > > > > > >> According to Charles, checkpointing is designed to be
>>> used
>>> > > > > > > > >> in the way you describe. To my knowledge, no open
>>> > > > > > > > >> checkpointing issues exist, but you should search the
>>> > > > > > > > >> archives. If you are concerned about the checkpoint
>>> > > > > > > > >> diverging from the
>>> > > > > > > > >> meta-information on flash, you could a) disable
>>> > > > > > > > >> checkpointing altogether, or b) submit a
>>> > > > >
>>> > > > > patch
>>> > > > >
>>> > > > > > > > >> implementing a checkpoint counter ;-)
>>> > > > > > > > >
>>> > > > > > > > > You can also choose to mount ignoring checkpointing with
>>> > > > > > > > >
>>> > > > > > > > > mount -t yaffs2 -o"no-checkpoint-read" ..
>>> > > > > > > >
>>> > > > > > > > This is not the option for me, since in final product, end
>>> user
>>> > > > >
>>> > > > > should
>>> > > > >
>>> > > > > > > > not be able
>>> > > > > > > > to change system data (i.e. mount flag's.) Or I can't
>>> change it
>>> > > > >
>>> > > > > unless
>>> > > > >
>>> > > > > > > > rootfs is flashed
>>> > > > > > > > on device, since yaffs2/nand partitions are mounted from
>>> rcS
>>> > > > > > > > script.
>>> > > > > > >
>>> > > > > > > You don't need to do this. Just leave checkpointing on.
>>> > > > > > >
>>> > > > > > > -- CHarles
>>> > > > > > >
>>> > > > > > >
>>> > > > > > > -- Charles
>>> > > > > >
>>> > > > > > -Peter
>>> > > > > > _______________________________________________
>>> > > > > > yaffs mailing list
>>> > > > > > yaffs@lists.aleph1.co.uk
>>> > > > > > http://lists.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs
>>> > >
>>> > > _______________________________________________
>>> > > yaffs mailing list
>>> > > yaffs@lists.aleph1.co.uk
>>> > > http://lists.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs
>>>
>>>
>>>
>>
>

This message is part of the following thread:
	the complete thread tree sorted by date
	peterlingoal at
	Charles Manning at