From manningc2@actrix.gen.nz Tue Jan 26 20:28:22 2010
Received: from smtp.firstline.co.nz ([203.167.210.162] helo=firstline.co.nz)
	by stoneboat.aleph1.co.uk with smtp (Exim 4.69)
	(envelope-from <manningc2@actrix.gen.nz>) id 1NZs1C-0002AQ-Vm
	for yaffs@lists.aleph1.co.uk; Tue, 26 Jan 2010 20:28:22 +0000
Received: (qmail 30859 invoked by uid 453); 26 Jan 2010 20:27:55 -0000
X-Virus-Checked: Checked by ClamAV on firstline.co.nz
Received: from Unknown (HELO linux-dual-head.local) (10.14.210.25)
	by firstline.co.nz (qpsmtpd/0.40) with ESMTP;
	Wed, 27 Jan 2010 09:27:55 +1300
From: Charles Manning <manningc2@actrix.gen.nz>
To: yaffs@lists.aleph1.co.uk
Date: Wed, 27 Jan 2010 09:27:54 +1300
User-Agent: KMail/1.9.10
References: <1264451986.13568.100.camel@thunk>
	<201001261246.24193.manningc2@actrix.gen.nz>
	<1264472569.13568.122.camel@thunk>
In-Reply-To: <1264472569.13568.122.camel@thunk>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="utf-8"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <201001270927.54140.manningc2@actrix.gen.nz>
X-SA-Exim-Connect-IP: 203.167.210.162
X-SA-Exim-Mail-From: manningc2@actrix.gen.nz
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on
	stoneboat.aleph1.co.uk
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=4.5 tests=AWL,BAYES_00,SPF_NEUTRAL
	autolearn=no version=3.2.5
X-SA-Exim-Version: 4.2.1 (built Wed, 25 Jun 2008 17:14:11 +0000)
X-SA-Exim-Scanned: Yes (on stoneboat.aleph1.co.uk)
Subject: Re: [Yaffs] Weirndess testing YAFFS2 with large files - md5sums
	don't match when copied.
X-BeenThere: yaffs@lists.aleph1.co.uk
X-Mailman-Version: 2.1.11
Precedence: list
List-Id: Discussion of YAFFS NAND flash filesystem <yaffs.lists.aleph1.co.uk>
List-Unsubscribe: <http://lists.aleph1.co.uk/cgi-bin/mailman/options/yaffs>,
	<mailto:yaffs-request@lists.aleph1.co.uk?subject=unsubscribe>
List-Archive: <http://lists.aleph1.co.uk/lurker/list/yaffs.html>
List-Post: <mailto:yaffs@lists.aleph1.co.uk>
List-Help: <mailto:yaffs-request@lists.aleph1.co.uk?subject=help>
List-Subscribe: <http://lists.aleph1.co.uk/cgi-bin/mailman/listinfo/yaffs>,
	<mailto:yaffs-request@lists.aleph1.co.uk?subject=subscribe>
X-List-Received-Date: Tue, 26 Jan 2010 20:28:22 -0000

On Tuesday 26 January 2010 19:24:19 Peter Barada wrote:
> On Tue, 2010-01-26 at 12:46 +1300, Charles Manning wrote:
> > Hello Peter
> >
> > On Tuesday 26 January 2010 09:39:46 Peter Barada wrote:
> > > I've run into a problem using the latest YAFFS code on linux-2.6.28-rc8
> > > using today's YAFFS CVS code.
> >
> > Does the test work with older yaffs?
>
> No, it did not - however it exhibited some random behavior.  Previous
> version was pulled from 20090909 and my thought was with the current
> changes to handle large file yaffs_Tnode handling it would help -
> version strings are:

What changes were those? There have been no changes wrt file size in the tnode 
trees for ages. As far as yaffs is concerned a 30MB file is tiny. If you were 
dealing with file sizes around the 2^31 integer roll-over or something then I 
could understand bugs creeping in.

>
> peter@blitz:~/work/logic/eps_svn/software/products/linux/LTIB/trunk/ltib-20
>091102-som/rpm/BUILD/linux-2.6.28-rc8$ grep '\$Id:' fs/yaffs2/*.[hc]
> fs/yaffs2/yaffs_checkptrw.c:	"$Id: yaffs_checkptrw.c,v 1.20 2009-09-09
> 03:03:01 charles Exp $";
> fs/yaffs2/yaffs_ecc.c:	"$Id: yaffs_ecc.c,v 1.11 2009-03-06 17:20:50
> wookey Exp $";
> fs/yaffs2/yaffs_fs.c:    "$Id: yaffs_fs.c,v 1.82 2009-09-18 00:39:21
> charles Exp $";
> fs/yaffs2/yaffs_guts.c:    "$Id: yaffs_guts.c,v 1.89 2009-09-09 00:56:53
> charles Exp $";
> fs/yaffs2/yaffs_mtdif1.c:const char *yaffs_mtdif1_c_version = "$Id:
> yaffs_mtdif1.c,v 1.11 2009-09-09 03:03:01 charles Exp $";
> fs/yaffs2/yaffs_mtdif2.c:	"$Id: yaffs_mtdif2.c,v 1.23 2009-03-06
> 17:20:53 wookey Exp $";
> fs/yaffs2/yaffs_mtdif.c:	"$Id: yaffs_mtdif.c,v 1.22 2009-03-06 17:20:51
> wookey Exp $";
> fs/yaffs2/yaffs_nand.c:	"$Id: yaffs_nand.c,v 1.11 2009-09-09 03:03:01
> charles Exp $";
>
> I'll go back and re-test with that version to grnerate the output.  The
> original test did the dd command with "dd if-/dev/urandom of=somefile.$i
> count=0 bs=0 skip=30M" to seek out 30MB after the open and then close
> the file - initially I thought the test was off as that dd command
> wouldn't generate any actual data on an EXT3 device.
>
> The MTD layer has performed flawlessly with the previous version(s),
Previous versions of what? yaffs? linux?
> so 
> I'm not thinking the MTD ECC handling itself is in error - I can add
> code to dump it if finds an ECC error on read; I noticed that the
> current code doesn't verify the data written if
> "CONFIG_YAFFS_ALWAYS_CHECK_CHUNK_ERASED" is not set - do you have a
> development patch that enables the readback to verify the chunk is
> written correctly so I can test that my MTD layer is still operating
> correctly?
No I don't have a patch like that but it would be helpful if yaffs did 
verification to help check the mtd layer more effectively.

I'll look at adding that.


<snip>

> > > With this code, I'm seeing 30MB files that are created have mismatching
> > > checksums while running the attached test script.  The output from the
> > > test looks like:
> > >
> > > OMAP-35x# . /media/mmcblk0p1/x
> > > Create 30M file and get
> > > md5sum
> > > 30720+0 records
> > > in
> > > 30720+0 records
> > > out
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.1
> > > **>> Block 710 needs
> > > retiring
> > > **>> yaffs write required 2
> > > attempts
> > > **>> Block 710
> > > retired
> > > Block 710 is in state 9 after gc, should be
> > > erased
> > > Calculate md5sums for copied
> > > files
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.1
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.2
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.3
> > > 8c8d5a7974d0b9da747bc59edd1991f6
> > > somefile.4
> > > execute sync and resee where a logical checkin fcalculate
> > > md5sums
> > > save exit: isCheckpointed
> > > 1
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.1
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.2
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.3
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.4
> > > Delete one of the
> > > files
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.1
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.3
> > > 3cb7668eb7760202d96970a6a9a3361f
> > > somefile.4
> > > recopy the deleted
> > > file
> > > f6dba6d5af7a7a89481da1849035a417
> > > somefile.1
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.3
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.4
> > > f6dba6d5af7a7a89481da1849035a417
> > > somefile.7
> > > Creating test folder and some junk files in that
> > > folder
> > > 1+0 records
> > > in
> > > 1+0 records
> > > out
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.1
> > > md5sums of all files in test
> > > folder
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.1
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.2
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.3
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.4
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.5
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.6
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.7
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.8
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.9
> > > execute sync and recalculate
> > > md5sums
> > > save exit: isCheckpointed
> > > 1
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.1
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.2
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.3
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.4
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.5
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.6
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.7
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.8
> > > ae1028b8d6aef86d020c9edfae29ca3d
> > > junk.9
> > > Remove some files and recreate
> > > them
> > > Calculate md5sums for 30M files
> > > again
> > > f6dba6d5af7a7a89481da1849035a417
> > > somefile.1
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.3
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.4
> > > 1fee3f481bfa5cf3403efe9e481a0374
> > > somefile.7
> > > execute sync and recalculate
> > > md5sums
> > > save exit: isCheckpointed
> > > 1
> > > a7b6ccfa31115aa75a0fdca07073293d
> > > somefile.1
> > > 5b04790304a4221f1016a8c310da4746
> > > somefile.3
> > > 1abb3d578e2d129341df26916090b869
> > > somefile.4
> > > f6dba6d5af7a7a89481da1849035a417
> > > somefile.7
> > > OMAP-35x#
> > >
> > > In the output, note that the md5sum of "somefile.*" should all match.
> > >
> > > Anyone seen anything like this before?  Test attached.
> >
> > I just ran the test on both 2.6.24-xxx and 2.6.31-xxx using nandsim on a
> > PC and had no problems. Here's one run:
> >
> > root@linux-dual-head:/mnt# ~charles/Dropbox/yaffs-30M-test
> > Create 30M file and get md5sum
> > 30720+0 records in
> > 30720+0 records out
> > 31457280 bytes (31 MB) copied, 7.72198 s, 4.1 MB/s
> > dc7fd1b9553217a9a1becbb101271eab  somefile.1
> > Calculate md5sums for copied files
> > dc7fd1b9553217a9a1becbb101271eab  somefile.1
> > dc7fd1b9553217a9a1becbb101271eab  somefile.2
> > dc7fd1b9553217a9a1becbb101271eab  somefile.3
> > dc7fd1b9553217a9a1becbb101271eab  somefile.4
> > execute sync and recalculate md5sums
> > dc7fd1b9553217a9a1becbb101271eab  somefile.1
> > dc7fd1b9553217a9a1becbb101271eab  somefile.2
> > dc7fd1b9553217a9a1becbb101271eab  somefile.3
> > dc7fd1b9553217a9a1becbb101271eab  somefile.4
> > Delete one of the files
> > dc7fd1b9553217a9a1becbb101271eab  somefile.1
> > dc7fd1b9553217a9a1becbb101271eab  somefile.3
> > dc7fd1b9553217a9a1becbb101271eab  somefile.4
> > recopy the deleted file
> > dc7fd1b9553217a9a1becbb101271eab  somefile.1
> > dc7fd1b9553217a9a1becbb101271eab  somefile.3
> > dc7fd1b9553217a9a1becbb101271eab  somefile.4
> > dc7fd1b9553217a9a1becbb101271eab  somefile.7
> > Creating test folder and some junk files in that folder
> > 1+0 records in
> > 1+0 records out
> > 1024 bytes (1.0 kB) copied, 0.000442124 s, 2.3 MB/s
> > 0bc0c6e9588ee2bf6c89463208c5a0e9  junk.1
> > md5sums of all files in test folder
> > 0bc0c6e9588ee2bf6c89463208c5a0e9  junk.1
> > 0bc0c6e9588ee2bf6c89463208c5a0e9  junk.2
> > 0bc0c6e9588ee2bf6c89463208c5a0e9  junk.3
> > 0bc0c6e9588ee2bf6c89463208c5a0e9  junk.4
> > 0bc0c6e9588ee2bf6c89463208c5a0e9  junk.5
> > 0bc0c6e9588ee2bf6c89463208c5a0e9  junk.6
> > 0bc0c6e9588ee2bf6c894see where a logical checkin f63208c5a0e9  junk.7
> > 0bc0c6e9588ee2bf6c89463208c5a0e9  junk.8
> > 0bc0c6e9588ee2bf6c89463208c5a0e9  junk.9
> > execute sync and recalculate md5sums
> > 0bc0c6e9588ee2bf6c89463208c5a0e9  junk.1
> > 0bc0c6e9588ee2bf6c89463208c5a0e9  junk.2
> > 0bc0c6e9588ee2bf6c89463208c5a0e9  junk.3
> > 0bc0c6e9588ee2bf6c89463208c5a0e9  junk.4
> > 0bc0c6e9588ee2bf6c89463208c5a0e9  junk.5
> > 0bc0c6e9588ee2bf6c89463208c5a0e9  junk.6
> > 0bc0c6e9588ee2bf6c89463208c5a0e9  junk.7
> > 0bc0c6e9588ee2bf6c89463208c5a0e9  junk.8
> > 0bc0c6e9588ee2bf6c89463208c5a0e9  junk.9
> > Remove some files and recreate them
> > Calculate md5sums for 30M files again
> > dc7fd1b9553217a9a1becbb101271eab  somefile.1
> > dc7fd1b9553217a9a1becbb101271eab  somefile.3
> > dc7fd1b9553217a9a1becbb101271eab  somefile.4
> > dc7fd1b9553217a9a1becbb101271eab  somefile.7
> > execute sync and recalculate md5sums
> > dc7fd1b9553217a9a1becbb101271eab  somefile.1
> > dc7fd1b9553217a9a1becbb101271eab  somefile.3
> > dc7fd1b9553217a9a1becbb101271eab  somefile.4
> > dc7fd1b9553217a9a1becbb101271eab  somefile.7
> >
> >
> > Perhaps the retirement of the blocks indicates that some data was being
> > corrupted.
>
> Could be - I'll re-nuke the flash (since those blocks on this particular
> board should not be bad) and try again.  I'm wondering if I'm caught in
> limbo with the particular version of the kernel I have that on the
> OMAP35x exhibits some caching behavior that isn't caught in the changes
> you've made.  Unfortunately this is a production release and if you have
> a suggestion on how to go backwards (i.e. undo some of the caching
> changes that I'm caught in the middle of), I'd be appreciative - I'm
> looking for stability, not necessarily efficiency compared to previous
> kernel versions.

Caches are an easy way to get data inconsistency.

Which cache are you talking about here? yaffs should not be changing to 
support changes in mtd-level or OMAP-specific caching.

There are two caches that yaffs **should** be aware of and should play nice 
with:
* It's own cache. Try disabling that to see if that makes any difference. You 
can do that by mounting with -o "no-cache"
* The page cache. There have been some changes in this area recently. fsx 
(which really pounds on the page cache interface) runs but you might have 
uncovered a hole that fsx does not.

The page cache can be thrown out by 
# echo 3 > /proc/sys/vm/drop_caches
which will force yaffs to read all the data back again.

Thus if you do
sync
md5sum foo
echo 3 > /proc/sys/vm/drop_caches
md5sum foo
then it indicates that the data in the cache was inconsistent with the data on 
flash.

>
> At some point it would be nice if there were tags on the YAFFS CVS tree
> so I can snap to a known version and apply it to a kernel and walk
> forward or backwards in time to capture logical changes to the YAFFS
> source and test with each.
Tagging each checkin would pollute the tags space pretty quickly.
cvs does not provide a checkin Id like svn or git but you can use -D to fetch 
as of a specific date

cvs update -D "2009-10-31" 


-- CHarles



