Charles Manning wrote: > On Friday 01 July 2005 10:41, Charles Manning wrote: > >>On Thursday 30 June 2005 17:43, Beat Morf wrote: >> >>>Hi >>> >>>I am using YAFFS direct implementation and actually testing some >>>basically requirements. >>> >>>When I write hundreds of files into the root of my YAFFS-FS (let's say >>>600) and dump them afterwards ('yaffs_opendir()', 'yaffs_readdir()'), I >>>saw, that the 'yaffs_readdir(...)' functions needs much more time for >>>the last files than for the first one. >>> >>>Each call to the 'yaffs_readdir(...)' function will start at the first >>>object within the 'yaffs_DIR'. For knowing which object was allready >>>returned, a separate list of allready showed objects is applied (the >>>objects are identifiable with the ObjectID); Means long times for a lot >>>of files! >>> >>>What is exactly the reason for such a method if I don't need to give >>>them out in an special ordered way? >>> >>>Actually I am using following 'yaffs_xxxxdir(...)' functions with a good >>>performance ('dsc->list' is used for pointing to the next >>>yaffsfs_ObjectListEntry entry): >> >>Thanx Beat >> >>I have had this comment once before from someone using huge directories. >>I believe they got around the problem by caching names in their >>application. >> >>There are some interesting issues associated with directory reading. For >>example, what happens if another thread is writing new files to the >>directory or deleting files from the directory? >> >>The easiest way to do a directory traversal is to keep a pointer to the >>next sibling in the list as we search, but that can go wrong if that object >>gets deleted before you read it: >> >>eg. >> >>1. Directory has files xx,yy zz. >>2. Read dir, get xx. Pointer points to yy >>3. Delete yy >>4. Now pointer points to a non-existant object! >> >>Another way that does not work is to try remember the number where you are: >>eg. >>1. Directory has files xx,yy zz. >>2. Read dir, get xx. Remember we're pointing to item 1 (starting at zero) >>3. Delete xx >>4. Now item 1 is zz and we skipped past yy. >> >> >>The code, as is, covers for cuveballs like this but it isn't very >>efficient. This is something that I should look at improving. >> >> >>I have not looked at your code suggestion in detail yet, but I shall. >> >>-- Charles > > > Having looked at Beat's code I think it would fail under one of the delete > case I mention. > > I spent some time this weekend trying to implement some code using a balanced > AVL tree instead of a linked list. This would possibly improve the speed but > things would still degrade with large directories. > > I'm now thinking of a scheme that is like Beat'ssuggestion, but is delete > safe. > > Essentially, this will use a pointer into the directory (same as Beat's), but > it would also add a check to any file deletion from a directory to check > whether the file is currently being pointed to, and if so move it on to the > next item in the directory. > > Ideas anybody? > > -- Charles > I've I looked at the sysfs code where they have basicaly the same situation: the directories are implemented as a linked list of entries. The solution they have adopted is to associate to each *opened* directory a small bit of data (a "cursor") which is essentialy a pointer to the current entry in the linked list, so a readdir of a whole directory is linear (in yaffs, for each invocation of the readdir method, which retrieve one entry at a time, we need to lookup the list from the beginning at each time and thus reading a whole directory have a quadratic cost). I think this solution is simple and clean; one of these day I will try a proof of concept of this. Note that in reality, the sysfs code is more complicated that I describe above, they do their own dirent allocation (I'm not sure why they need this), it need a little more code each time an entry is added/removed from the list and when seeking a directory. Luc