Closed
Bug 1104337
Opened 10 years ago
Closed 7 years ago
[DeviceStorage] need a faster way to enumerate files
Categories
(Core :: DOM: Device Interfaces, defect)
Tracking
()
RESOLVED
INCOMPLETE
People
(Reporter: djf, Unassigned)
References
Details
In bug 1046995, I'm trying to figure out ways to improve the startup behavior and scanning performance of our FirefoxOS media apps. These apps use the MediaDB library, and improvments to that library require improvements to the speed at which we can enumerate the files from DeviceStorage.
The DeviceStorage.enumerate() API is cursor-based, so in theory the implementation could start feeding files to the JS client as fast as it finds them. But in practice the implementation seems to enumerate all the files and then feed them, one at a time, to the JS client.
In my tests on a Flame, when there are 1000 images on an sdcard, device storage takes about 800ms to enumerate the first file, and then takes about 2ms for each remaining file. If gecko has found all 1000 files after that first 800ms, I'd like an API where gecko just gives me the list of all 1000 in one big batch right away rather than stringing me out for another 2 seconds.
I'm pretty agnostic about how this is done. One way would be to add enumerateInBatches() which would work just like enumerate except that the cursor.result could be an array of files instead of a single file. That would give the implementation some flexibility to return one directory at a time, perhaps.
Another way would be to just add an enumerateAll() method that did not use a cursor at all and just returned a DOMRequest for all the files. I know nothing about the implementation, but I'd guess that this might be the simplest.
My main use case for this is doing complete enumeration of the entire storage area. So I'd be willing to accept a solution that does not include directory-based or time-based (the "since" option) enumeration. (Though it would be great to keep those features if possible).
I also don't need file objects as the result of this scan. It would be okay with me if the scan returned filenames, sizes and modification dates as plain strings and numbers in a data structure that did not involve any blobs or files. The MIME type of each file would be nice, but not required, since I can derive it from the file extension.
Finally, it would be great if this enumeration would do something smart on multi-core CPUs or would at least let me run it in a worker to take advantage of multiple cores. (I'd like, for example, to be able to efficiently enumerate the filesystem at the same time as I enumerate an indexedDB database)
| Reporter | ||
Comment 1•10 years ago
|
||
Dave: I've got no idea whether I'm asking for something easy or hard here. Is this something you can do? Can you do it in the 2.2 timeframe?
Flags: needinfo?(dhylands)
Comment 2•10 years ago
|
||
I think that it should be possible. I'd like to see it chunk things up to N entries at a time. Then we can tune N to optimize the performance.
Comment 3•10 years ago
|
||
If overhauling the API to do chunking, the file system's API's DirectoryReader.readEntries mechanism already has semantics like this:
http://dev.w3.org/2009/dap/file-system/file-dir-sys.html#the-directoryreader-interface
I don't know if that's an API we love or hate, but might as well be consistent with something! ;)
Comment 4•10 years ago
|
||
From a pure performance standpoint, returning just names would be the best.
Also device storage determines the mime types based on the extension.
I think I would just fix the existing enumeration method to work in batches.
If I was going to add a new function, it would be to add a function which just returned filenames. I think that the batching should just be an internal thing and transparent to the caller.
| Reporter | ||
Comment 5•10 years ago
|
||
(In reply to Dave Hylands [:dhylands] (on PTO Thur & Fri) from comment #4)
> From a pure performance standpoint, returning just names would be the best.
Because, I suppose, you have to call fstat() to get the size and date of the files. Currently in MediaDB I've got some code that checks the size and date of a file to see if it has changed since last scanned. I'll need that as part of the full scan. But if there was a super-fast API that just returned a list of filenames, I'd use that at startup to find files that had been deleted.
>
> Also device storage determines the mime types based on the extension.
>
Right, so I don't need type from device storage, because I can derive it myself using the same logic.
> I think I would just fix the existing enumeration method to work in batches.
Are you talking about breaking backward compatibility to start returning an array instead of a single file, or would it only do batches if the caller passed some new option in, or something? Either way, this would make me happy.
> If I was going to add a new function, it would be to add a function which
> just returned filenames. I think that the batching should just be an
> internal thing and transparent to the caller.
A function that returns filenames in addition to the batching would also make me happy. But batching in the existing enumerate() method is probably more important.
Comment 6•10 years ago
|
||
(In reply to David Flanagan [:djf] from comment #5)
> (In reply to Dave Hylands [:dhylands] (on PTO Thur & Fri) from comment #4)
> > I think I would just fix the existing enumeration method to work in batches.
>
> Are you talking about breaking backward compatibility to start returning an
> array instead of a single file, or would it only do batches if the caller
> passed some new option in, or something? Either way, this would make me
> happy.
Well internally, when you do an enumerate, the child sends the request to the parent, who determines the entire result and sends it back to the child, who then uses a cursor to enumerate through this result. This wastes lots of memory, and introduces a large latency to get the first file when there are many files in the storage area.
So internally, I'd rather see it start to collect something and send the result back in batches rather than all at once. There should probably be some flow control mechanism to prevent runaway memory usage.
> > If I was going to add a new function, it would be to add a function which
> > just returned filenames. I think that the batching should just be an
> > internal thing and transparent to the caller.
>
> A function that returns filenames in addition to the batching would also
> make me happy. But batching in the existing enumerate() method is probably
> more important.
I think we should fix device storage internally first, and then decide if we want to add a new API which returns a batch of files.
Comment 7•7 years ago
|
||
Cleaning up Device Interfaces component, and mass-marking old FxOS bugs as incomplete.
If any of these bugs are still valid, please let me know.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•