Bug 1627071 Comment 0 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

See the attached zip file for a log of IO on Nightly, captured by Procmon. You can visualize it with a tool I built [here](procmon-analyze.github.io/). Note that there is a great deal of overlapped IO, and that startup in general seems to be heavily saturated with IO requests. Accordingly it's likely that we can improve startup times by eliminating background thread IO. This bug tracks that effort.
See the attached zip file for a log of IO on Nightly, captured by Procmon. You can visualize it with a tool I built [here](https://procmon-analyze.github.io/). Note that there is a great deal of overlapped IO, and that startup in general seems to be heavily saturated with IO requests. Accordingly it's likely that we can improve startup times by eliminating background thread IO. This bug tracks that effort.
View the attached file at https://procmon-analyze.github.io/ to see a visualization of startup IO on reference hardware. tl;dr: There is almost no time that we are not doing reads during startup, and during much of it we are requesting multiple, simultaneous reads, with windows of time where we are simultaneously requesting seven different reads. This likely causes unnecessary seeking which would hurt IO throughput (see [here](https://stackoverflow.com/questions/9191/how-to-obtain-good-concurrent-read-performance-from-disk) - though take with a grain of salt as it was posted 5 years ago).

Overall, during startup we read about 200MB off disk. Half of that is libxul, and half of that is everything else. Of the "everything else", a good bit of it are reads that we're actually getting from the cache. Thus, if our startup reads were optimally laid out, we should expect startup to take less than twice the time it takes to read libxul, and yet it takes about four times as long (on average, on my 2017 reference hardware).

So what is the plan? Conceptually, we want to extend and strengthen the basic concept of the existing URLPreloader: read things in advance, in an organized fashion. However, we want to expand upon it:

1. We need to expand the coverage. We want to ensure that as many files as possible are loaded sequentially by themselves (not random access), and sequentially with other files (not concurrently). Notable offenders today are DLLs and sqlite databases, though the latter are quite a bit more complicated to fix.

2. We need to add safeguards to ensure that we don't add reads in the future that go untracked.

3. We need to identify as many places as possible where files can be merged together into one file which can be read all at once, sequentially off disk.

4. We need to aggressively compress what we can, so that we read fewer bytes overall. The CPU trade-off should be minimal with a compression algorithm like lz4 or zlib.
View the attached file at https://procmon-analyze.github.io/ to see a visualization of startup IO on reference hardware. tl;dr: There is almost no time that we are not doing reads during startup, and during much of it we are requesting multiple, simultaneous reads, with windows of time where we are simultaneously requesting seven different reads. This likely causes unnecessary seeking which would hurt IO throughput (see [here](https://stackoverflow.com/questions/9191/how-to-obtain-good-concurrent-read-performance-from-disk) - though take with a grain of salt as it was posted 5 years ago).

Overall, during startup we read about 200MB off disk. Half of that is libxul, and half of that is everything else. Of the "everything else", a good bit of it are reads that we're actually getting from the cache. Thus, if our startup reads were optimally laid out, we should expect startup to take less than twice the time it takes to read libxul, and yet it takes about four times as long (on average, on my 2017 reference hardware).

So what is the plan? Conceptually, we want to extend and strengthen the basic concept of the existing URLPreloader: read things in advance, in an organized fashion. However, we want to expand upon it:

1. We need to expand the coverage. We want to ensure that as many files as possible are loaded sequentially by themselves (not random access), and sequentially with other files (not concurrently). Notable offenders today are DLLs and sqlite databases, though the latter are quite a bit more complicated to fix.

2. We need to add safeguards to ensure that we don't add reads in the future that go untracked.

3. We need to identify as many places as possible where files can be merged together into one file which can be read all at once, sequentially off disk.

4. We need to aggressively compress what we can, so that we read fewer bytes overall. The CPU trade-off should be minimal with a compression algorithm like lz4 or zlib.

5. While we're here and analyzing our disk IO, we need to simply purge as much of it as possible out of the startup path, regardless of whether it is on the main thread or not.

Back to Bug 1627071 Comment 0