Fix XULStore main thread file I/O performance
Categories
(Core :: SQLite and Embedded Database Bindings, enhancement, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox94 | --- | fixed |
People
(Reporter: myk, Assigned: vporof)
References
(Blocks 1 open bug)
Details
(Whiteboard: [fxperf:p2])
Attachments
(1 file, 2 obsolete files)
Over in bug 1460811, comment 17, Florian asked if the new XULStore implementation is expected to do more main thread file I/O than the old implementation. It isn't, and we should investigate its measured usage to ensure it isn't doing anything unnecessary.
Here's an example of what Florian is seeing in recent startup profiles:
Of its markers, the largest by an order of magnitude is the 3.7ms it took to fsync xulstore/data.mdb. I don't see an fsync when I profile startup with an existing Firefox profile, however:
I might expect an fsync of that file on first-run, when Firefox creates a profile for the first time; and also on "first run after upgrading to the new XULStore," when Firefox migrates data from the old store to the new one. I wouldn't expect one otherwise, though.
And perhaps we can avoid it even on first run by creating/migrating the datastore on the new implementation's background thread. Either way, we should also investigate the other markers, even though they're all much cheaper than the fsync, to ensure we aren't doing any other potentially unnecessary file I/O.
Florian: is your startup profile possibly a first-run (or "first run after upgrade") profile?
Comment 1•6 years ago
|
||
(In reply to Myk Melez [:myk] [@mykmelez] from comment #0)
Thanks for filing!
I don't see an fsync when I profile startup with an existing Firefox profile, however:
That's still a lot of main thread I/O on your profile. These operations seem cheap in your profile, but I assume you ran this on a machine with an SSD.
Florian: is your startup profile possibly a first-run (or "first run after upgrade") profile?
It's a startup profile captured automatically by the startup main thread I/O test I'm writing in bug 1540135. So this is a slightly artificial situation as mochitests have some custom defaults to avoid network requests. It's closer to a first-run startup than to a normal warm startup though.
I've observed that startup in our test infrastructure is typically CPU bound, with I/O being very cheap. I would expect actual users (especially users with mechanical hard drives) to see much longer times for these I/O operations.
Comment 2•6 years ago
|
||
Here is what it looks like on a normal cold startup on the reference hardware: https://perfht.ml/2LgAOGD
Updated•6 years ago
|
Updated•6 years ago
|
Reporter | ||
Comment 3•6 years ago
|
||
I have an idea for how to address this and will investigate…
Comment 4•6 years ago
|
||
(In reply to Myk Melez [:myk] [@mykmelez] from comment #3)
I have an idea for how to address this and will investigate…
Hi Myk. Can you tell us a bit more about your plans? :-)
Reporter | ||
Comment 5•6 years ago
|
||
For XULStore in general, we should be able to preload its database on its background thread at the same time that URLPreloader pre-reads other URLs on its background thread.
For the specific XULStore values that earlyBlankFirstPaint()
and nsXULWindow more generally requires to show the first browser (or blank) window—width, height, screenX, screenY, and sizemode—we may be able to speed up retrieval even more by storing them in bespoke storage that is optimized for retrieval of just those values.
For example, a binary file with the shortest sequence of bytes sufficient to represent those values in predetermined order (thus not requiring delimiters) would presumably be 9-10 bytes in size: two each for width, height, screenX, and screenY; plus one byte for sizemode; and perhaps another byte that identifies the format version (in case we want to change it in the future).
But I'll start by preloading XULStore in general.
Comment 6•6 years ago
|
||
(In reply to Myk Melez [:myk] [@mykmelez] from comment #5)
For the specific XULStore values that
earlyBlankFirstPaint()
and nsXULWindow more generally requires to show the first browser (or blank) window—width, height, screenX, screenY, and sizemode—we may be able to speed up retrieval even more by storing them in bespoke storage that is optimized for retrieval of just those values.For example, a binary file with the shortest sequence of bytes sufficient to represent those values in predetermined order (thus not requiring delimiters) would presumably be 9-10 bytes in size: two each for width, height, screenX, and screenY; plus one byte for sizemode; and perhaps another byte that identifies the format version (in case we want to change it in the future).
Would this file be read on the main thread? If so, just storing these values in prefs (which are already available in memory at the time we display the early blank window) would be faster than needing to touch another file on disk.
Do you expect to be able to fix this bug for 68? If not, should we backout bug 1460811 or restrict it to nightly-only in the meantime?
Comment 7•6 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #6)
Do you expect to be able to fix this bug for 68? If not, should we backout bug 1460811 or restrict it to nightly-only in the meantime?
Florian, are there any plans to add:
- automated tests to capture this type of I/O change
- a policy around when it's appropriate to backout/disable patches due to an I/O change
I understand that (1) is very difficult to do. But it seems harder to do (2) without it, and without (2) it's not clear to me how to make a decision in any particular case.
Comment 8•6 years ago
|
||
(In reply to Brian Grinstead [:bgrins] from comment #7)
(In reply to Florian Quèze [:florian] from comment #6)
Do you expect to be able to fix this bug for 68? If not, should we backout bug 1460811 or restrict it to nightly-only in the meantime?
Florian, are there any plans to add:
- automated tests to capture this type of I/O change
- a policy around when it's appropriate to backout/disable patches due to an I/O change
I understand that (1) is very difficult to do. But it seems harder to do (2) without it, and without (2) it's not clear to me how to make a decision in any particular case.
-
Yes, I've been working on this on and off for the last 3 months, and it landed yesterday: see bug 1540135.
Actually, this bug has been found because I had to add a whitelist entry for it between my try runs. -
Sounds straightforward given the test is a mochitest: a new test failure means sheriffs will backout the patch that introduced new I/O.
The next question is: when is it fine to whitelist new I/O? The answer will be: if you can convince the front-end performance team who's looking after this test (mostly myself and mconley I guess) that this new main thread I/O is unavoidable.
Comment 9•6 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #8)
Yes, I've been working on this on and off for the last 3 months, and it landed yesterday: see bug 1540135.
Actually, this bug has been found because I had to add a whitelist entry for it between my try runs.Sounds straightforward given the test is a mochitest: a new test failure means sheriffs will backout the patch that introduced new I/O.
Thanks for the response - a new mochitest sounds great.
Reporter | ||
Comment 10•6 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #6)
Would this file be read on the main thread? If so, just storing these values in prefs (which are already available in memory at the time we display the early blank window) would be faster than needing to touch another file on disk.
That's currently true, and storing these values in prefs is the path of least resistance. But it isn't clear that it has to be true, so it's worth pushing on this assumption.
Do we really need to parse all prefs before we display the early blank window? Do we even need to parse any prefs? And if so, what's the subset that we actually need, and how do we isolate those, so we can load them first, and then display the window, before we load the rest?
Do you expect to be able to fix this bug for 68? If not, should we backout bug 1460811 or restrict it to nightly-only in the meantime?
Sorry, I should have mentioned this earlier: in bug 1547877 I re-added the old implementation of XULStore and configured the new implementation to be enabled only on Nightly (because of this issue and others). So the new implementation will not ride the trains to the 68 release.
(Nevertheless, this issue is on my list of the top three issues to resolve before we do allow the new implementation to ride the trains. So it remains a top priority for me.)
Comment 11•6 years ago
|
||
(In reply to Myk Melez [:myk] [@mykmelez] from comment #10)
Do we really need to parse all prefs before we display the early blank window? Do we even need to parse any prefs? And if so, what's the subset that we actually need, and how do we isolate those, so we can load them first, and then display the window, before we load the rest?
We probably don't need to parse prefs, no. My unverified assumption is that parsing prefs is cheaper than reading a small file from the disk. I'm wondering if what you are suggesting is equivalent to storing prefs in a more efficient format.
I would expect the cases where prefs are expensive to parse to be due to large values having been stored in prefs by legacy add-ons.
Sorry, I should have mentioned this earlier: in bug 1547877 I re-added the old implementation of XULStore and configured the new implementation to be enabled only on Nightly (because of this issue and others). So the new implementation will not ride the trains to the 68 release.
Thanks! I think that means my I/O test will fail when merged to beta, and more conditions need to be added on some whitelist entries.
Reporter | ||
Comment 12•6 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #11)
My unverified assumption is that parsing prefs is cheaper than reading a small file from the disk. I'm wondering if what you are suggesting is equivalent to storing prefs in a more efficient format.
I don't think so, given the intentional flexibility of the prefs API, and since there are many prefs, even on firstrun with a new profile (70 user_pref lines in prefs.js after quitting Firefox following firstrun on my laptop)
I expect an optimally efficient format for storing/reading/parsing five specific integer values to be much more efficient than an optimally efficient format for storing/reading/parsing an arbitrary set of prefs.
(Nevertheless, it's worth considering whether we can improve prefs efficiency as well to reduce its startup cost.)
Thanks! I think that means my I/O test will fail when merged to beta, and more conditions need to be added on some whitelist entries.
Sorry for the bustage! Let me know how I can help with the cleanup. (The relevant configuration option is MOZ_NEW_XULSTORE
, which is defined in toolkit/moz.configure and reflected into AppConstants.MOZ_NEW_XULSTORE
.)
Comment 13•6 years ago
|
||
(In reply to Myk Melez [:myk] [@mykmelez] from comment #12)
(In reply to Florian Quèze [:florian] from comment #11)
My unverified assumption is that parsing prefs is cheaper than reading a small file from the disk. I'm wondering if what you are suggesting is equivalent to storing prefs in a more efficient format.
I don't think so, given the intentional flexibility of the prefs API, and since there are many prefs, even on firstrun with a new profile (70 user_pref lines in prefs.js after quitting Firefox following firstrun on my laptop)
And we should consider what we can do about this more generally, esp. when it comes to state storage that isn't actually a user "preference", like "timestamp when we last started as stored by telemetry". But that's not really the case for this bug - this is actually user-configured stuff.
I expect an optimally efficient format for storing/reading/parsing five specific integer values to be much more efficient than an optimally efficient format for storing/reading/parsing an arbitrary set of prefs.
This makes sense for parsing, but that's not the only (nor biggest) cost here. If all the prefs consumers used this logic, things would probably get (a lot) worse because we'd end up with many many tiny separate files on separate blocks/sectors on disk, so on HDDs there'll be more seeking than just having 1 file, seeking which in practice will be more expensive in wall-clock time. Costs of file reads on most systems likely only start being vaguely related to their size once you need more than a few blocks/sectors of disk space (ie when there's a chance of fragmentation). As Florian already said, the CPU cost of parsing prefs is likely negligible on a lot of users' systems compared to the disk IO cost -- the opposite of how things pan out on talos (but even there, I'm fairly sure pref parsing doesn't show up in profiles as significant). Your example of 70 user_pref lines likely still fits in very few disk sectors (maybe even in only 1, depending on the size of the prefs and the type of disk etc.).
Reporter | ||
Comment 14•6 years ago
|
||
(In reply to :Gijs (he/him) from comment #13)
(In reply to Myk Melez [:myk] [@mykmelez] from comment #12)
(In reply to Florian Quèze [:florian] from comment #11)
My unverified assumption is that parsing prefs is cheaper than reading a small file from the disk. I'm wondering if what you are suggesting is equivalent to storing prefs in a more efficient format.
I don't think so, given the intentional flexibility of the prefs API, and since there are many prefs, even on firstrun with a new profile (70 user_pref lines in prefs.js after quitting Firefox following firstrun on my laptop)
And we should consider what we can do about this more generally, esp. when it comes to state storage that isn't actually a user "preference", like "timestamp when we last started as stored by telemetry". But that's not really the case for this bug - this is actually user-configured stuff.
Agreed, it'd be useful to think about the issues more generally, in addition to whatever we do here for this issue specifically.
I expect an optimally efficient format for storing/reading/parsing five specific integer values to be much more efficient than an optimally efficient format for storing/reading/parsing an arbitrary set of prefs.
This makes sense for parsing, but that's not the only (nor biggest) cost here. If all the prefs consumers used this logic, things would probably get (a lot) worse because we'd end up with many many tiny separate files on separate blocks/sectors on disk, so on HDDs there'll be more seeking than just having 1 file, seeking which in practice will be more expensive in wall-clock time.
Agreed, the solution I've proposed here is not intended to be generalized to all prefs consumers. I'm only proposing it for this particular one because it blocks early blank first paint (both the current implementation, which happens to run after prefs are read/parsed, and an ideal implementation that wouldn't depend on prefs).
Costs of file reads on most systems likely only start being vaguely related to their size once you need more than a few blocks/sectors of disk space (ie when there's a chance of fragmentation). As Florian already said, the CPU cost of parsing prefs is likely negligible on a lot of users' systems compared to the disk IO cost -- the opposite of how things pan out on talos (but even there, I'm fairly sure pref parsing doesn't show up in profiles as significant).
Understood. In the case of the XULStore values that block early blank first paint, which we can store in 9-10 bytes, we should be able to improve both CPU time and IO time (and thus wall-clock time) to that first paint by reading and parsing the values from a separate file.
Your example of 70 user_pref lines likely still fits in very few disk sectors (maybe even in only 1, depending on the size of the prefs and the type of disk etc.).
On my Windows system, prefs.js after first run is 6,645 bytes. On macOS and Linux, it's 6,019 and 5,840, respectively. All of which take at least 12 sectors on an HDD with 512-byte sectors or two sectors on an HDD with 4k sectors.
Prefs files in the wild, which contain accumulated "preferences" (both actual user settings and data stored by Firefox components), will be larger, although it isn't clear by how much on average. The prefs.js file in my default profile is 45,764 bytes (although this is likely to be an outlier).
Comment 15•5 years ago
|
||
This is still causing slowness visible in Nightly startup profiles, eg. https://perfht.ml/2GInN3A
Comment 16•5 years ago
|
||
Hey myk,
Can I presume correctly that you're unlikely to work on this anytime soon? If so (or if I don't hear back by the end of the week), I'll unassign you.
Comment 17•5 years ago
|
||
I was told in the #browser-arch room that Victor has this bug in his todos for Q3.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 18•4 years ago
•
|
||
After bug 1654192, bug 1594995, bug 1597898 land, the only IO that happens during startup is because of an unavoidable temporary migration from lmdb to safe mode. After some period of time, we can remove this migration step and thus entirely remove rkv from startup.
Assignee | ||
Comment 19•4 years ago
|
||
Assignee | ||
Comment 20•4 years ago
•
|
||
We should land this in some future next version, once there's been plenty of time for migrations to happen. Considering that consumers that did need migrations were only restricted to nightly and beta, and that stored data isn't critical to migrate in the first place, it will then be safe to remove the migration code and remove all traces of LMDB from startup.
Comment 21•4 years ago
|
||
There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:vporof, could you have a look please?
For more information, please visit auto_nag documentation.
Comment 22•3 years ago
|
||
This patch does not apply cleanly on central now as xperf_whitelist.json changed its name to xperf_allowlist.json (while the content seems to be unchanged).
Comment 24•3 years ago
|
||
Comment 25•3 years ago
|
||
bugherder |
Comment 26•3 years ago
|
||
Comment 27•3 years ago
|
||
- Prefilter profile content so that each entry can be deleted recursively without checking for root or lockfiles.
- Store undeleted roots instead of undeleted files so that a deeply nested undeleted file don't result in factorial growth of files to delete.
- Make directory recursion generic so file deletion and undeleted file logging can traverse paths identically.
Depends on D140426
Comment 28•3 years ago
|
||
Comment on attachment 9266630 [details]
Bug 1546838 - Post: Remove special casing no longer necessary as Bug 1716291 was a duplicate of Bug 1546838 r=jstutte,nalexander
Revision D140426 was moved to bug 1715742. Setting attachment 9266630 [details] to obsolete.
Comment 29•3 years ago
|
||
Comment on attachment 9267501 [details]
Bug 1546838 - Post: Refactor RemoveProfileFiles
; fix risk of factorial growth of undeleted files to remove. r=jstutte,nalexander
Revision D140909 was moved to bug 1715742. Setting attachment 9267501 [details] to obsolete.
Updated•3 months ago
|
Description
•