Firefox GUI lags for a few seconds serializing principals in session restore code
Categories
(Firefox :: Session Restore, defect)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox112 | --- | fixed |
People
(Reporter: germano.massullo, Assigned: farre)
References
Details
(Keywords: perf:responsiveness, Whiteboard: Confirmed on Firefox 107.0)
Attachments
(5 files, 1 obsolete file)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:105.0) Gecko/20100101 Firefox/105.0
Steps to reproduce:
Firefox GUI sometimes lags for a few seconds. It happens on both webpages and Firefox GUI.
CPU Ryzen 5 3600, 128 GB RAM, Radeon RX480
Fedora 36 KDE, xorg session, Plasma 5.25.5, kernel 5.19.14-200.fc36.x86_64, videodriver amdgpu, Firefox 105.0.2
I attached the output file of https://profiler.firefox.com/
At the moment I don't have the opportunity to test a new Firefox profile
| Reporter | ||
Comment 1•3 years ago
|
||
Comment 2•3 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Widget: Gtk' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
| Reporter | ||
Comment 3•3 years ago
|
||
If you are trying to import the file into profiler.firefox.com and you get an error, is because I had to break it into multiple lines to be able to inspect it and remove sensitive data.
To achieve that I had to use
sed 's/},/}\n/g' input.json > output.json
I filled https://bugzilla.mozilla.org/show_bug.cgi?id=1795303 to complain about Firefox creating such huge single line JSON file
| Reporter | ||
Comment 4•3 years ago
|
||
I tried to create a new Firefox profile, but I was not able to reproduce the problem
| Reporter | ||
Comment 5•3 years ago
|
||
I would like to debug this bug as soon as possible because I don't know how long I will be using the machine were the problem is reproducible
Comment 6•3 years ago
|
||
Can you try Wayland session and/or upstream binaries?
https://fedoraproject.org/wiki/How_to_debug_Firefox_problems#Testing_Mozilla_binaries
May be as well caused by Fedora packages.
Thanks.
Updated•3 years ago
|
| Reporter | ||
Comment 7•3 years ago
|
||
Also upstream binary causes the same problem.
Comment 8•3 years ago
|
||
Here's a shareable version of the profile above: https://share.firefox.dev/3DzkFUq
Seems like the jank comes from SessionRestore, serializing a principal to JSON in nsIScriptSecurityManager.principalToJSON, and same in serializeCSP. My guess is that that code is either not intended to take too long, or if it is it shouldn't be running in the main thread...
It might be useful to have a problematic session store file attached to the bug. Reporter, could you attach your sessionstore.jsonlz4 file from your profile if you're comfortable with it? You can see your profile directory in about:profiles, there's a sessionstore.jsonlz4 file that is the relevant one here I believe. That's full of personal info, but when creating the attachment you can tick Make attachment and comment private checkbox to make sure as little people as possible have access to it.
Comment 9•3 years ago
|
||
The product::component has been changed since the backlog priority was decided, so we're resetting it.
For more information, please visit auto_nag documentation.
| Reporter | ||
Comment 10•3 years ago
•
|
||
Reporter, could you attach your sessionstore.jsonlz4 file from your profile if you're comfortable with it? [...]
Hello, I prefer to not disclose such file.
Is there anything else I can do to help you debugging?
| Reporter | ||
Comment 11•3 years ago
|
||
is there any patch that I could test?
Comment 13•3 years ago
|
||
I don't know, I'm not familiar with the session restore code. Farre / Barret, do you know?
| Assignee | ||
Comment 14•3 years ago
|
||
Yeah, without a test case we're just guessing here. Did this use to happen in a content process before SHIP, and now that we do it in the parent process we jank the UI?
Does that sound plausible Peter?
I guess doing it in the background is the way to go in that case.
Comment 15•3 years ago
|
||
I looked at the code and nothing jumped out at me. I'd have to guess the object being serialized is large enough that it takes long enough to jank the thread.
Comment 16•3 years ago
|
||
I do currently have some lagging in a session of 1,700 tabs, mainly when typing in edit fields (like bug comments) and in the address bar. (up until a couple weeks ago I didn't have lagging). So perhaps my sessionstore has the same issue. I am willing to share privately in email. Just let me know who to send it to. Perhaps Germano would be willing to do the same?
Comment 17•3 years ago
|
||
I'd prefer for Barret / Farre to poke because they're more familiar with this code, but if they're not available feel free to mail it to either my personal or my moco email (emilio [at] mozilla.com). If you could take a profile of it to confirm it's the same issue that'd also be super-helpful.
| Reporter | ||
Comment 18•3 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #16)
[...]I am willing to share privately in email. Just let me know who to send it to. Perhaps Germano would be willing to do the same?
I am sorry, I will not share my Firefox tabs list
| Assignee | ||
Updated•3 years ago
|
Comment 19•3 years ago
|
||
(In reply to Andreas Farre [:farre] from comment #14)
Does that sound plausible Peter?
It's plausible, sure. The BasePrincipal::ToJSON doesn't look super-efficient too.
Comment 20•3 years ago
|
||
(In reply to Peter Van der Beken [:peterv] from comment #19)
The
BasePrincipal::ToJSONdoesn't look super-efficient too.
Bug 1508939 is where we switched the serialization code to the current mechanism (afaict from writing directly to a stream to creating Json::Value objects containing string maps, which then get serialized to a string recursively).
| Reporter | ||
Comment 21•3 years ago
|
||
Another thing I noticed but I haven't mentioned before. You experience a short lag every time you save a file on your disk
Comment 22•3 years ago
|
||
The Performance Impact Calculator has determined this bug's performance impact to be low. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.
Impact on browser: Causes noticeable jank
+5
Configuration: Rare
×0.3
| Reporter | ||
Comment 23•3 years ago
|
||
(In reply to Frank Doty [:fdoty] from comment #22)
The Performance Impact Calculator has determined this bug's performance impact to be low. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.
Impact on browser: Causes noticeable jank
+5
Configuration: Rare
×0.3
Well I don't agree at all. This bug can causes lags up to ~15 seconds
Comment 24•3 years ago
|
||
The severity field is not set for this bug.
:dao, could you have a look please?
For more information, please visit auto_nag documentation.
| Reporter | ||
Updated•3 years ago
|
| Reporter | ||
Updated•3 years ago
|
Comment 25•3 years ago
|
||
IMO the main thing to look at here is the JSON serialization library (jsoncpp). We're spending a ton of time in there, recursing into objects and then repeatedly creating strings, which get passed out to outer calls which then reserialize those string into bigger strings etc. Andreas and I have been looking to see if we can optimize this somehow.
| Assignee | ||
Updated•3 years ago
|
| Assignee | ||
Comment 26•3 years ago
|
||
If this has got to do with slowness of serialization of principals, I'm leaning towards it being mostly expanded principals. As far as I can tell, they mostly (only?) show up in the context of extension. Would you be ok sharing your list of extensions? It could help us get a way to reproduce this issue.
| Assignee | ||
Comment 27•3 years ago
|
||
Dao, I've been looking into changing how we serialize expanded principals, and it seems promising in that it will speed up both serialization and deserialization. The trouble is that it will break the format in session store, and I'm not sure how we usually handle this. I can make newer versions of Firefox still be able to handle the old format, but I can't make older versions handle the new format. Is this OK?
This will probably not fix this issue, but it will make things better. To fix this issue I think we need to break up serialization of Session History into smaller chunks, but before that I was thinking of adding telemetry. Do you know if we already have good probes? I'll look around a bit, but if there aren't any I plan to add a handful of probes for that as well.
| Assignee | ||
Comment 28•3 years ago
|
||
Also ditch Window.btoa!
| Assignee | ||
Comment 29•3 years ago
|
||
Comment 30•3 years ago
|
||
Updated•3 years ago
|
Comment 31•3 years ago
|
||
Depends on D166810
Comment 32•3 years ago
|
||
Depends on D166953
Comment 33•3 years ago
|
||
I've been looking at ContentPrincipal JSON serialization a bit, and managed to make it a bit faster (reduced to about 1/3 of the original time). Here are the times (in ms) for the different patches for serializing a million ContentPrincipal objects:
| Trunk | 4670 |
| Cache the Json::StreamWriterBuilder | 2260 |
| Avoid a Json::Value copy | 1990 |
| Optimize serializing enums | 1570 |
There might be more we could do, but it would probably require patching the JsonCpp code.
We should also probably land telemetry first, so that we can verify the effect of our patches in the real world.
| Assignee | ||
Comment 34•3 years ago
|
||
Created Bug 1810704, let's land that and wait a release until trying to land the patches for this bug.
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
| Assignee | ||
Comment 35•3 years ago
|
||
The expanded principal serialization patch is running on try.
Updated•3 years ago
|
Comment 36•3 years ago
|
||
| Assignee | ||
Comment 37•3 years ago
|
||
Comment 38•3 years ago
|
||
Comment 39•3 years ago
|
||
Backed out 5 changesets (bug 1795312) for causing build bustage at BasePrincipal.h
Backout: https://hg.mozilla.org/integration/autoland/rev/4f67e1a353a51251ee82eb82e7b7a762fb12eaa1
Failure push: https://treeherder.mozilla.org/jobs?repo=autoland&revision=2594a213283cb5f14e460b73c4bbaaba01f110c8
Failure log: https://treeherder.mozilla.org/logviewer?job_id=405030240&repo=autoland&lineNumber=10061
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 40•3 years ago
|
||
Comment 41•3 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/903aa182c02d
https://hg.mozilla.org/mozilla-central/rev/ff5c9e4043b0
https://hg.mozilla.org/mozilla-central/rev/cfada6ca624f
https://hg.mozilla.org/mozilla-central/rev/c126577344db
Updated•2 years ago
|
Description
•