Closed Bug 1795312 Opened 3 years ago Closed 3 years ago

Firefox GUI lags for a few seconds serializing principals in session restore code

Categories

(Firefox :: Session Restore, defect)

Firefox 105
defect

Tracking

()

RESOLVED FIXED
112 Branch
Performance Impact ?
Tracking Status
firefox112 --- fixed

People

(Reporter: germano.massullo, Assigned: farre)

References

Details

(Keywords: perf:responsiveness, Whiteboard: Confirmed on Firefox 107.0)

Attachments

(5 files, 1 obsolete file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:105.0) Gecko/20100101 Firefox/105.0

Steps to reproduce:

Firefox GUI sometimes lags for a few seconds. It happens on both webpages and Firefox GUI.
CPU Ryzen 5 3600, 128 GB RAM, Radeon RX480
Fedora 36 KDE, xorg session, Plasma 5.25.5, kernel 5.19.14-200.fc36.x86_64, videodriver amdgpu, Firefox 105.0.2

I attached the output file of https://profiler.firefox.com/
At the moment I don't have the opportunity to test a new Firefox profile

Attached file Firefox_profile.tar.xz

The Bugbug bot thinks this bug should belong to the 'Core::Widget: Gtk' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Widget: Gtk
Product: Firefox → Core

If you are trying to import the file into profiler.firefox.com and you get an error, is because I had to break it into multiple lines to be able to inspect it and remove sensitive data.
To achieve that I had to use
sed 's/},/}\n/g' input.json > output.json
I filled https://bugzilla.mozilla.org/show_bug.cgi?id=1795303 to complain about Firefox creating such huge single line JSON file

I tried to create a new Firefox profile, but I was not able to reproduce the problem

I would like to debug this bug as soon as possible because I don't know how long I will be using the machine were the problem is reproducible

Can you try Wayland session and/or upstream binaries?
https://fedoraproject.org/wiki/How_to_debug_Firefox_problems#Testing_Mozilla_binaries
May be as well caused by Fedora packages.
Thanks.

Flags: needinfo?(germano.massullo)
Priority: -- → P3

Also upstream binary causes the same problem.

Flags: needinfo?(germano.massullo)

Here's a shareable version of the profile above: https://share.firefox.dev/3DzkFUq

Seems like the jank comes from SessionRestore, serializing a principal to JSON in nsIScriptSecurityManager.principalToJSON, and same in serializeCSP. My guess is that that code is either not intended to take too long, or if it is it shouldn't be running in the main thread...

It might be useful to have a problematic session store file attached to the bug. Reporter, could you attach your sessionstore.jsonlz4 file from your profile if you're comfortable with it? You can see your profile directory in about:profiles, there's a sessionstore.jsonlz4 file that is the relevant one here I believe. That's full of personal info, but when creating the attachment you can tick Make attachment and comment private checkbox to make sure as little people as possible have access to it.

Status: UNCONFIRMED → NEW
Performance Impact: --- → ?
Component: Widget: Gtk → Session Restore
Ever confirmed: true
Flags: needinfo?(germano.massullo)
Product: Core → Firefox
Summary: Firefox GUI lags for a few seconds → Firefox GUI lags for a few seconds serializing principals in session restore code

The product::component has been changed since the backlog priority was decided, so we're resetting it.
For more information, please visit auto_nag documentation.

Priority: P3 → --

Reporter, could you attach your sessionstore.jsonlz4 file from your profile if you're comfortable with it? [...]

Hello, I prefer to not disclose such file.
Is there anything else I can do to help you debugging?

Flags: needinfo?(germano.massullo)

is there any patch that I could test?

Are there any other options here?

Flags: needinfo?(emilio)

I don't know, I'm not familiar with the session restore code. Farre / Barret, do you know?

Flags: needinfo?(emilio)
Flags: needinfo?(brennie)
Flags: needinfo?(afarre)

Yeah, without a test case we're just guessing here. Did this use to happen in a content process before SHIP, and now that we do it in the parent process we jank the UI?

Does that sound plausible Peter?

I guess doing it in the background is the way to go in that case.

Flags: needinfo?(afarre)

I looked at the code and nothing jumped out at me. I'd have to guess the object being serialized is large enough that it takes long enough to jank the thread.

Flags: needinfo?(brennie)

I do currently have some lagging in a session of 1,700 tabs, mainly when typing in edit fields (like bug comments) and in the address bar. (up until a couple weeks ago I didn't have lagging). So perhaps my sessionstore has the same issue. I am willing to share privately in email. Just let me know who to send it to. Perhaps Germano would be willing to do the same?

I'd prefer for Barret / Farre to poke because they're more familiar with this code, but if they're not available feel free to mail it to either my personal or my moco email (emilio [at] mozilla.com). If you could take a profile of it to confirm it's the same issue that'd also be super-helpful.

(In reply to Wayne Mery (:wsmwk) from comment #16)

[...]I am willing to share privately in email. Just let me know who to send it to. Perhaps Germano would be willing to do the same?

I am sorry, I will not share my Firefox tabs list

Flags: needinfo?(peterv)

(In reply to Andreas Farre [:farre] from comment #14)

Does that sound plausible Peter?

It's plausible, sure. The BasePrincipal::ToJSON doesn't look super-efficient too.

Flags: needinfo?(peterv)

(In reply to Peter Van der Beken [:peterv] from comment #19)

The BasePrincipal::ToJSON doesn't look super-efficient too.

Bug 1508939 is where we switched the serialization code to the current mechanism (afaict from writing directly to a stream to creating Json::Value objects containing string maps, which then get serialized to a string recursively).

Another thing I noticed but I haven't mentioned before. You experience a short lag every time you save a file on your disk

The Performance Impact Calculator has determined this bug's performance impact to be low. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Impact on browser: Causes noticeable jank
+5
Configuration: Rare
×0.3

Performance Impact: ? → low

(In reply to Frank Doty [:fdoty] from comment #22)

The Performance Impact Calculator has determined this bug's performance impact to be low. If you'd like to request re-triage, you can reset the Performance Impact flag to "?" or needinfo the triage sheriff.

Impact on browser: Causes noticeable jank
+5
Configuration: Rare
×0.3

Well I don't agree at all. This bug can causes lags up to ~15 seconds

The severity field is not set for this bug.
:dao, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(dao+bmo)
Performance Impact: low → ?
Whiteboard: Confirmed on Firefox 107.0

IMO the main thing to look at here is the JSON serialization library (jsoncpp). We're spending a ton of time in there, recursing into objects and then repeatedly creating strings, which get passed out to outer calls which then reserialize those string into bigger strings etc. Andreas and I have been looking to see if we can optimize this somehow.

Assignee: nobody → afarre
Status: NEW → ASSIGNED

If this has got to do with slowness of serialization of principals, I'm leaning towards it being mostly expanded principals. As far as I can tell, they mostly (only?) show up in the context of extension. Would you be ok sharing your list of extensions? It could help us get a way to reproduce this issue.

Flags: needinfo?(germano.massullo)

Dao, I've been looking into changing how we serialize expanded principals, and it seems promising in that it will speed up both serialization and deserialization. The trouble is that it will break the format in session store, and I'm not sure how we usually handle this. I can make newer versions of Firefox still be able to handle the old format, but I can't make older versions handle the new format. Is this OK?

This will probably not fix this issue, but it will make things better. To fix this issue I think we need to break up serialization of Session History into smaller chunks, but before that I was thinking of adding telemetry. Do you know if we already have good probes? I'll look around a bit, but if there aren't any I plan to add a handful of probes for that as well.

Attachment #9312221 - Attachment description: WIP: Bug 1795312 - Improve performance of serialization of principals. → WIP: Bug 1795312 - Cache the Json::StreamWriterBuilder for serializing principals.

I've been looking at ContentPrincipal JSON serialization a bit, and managed to make it a bit faster (reduced to about 1/3 of the original time). Here are the times (in ms) for the different patches for serializing a million ContentPrincipal objects:

Trunk 4670
Cache the Json::StreamWriterBuilder 2260
Avoid a Json::Value copy 1990
Optimize serializing enums 1570

There might be more we could do, but it would probably require patching the JsonCpp code.

We should also probably land telemetry first, so that we can verify the effect of our patches in the real world.

Depends on: 1810704

Created Bug 1810704, let's land that and wait a release until trying to land the patches for this bug.

Attachment #9312221 - Attachment description: WIP: Bug 1795312 - Cache the Json::StreamWriterBuilder for serializing principals. → Bug 1795312 - Cache the Json::StreamWriterBuilder for serializing principals. r?farre!
Attachment #9312495 - Attachment description: WIP: Bug 1795312 - Avoid a Json::Value copy. → Bug 1795312 - Avoid a Json::Value copy. r?farre!
Attachment #9312496 - Attachment description: WIP: Bug 1795312 - Optimize serializing enums as JSON string keys for principals. → Bug 1795312 - Optimize serializing enums as JSON string keys for principals. r?farre!

The expanded principal serialization patch is running on try.

Attachment #9312167 - Attachment description: WIP: Bug 1795312 - Fully serialize/deserialize expanded principals with jsoncpp. → Bug 1795312 - Don't Base64 encode principals when serializing. r=peterv!
Pushed by pvanderbeken@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a732c5bcc392 Don't Base64 encode principals when serializing. r=peterv https://hg.mozilla.org/integration/autoland/rev/38f9e767cc32 Cache the Json::StreamWriterBuilder for serializing principals. r=farre https://hg.mozilla.org/integration/autoland/rev/bc1c5b1c2d85 Avoid a Json::Value copy. r=farre https://hg.mozilla.org/integration/autoland/rev/2594a213283c Optimize serializing enums as JSON string keys for principals. r=farre
Pushed by afarre@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c4f9e74d6164 Remove newly introduced intermittent test. r=peterv
Attachment #9316578 - Attachment is obsolete: true
Pushed by pvanderbeken@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/903aa182c02d Don't Base64 encode principals when serializing. r=peterv https://hg.mozilla.org/integration/autoland/rev/ff5c9e4043b0 Cache the Json::StreamWriterBuilder for serializing principals. r=farre https://hg.mozilla.org/integration/autoland/rev/cfada6ca624f Avoid a Json::Value copy. r=farre https://hg.mozilla.org/integration/autoland/rev/c126577344db Optimize serializing enums as JSON string keys for principals. r=farre
See Also: → 1860460
See Also: → 1861787
Flags: needinfo?(germano.massullo)
Flags: needinfo?(dao+bmo)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: