Open Bug 1849393 Opened 1 year ago Updated 27 days ago

Firefox regularly hangs for over half a second in SessionWriter.sys.mjs calling IOUtils.writeJSON serializing a massive JS object when saving tab state

Categories

(Firefox :: Session Restore, defect, P3)

Firefox 116
defect

Tracking

()

People

(Reporter: torokati44, Unassigned)

References

(Blocks 2 open bugs)

Details

(Keywords: perf, Whiteboard: QA-not-reproducible)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/116.0

Steps to reproduce:

Over several years, accumulated ~6500 open tabs in one window, and ~3000 in another.

Actual results:

Firefox regularly hangs completely for several hundred milliseconds (videos and animations are annoyingly hiccuping) every couple minutes, when the tab state is saved.
See: https://profiler.firefox.com/public/skbxz2aw31jg2mwd6hbntr4s2axfcqjgrpg931g/flame-graph/?globalTrackOrder=0wj&range=2125m697&thread=0&v=10

Expected results:

No hiccups, certainly not regular ones, no matter how many tabs are open.

The Bugbug bot thinks this bug should belong to the 'Core::Widget: Gtk' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Widget: Gtk
Product: Firefox → Core
Component: Widget: Gtk → Storage: localStorage & sessionStorage

In the profile ~half of the time seems to be spent inside Quote, the other ~half is a memcpy that seems to come directly from IOUtils::WriteJSON. It might be that these copies are unavoidable and there is an outer loop or something else that could be avoided when persisting the tab state. But there might be some useless conversion/string copy in between, too, and this profile seems interesting enough to take a look.

Unfortunately I cannot browse the source code within that profile, is that coming from a non official mozilla build (it seems to be on linux, so probably it's some distribution's build) ?

In any case I'd propose to start the investigation from IOUtils::WriteJSON in "DOM: Core".

Component: Storage: localStorage & sessionStorage → DOM: Core & HTML

The JSON serialization stacks seem reasonable here, it sounds like it's just a massive single object being JSON stringified. Moving to the session store component were the profiler stacks identify code in SessionWriter as the immediate caller.

Some mechanisms have been proposed in the past to reduce the size of what Session Store needs to store inline by allowing SessionStorage to be stored in a LocalStorage-like manner (bug 1445464), although that would not handle the History.pushState structured serialization storage, although the same idea potentially holds there for storing state in what amounts to a separate KV store from the primary session storage. Using separate stores does potentially create additional coherency / GC issues however, as well as changing the I/O pattern from large sequential writes/reads to scattered random access which could require consumers to work in an async fashion.

Component: DOM: Core & HTML → Session Restore
Product: Core → Firefox
Summary: Firefox regularly hangs for over half a second in IOUtils.writeJSON when saving tab state → Firefox regularly hangs for over half a second in SessionWriter.sys.mjs calling IOUtils.writeJSON serializing a massive JS object when saving tab state

The severity field is not set for this bug.
:dao, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(dao+bmo)
Blocks: ss-perf
Severity: -- → S3
Flags: needinfo?(dao+bmo)
Keywords: perf
Priority: -- → P3

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #3)

reduce the size of what Session Store needs to store inline by allowing SessionStorage to be stored in a LocalStorage-like manner (bug 1445464), although that would not handle the History.pushState structured serialization storage, although the same idea potentially holds there for storing state in what amounts to a separate KV store from the primary session storage. Using separate stores does potentially create additional coherency / GC issues however, as well as changing the I/O pattern from large sequential writes/reads to scattered random access which could require consumers to work in an async fashion.

Just for clarity - "session store" here is the thing that stores all the user's tabs and windows for reuse after a crash / restart, and it is slow as per the profile (because it's re-writing the 6500 tabs to disk via JSON, and just serializing that info to JSON takes a long time and happens on the main thread because the data came from the main thread - the IO is not on the main thread).

"SessionStorage" is the LocalStorage-related DOM api.

Confusingly, "session (re)store" can contain DOM "SessionStorage" data (as well as e.g. session cookies), for restoration post-restart/crash.

Do we think the issue here is actually related to DOM SessionStorage, or only to session store?

Flags: needinfo?(bugmail)
Depends on: 1854434

(In reply to :Gijs (he/him) from comment #5)

Do we think the issue here is actually related to DOM SessionStorage, or only to session store?

The issue here is exclusively Session Store serializing a large object graph in its entirety synchronously. I don't know the distribution of data in the reporter's persisted file, but that could be interesting!

I mentioned bug 1445464 because one can imagine 2 main classes of fixes to this problem:

  1. Store less data in the graph! Bug 1445464 is one potential way to reduce the amount of data other than reducing the number of windows/tabs to be serialized.
  2. Leverage continuity of data between the successive serializations to cache portions of the serialized data. For example, use a (Weak)Map to store the JSON representation of each tab, invalidating the cached value if a mutation happens on the tab. For tabs that are unloaded/were never loaded, we would expect stability. Then, instead of using writeJSON, a somewhat-manually merged JSON string can be stitched together, either leveraging string ropes for efficiency[1] and using the existing IOUtils.writeUTF8, or potenially leveraging the ability to create multi-part Blobs by passing an array to the Blob constructor and adding a method like "writeBlob" to IOUtils. Note that if going with the blob approach, it might be appropriate to pre-materialize any strings into Blobs and cache that, as I think the Blob may copy the string it's provided[2].

Realistically, bug 1445464 will not happen anytime soon absent explicit guidance from product, as it would be a pretty serious task to implement in storage space and require heavy coordination with the session store team too.

1: There are rules around when ropes are used versus copying, and I don't know them offhand and remember it not being trivial to look up.

2: This is potentially a bit easier to look up, but if you never need to look at the contents of the blob after you create it, it's definitely simplest to just create the Blobs ASAP since the ownership is much more straightforward (as long as no one tries to get clever with slicing the blobs).

Flags: needinfo?(bugmail)

IOUtils::writeJSON is about 5% of all the reported hangs for Nightly: https://fqueze.github.io/hang-stats/#date=20230916&row=0&filter=writeJSON

Hello! I have tried to reproduce the issue with firefox 120.0a1(2023-10-20) on ubuntu 22.04 unfortunately I wasn't able to reproduce it on my end. I will mark this as QA-not-reproducible.

Have a nice day!

Whiteboard: QA-not-reproducible
Depends on: 1866239
See Also: → 1866237

I don't see this regularly, but I see caught this on the profile and bug 1866239 should shave a bit of time of these.

I wonder if there's a faster thing that IOUtils::WriteJSON could do. Like serializing the object graph to some binary representation first, then ship that to the background thread...

Status: UNCONFIRMED → NEW
Ever confirmed: true
Duplicate of this bug: 1866237

Adding a see-also to bug 1867137, which is one possibility for reducing the size of this file, which would reduce its severity for at least some users.

But regardless it seems like we should have a better way to (ideally incrementally) update this persisted state. Either with a different I/O strategy (I think there are stream-based JSON parsers right, like this one.) Or at least moving that file i/o to a different thread/process. I guess maybe incremental updates run a greater risk of data corruption which defeats the whole point of this file... Can I have fast and safe?

I would expect Session Restore isn't unique in this requirement however, and a solution would come from nearer to IOUtils?

See Also: → 1867137

(In reply to Sam Foster [:sfoster] (he/him) from comment #11)

Adding a see-also to bug 1867137, which is one possibility for reducing the size of this file, which would reduce its severity for at least some users.

But regardless it seems like we should have a better way to (ideally incrementally) update this persisted state. Either with a different I/O strategy (I think there are stream-based JSON parsers right, like this one.) Or at least moving that file i/o to a different thread/process. I guess maybe incremental updates run a greater risk of data corruption which defeats the whole point of this file... Can I have fast and safe?

The problem isn't parsing (well, maybe that's also slow but it only happens once rather than regularly while the browser is running, and it's not the subject of this bug) but writing. There's a huge object graph (object with more object/array refs to more objects/arrays ... to eventually strings/numbers/bools/whatever) that needs to be written to disk in some way. Right now, we serialize to a JSON string and pass the string to another thread for async IO. This means the IO cost as such is like 99% offloaded so not what is making things slow. But what's slow is creating the giant JSON string.

But JS objects are not threadsafe, so we can't create the string away from the main thread. There isn't really a good way to 'incrementally' create the JSON string from the object graph, especially not because the graph might mutate if we return control to the event loop (ie let more JS run before we finish serializing the object graph). Ironically, the work from bug 1505572 may have made the "write this object graph to a string" work harder rather than easier (because the individual bits are "spread out" more than if we had previously amalgamated them into strings and cached them, kind of sort of like part of the suggestion in comment #6.

There was some discussion about optimizing this elsewhere, but I don't remember where and can't find it off-hand. :-(

I would expect Session Restore isn't unique in this requirement however, and a solution would come from nearer to IOUtils?

Pretty sure session restore is the only consumer with JSON files anywhere near this order of magnitude in terms of size and complexity. And one of the suggestions in comment 6 is swapping out the storage system completely! So although a solution may need to part-live in IOUtils in terms of efficiency (using C++ and threads rather than keeping it all in JS), I'm not sure that the "session restore isn't unique" bit is true here.

Duplicate of this bug: 1867161

I have reliable STR to trigger this bug (starting with a fresh profile) over in bug 1867161 comment 0 -- "just" create a window with 500 https://bugzilla.mozilla.org/ tabs (quick steps to do this over in the other bug). (Bugzilla is particularly useful for this since its favicon is kinda heavyweight, and it gets encoded into the sessionstore for each tab as a data URI, per comment 11.)

With that window open in the background, Firefox seems to just repeatedly jank for 500ms and undergo a persistent ~225MB increase in parent-process memory usage, every 15s. (Presumably the memory increase resets periodically once it hits some GC threshold, and that massive multi-gigabyte GC presumably causes another period of jank.)

Here's a profile: https://share.firefox.dev/41fFP5D

One thing I thought of measuring when looking at this last week in comment 6 etc, but didn't get to, is whether structured cloning the object into a worker would be significantly faster than doing the JSON serialization. If so, even though it's effectively more work in the end, it seems it should be possible to ship the session-restore data to a worker first, and perform the serialization in the worker.

Another potential relatively-low-effort / high-gain thing to do, assuming the graph needs to be built in JS which kinda seems to be the case, is making a WebIDL dictionary with the data, and create a let's say ChromeUtils or IOUtils method to write it to disk.

Conversion to a dictionary should be relatively fast, and the native objects can be sent off-thread (if they don't contain JS things, which shouldn't in this case because it's just data). WebIDL also supports auto-generating the JSON serialization.

So what I'd love having the time to do would be something like getting a big sessionstore json blob, parsing it, and comparing:

  • JSON.stringify (which is effectively what writeJSON does)
  • structuredClone (maybe without the overhead of allocating new JS objects, not sure how structured clone in native code works exactly).
  • Passing it to a WebIDL dictionary.

Gijs / Sam, do those approaches seem reasonable to you? Do you know who might have the cycles to profile or prototype something like that if so? Do you know where the schema of this data lives? Or is the schema basically this?

Flags: needinfo?(sfoster)
Flags: needinfo?(gijskruitbosch+bugs)

I wrote comment 15 mid-airing with Daniel's comment 14. Daniel's STR are probably-massively improvable by having the data URIs / tab state fields deduplicated in some way. But not sure if "duplicate favicons" are the general case here. It does seem likely that people with hundreds of tabs would have at least a fair chunk of them from the same site.

FWIW, here's a longer profile where I placed a heavy object on my 0 key for 7 minutes at paste.mozilla.org to mimic continuous rapid typing:
https://share.firefox.dev/3GmKzN0

This shows the jank from sessionstore-writing every 15 seconds, with memory usage increasing by ~225MB every time. I was hoping to see a massive GC at some point, but I never did (maybe that happened after I captured the profile). The parent process memory usage was 5.7GB higher at the end vs. the start of the profile, which is all presumably JS garbage from the sessionstore serialization, based on the stairstep pattern that can be observed there.

(In reply to Daniel Holbert [:dholbert] from comment #17)

This shows the jank from sessionstore-writing every 15 seconds

Side note: this magic 15 second interval seems to come from the pref browser.sessionstore.interval:
https://searchfox.org/mozilla-central/rev/877adfeec838cdad97369441865bac605109427d/modules/libpref/init/StaticPrefList.yaml#1576-1579

# Minimal interval between two save operations in milliseconds (while the user is active).
- name: browser.sessionstore.interval
  type: RelaxedAtomicUint32
  value: 15000

(In reply to Emilio Cobos Álvarez (:emilio) from comment #15)

One thing I thought of measuring when looking at this last week in comment 6 etc, but didn't get to, is whether structured cloning the object into a worker would be significantly faster than doing the JSON serialization.

I think SessionStore used to do this and it was also a performance problem. Specifically my comments in https://bugzilla.mozilla.org/show_bug.cgi?id=1537694#c25 and https://bugzilla.mozilla.org/show_bug.cgi?id=1546847#c17 suggest this.

Another potential relatively-low-effort / high-gain thing to do, assuming the graph needs to be built in JS which kinda seems to be the case, is making a WebIDL dictionary with the data, and create a let's say ChromeUtils or IOUtils method to write it to disk.

Yes, I think this is the best magic possible for a fast hand-off of data. I do want to emphasize that having ChromeUtils/IOUtils expose the consuming method is very important here; involving postMessage with this would be strictly worse.

(In reply to :Gijs (he/him) from comment #12)

(In reply to Sam Foster [:sfoster] (he/him) from comment #11)
But JS objects are not threadsafe, so we can't create the string away from the main thread. There isn't really a good way to 'incrementally' create the JSON string from the object graph, especially not because the graph might mutate if we return control to the event loop (ie let more JS run before we finish serializing the object graph).

Ok that is the bit I was hoping wasn't true. There are some pretty clean lines we could segment this large graph along - I'm thinking of the windows array/property. Each entry in there is self-contained, and for the most part any (user) change to one necessarily means the others haven't changed. However, if the site navigates itself in the background the state changes and our snapshot becomes out of date. It is true that every update we persist is a 99% match for the existing file. I don't have good suggestions for how to fix that - other than database-something-something.

Pretty sure session restore is the only consumer with JSON files anywhere near this order of magnitude in terms of size and complexity.

Hmm, yeah I was thinking of logins.json (and the other .json files in the profile) but they are likely at least an order of magnitude smaller if not 2. And they are typically updated in response to a user action, rather than the continual background tick we see here.
(In reply to Emilio Cobos Álvarez (:emilio) from comment #15)

Gijs / Sam, do those approaches seem reasonable to you? Do you know who might have the cycles to profile or prototype something like that if so?
Yes, and no in that order.

Do you know where the schema of this data lives? Or is the schema basically this?

We don't have a formal schema that I'm aware of. It might be useful exercise to create one - if only to use in tests. I'll make a note to file that.

Flags: needinfo?(sfoster)

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #19)

(In reply to Emilio Cobos Álvarez (:emilio) from comment #15)

One thing I thought of measuring when looking at this last week in comment 6 etc, but didn't get to, is whether structured cloning the object into a worker would be significantly faster than doing the JSON serialization.

I think SessionStore used to do this and it was also a performance problem. Specifically my comments in https://bugzilla.mozilla.org/show_bug.cgi?id=1537694#c25 and https://bugzilla.mozilla.org/show_bug.cgi?id=1546847#c17 suggest this.

Yes, this is right. SessionStore used to have a dedicated worker for writing to disk, which used the now-removed worker-only synchronous IO (!) APIs to write to disk from the worker thread. To get the data to the worker, it implicitly used structuredClone (by using postMessage to talk to the worker). We removed the worker and OS.File because OS.File was a js-ctypes wrapper around C file IO things and generally had a distinct set of bugs from nsIFile and was awful in that and several other respects; also because setting up the worker and loading OS File into it etc. was decidedly slow/janky - a ChromeUtils wrapper around doing async IO (IOUtils) was set up instead.

If structuredClone overhead is better than JSON.stringify, this might do something, but I'm not sure how much. You'd want to experiment with changing writeJSON in IOUtils to call whatever the JSAPI equivalent of structuredClone is to get a clone of the object into a different compartment (handwave how you set that up for a thread that otherwise doesn't run JS, I think...) and then dealing with the stringification off-mainthread. (Though as Emilio noted, this ends up doing strictly more work because we'd do both structuredClone + JSON.stringify.)

Another potential relatively-low-effort / high-gain thing to do, assuming the graph needs to be built in JS which kinda seems to be the case, is making a WebIDL dictionary with the data, and create a let's say ChromeUtils or IOUtils method to write it to disk.

Yes, I think this is the best magic possible for a fast hand-off of data. I do want to emphasize that having ChromeUtils/IOUtils expose the consuming method is very important here; involving postMessage with this would be strictly worse.

I'm not sure I understand why/how this would be better than serializing from a "raw" object graph. Doesn't the profile have us mostly spend time in memcpy type stuff? Presumably, when sending a writeable webidl dictionary to another thread, we still have to do those copies, as allowing (write) access from both threads would not be safe so we can't pass by reference - unless webidl dictionaries are completely threadsafe? - and the caller has a ref so we can't std:move... (ni for this)

(as Sam said, SessionStore is so old that schemas weren't a thing / weren't used when it started, and so we still don't have one now. It wouldn't be a bad idea to add one. But it's non-trivial. There's also enough depth/complexity to the object graph that we'd need a few different nested dictionaries / arrays of dictionaries etc. - not only do we have tabs across windows, we then also have closed tab/window data, history entries in those tabs, form input inside those history entries, favicon data we may wish to de-duplicate as per the other bug, etc. etc.)

(In reply to Sam Foster [:sfoster] (he/him) from comment #20)

I'm thinking of the windows array/property. Each entry in there is self-contained,

This stops being true if we try to deduplicate across entries.

and for the most part any (user) change to one necessarily means the others haven't changed.

This already isn't true as website can navigate themselves and clear or manipulate their form field contents, cookies, scroll positions, and/or other session-restore-stored data while in the background. :-(


At what point do we re-visit the storage mechanism here? Because it feels like fundamentally we're fighting how JSON works: it's always an all/nothing update when you write a new file, and if the new file is tens or hundreds of MB then doing that every 15s is just a losing proposition, even if we optimize how we stringify and help the CPU and jank bits with that.

Even if we "just" wrote chunks of "these tabs in these windows have updated" in sequential files, that'd be an improvement - we could periodically reconcile the set of files entirely off-main-thread (though we'd have to be careful about sequencing that with new writes), as a kind of poor man's "database" - as then all the JSON reading, parsing and writing could live OMT

Stepping back, I don't know why (compressed) json was chosen as the storage format here back in the Firefox 3/4 days, and if/when we should consider switching to sqlite or something similar. Marco, do you remember and/or do you have sage advice given I've uttered the words "sqlite"? :-)

Flags: needinfo?(mak)
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(emilio)

Serializing to json on a different thread may be a problem for js, but we could use an external Rust library, like serde_json? Though, there's still that need to access the data to save it on disk, that likely means creating a const copy of it? IOUtils could grow a method receiving a js object, copy and serialize it off the main thread. This would be a minimum effort I think, not many code changes required. For a short term solution it could be fine.

Assuming we care only about the serialization problem and ignore memory usage concerns (thus retain the current system with in-memory graph), ideally we'd like to avoid serialization, just dump a memory graph to disk and retrieve it. There may be libraries managing a graph in memory and able to dump/retrieve it from disk efficiently. Those are more likely to use a binary format (compressed memory dump?), that could potentially make it harder to just copy urls from a corrupt file. Though we may have thrown away easy recovery with lz4 already. This would be a medium effort, with more code changes necessary, provided we find a good library.
Is memory usage a concern?

If we also care about reducing memory usage (stop having a memory object), we need a graph on disk to:

  • read the whole graph to restore everything
  • read part of the graph to restore something
  • write part of the graph to update something

Sqlite is a possibility, but it feels again like we put all our eggs into the same basket for lack of better options.
My concerns are around:

  • detecting corruption, we'd need to notice corruption on reads and writes and act accordingly, not impossible but it's a project on itself.
  • it's not optimized as a graph db. Alternatives like https://github.com/kuzudb/kuzu exist, but that seems an overkill for the use cases above.
  • in most cases we read the whole data on startup, that doesn't seem exactly what an rdbms is optimized for.

This may be a huge effort, as we'd have to rewrite everything around an async API. I'd indeed avoid concurrent synchronous and asynchronous access to the db, for many reasons that are still causing us headaches.
Again, there may be better solutions in the market supporting those few operations with less overhead. Though I don't know them.

Flags: needinfo?(mak)

(In reply to Marco Bonardo [:mak] from comment #22)

Serializing to json on a different thread may be a problem for js, but we could use an external Rust library, like serde_json? Though, there's still that need to access the data to save it on disk, that likely means creating a const copy of it? IOUtils could grow a method receiving a js object, copy and serialize it off the main thread. This would be a minimum effort I think, not many code changes required. For a short term solution it could be fine.

The disk writes are already off-mainthread. Serializing the JS object on the mainthread is the expensive bit, the data from previous effort here (see references to structuredClone above, esp from asuth's comments in https://bugzilla.mozilla.org/show_bug.cgi?id=1537694#c25 and https://bugzilla.mozilla.org/show_bug.cgi?id=1546847#c17 ) suggests - though nobody has done like-for-like tests! - that "just" copying the JS object to make it usable on another thread was also expensive.

Is memory usage a concern?

Not for the graph as such, I think, though the amount of GC churn from repeatedly generating new copies/instances of the graph or its JSON representation was problematic (especially when we used a worker, so using structuredclone may have similar downsides). And naturally, as we have jank that effectively scales with the (memory) size of the graph, reducing the size of the graph is beneficial in reducing the jank... but it feels like we're missing a smarter solution.

Again, there may be better solutions in the market supporting those few operations with less overhead. Though I don't know them.

Yes... I'd hope that we have options in this realm but I don't know what they are.

(In reply to :Gijs (he/him) from comment #21)

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #19)

(In reply to Emilio Cobos Álvarez (:emilio) from comment #15)

Another potential relatively-low-effort / high-gain thing to do, assuming the graph needs to be built in JS which kinda seems to be the case, is making a WebIDL dictionary with the data, and create a let's say ChromeUtils or IOUtils method to write it to disk.

Yes, I think this is the best magic possible for a fast hand-off of data. I do want to emphasize that having ChromeUtils/IOUtils expose the consuming method is very important here; involving postMessage with this would be strictly worse.

I'm not sure I understand why/how this would be better than serializing from a "raw" object graph. Doesn't the profile have us mostly spend time in memcpy type stuff? Presumably, when sending a writeable webidl dictionary to another thread, we still have to do those copies, as allowing (write) access from both threads would not be safe so we can't pass by reference - unless webidl dictionaries are completely threadsafe? - and the caller has a ref so we can't std:move... (ni for this)

Clear light of morning thought: having the dictionaries themselves be threadsafe probably isn't enough; if we end up writing while the main thread is partway through updating them we could end up with inconsistent/broken data on disk, as the session store code does not expect something to be writing its object graph potentially mid-update-notification from a content process / tabswitch / whatever. It's not a real DB so there aren't transactions or something like that to protect us.

(In reply to :Gijs (he/him) from comment #21)

I'm not sure I understand why/how this would be better than serializing from a "raw" object graph. Doesn't the profile have us mostly spend time in memcpy type stuff? Presumably, when sending a writeable webidl dictionary to another thread, we still have to do those copies, as allowing (write) access from both threads would not be safe so we can't pass by reference - unless webidl dictionaries are completely threadsafe? - and the caller has a ref so we can't std:move... (ni for this)

WebIDL dictionaries are converted to a (thread-safe) C++ object at the point you call into WebIDL. At that point you are no longer pointing to the original JS object, and JS can do whatever, and C++ can do whatever with that object, even moving it to another thread and serialize it to JSON there.

So turning the JS object into WebIDL dictionary ought to be a lot faster than performing the json serialization, specially if we manage to share the big string buffers between JS and C++ (which is possible, specially if those came from C++ to begin with like nsIURI.spec).

Does that answer your question?

Flags: needinfo?(emilio) → needinfo?(gijskruitbosch+bugs)

(In reply to Emilio Cobos Álvarez (:emilio) from comment #25)

(In reply to :Gijs (he/him) from comment #21)

I'm not sure I understand why/how this would be better than serializing from a "raw" object graph. Doesn't the profile have us mostly spend time in memcpy type stuff? Presumably, when sending a writeable webidl dictionary to another thread, we still have to do those copies, as allowing (write) access from both threads would not be safe so we can't pass by reference - unless webidl dictionaries are completely threadsafe? - and the caller has a ref so we can't std:move... (ni for this)

WebIDL dictionaries are converted to a (thread-safe) C++ object at the point you call into WebIDL. At that point you are no longer pointing to the original JS object, and JS can do whatever, and C++ can do whatever with that object, even moving it to another thread and serialize it to JSON there.

So turning the JS object into WebIDL dictionary ought to be a lot faster than performing the json serialization, specially if we manage to share the big string buffers between JS and C++ (which is possible, specially if those came from C++ to begin with like nsIURI.spec).

Does that answer your question?

Emilio and I discussed a bit more on matrix. Unfortunately URI.spec being accessed by JS currently still leads to copies. bug 1576076 will help there but it may not be sufficient. The biggest strings in the data right now are likely the page and favicon URIs, followed by form data and cookies and the like (the latter may be optimizable in this way, I haven't checked). Even so, Emilio expects that using webidl here would still be faster. Of course, the proof would be in the proverbial pudding. Coming up with all the requisite typed data structures here would be non-trivial, and it's not yet entirely clear how big an improvement we'd get. That said, having schemas encoded in webidl dictionaries would still be helpful in terms of having the sessionstore file format effectively documented in a way that it isn't right now, so it may be useful work in that sense either way.

So I think it would make sense to at least try to start defining explicitly what the data structure here looks like. I (and I expect Emilio) can help with the webidl side of things if that's useful; Worst case, we end up finding out that the improvement is limited (for now), but either way we'd end up with better documentation and perhaps better stability if we have a more definite understanding of the moving pieces here. Sam/Sarah, does the team have cycles for this in the near-ish term?

Flags: needinfo?(sfoster)
Flags: needinfo?(sclements)
Flags: needinfo?(gijskruitbosch+bugs)

(In reply to :Gijs (he/him) from comment #26)

So I think it would make sense to at least try to start defining explicitly what the data structure here looks like. I (and I expect Emilio) can help with the webidl side of things if that's useful; Worst case, we end up finding out that the improvement is limited (for now), but either way we'd end up with better documentation and perhaps better stability if we have a more definite understanding of the moving pieces here. Sam/Sarah, does the team have cycles for this in the near-ish term?

We are trying to carve out some session store time in the next quarter, and this seems like a good candidate. So yes, this is a useful discussion to have at this time.

Flags: needinfo?(sfoster)

For what it's worth, I did a mozregression investigation and found that my STR from bug 1867161 experienced a substantial regression (from ~fine to current levels of badness) way back in 2018.

My STR: open the 500 tabs that all use the 200KB bugzilla.m.o favicon, served by a local web server; wait for those tabs to finish loading and for web requests to stop; then, in a https://paste.mozilla.org tab, hold down the a key and watching the character repeat indefinitely to fill a textbox. Watch for half-second jank.

Good: 2018-10-05
Bad: 2018-10-06
Regression range:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=54cb6a2f028b033ef567f00af2f82f5fb97ab437&tochange=07c609fc8eb89140f602ef0c838b900c2964287a

  • Before this range, these STR produce occasional small hiccups but nothing that seems particularly janky, and often I can hold down a for tens of seconds without any noticeable pausing.
  • After this range, there's a very-noticeable half-second jank every 15s (as in current nightly)

Adding more "stress testing" to my STR: if I switch from the 200KB favicon to this 6MB one: then, when I'm mashing my a key, the "good" build sees ~3s jank every 15 seconds, whereas the "bad" build janks for ~30s (so the periodic reserialization is ~10x worse).

I'm not sure what in there would be responsible; maybe it was Bug 1493150, which changed string deserialization? (apparently an optimization, but maybe this particular setup is a perf cliff in the new optimized world)

(In reply to Daniel Holbert [:dholbert] from comment #28)

Adding more "stress testing" to my STR: if I switch from the 200KB favicon to this 6MB one

(er, that favicon is actually larger than what current Nighlty will accept, so it's not useful for stress testing anymore. I'll post a tarball with a still-usable heavier icon, and clearer methodology, over on my dupe Bug 1867161, to avoid cluttering this bug up too much.)

Let me throw out some additional ideas.
We could potentially use a page-icon: uri when we restore a tab without a defined icon. Then, if the profile is not in permanent PB mode and history is enabled, we could avoid storing the favicon. That would fetch the favicon from Places on restore. Advantages: we save a lot of bytes. Disadvantages: the risk of not having an icon increases a bit, porting the session to a different profile may miss a lot of icons (This is probably the deal breaker), icon may miss indicators (like count of unread mails).
We could store the favicon screen representation instead of the downloaded favicon data, potentially in webp format, if it's saving space. Advantages: we store smaller optimized images. Disadvantage: we must encode an image per each tab.

(In reply to Daniel Holbert [:dholbert] from comment #28)

For what it's worth, I did a mozregression investigation and found that my STR from bug 1867161 experienced a substantial regression (from ~fine to current levels of badness) way back in 2018.

My STR: open the 500 tabs that all use the 200KB bugzilla.m.o favicon, served by a local web server; wait for those tabs to finish loading and for web requests to stop; then, in a https://paste.mozilla.org tab, hold down the a key and watching the character repeat indefinitely to fill a textbox. Watch for half-second jank.

Good: 2018-10-05
Bad: 2018-10-06
Regression range:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=54cb6a2f028b033ef567f00af2f82f5fb97ab437&tochange=07c609fc8eb89140f602ef0c838b900c2964287a

  • Before this range, these STR produce occasional small hiccups but nothing that seems particularly janky, and often I can hold down a for tens of seconds without any noticeable pausing.
  • After this range, there's a very-noticeable half-second jank every 15s (as in current nightly)

Adding more "stress testing" to my STR: if I switch from the 200KB favicon to this 6MB one: then, when I'm mashing my a key, the "good" build sees ~3s jank every 15 seconds, whereas the "bad" build janks for ~30s (so the periodic reserialization is ~10x worse).

I'm not sure what in there would be responsible; maybe it was Bug 1493150, which changed string deserialization? (apparently an optimization, but maybe this particular setup is a perf cliff in the new optimized world)

Let's ask Tom. :-)

Flags: needinfo?(tschuster)

The linked bug does look suspicious, but structured cloning isn't used for JSON.stringify. Even InlineCharBuffer doesn't seem to be used directly at least. It's still possible, but it's not a totally obvious connection.
The only thing I could think of right now is that we don't deflate the string (turn 16bit -> 8bit) anymore when reading the structured clone data and we might have to inflate the buffer used to hold the serialized JSON string when quoting a string (coming from structed cloning somehow) in JSON: https://searchfox.org/mozilla-central/rev/033cc91cd088a992ab8728bab9821624cee62ce5/js/src/builtin/JSON.cpp#177-179. (Somebody should check if that happens often) The profile in bug 1867161 comment 4 at least points to that function, but it's also quite possible that the data we are serializing is just mainly strings (e.g. URLs).

I am going to needinfo Steve, because he worked on optimizing JSON.stringify recently (bug 1837410).

Flags: needinfo?(tschuster) → needinfo?(sphink)

The only thing I could think of right now is that we don't deflate the string (turn 16bit -> 8bit) ..

I just noticed that we have variants for encoding 8bit and 16bit strings in structured data, that makes that explanation a bit more unlikely.

My fear was that this was serializing most of the string using the new fast path, then bailing out and redoing it with the slower path, but that's not happening—it's staying completely in the fast path in the profile above.

Eliminating all of the memory copying looks like it could potentially double the speed here, which doesn't solve the architectural issues but would certainly be nice. It looks like half the memory copying is when building the JSON string, and half when copying it into the nsAString result. Perhaps the latter could be eliminated by having the result Adopt the JSON string data? Unless that string has stuff before and after the main JSON data, but that seems relatively easy to work around.

Eliminating copying while building the string would be more invasive. I can think of two approaches: (1) pass in a callback for writing the output immediately, or (2) have a mode that constructs a nested rope instead of a string, and then add an interface or interfaces for traversing a rope. (2) would not automatically deduplicate common strings, although there would be a natural place to inject a deduplication lookup if desired. Also, if the majority of the strings encountered require no escaping, we could probably accelerate both approaches with an optimistic scan for chars that need escaping (since currently anything that requires quoting will copy the data at least once).

(In reply to :Gijs (he/him) from comment #21)

Even if we "just" wrote chunks of "these tabs in these windows have updated" in sequential files, that'd be an improvement - we could periodically reconcile the set of files entirely off-main-thread (though we'd have to be careful about sequencing that with new writes), as a kind of poor man's "database" - as then all the JSON reading, parsing and writing could live OMT

Heh, sounds like a DB's WAL. Maybe this is what you're talking about, but to me this suggests: have a worker thread that maintains a consistent snapshot of a JS-only object graph and periodically serializes it to JSON and writes it to disk. Ship updates (the WAL) from the main thread to this worker thread. Is it really possible to capture all updates to the object graph? Maybe I misunderstood what updates you were talking about—was it the actual changes to the object graph, or just a list of tabs that have updated?

Flags: needinfo?(sphink)
See Also: 1866237

I was able to capture a profile of my STR before and after the comment 28 regression (more details in bug 1867161 comment 3), by profiling using mstange's samply tool (I haven't yet been able to get the built-in Gecko profiler to work in those Nightlies yet, but samply bypasses those issues). Due to using an external profiler, the profiles have no markers and no JS function names, but we do have C++ symbols which is pretty good in this case, given that the suspected regressor is a C++ JS-engine change (bug 1493150).

"good" 2018-10-05: https://share.firefox.dev/3t8PE8v
"bad" 2018-10-06: https://share.firefox.dev/419tVtD

Both profiles are focused only on samples in IdleRun since that's the caller of this periodic sessionstore-serialization code. Both profiles show three serialization operations, separated by 15 seconds. In the "before" profile, each operation is ~1s long (3072 samples total). In the "after" profile, each operation is ~7s long (21010 samples total).

Maybe by comparing them we can see some clues about how bug 1493150 might've regressed this use-case?
[EDITED to fix bug number copypaste error, now referencing bug 1493150]

Flags: needinfo?(evilpies)

Both of my prev. comment's profiles spend most of the time in JS_WriteStructuredClone, but much more time there in the "bad" vs. the "good" profile (~18s vs ~1.5s).

One superficial difference that I'm noticing when comparing them: the "bad" profile seems to be spending a higher proportion of its JS_WriteStructuredClone time inside of mozilla::BufferList::AllocateBytes. It spends 5354/17936 samples or roughly 30% of the time, vs. the "good" profile spends 118/1526 samples or roughly 8% of the time (dividing AllocateBytes sample-count by JS_WriteStructuredClone sample-count).

(In reply to Steve Fink [:sfink] [:s:] from comment #34)

(In reply to :Gijs (he/him) from comment #21)

Even if we "just" wrote chunks of "these tabs in these windows have updated" in sequential files, that'd be an improvement - we could periodically reconcile the set of files entirely off-main-thread (though we'd have to be careful about sequencing that with new writes), as a kind of poor man's "database" - as then all the JSON reading, parsing and writing could live OMT

Heh, sounds like a DB's WAL. Maybe this is what you're talking about,

I mean, it's a WAL built out of JSON. If I was a cleverer person I'd give it a snappy name and publish it on the web as a good idea.

To be clear, it's not actually a good idea. It's a terrible idea, it's just slightly less terrible than the thought of completely redesigning the storage system behind session restore, esp. when an RDBMS isn't a good fit for it per Marco's comment. We could potentially investigate a different storage mechanism that is closer to what we do... but our hands have been burned lots of times in the "the storage model for this data is dumb, let's change it" area (mentat, the new cert/psm store thingy, I think some localstorage stuff, the new xulstore / rkv stuff, etc. etc.)
It doesn't sound like there's an obvious, straightforward, small, reliable, license-compatible etc. etc. alternative here. And sticking with JSON helps with the feasibility of implementing it in comparatively limited time etc. vs the time required to make sure that swapping it out entirely for millions upon millions of people isn't going to lose them mission-critical stuff.

but to me this suggests: have a worker thread that maintains a consistent snapshot of a JS-only object graph and periodically serializes it to JSON and writes it to disk. Ship updates (the WAL) from the main thread to this worker thread. Is it really possible to capture all updates to the object graph? Maybe I misunderstood what updates you were talking about—was it the actual changes to the object graph, or just a list of tabs that have updated?

I'm not sure I fully understand the distinction you're drawing. But here are some notes, and maybe this helps and maybe not, in which case feel free to re-ask:

  • writing giant JSON files remains a bad idea, even if we update how we generate the giant JSON files. It's bad for SSD writes and all sorts. See also bug 1304389.
  • there are various way session state changes. Tabs open, close, navigate, move between windows, scroll, update cookies or form data, etc. etc. I honestly don't know how costly it would be to constantly communicate all this to a worker. Logically though, it can't be worse than copying the whole graph to a separate thread every 15s as it's necessarily a superset of the information. And logically, if the worker needs to do the writing, we'll need to communicate the entire delta, not just "this tab changed, figure it out".
  • From a memory PoV it seems inefficient to have a constant-in-memory duplicate of all session restore data. But perhaps we could drop the mainthread copy that session restore keeps at the moment, if the relevant APIs are or can be async, so we can fetch the information from the worker.
  • The unfortunate thing is that JS workers don't generally have access to things we want - not even utility libraries or Services or any other xpcom stuff. OS.File reimplemented require for workers and effectively eval'd a bunch of stuff in order to do this. I have to believe there's a better way to do that, but it means that a priori, "use a worker from JS" also does not fill me with the joys of spring - that's the bad thing we did before that we ripped out because the perf was horrible!

So it's possible bug 1493150 is unrelated, actually -- I just found something else interesting when comparing the actual sessionstore contents from the "good" vs "bad" build -- their sessionstore-backups/recovery.jsonlz4 files are actually substantially different sizes. So this might not be a platform JS regression after all (i.e. bug 1493150 might be irrelevant).

The "good" recovery.jsonlz4 is 8.6M while the "bad" one is 21MB. The extracted JSON is on the same order of magnitude as the compressed data (9.6MB for "good" vs 23MB for "bad"); the bulk of the file is already-compressed PNG data, so lz4 doesn't do much.

In both files, most of the 500 tabs actually are shown with "image": null in both sessionstore files! The key difference is that in the "good" one, 5 of the tabs have "image: [bugzilla favicon data URI]", whereas in the "bad" one, 12 of them have "image: [bugzilla favicon data URI]" (over twice as many).

If I search-and-replace to remove that specific data URI from the sessionstore JSON, then both JSON files end up being ~350KB, roughly the same size and quite small/easy-to-serialize.

So it's really the decision about whether-to-serialize 5 vs 12 repetitions of that data URI that's responsible for the size difference there, and maybe probably that's partly or entirely responsible for the regression that I'm seeing in comment 28.

(Note, the bugzilla favicon that I'm using in this experiment is a little under 1.9MB - I artificially scaled it up to make this bug easier to see (and I'm hosting it on a local web server). So that's why 5 repetitions * 1.9MB produces roughly 9.5 MB of JSON output, vs 12 *1.9 produces 22.8MB of JSON output.)

SO: The question then is, what push made us go from serializing 5 vs. 12 copies of the favicon, in the regression pushlog? My first guess it's from Bug 1470280 - Increase process count to 8 on Nightly (previously 4 processes). 8 isn't quite 12, but that probably lets us load more (maybe 12 for some reason) copies of the background-tab? And all the ones that are "dormant" get serialized with "image": null in the JSON I guess?

(In reply to Daniel Holbert [:dholbert] from comment #38)

SO: The question then is, what push made us go from serializing 5 vs. 12 copies of the favicon, in the regression pushlog? My first guess it's from Bug 1470280 - Increase process count to 8 on Nightly (previously 4 processes). 8 isn't quite 12, but that probably lets us load more (maybe 12 for some reason) copies of the background-tab? And all the ones that are "dormant" get serialized with "image": null in the JSON I guess?

IIUC we look at a pre-fission Firefox here (from 2018) and IIRC we do set aside ("recycle") only one extra process in e10s only. But I might overlook something, in 2018 I was not yet looking at that code at all... In any case what you describe sounds like deduplicating the favicons could reduce the data significantly here (maybe even in memory before serializing anything) ?

Possible correction to my theorizing on "what made us go from serializing 5 vs. 12 copies of the favicon, in the regression pushlog?"

Even in the "good" 2018-10-05 build, if I hold down ctrl+PageDown to view every tab, my recovery.jsonlz4 "only" increases to 39MB with 21 copies of the data URI -- not 500 copies -- and the periodic idle-reserialize time doesn't change, somehow. In my "bad", build, I wasn't able to get a recovery.jsonlz4 file yet; Firefox may have crashed at shutdown while regenerating it, I think (?)

Here are profiles of the good/bad builds with that setup (all of the 500 tabs having been viewed)
"good" 2018-10-05: https://share.firefox.dev/481K8U1 (1s per serialization)
"bad" 2018-10-06: https://share.firefox.dev/3Rojzmx (7s per serialization)

The durations look roughly the same as in comment 35 (where I hadn't viewed every tab). So how-many-tabs-have-been-viewed-or-loaded-via-some-processCount-based-minimum doesn't seem to be a relevant difference after all.

Also: If I run the "good" build with an explicit processCount of 8 set up-front in prefs.js (and confirm 8 content processes shown in a profile), that doesn't seem to change its metrics; its serialize operation is still ~1s.

So comment 38 and the process-how-many-tabs-have-you-viewed metric may not actually be relevant, sorry.

(In reply to Jens Stutte [:jstutte] from comment #39)

In any case what you describe sounds like deduplicating the favicons could reduce the data significantly here (maybe even in memory before serializing anything) ?

That's very likely true, yeah. That's bug 1867137.

Both of my prev. comment's profiles spend most of the time in JS_WriteStructuredClone, but much more time there in the "bad" vs. the "good" profile (~18s vs ~1.5s).

Weird. My patch modified reading not writing structured cloning data.

Flags: needinfo?(evilpies)

(In reply to Daniel Holbert [:dholbert] from comment #40)

Possible correction to my theorizing on "what made us go from serializing 5 vs. 12 copies of the favicon, in the regression pushlog?"

Even in the "good" 2018-10-05 build, if I hold down ctrl+PageDown to view every tab, my recovery.jsonlz4 "only" increases to 39MB with 21 copies of the data URI -- not 500 copies -- and the periodic idle-reserialize time doesn't change, somehow. In my "bad", build, I wasn't able to get a recovery.jsonlz4 file yet; Firefox may have crashed at shutdown while regenerating it, I think (?)

Here are profiles of the good/bad builds with that setup (all of the 500 tabs having been viewed)
"good" 2018-10-05: https://share.firefox.dev/481K8U1 (1s per serialization)
"bad" 2018-10-06: https://share.firefox.dev/3Rojzmx (7s per serialization)

Do these profiles still correspond to files that are different sizes, or are these now writing similarly-sized files? Any chance something like https://www.jsondiff.com/ or piping through jq locally can identify differences between the two files? (even a similar-ish size may have different processing times if there are different kinds of objects nested in them or something)

Also, can you elaborate on what you mean by:

the periodic idle-reserialize time doesn't change, somehow

? Are you saying the "good" case isn't writing to disk every 15s or so but the "bad" case is ?

Also, are there any new tab (blank) pages in there? There appear to have been some activity stream / new tab changes in the pushlog you linked. Replacing them with about:blank may be helpful to isolate if that's related (thinking that if those are, for whatever reason, tripping the "something changed" session restore hook in the bad case, but not in the good case, that could potentially explain the jankiness).

Flags: needinfo?(dholbert)

Based on bug 1870100, SessionStore also seems to slow down window switching :/
Not blocking the next paint when switching windows should be a relatively easy fix I would hope, so that might be worth having a separate bug for. Edit: Or we can just repurpose bug 1870100.

See Also: → 1870245
No longer blocks: 1870100
See Also: → 1546847

(In reply to :Gijs (he/him) from comment #42)

(In reply to Daniel Holbert [:dholbert] from comment #40)

[...] in the "good" 2018-10-05 build, if I hold down ctrl+PageDown to view every tab, my recovery.jsonlz4 "only" increases to 39MB with 21 copies of the data URI -- not 500 copies -- and the periodic idle-reserialize time doesn't change, somehow. In my "bad", build, I wasn't able to get a recovery.jsonlz4 file yet; Firefox may have crashed at shutdown while regenerating it, I think (?)

Do these profiles still correspond to files that are different sizes, or are these now writing similarly-sized files? Any chance something like https://www.jsondiff.com/ or piping through jq locally can identify differences between the two files? (even a similar-ish size may have different processing times if there are different kinds of objects nested in them or something)

As noted there, in the "bad" build, I thought I wasn't getting any recovery.jsonlz4 file after I viewed all tabs. I think it was really just 4KB which I thought was unusually small and I assumed it was corrupt/truncated. Trying again today, I'm still seeing that same behavior.

After viewing all the tabs with Ctrl+PageDown (as a modification to the steps in bug 1867161 comment 3), the "Good" 2018-10-05 build gets me a 31MB recovery.jsonlz4 file (34MB when extracted), while the "bad" build gets me a 4KB recovery.jsonlz4 file (72KB when extracted). The "good" build's JSON has 18 copies of the bugzilla favicon data URI, and the bad build has zero copies. If I use search-and-replace to replace all of those data-URIs with just the string data-uri, then the good build's recovery JSON drops to 308KB, with each saved tab looking like this:

{"entries": [{"url": "http://localhost:8999/", "title": "http://localhost:8999/", "cacheKey": 0, "ID": 1, "docshellUUID": "{54a7828e-a8a7-4953-ab08-38383ac5a886}", "originalURI": "http://localhost:8999/", "resultPrincipalURI": null, "principalToInherit_base64": "vQZuXxRvRHKDMXv9BbHtkAAAAAAAAAAAwAAAAAAAAEYAAAA4bW96LW51bGxwcmluY2lwYWw6ezdiMjhhMzc2LWE5YWQtNDgwZi1iMWFhLTRiYmU0MWNhYmJhOH0AAAAA", "triggeringPrincipal_base64": "SmIS26zLEdO3ZQBgsLbOywAAAAAAAAAAwAAAAAAAAEY=", "docIdentifier": 1, "persist": true}], "lastAccessed": 1702672395485, "hidden": false, "attributes": {}, "userContextId": 0, "index": 1, "image": "data-uri"}, 

(with image:null for most of the entries, but 18 having "data-uri" which was originally the full data URI)

The "Bad" build's JSON has each saved tab instead looking like this:

{"entries": [], "lastAccessed": 1702672623672, "hidden": false, "attributes": {}, "image": null, "userTypedValue": "http://localhost:8999", "userTypedClear": 1}, 

On another attempt where I didn't view every tab, the "Bad" build's JSON was a bit bigger (208KB when extracted) and it looked more similar to the "good" builds output, but still without any data URIs:

{"entries": [{"url": "http://localhost:8999/", "title": "http://localhost:8999/", "cacheKey": 0, "ID": 2, "docshellUUID": "{825d8691-cf43-4dfa-8def-3c0aa1bf7173}", "originalURI": "http://localhost:8999/", "resultPrincipalURI": null, "principalToInherit_base64": "vQZuXxRvRHKDMXv9BbHtkAAAAAAAAAAAwAAAAAAAAEYAAAA4bW96LW51bGxwcmluY2lwYWw6e2M5NThlZjk0LTgxMDItNDBkNi05Mzg0LTViNzdlY2IzNGI5Zn0AAAAA", "triggeringPrincipal_base64": "SmIS26zLEdO3ZQBgsLbOywAAAAAAAAAAwAAAAAAAAEY=", "docIdentifier": 2, "persist": true}], "lastAccessed": 1702674579980, "hidden": false, "attributes": {}, "userContextId": 0, "index": 1, "image": null}, 

I'm also not entirely sure these recovery.jsonlz4 files are really indicative of what we're spending our time on here, though, because (a) their last-modified timestamp (shown with ls -l) seems to correspond to a time close to startup, rather than further along if I let Firefox run for a few minutes while I'm testing; and (b) I don't see paste.mozilla.org in there at all for some reason, even though that's one of my open tabs and that's where I'm doing the typing that is prompting Firefox to reserialize its sessionstore data.

Also, can you elaborate on what you mean by:

the periodic idle-reserialize time doesn't change, somehow
? Are you saying the "good" case isn't writing to disk every 15s or so but the "bad" case is ?

No, I meant they're still both writing to disk every 15s, with the same durations as noted in comment 35. Every 15s of typing, the "good" build is still janking for ~1s and the bad build is still janking for ~7s. (I only mentioned this as being notable because I was expecting that the durations might change if I viewed all the tabs; but it seems viewing the tabs doesn't make a difference to the jank durations.)

Also, are there any new tab (blank) pages in there? There appear to have been some activity stream / new tab changes in the pushlog you linked. Replacing them with about:blank may be helpful to isolate if that's related (thinking that if those are, for whatever reason, tripping the "something changed" session restore hook in the bad case, but not in the good case, that could potentially explain the jankiness).

No, there aren't any new-tab pages in my list of open tabs (blank or otherwise).

Flags: needinfo?(dholbert)
See Also: → 464350
Duplicate of this bug: 1759085
Duplicate of this bug: 1827433
See Also: → 1876766

This happens every few months when they put out an update and lasts about a month or few until another update fixes it somehow until they end up breaking it again.

Duplicate of this bug: 1876783
No longer duplicate of this bug: 1876783
See Also: 1876766
Duplicate of this bug: 1821535

I duped my bug, which has a profile: https://share.firefox.dev/3J3EZQw

There was a lot of older work and discussion of options ~13 years ago, especially in bug 669603 and bug 669034 ("[meta] Re-architect session restore to avoid periodic freezes"). Perhaps some of those ideas might still be interesting/useful.

Flags: needinfo?(sclements)

Here is another profile: https://share.firefox.dev/3KwJ80k
It records 76 seconds of idling, and it includes 4 instances of attempts to serialize the whole session state, each taking a bit over a half second to complete. Despite idling, the file is written every 15 seconds while only changing the lastUpdate timestamp. It seems that it should be feasible to optimize for this case: avoid writing if the state is known to not have changed.

The session covers 2666 tabs (7624 entries), mostly in one window.
Compressed recovery.jsonlz4 file is 20 MB.
Uncompressed it is 96 MB (with https://github.com/avih/dejsonlz4 ).

After decompressing and pretty-printing sessionstore-backups/recovery.jsonlz4 and recovery.baklz4, and then diffing them, then the only difference is the lastUpdate field within, generated at https://searchfox.org/mozilla-central/rev/dc08a7321a22918d5a26b7641f9b10cd2a09d98e/browser/components/sessionstore/SessionStore.sys.mjs#4663

Example of commands:

dejsonlz4 recovery.jsonlz4 rec.json
jq . rec.json > pretty.rec.json

dejsonlz4 recovery.baklz4 bak.json
jq . bak.json > pretty.bak.json

diff pretty.rec.json pretty.bak.json

Output:

227640c227640
<     "lastUpdate": 1717413670478,
---
>     "lastUpdate": 1717413686335,

(In reply to Rob Wu [:robwu] from comment #51)

Here is another profile: https://share.firefox.dev/3KwJ80k
Despite idling, the file is written every 15 seconds while only changing the lastUpdate timestamp. It seems that it should be feasible to optimize for this case: avoid writing if the state is known to not have changed.

In this profile, the unloadNonRequiredTabs/this.unloadTimer<[AsyncTabSwitcher.sys.mjs]:JS timer fires every 300ms (except when there's jank caused by this bug), I don't think that's the intended behavior. https://share.firefox.dev/3Vrt6LR

I think this means https://searchfox.org/mozilla-central/rev/dc08a7321a22918d5a26b7641f9b10cd2a09d98e/browser/components/tabbrowser/AsyncTabSwitcher.sys.mjs#756 is always true in your browsing session. If you have steps to reproduce, I would recommend filing a separate bug.

(In reply to Florian Quèze [:florian] from comment #52)

I think this means https://searchfox.org/mozilla-central/rev/dc08a7321a22918d5a26b7641f9b10cd2a09d98e/browser/components/tabbrowser/AsyncTabSwitcher.sys.mjs#756 is always true in your browsing session. If you have steps to reproduce, I would recommend filing a separate bug.

I filed bug 1900989 with the details that I have.

See Also: → 1304389

(In reply to Marco Bonardo [:mak] from comment #22)

Sqlite is a possibility, but it feels again like we put all our eggs into the same basket for lack of better options.
My concerns are around:

  • detecting corruption, we'd need to notice corruption on reads and writes and act accordingly, not impossible but it's a project on itself.

I'm not sure why you would like to detect corruption in a SQLite database; the database guarantees Consistency in case of crash. Only system problems (RAM,disk) or SQLite implementation bugs would cause corruption that might render the file unreadable.
In any case SQLite has an extension that adds a checksum on every page: https://sqlite.org/cksumvfs.html

Do you really need a graph db for listing tabs? A properly normalized schema should be enough for storing many lists of URLs (each tab with its list of history). Unless I'm missing something, since I don't know the internals of the jsonlz4 file.

  • in most cases we read the whole data on startup, that doesn't seem exactly what an rdbms is optimized for.

Indeed, the CPU time might be larger for reading and putting together the whole normalized DB. Still 10K rows (the number of tabs the OP has) is really a very small table in today's standards, and the schema could be optimised for keeping the history of each tab on separate table and not reading it until accessed.

This may be a huge effort, as we'd have to rewrite everything around an async API. I'd indeed avoid concurrent synchronous and asynchronous access to the db, for many reasons that are still causing us headaches.

This is a valid argument that I can't really debute. Mozilla has tons of experience from the places DB, so you know better if you want to use sqlite for more. I just want to mention that a SQLite database configured to be in WAL mode and with PRAGMA synchronous=NORMAL guarantees ACI (no D) semantics, which means it doesn't really fsync() on every change so delays are minimal, but you might lose some (configurable) seconds of changes before a hard crash.

Is memory usage a concern?

If we also care about reducing memory usage (stop having a memory object), we need a graph on disk to:

  • read the whole graph to restore everything
  • read part of the graph to restore something
  • write part of the graph to update something

In fact, an advanced sessionstore implementation with SQLite backend can reduce memory usage, by keeping things in separate tables and not reading everything from sqlite into memory, but only reading what is needed for displaying the windows, and reading/writing small chunks as the user clicks around.

In my view such an implementation would solve the issue in this ticket (frequent hangs while serializing the world) and other issues like I/O intensity and memory usage. I agree it's a huge undertaking though. :-)
Plugging bug 937651 for re-evaluation.

(In reply to Dimitrios Apostolou from comment #54)

Do you really need a graph db for listing tabs? A properly normalized schema should be enough for storing many lists of URLs (each tab with its list of history). Unless I'm missing something, since I don't know the internals of the jsonlz4 file.

Yeah the session store file has state for the open windows, state for all the tabs in those windows, icons, scroll position, form state, history, cookies, session storage etc. etc. Its quite a deep graph and much, much more than the array of urls for each tab you are imagining.

I filed bug 1905958 for creating the schema we've discussed here. I'm not sure if that necessarily blocks any progress on this bug - though it would undoubtedly be useful to have - so I've just made it a see-also for now.

See Also: → 1905958

(In reply to Dimitrios Apostolou from comment #54)

I'm not sure why you would like to detect corruption in a SQLite database; the database guarantees Consistency in case of crash.

Yes, as far as:

  1. memory is sane (that today is not a given, because everyone loves to overclock memory)
  2. filesystem doesn't lie (unfortunately we have evidence some do)
  3. disk is sane

SQLite on itself is totally fine, but we have clear evidence that databases are corrupting, just not often. Regardless we don't want to leave users with a broken experience, even if it's just a few hundreds.
An additional periodic backup, plus corruption callbacks, would likely help there.

In fact, an advanced sessionstore implementation with SQLite backend can reduce memory usage, by keeping things in separate tables and not reading everything from sqlite into memory.

It is a possibility, but it doesn't come for free out of the box. We can surely configure a connection to use smallest possible memory, or even use mmap (though we'd need to intercept I/O error signals). And we should try to avoid having the data repeated both in a live js object and memory pages in sqlite, or we'd double memory use.

In my view such an implementation would solve the issue in this ticket (frequent hangs while serializing the world) and other issues like I/O intensity and memory usage. I agree it's a huge undertaking though. :-)

I think a first step in the right direction would be to ensure all reads and writes to/from session store are asynchronous, that would allow to more easily replace the underlying storage, that couldn't be anymore just a plain js object. I don't know session store code enough to tell how easy that would be.

I was recently pondering about using Sqlite as a way to offload json management. We could basically load json into a memory database, read/write single values or parts of the structure using its json functions, and periodically ask sqlite to write the json (or jsonb) to disk. If we could do all reads and writes async, we'd offload everything to the storage helper thread. This was just some random thoughts without a full plan in mind. Of course it's again "abusing" a tool for something it was not exactly built for.

Thank you for moving on a constructive discussion.

(In reply to Marco Bonardo [:mak] from comment #57)

I think a first step in the right direction would be to ensure all reads and writes to/from session store are asynchronous, that would allow to more easily replace the underlying storage, that couldn't be anymore just a plain js object. I don't know session store code enough to tell how easy that would be.

I believe all the reads are async and all the writes are, too.

(In reply to :Gijs (he/him) from comment #58)

I believe all the reads are async and all the writes are, too.

That's reading and writing the whole object, I was also thinking about read/writes of single values or parts, as the costs here seem to be around generating the representation, rather than just storing it to disk.

I don't think this was mentioned in this thread, but this also causes YouTube to stutter while saving my 41 MB session file.

You need to log in before you can comment on or make changes to this bug.