[Session Restore] Split collect and write across distinct ticks

RESOLVED WONTFIX

Status

()

defect
RESOLVED WONTFIX
5 years ago
4 years ago

People

(Reporter: Yoric, Unassigned)

Tracking

({perf})

unspecified
Points:
2
Bug Flags:
firefox-backlog +
qe-verify ?

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

At the moment, SessionSaver:
1. collects data;
2. serializes and writes it.

Telemetry 95h percentile, that step 1. can take 30ms+, while step 2. can take 20ms+. Each steps, taken independently, is ok, but since we execute both steps immediately after each other, this causes jank.

We should ensure that these steps are executed across distinct ticks, to reduce jank.
Blocks: fxdesktopbacklog
No longer blocks: fxdesktoptriage
Whiteboard: p=0
No longer blocks: fxdesktopbacklog
Flags: firefox-backlog+
Whiteboard: p=0 → p=2
Posted patch Draft (obsolete) — Splinter Review
Here's a first draft. Do we need to add some locking to make it play nicely with async shutdown?
Assignee: nobody → dteller
Attachment #8417420 - Flags: feedback?(ttaubert)
I slightly reorganized SessionSaver.run() so that:
- collecting, serializing and writing are split across three ticks;
- we now use AsyncShutdown to ensure that _saveState cannot be interrupted by shutdown, which will give us more breathing space should we decide to split further.

A natural followup would be to take advantage of this asynchronicity to reduce the duration of the longest serialization/communication operation to ~0.
Attachment #8417420 - Attachment is obsolete: true
Attachment #8417420 - Flags: feedback?(ttaubert)
Attachment #8421677 - Flags: review?(ttaubert)
Keywords: perf
Comment on attachment 8421677 [details] [diff] [review]
Split collect, serialize and write across three ticks.

Review of attachment 8421677 [details] [diff] [review]:
-----------------------------------------------------------------

David, thank you for working on that. Telemetry however tells us that collecting data seems to be the biggest offender, as well as serializing it to JSON. I wonder, why are we trying to come up with band-aid fixes instead of concentrating on sending differential updates to the worker? That would certainly lower the serialization overhead (or we just stop doing that on the main thread), serialize the whole state OMT and get rid of explicit state collection. And we want/need that for journaled storage in the future anyway.

::: browser/components/sessionstore/src/SessionFile.jsm
@@ +164,5 @@
>        // write instructions.
>        isFinalWrite = this._isClosed = true;
>      }
>  
> +    return Task.spawn(function task() {

Could as well make it |function* task() {| if we touch that line.

::: browser/components/sessionstore/src/SessionSaver.jsm
@@ +208,5 @@
>      PrivacyFilter.filterPrivateWindowsAndTabs(state);
>  
> +    // As we are not sure that we are the only ones touching `state`,
> +    // we need to remove any possible aliasing.
> +    state = Cu.cloneInto(state, {});

Cloning the whole state seems costly. Measurements in bug 972243 suggest that this may have a non-negligible overhead, esp. for big sessions.

@@ +247,5 @@
>  
> +        stopWatchFinish("COLLECT_DATA_MS", "COLLECT_DATA_LONGEST_OP_MS");
> +
> +
> +        yield this._writeState(state);

Why can't we just call _writeState() off of the next tick if we wanted to split these steps? That means we could do that without touching/rewriting the whole function.

@@ +307,3 @@
>     */
>    _writeState: function (state) {
> +    return Task.spawn(function* () {

Splitting _writeState() into multiple chunks seems like optimizing for the 2%.

Serializing the data [1] takes more than 29ms only for ~2% of users with Telemetry enabled. ~6% are between 12 and 29ms. This doesn't seem to be a problem for most of our users? Sending the serialized data to the worker takes more than 8ms for only ~2% of Telemetry users.

[1] http://telemetry.mozilla.org/#nightly/32/FX_SESSION_RESTORE_SERIALIZE_DATA_MS
[2] http://telemetry.mozilla.org/#nightly/32/FX_SESSION_RESTORE_SEND_SERIALIZED_STATE_LONGEST_OP_MS
Attachment #8421677 - Flags: review?(ttaubert)
(In reply to Tim Taubert [:ttaubert] from comment #3)
> Comment on attachment 8421677 [details] [diff] [review]
> Split collect, serialize and write across three ticks.
> 
> Review of attachment 8421677 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> David, thank you for working on that. Telemetry however tells us that
> collecting data seems to be the biggest offender, as well as serializing it
> to JSON. I wonder, why are we trying to come up with band-aid fixes instead
> of concentrating on sending differential updates to the worker? That would
> certainly lower the serialization overhead (or we just stop doing that on
> the main thread), serialize the whole state OMT and get rid of explicit
> state collection. And we want/need that for journaled storage in the future
> anyway.

Well, according to the data in comment 0, we have (in the 95th percentile) ~50ms spent on collect + serialize + write. This quick fix is self-contained can hopefully reduce the longest operation to 30-40 milliseconds (depending on the duration of Cu.cloneInto).

The difficulty of sending differential updates is much steeper. I might be able to look at it next term, after discussing it with both you and Vladan. However, my sentiment is that landing the present bug would be kind of sufficient to remove the need to work on differential updates in a close future.

> ::: browser/components/sessionstore/src/SessionSaver.jsm
> @@ +208,5 @@
> >      PrivacyFilter.filterPrivateWindowsAndTabs(state);
> >  
> > +    // As we are not sure that we are the only ones touching `state`,
> > +    // we need to remove any possible aliasing.
> > +    state = Cu.cloneInto(state, {});
> 
> Cloning the whole state seems costly. Measurements in bug 972243 suggest
> that this may have a non-negligible overhead, esp. for big sessions.

Indeed, we win only if this clearly faster than JSON.stringify on the same object.

> 
> @@ +247,5 @@
> >  
> > +        stopWatchFinish("COLLECT_DATA_MS", "COLLECT_DATA_LONGEST_OP_MS");
> > +
> > +
> > +        yield this._writeState(state);
> 
> Why can't we just call _writeState() off of the next tick if we wanted to
> split these steps? That means we could do that without touching/rewriting
> the whole function.

Because we have no guarantee that `state` won't be touched in the meantime by any event handler, do we?
Now, we could add a lock "don't touch any data structure while we're writing". It's just more complicated.


> @@ +307,3 @@
> >     */
> >    _writeState: function (state) {
> > +    return Task.spawn(function* () {
> 
> Splitting _writeState() into multiple chunks seems like optimizing for the
> 2%.

Certainly, the largest win is yield just after "// Collection is complete. We may now be safely interrupted." in `_saveState`. This additional cut is basically because it's free.
Flags: needinfo?(ttaubert)
To get the final word on performance, we could land this, wait for 1-2 weeks and revert it if it regresses COLLECT_DATA_LONGEST_OP & co.
(In reply to David Rajchenbach Teller [:Yoric] from comment #4)
> > > +    // As we are not sure that we are the only ones touching `state`,
> > > +    // we need to remove any possible aliasing.
> > > +    state = Cu.cloneInto(state, {});
> > 
> > Cloning the whole state seems costly. Measurements in bug 972243 suggest
> > that this may have a non-negligible overhead, esp. for big sessions.
> 
> Indeed, we win only if this clearly faster than JSON.stringify on the same
> object.

Well before your patch we were calling JSON.stringify() once, now we're additionally doing a structured clone. That doesn't feel better to me?

> > Why can't we just call _writeState() off of the next tick if we wanted to
> > split these steps? That means we could do that without touching/rewriting
> > the whole function.
> 
> Because we have no guarantee that `state` won't be touched in the meantime
> by any event handler, do we?
> Now, we could add a lock "don't touch any data structure while we're
> writing". It's just more complicated.

Yeah, no locks, please :)

> Certainly, the largest win is yield just after "// Collection is complete.
> We may now be safely interrupted." in `_saveState`. This additional cut is
> basically because it's free.

It doesn't feel free to me judging by how much code we have to touch for that. Looks more complicated too.
Flags: needinfo?(ttaubert)
(In reply to Tim Taubert [:ttaubert] from comment #6)
> Well before your patch we were calling JSON.stringify() once, now we're
> additionally doing a structured clone. That doesn't feel better to me?

Well, the hope is that Cu.cloneInto is faster than JSON.stringify. That needs to be proved, of course. Alternatively, I have a (quite hackish) idea to lock things until we have finished cloning asynchronously.

> > Certainly, the largest win is yield just after "// Collection is complete.
> > We may now be safely interrupted." in `_saveState`. This additional cut is
> > basically because it's free.
> 
> It doesn't feel free to me judging by how much code we have to touch for
> that. Looks more complicated too.

Well, the big change is putting everything in a `Task.spawn`.
(In reply to David Rajchenbach Teller [:Yoric] from comment #7)
> (In reply to Tim Taubert [:ttaubert] from comment #6)
> > Well before your patch we were calling JSON.stringify() once, now we're
> > additionally doing a structured clone. That doesn't feel better to me?
> 
> Well, the hope is that Cu.cloneInto is faster than JSON.stringify. That
> needs to be proved, of course. Alternatively, I have a (quite hackish) idea
> to lock things until we have finished cloning asynchronously.

But with the patch we're calling Cu.cloneInto() in _saveState() and JSON.stringify() in _writeState(). So we're increasing the cost here as a whole. The first chunk should now also take longer than it did before where we just collected data. The second chunk should remain the same.

I'd really rather have us tackle the right solution than to fumble around in the dark and discuss wallpaper patches :/
(In reply to Tim Taubert [:ttaubert] from comment #8)
> (In reply to David Rajchenbach Teller [:Yoric] from comment #7)
> > (In reply to Tim Taubert [:ttaubert] from comment #6)
> > > Well before your patch we were calling JSON.stringify() once, now we're
> > > additionally doing a structured clone. That doesn't feel better to me?
> > 
> > Well, the hope is that Cu.cloneInto is faster than JSON.stringify. That
> > needs to be proved, of course. Alternatively, I have a (quite hackish) idea
> > to lock things until we have finished cloning asynchronously.
> 
> But with the patch we're calling Cu.cloneInto() in _saveState() and
> JSON.stringify() in _writeState(). So we're increasing the cost here as a
> whole. The first chunk should now also take longer than it did before where
> we just collected data. The second chunk should remain the same.

Scenario 1 (with Cu.cloneInto):
* we bring down the critical duration from 50ms to 30ms + x, where x is the duration of `Cu.cloneInto`;
* we bring up the total duration from 50ms to 50ms + x.

So if x is faster than JSON.stringify, we win in terms of perceived performance. Otherwise, we lose.

Scenario 2 (with a non-blocking lock):
* we bring down the critical duration from 50ms to 30ms, with an option to bring it further down;
* total duration stays roughly the same.

> I'd really rather have us tackle the right solution than to fumble around in
> the dark and discuss wallpaper patches :/

Sure, rewriting entirely the collection of data would let us decrease the 30ms collection and the journal would let us decrease the 20ms serialize + write. But that's a much bigger piece of work.
Assignee: dteller → nobody
Points: --- → 2
Flags: qe-verify?
Whiteboard: p=2
Bug died because of lack of momentum/response.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.