Closed Bug 1386867 Opened 3 years ago Closed 3 years ago

3.31% quantum_pageload_google (windows10-64) regression on push 58b579b4ef4e1fb938297bc43a7fc7e4b2168a4a (Wed Aug 2 2017)

Categories

(Firefox :: Session Restore, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Firefox 57
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 --- unaffected
firefox57 blocking fixed

People

(Reporter: jmaher, Assigned: ttaubert)

References

Details

(Keywords: perf, regression, talos-regression)

Attachments

(1 file, 1 obsolete file)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=58b579b4ef4e1fb938297bc43a7fc7e4b2168a4a

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  3%  quantum_pageload_google summary windows10-64 opt e10s stylo     527.56 -> 545.04


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=8494

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Component: Untriaged → Session Restore
this appears to be a similar regression for the non stylo case, as well as the pgo case.

:mystor, I see that you authored 2 of the 3 patches in the bug, could you take a look at this and determine why this is happening and help figure out a resolution?
Flags: needinfo?(michael)
I wrote two of the patches, but they were minor fixups - ttaubert probably will have a better idea of what's going on here.
Flags: needinfo?(michael) → needinfo?(ttaubert)
Ok let's start with the obvious: I have no idea. I profiled talos locally and it looks like the only thing we're doing differently and that Google is heavy on might be DOMSessionStorage?

1) Here's the backout as a baseline:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=be582e2778b1bbb0e60a0fb1fc912207bf39d4e0

2) A tiny modification:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ee522a878a2022a8ee73a7a8da8c773279014596
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=be582e2778b1bbb0e60a0fb1fc912207bf39d4e0&newProject=try&newRevision=ee522a878a2022a8ee73a7a8da8c773279014596&framework=1&filter=tp6_google&showOnlyImportant=0

[No improvement here.]

3) Commenting out SessionStore.collect():

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e138ccaf815ba905d1e3dd1886a5a0afede4f542
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=be582e2778b1bbb0e60a0fb1fc912207bf39d4e0&newProject=try&newRevision=e138ccaf815ba905d1e3dd1886a5a0afede4f542&framework=1&filter=tp6_google&showOnlyImportant=0

[Surprisingly, no improvement here either.]

4) For fun, deactive the SessionStorageListener:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b580b8cbbf5dc7d1f089733453a90a0791fdc56b
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=be582e2778b1bbb0e60a0fb1fc912207bf39d4e0&newProject=try&newRevision=b580b8cbbf5dc7d1f089733453a90a0791fdc56b&framework=1&filter=tp6_google&showOnlyImportant=0

[That's better than the base revision, but not really something we can land. Probably good to concentrate on DOMSessionStorage some more?]
Flags: needinfo?(ttaubert)
It looks like the MozStorageChanged event handler is somewhat problematic and I don't understand why. Especially not why suddenly after we fixed bug 1373672.

Here's a patch that removes differential updates with "storagechange", that seems to bring us back to the previous tp6_google numbers:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2459dc16c5c66bea2c7c0b8f0cb1a8b5bc4af9da
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=be582e2778b1bbb0e60a0fb1fc912207bf39d4e0&newProject=try&newRevision=2459dc16c5c66bea2c7c0b8f0cb1a8b5bc4af9da&framework=1&filter=tp6_google&showOnlyImportant=0
Michael, wdyt? I've also used the chance to remove `createLazy()`, we don't actually need to cache these functions' results as they're only called once.
Attachment #8894500 - Flags: review?(michael)
BTW, differential updates were introduced, I think, to deal with large DOMSessionStorage messages. These should hopefully not be a problem anymore since bug 1362058 limited what we store to 2KB.
Comment on attachment 8894500 [details] [diff] [review]
0001-Bug-1386867-Fix-quantum_pageload_google-talos-regres.patch

Review of attachment 8894500 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/sessionstore/SessionStorage.jsm
@@ -40,5 @@
>     *         session storage data as strings. For example:
>     *         {"https://example.com^userContextId=1": {"key": "value", "my_number": "123"}}
>     */
> -  collect(docShell, frameTree) {
> -    return SessionStorageInternal.collect(docShell, frameTree);

Were we passing a unused frameTree argument to SessionStorageInternal before?

::: browser/components/sessionstore/content/content-sessionStore.js
@@ -640,5 @@
> -      return;
> -    }
> -
> -    let {url, key, newValue} = event;
> -    let uri = Services.io.newURI(url);

I'm guessing that this callback (probably URL parsing is the most expensive part if I had to guess from eyeballing this) is the expensive bit here?

I don't really like removing the partial session storage updating because I worry it will hurt real-world usage in favour of helping this talos test. I can see why doing this for every SessionStorage update callback could slow down the browser though.

We could probably use native code here to keep the partial updates and speed this up, but it's probably not worth it if we have a maximum storage limit which is fairly low.

Could you try doing something like:
a) Don't parse the string URI on the event,
b) just use the URL as the key in the _changes map,
c) In collect() we actually parse the URIs and fix up the data to actually be useful?

Potentially we could even avoid the URI parsing by instead using some native helper code (on SessionStorageUtils or DOMWindowUtils, iunno), which gets the Principal off of the event.storageArea object, which we could use instead of parsing the URL ourselves.

(NOTE: This is all assuming that the URL parsing is the slow part).
Attachment #8894500 - Flags: review?(michael) → review-
(I am not sure if I made it clear from the comment but I'm not opposed to landing this fundamentally, I just think it might be good to explore some alternatives first because it would be nice to not lose incremental sessionstore.)
(In reply to Tim Taubert [:ttaubert] from comment #10)
> Interestingly, the patch above together with removing the storage quota
> retrieval seems only slightly worse than the baseline:
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=066491f1cd1eeb74a4eea1439994c3f890b78ad8
> https://treeherder.mozilla.org/perf.html#/
> compare?originalProject=try&originalRevision=be582e2778b1bbb0e60a0fb1fc912207
> bf39d4e0&newProject=try&newRevision=066491f1cd1eeb74a4eea1439994c3f890b78ad8&
> framework=1&filter=tp6_google&showOnlyImportant=0
> 
> (I still don't understand why this is suddenly so problematic :/)

What does it look like if you keep that check, but stash nsIDOMWindowUtils in a global?
-    let usage = content.QueryInterface(Ci.nsIInterfaceRequestor)
-                       .getInterface(Ci.nsIDOMWindowUtils)
-                       .getStorageUsage(event.storageArea);

^ I took a quick look at the definition of GetStorageUsage, and it looks like it should be pretty cheap, so I imagine that the expensive part is the interface lookups?
Let me try this again with a global nsIDOMWindowUtils and deferring the .newURI() call.
Page load is a Quantum release criteria and Google.come is a top site. I'd like to track this as a 57 blocker until we have a resolution.
Assignee: nobody → ttaubert
Priority: -- → P1
(In reply to Michael Layzell [:mystor] from comment #7)
> ::: browser/components/sessionstore/SessionStorage.jsm
> @@ -40,5 @@
> >     *         session storage data as strings. For example:
> >     *         {"https://example.com^userContextId=1": {"key": "value", "my_number": "123"}}
> >     */
> > -  collect(docShell, frameTree) {
> > -    return SessionStorageInternal.collect(docShell, frameTree);
> 
> Were we passing a unused frameTree argument to SessionStorageInternal before?

Forgot to answer. Yes, the second argument was unused since the API changed. I modified only the *Internal object's method, but not the public one.
Comment on attachment 8902685 [details] [diff] [review]
0001-Bug-1386867-Fix-quantum_pageload_google-talos-regres.patch

Review of attachment 8902685 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/sessionstore/content/content-sessionStore.js
@@ +797,5 @@
>     *        A function that returns the value that will be sent to the parent
>     *        process.
>     */
>    push(key, fn) {
> +    this._data.set(key, fn);

This seems like it would affect other collectors as well as just SessionStorage. Do we see any perf changes for them?
Attachment #8902685 - Flags: review?(michael) → review+
(In reply to Michael Layzell [:mystor] from comment #19)
> >     *        A function that returns the value that will be sent to the parent
> >     *        process.
> >     */
> >    push(key, fn) {
> > +    this._data.set(key, fn);
> 
> This seems like it would affect other collectors as well as just
> SessionStorage. Do we see any perf changes for them?

I removed it because it actually shouldn't affect any of the collectors. I'm not sure why I ever added it back then, but we never call any of the functions twice. They're called once, or overridden, and then the map is cleared.
Pushed by ttaubert@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a972e0fb330b
Fix quantum_pageload_google talos regression r=mystor
Status: NEW → ASSIGNED
https://hg.mozilla.org/mozilla-central/rev/a972e0fb330b
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 57
This improved the tp6 tests:

== Change summary for alert #9110 (as of August 30 2017 14:55 UTC) ==

Improvements:

  5%  tp6_google summary windows7-32 opt e10s     544.50 -> 515.21
  5%  tp6_google summary windows10-64 opt e10s    533.06 -> 506.25
  5%  tp6_google summary windows10-64 pgo e10s    423.92 -> 404.42
  5%  tp6_google summary windows10-64 opt e10s stylo526.88 -> 502.75
  5%  tp6_google summary windows10-64 pgo 1_thread e10s stylo441.38 -> 421.46
  4%  tp6_google summary windows7-32 opt e10s stylo525.12 -> 502.42
  4%  tp6_google summary windows10-64 opt 1_thread e10s stylo532.83 -> 510.79
  4%  tp6_google summary windows7-32 pgo e10s     413.44 -> 397.29

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=9110
You need to log in before you can comment on or make changes to this bug.