Closed Bug 1379568 Opened 7 years ago Closed 7 years ago

1.85% quantum_pageload_google (windows7-32) regression on push 807fa40a522ffdaec16d6aa900b3c3d84dd7434d (Thu Jul 6 2017)

Categories

(Core :: DOM: Core & HTML, defect)

Unspecified
Windows
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- unaffected
firefox54 --- unaffected
firefox55 --- unaffected
firefox56 --- fixed

People

(Reporter: igoldan, Assigned: baku)

References

Details

(Keywords: perf, regression, talos-regression)

Attachments

(1 file)

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=807fa40a522ffdaec16d6aa900b3c3d84dd7434d

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  2%  quantum_pageload_google summary windows7-32 opt e10s     1,135.12 -> 1,156.17


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=7740

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Component: Untriaged → DOM
Product: Firefox → Core
:baku Could you take a look over this regression? Is this something we can improve or should we rather accept it?

For the record, I filed this bug for Windows 7, but I happens on Windows 10 also, to a lesser degree. You can view this here:

https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bmozilla-inbound,9fd1d8691d24b84a9986dbd6c0a3248b5493fb5e,1,1%5D&series=%5Bmozilla-inbound,07395148c79800e93f4913146087c5e6ecb008e5,1,1%5D&zoom=1499305411635.8733,1499426720229.9392,1002.6022304832713,1263.5687732342008
Flags: needinfo?(amarchesini)
The only main deference in my patch is that we use the event target of the window, instead of the global/system group one for the dispatching of localStorage/sessionStorage notification runnables. Plus, instead dispatching 1 notification, we have multiple dispatching, one per window. This is needed for quantum DOM and I don't think we want to revert this change.

An optimization we can do, is to have a pre-filtering of the events: currently we dispatch a runnable for any registered window. Maybe we can do this only if we know that the window could be interested in this localStorage/sessionStorage change. This pre-filtering should be done using the principal.

I can write a patch, but I don't think we will have a huge improvement, until the quantum DOM scheduling is completed.
Flags: needinfo?(amarchesini) → needinfo?(wmccloskey)
(In reply to Andrea Marchesini [:baku] from comment #2)
> An optimization we can do, is to have a pre-filtering of the events:
> currently we dispatch a runnable for any registered window. Maybe we can do
> this only if we know that the window could be interested in this
> localStorage/sessionStorage change. This pre-filtering should be done using
> the principal.

That seems like a good idea to me. Maybe we could also just check if the window has any listeners for the event? That would probably be easier.

> I can write a patch, but I don't think we will have a huge improvement,
> until the quantum DOM scheduling is completed.

A filtering patch would probably get us back to baseline performance without having to back out the patch. That seems worth it to me.
Flags: needinfo?(wmccloskey)
> That seems like a good idea to me. Maybe we could also just check if the
> window has any listeners for the event? That would probably be easier.

We cannot, because it's not just about dispatching events, but we use this mechanism also to keep localStorage in sync with what happens on other processes.
Attached patch storage.patchSplinter Review
Assignee: nobody → amarchesini
Attachment #8885155 - Flags: review?(bugmail)
Comment on attachment 8885155 [details] [diff] [review]
storage.patch

Review of attachment 8885155 [details] [diff] [review]:
-----------------------------------------------------------------

Restating: This should notably improve performance by performing some disqualifying checks earlier in the process, which avoids creating and dispatching runnables.  These changes are fallout from bug 1333981.  Previously the early checks weren't necessary (or possible) because the observer service was used and we'd synchronously run the checks for each window.  We still expect the combination of bug 1333981 and these fixes to result in some level of overhead.

It looks like performance wins could also be had by eliminating the for-devtools StorageNotifierRunnable path in Storage::NotifyChange.  The runnable exists regardless of whether devtools cares or not.  (Note that the runnable is immediately run during e10s propagation but is dispatched for local in-process changes.)  The underlying storage actor is already pulling localStorage/sessionStorage off of the inner window (and triggering all the relevant overhead), so it can register "storage"/"MozSessionStorageChanged"/"MozLocalStorageChanged" listeners against the window without any added cost.  (The latter two events are the chrome-only events generated then a window modifies its own localStorage and are necessary for correctness.)


It's probably also worth stating that the changes in bug 1333981 are troublesome for e10s correctness.  It's nsGlobalWindow::CloneStorageEvent() that applies the e10s-propagated changes.  Previously, a single localStorage change might be applied redundantly to the same underlying LocalStorageCache if there were multiple nsGlobalWindows, but they would all be applied during the same turn of the event loop.  It was wasteful but had no correctness impact.  Post-bug 1333981, separate runnables are created for each observer.  In a scenario with the same origin present in separate tab groups with separate event targets and one of the event targets throttled, correctness can be compromised.  Specifically, imagine the following sequence of events:
- other process does foo=bar, this is propagated for e10s purposes and also sent to the parent for persistence to the database.  (These are two different mechanisms, unfortunately.)
- page in tab group A sees foo=bar ObserveStorageNotification, which is applied to the underlying LocalStorage, giving it state {foo: bar} consistent with the database state change triggered by the other process.
- page in tab group A mutates localStorage, setting foo=hat, giving it local and database state {foo: hat}.  The change is also propagated to other processes.
- page in tab group B finally process its foo=bar ObserveStorageNotification, which is applied to the underlying LocalStorage, resulting in local state {foo: bar} but the (correct) database state is {foo: hat}.

(Note that we actually already had a variation on this problem prior to bug 1333981 if the foo=hat mutation occurred during the dispatch of the foo=bar "storage" event.)

This correctness issue can be addressed by altering LocalStorage::ApplyEvent and its passed-in StorageEvent so that ApplyEvent runs at most once for a given StorageEvent (via previously proposed hacky mutation).  Long term fix continues to be the localStorage overhaul.
Attachment #8885155 - Flags: review?(bugmail) → review+
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/960a0dc9092f
Filtering for StorageEvent broadcasting, r=asuth
Backed out for frequently timing out web-platform-test /webstorage/storage_setitem.html:

https://hg.mozilla.org/integration/mozilla-inbound/rev/da31f44d2f8a27bfe0d97905346dcf891a5c170f

A push with two failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=0c098eb60dd15df7d90d100fb1d181b12d53a2ab&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=113661681&repo=mozilla-inbound

[task 2017-07-12T14:59:15.692622Z] 14:59:15     INFO - PID 8819 | [8819] WARNING: No outer window available!: file /home/worker/workspace/build/src/dom/base/nsGlobalWindow.cpp, line 12853
[task 2017-07-12T14:59:15.693365Z] 14:59:15     INFO - PID 8819 | [8819] WARNING: No outer window available!: file /home/worker/workspace/build/src/dom/base/nsGlobalWindow.cpp, line 12853
[task 2017-07-12T14:59:15.694160Z] 14:59:15     INFO - PID 8819 | [8819] WARNING: No outer window available!: file /home/worker/workspace/build/src/dom/base/nsGlobalWindow.cpp, line 12853
[task 2017-07-12T14:59:15.695105Z] 14:59:15     INFO - TEST-UNEXPECTED-TIMEOUT | /webstorage/storage_setitem.html | expected OK
Flags: needinfo?(amarchesini)
This has been fixed by bug 368116.
Flags: needinfo?(amarchesini)
I don't think we want to uplift this patch. There are too many dependences.
bug 368116 was worksforme, possibly you have the wrong bug number?
for this performance regression, I don't see any change since this was filed.  Are we safe to assume we are recommending closing this as wontfix?
the patch has been backout. I'm working on it.
1 week until we merge to beta, is it possible to get this bug resolved this week?
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a3efdf436e95
Filtering for StorageEvent broadcasting, r=asuth
https://hg.mozilla.org/mozilla-central/rev/a3efdf436e95
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Depends on: 1384908
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.