Closed Bug 1391495 Opened 7 years ago Closed 7 years ago

Intermittent browser/base/content/test/performance/browser_startup.js | RecentWindow.jsm is not allowed before handling user events -

Categories

(Firefox :: General, defect, P5)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])

there are 41 failures in the last 6 days, lets start looking into this bug as it is a new failure.  Windows 8 opt/pgo is the primary failure pattern.

:felipe- I saw your name often in the hg log, this test has a lot of random code in it- as this is a new failure within the last week, could you help find someone who could look into this?
Flags: needinfo?(felipc)
Whiteboard: [stockwell needswork]
This is not a problem with the test, but likely some patch that added RecentWindow.jsm usage during some early startup path.

We need to look into the changesets that landed around the time that this started failing.
Flags: needinfo?(felipc) → needinfo?(florian)
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #2)
> there are 41 failures in the last 6 days

More if you take into account the failures that were misattributed to bug 1382613.

> Windows 8 opt/pgo is the primary failure pattern.

Isn't this almost perma-fail on this platform?
well it is >30% failure, I would say 50%- we do skip these tests and run them every ~5th push, so the percentage failure is much higher than 16% or 20%.
(In reply to :Felipe Gomes (needinfo me!) from comment #3)

> We need to look into the changesets that landed around the time that this
> started failing.

Hopefully the try server can tell us which code to look at. I just pushed a debug changeset that should print the stack in the log: https://treeherder.mozilla.org/#/jobs?repo=try&revision=41ccfc5e171cbb9d7faf11b588541067498d5bb4
Flags: needinfo?(florian)
Summary: Intermittent browser/base/content/test/performance/browser_startup.js | resource:///modules/RecentWindow.jsm is not allowed before handling user events - → Intermittent browser/base/content/test/performance/browser_startup.js | RecentWindow.jsm is not allowed before handling user events -
(In reply to Florian Quèze [:florian] [:flo] from comment #7)
> (In reply to :Felipe Gomes (needinfo me!) from comment #3)
> 
> > We need to look into the changesets that landed around the time that this
> > started failing.
> 
> Hopefully the try server can tell us which code to look at. I just pushed a
> debug changeset that should print the stack in the log:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=41ccfc5e171cbb9d7faf11b588541067498d5bb4

This didn't run any test jobs, and neither did a repush without --artifact: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8701a632cd2367f673624799f8316da416a5a2ba
So I don't know what's the way to reproduce the Windows 8 x64 pgo mochitest-e10s-browser-chrome-* test runs where this is failing most of the time.
I'm on PTO for the rest of the week, so anybody should feel free to adopt this simple debugging patch and repush it to try with a correct syntax.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0b31467ad38629b2266779ba3e8b3b6fc9bdb18c&group_state=expanded (though it doesn't look very helpful).

In theory, it should have just needed "-u mochitest-e10s-bc[Windows 8]" but that didn't work either. "Add new jobs" did, though.
An import to RecentWindow.jsm showed up from three different stacks on these logs, but it's not clear to me which one is causing it. They are:

> 12:26:33     INFO - GECKO(1220) | 1503861993464	Marionette	INFO	Enabled via --marionette
> 12:26:35     INFO - GECKO(1220) | IMPORTING resource:///modules/RecentWindow.jsm
> 12:26:35     INFO - GECKO(1220) | 0 XPCU_moduleLambda() ["resource://gre/modules/XPCOMUtils.jsm":333]
> 12:26:35     INFO - GECKO(1220) | 1 get() ["resource://gre/modules/XPCOMUtils.jsm":199]
> 12:26:35     INFO - GECKO(1220) | 2 _gatherFirstWindowMeasurements/<() ["resource:///modules/BrowserUITelemetry.jsm":287]
> 12:26:35     INFO - GECKO(1220) | 3 onSuccess() ["jar:file:///C:/slave/test/build/application/firefox/omni.ja!/components> /nsSearchService.js":3884]
> 12:26:37     INFO - GECKO(1220) | 1503861997950	Marionette	INFO	Listening on port 2828

> 12:26:37     INFO - GECKO(1220) | IMPORTING resource:///modules/RecentWindow.jsm
> 12:26:37     INFO - GECKO(1220) | 0 XPCU_moduleLambda() ["resource://gre/modules/XPCOMUtils.jsm":333]
> 12:26:37     INFO - GECKO(1220) | 1 get() ["resource://gre/modules/XPCOMUtils.jsm":199]
> 12:26:37     INFO - GECKO(1220) | 2 ssi_getMostRecentBrowserWindow() ["resource:///modules/sessionstore/> SessionStore.jsm":4136]
> 12:26:37     INFO - GECKO(1220) | 3 getCurrentState() ["resource:///modules/sessionstore/SessionStore.jsm":3048]
> 12:26:37     INFO - GECKO(1220) | 4 getCurrentState() ["resource:///modules/sessionstore/SessionStore.jsm":350]
> 12:26:37     INFO - GECKO(1220) | 5 _saveState() ["resource:///modules/sessionstore/SessionSaver.jsm":266]
> 12:26:37     INFO - GECKO(1220) | 6 _saveStateAsync() ["resource:///modules/sessionstore/SessionSaver.jsm":347]
> 12:26:37     INFO - GECKO(1220) | 7 saveStateAsyncWhenIdle() ["resource:///modules/sessionstore/SessionSaver.jsm":198]

> 12:25:58     INFO - GECKO(4624) | IMPORTING resource:///modules/RecentWindow.jsm
> 12:25:58     INFO - GECKO(4624) | 0 XPCU_moduleLambda() ["resource://gre/modules/XPCOMUtils.jsm":333]
> 12:25:58     INFO - GECKO(4624) | 1 get() ["resource://gre/modules/XPCOMUtils.jsm":199]
> 12:25:58     INFO - GECKO(4624) | 2 _attachIndicator/<() ["resource:///modules/DownloadsTaskbar.jsm":138]

I think it's the first two which happens right after a process is created, although I couldn't tell whether that is a chrome or content process.

There's already an exception for RecentWindow.jsm caused by SessionSaver.jsm on non-e10s [1].  Maybe this is now happening on e10s too?

[1] http://searchfox.org/mozilla-central/source/browser/base/content/test/performance/browser_startup.js#124
:felipe, are you working on fixing this, or should we disable the test to reduce the pain?
Flags: needinfo?(felipc)
I'm not.. But this test is very useful from preventing more things being added to startup, so instead of disabling the full test, I'll push a patch (when the tree reopens) to temporarily remove RecentWindow.jsm from the blacklist so we can investigate it without it being a pain to everyone else.
Flags: needinfo?(felipc)
Pushed by felipc@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5b14ecb6f1c2
Temporarily remove RecentWindow.jsm from startup blacklist to avoid this test from being disabled. r=me
(In reply to :Felipe Gomes (needinfo me!) from comment #17)
> I'm not.. But this test is very useful from preventing more things being
> added to startup, so instead of disabling the full test, I'll push a patch
> (when the tree reopens) to temporarily remove RecentWindow.jsm from the
> blacklist so we can investigate it without it being a pain to everyone else.

Thanks for ensuring the test doesn't get disabled! :-)

fwiw, I would have preferred to keep RecentWindow.jsm blacklisted for the previous startup phase ("before first paint") to prevent further related regressions.
(In reply to :Felipe Gomes (needinfo me!) from comment #14)
> An import to RecentWindow.jsm showed up from three different stacks on these
> logs, but it's not clear to me which one is causing it.

The first stack shown for the process is almost guaranteed to be the relevant one.

> They are:
> 
> > 12:26:33     INFO - GECKO(1220) | 1503861993464	Marionette	INFO	Enabled via --marionette
> > 12:26:35     INFO - GECKO(1220) | IMPORTING resource:///modules/RecentWindow.jsm
> > 12:26:35     INFO - GECKO(1220) | 0 XPCU_moduleLambda() ["resource://gre/modules/XPCOMUtils.jsm":333]
> > 12:26:35     INFO - GECKO(1220) | 1 get() ["resource://gre/modules/XPCOMUtils.jsm":199]
> > 12:26:35     INFO - GECKO(1220) | 2 _gatherFirstWindowMeasurements/<() ["resource:///modules/BrowserUITelemetry.jsm":287]
> > 12:26:35     INFO - GECKO(1220) | 3 onSuccess() ["jar:file:///C:/slave/test/build/application/firefox/omni.ja!/components> /nsSearchService.js":3884]

This code runs as soon as the search service is done initializing, and this is done from a .then on a promise, so the micro task likely takes priority over other events.


> > 12:26:37     INFO - GECKO(1220) | 1503861997950	Marionette	INFO	Listening on port 2828
> 
> > 12:26:37     INFO - GECKO(1220) | IMPORTING resource:///modules/RecentWindow.jsm
> > 12:26:37     INFO - GECKO(1220) | 0 XPCU_moduleLambda() ["resource://gre/modules/XPCOMUtils.jsm":333]
> > 12:26:37     INFO - GECKO(1220) | 1 get() ["resource://gre/modules/XPCOMUtils.jsm":199]
> > 12:26:37     INFO - GECKO(1220) | 2 ssi_getMostRecentBrowserWindow() ["resource:///modules/sessionstore/> SessionStore.jsm":4136]
> > 12:26:37     INFO - GECKO(1220) | 3 getCurrentState() ["resource:///modules/sessionstore/SessionStore.jsm":3048]
> > 12:26:37     INFO - GECKO(1220) | 4 getCurrentState() ["resource:///modules/sessionstore/SessionStore.jsm":350]
> > 12:26:37     INFO - GECKO(1220) | 5 _saveState() ["resource:///modules/sessionstore/SessionSaver.jsm":266]
> > 12:26:37     INFO - GECKO(1220) | 6 _saveStateAsync() ["resource:///modules/sessionstore/SessionSaver.jsm":347]
> > 12:26:37     INFO - GECKO(1220) | 7 saveStateAsyncWhenIdle() ["resource:///modules/sessionstore/SessionSaver.jsm":198]

This code runs from a requestIdleCallback so I would assume it's not the culprit... although bug 1388664 landed only 5 days ago, so it may have a role in the change of behavior we are observing.

> > 12:25:58     INFO - GECKO(4624) | IMPORTING resource:///modules/RecentWindow.jsm
> > 12:25:58     INFO - GECKO(4624) | 0 XPCU_moduleLambda() ["resource://gre/modules/XPCOMUtils.jsm":333]
> > 12:25:58     INFO - GECKO(4624) | 1 get() ["resource://gre/modules/XPCOMUtils.jsm":199]
> > 12:25:58     INFO - GECKO(4624) | 2 _attachIndicator/<() ["resource:///modules/DownloadsTaskbar.jsm":138]

This happens much later from an 'unload' event listener, I think we can ignore it.

> I think it's the first two which happens right after a process is created,
> although I couldn't tell whether that is a chrome or content process.

My debug patch (and this test) only logs import in the chrome process. It makes sense that we are seeing a content process being created during startup; I'm afraid this doesn't help us much.

> There's already an exception for RecentWindow.jsm caused by SessionSaver.jsm
> on non-e10s [1].  Maybe this is now happening on e10s too?
> 
> [1]
> http://searchfox.org/mozilla-central/source/browser/base/content/test/
> performance/browser_startup.js#124

Maybe. But I wonder if bug 1388664 could have already fixed half of this.

So... the first thing I would try here is to add a Services.tm.dispatchToMainThread call within the Services.search.init callback at http://searchfox.org/mozilla-central/rev/18c16ebf818abb86805ce08a6e537e4cd826f044/browser/modules/BrowserUITelemetry.jsm#286
Whiteboard: [stockwell needswork] → [stockwell disabled]
I pushed to try a patch re-enabling the RecentWindow.jsm check, and it's no longer permafail (only failed 3 times out of 31 runs). I wonder if the perma-fail part could have been fixed with bug 1397232.

The 3 failures I got have this stack (I actually just used this bug as a test case for my patches in bug 1398198) :

TEST-UNEXPECTED-FAIL | browser/base/content/test/performance/browser_startup.js | resource:///modules/RecentWindow.jsm is not allowed before handling user events - 
Stack trace:
chrome://mochitests/content/browser/browser/base/content/test/performance/browser_startup.js:null:227
0 XPCU_moduleLambda() ["resource://gre/modules/XPCOMUtils.jsm":333]
1 get() ["resource://gre/modules/XPCOMUtils.jsm":199]
2 ssi_getMostRecentBrowserWindow() ["resource:///modules/sessionstore/SessionStore.jsm":4325]
3 getCurrentState() ["resource:///modules/sessionstore/SessionStore.jsm":3149]
4 getCurrentState() ["resource:///modules/sessionstore/SessionStore.jsm":358]
5 _saveState() ["resource:///modules/sessionstore/SessionSaver.jsm":266]
6 _saveStateAsync() ["resource:///modules/sessionstore/SessionSaver.jsm":347]
7 saveStateAsyncWhenIdle() ["resource:///modules/sessionstore/SessionSaver.jsm":198]
(In reply to Florian Quèze [:florian] [:flo] from comment #26)
> I pushed to try a patch re-enabling the RecentWindow.jsm check, and it's no
> longer permafail (only failed 3 times out of 31 runs).

https://treeherder.mozilla.org/#/jobs?repo=try&revision=eb44a8d3c45793db4d424add6dfd1dc73ae0800f
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.