Closed Bug 1630176 Opened 4 months ago Closed 1 month ago

Intermittent dom/serviceworkers/test/test_notification_openWindow.html | Test timed out.

Categories

(Core :: DOM: Service Workers, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Fission Milestone M4.1

People

(Reporter: intermittent-bug-filer, Assigned: annyG)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])

Attachments

(2 files)

Filed by: rgurzau [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=297679526&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/I9pVjMofSNumIQmIo59bVw/runs/0/artifacts/public/logs/live_backing.log


[task 2020-04-15T05:39:14.921Z] 05:39:14 INFO - Waiting for "DONE" from worker.
[task 2020-04-15T05:39:14.922Z] 05:39:14 INFO - TEST-FAIL | dom/serviceworkers/test/test_notification_openWindow.html | The author of the test has indicated that flaky timeouts are expected. Reason: Mock alert service dispatches show and click events.
[task 2020-04-15T05:39:14.922Z] 05:39:14 INFO - TEST-FAIL | dom/serviceworkers/test/test_notification_openWindow.html | The author of the test has indicated that flaky timeouts are expected. Reason: Mock alert service dispatches show and click events.
[task 2020-04-15T05:39:14.922Z] 05:39:14 INFO - Buffered messages logged at 05:34:19
[task 2020-04-15T05:39:14.922Z] 05:39:14 INFO - TEST-PASS | dom/serviceworkers/test/test_notification_openWindow.html | Didn't crash!
[task 2020-04-15T05:39:14.923Z] 05:39:14 INFO - add_task | Leaving test test
[task 2020-04-15T05:39:14.924Z] 05:39:14 INFO - Buffered messages finished
[task 2020-04-15T05:39:14.925Z] 05:39:14 INFO - TEST-UNEXPECTED-FAIL | dom/serviceworkers/test/test_notification_openWindow.html | Test timed out.
[task 2020-04-15T05:39:14.925Z] 05:39:14 INFO - SimpleTest.ok@SimpleTest/SimpleTest.js:299:16
[task 2020-04-15T05:39:14.925Z] 05:39:14 INFO - reportError@SimpleTest/TestRunner.js:128:22
[task 2020-04-15T05:39:14.925Z] 05:39:14 INFO - TestRunner._checkForHangs@SimpleTest/TestRunner.js:150:18
[task 2020-04-15T05:39:14.925Z] 05:39:14 INFO - setTimeout handlerTestRunner._checkForHangs@SimpleTest/TestRunner.js:184:15
[task 2020-04-15T05:39:14.925Z] 05:39:14 INFO - setTimeout handler
TestRunner._checkForHangs@SimpleTest/TestRunner.js:184:15
[task 2020-04-15T05:39:14.926Z] 05:39:14 INFO - setTimeout handlerTestRunner._checkForHangs@SimpleTest/TestRunner.js:184:15
[task 2020-04-15T05:39:14.926Z] 05:39:14 INFO - setTimeout handler
TestRunner._checkForHangs@SimpleTest/TestRunner.js:184:15
[task 2020-04-15T05:39:14.926Z] 05:39:14 INFO - setTimeout handlerTestRunner._checkForHangs@SimpleTest/TestRunner.js:184:15
[task 2020-04-15T05:39:14.926Z] 05:39:14 INFO - setTimeout handler
TestRunner._checkForHangs@SimpleTest/TestRunner.js:184:15
[task 2020-04-15T05:39:14.926Z] 05:39:14 INFO - setTimeout handlerTestRunner._checkForHangs@SimpleTest/TestRunner.js:184:15
[task 2020-04-15T05:39:14.926Z] 05:39:14 INFO - setTimeout handler
TestRunner._checkForHangs@SimpleTest/TestRunner.js:184:15
[task 2020-04-15T05:39:14.926Z] 05:39:14 INFO - setTimeout handlerTestRunner._checkForHangs@SimpleTest/TestRunner.js:184:15
[task 2020-04-15T05:39:14.926Z] 05:39:14 INFO - setTimeout handler
TestRunner._checkForHangs@SimpleTest/TestRunner.js:184:15
[task 2020-04-15T05:39:14.926Z] 05:39:14 INFO - TestRunner.runTests/<@SimpleTest/TestRunner.js:420:16
[task 2020-04-15T05:39:14.926Z] 05:39:14 INFO - promise callbackTestRunner.runTests@SimpleTest/TestRunner.js:407:48
[task 2020-04-15T05:39:14.927Z] 05:39:14 INFO - RunSet.runtests@SimpleTest/setup.js:218:14
[task 2020-04-15T05:39:14.927Z] 05:39:14 INFO - RunSet.runall@SimpleTest/setup.js:197:12
[task 2020-04-15T05:39:14.927Z] 05:39:14 INFO - hookupTests@SimpleTest/setup.js:294:12
[task 2020-04-15T05:39:14.927Z] 05:39:14 INFO - parseTestManifest@http://mochi.test:8888/manifestLibrary.js:46:13
[task 2020-04-15T05:39:14.927Z] 05:39:14 INFO - getTestManifest/req.onload@http://mochi.test:8888/manifestLibrary.js:59:28
[task 2020-04-15T05:39:14.927Z] 05:39:14 INFO - EventHandlerNonNull
getTestManifest@http://mochi.test:8888/manifestLibrary.js:55:3
[task 2020-04-15T05:39:14.927Z] 05:39:14 INFO - hookup@SimpleTest/setup.js:270:20
[task 2020-04-15T05:39:14.927Z] 05:39:14 INFO - EventHandlerNonNull*@dom/serviceworkers/test?autorun=1&closeWhenDone=1&consoleLevel=INFO&testname=tests/dom/serviceworkers/test/test_notification_openWindow.html&manifestFile=tests.json&dumpOutputDirectory=%2Ftmp&cleanupCrashes=true:11:1
[task 2020-04-15T05:39:15.753Z] 05:39:15 ERROR - [SimpleTest.finish()] this test already called finish!

Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Summary: Intermittent TC dom/serviceworkers/test/test_notification_openWindow.html | Test timed out. → Intermittent dom/serviceworkers/test/test_notification_openWindow.html | Test timed out.

Hi Kashav, I found in the log:

TEST-FAIL | dom/serviceworkers/test/test_notification_openWindow.html | The author of the test has indicated that flaky timeouts are expected.  Reason: Mock alert service dispatches show and click events.

originating from here.

Does this mean, this test is expected to timeout sometimes and we should not care further?

Flags: needinfo?(kmadan)

This is the full log for the failure from comment #7:

INFO - TEST-START | dom/serviceworkers/test/test_notification_openWindow.html
...
INFO - Registering service worker.
INFO - Setting browser.link.open_newwindow to 1.
INFO - Showing notification.
INFO - Waiting for "DONE" from worker.
INFO - TEST-FAIL | dom/serviceworkers/test/test_notification_openWindow.html | The author of the test has indicated that flaky timeouts are expected.  Reason: Mock alert service dispatches show and click events. 
INFO - TEST-FAIL | dom/serviceworkers/test/test_notification_openWindow.html | The author of the test has indicated that flaky timeouts are expected.  Reason: Mock alert service dispatches show and click events. 
INFO - TEST-PASS | dom/serviceworkers/test/test_notification_openWindow.html | Didn't crash! 
INFO - Setting browser.link.open_newwindow to 2.
INFO - Showing notification.
INFO - Waiting for "DONE" from worker.
INFO - TEST-FAIL | dom/serviceworkers/test/test_notification_openWindow.html | The author of the test has indicated that flaky timeouts are expected.  Reason: Mock alert service dispatches show and click events. 
INFO - TEST-FAIL | dom/serviceworkers/test/test_notification_openWindow.html | The author of the test has indicated that flaky timeouts are expected.  Reason: Mock alert service dispatches show and click events. 
INFO - TEST-UNEXPECTED-FAIL | dom/serviceworkers/test/test_notification_openWindow.html | Test timed out. 

So we're timing out waiting for the "DONE" message, which could mean a few different things: 1) we're not clicking the notification, so we're not even running the code that starts the test here (this is the case that is expected to timeout, so we can ignore it), or 2) we're just not seeing the "DONE" message from the test file, which is sent here.

It's very unlikely that we're in the second case, so I think it's fine to ignore this, but this started re-appearing on May 27 (right after bug 1630323 landed), so redirecting to annyG in case I'm missing something. We can probably add some more logs to confirm which case we're in.

Flags: needinfo?(kmadan) → needinfo?(agakhokidze)

I'm uncertain how the patches from Bug 1637632 could have caused this test to permafail. The nature of this test seems unrelated to the code modified/added in Bug 1637632.

Flags: needinfo?(lelouch.cpp)

I'm currently investigating this. Out of the bugs that landed on June 15th, bug 1644943 has changed some code in the relevant area, but I haven't determine exactly what's wrong yet.

Based on this https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2020-05-27&endday=2020-06-18&tree=trunk&bug=1630176 it looks like on June 15th when bug 1644943 landed, this test became a permafailure.

I have a theory that before June 15th, the intermittents that were happening were due to not requesting a long enough timeout in the test and eventually when Matt landed his patches on June 15th, they became permafails. I checked out a version of mozilla-central at the point where Matt's patches haven't landed yet, added some logs and requested a longer timeout in the test and this is the resulting try run https://treeherder.mozilla.org/#/jobs?repo=try&revision=414e32b62ef6ee48017b6b43c6a159a238f9f570 . There are no timeouts here, which makes me think that my theory is correct. I also tried adding a longer timeout, but it didn't help the issue because we never load the desired gClientUrl from the test.

I then debugged the timeout in the test locally on top of the latest mozilla-central. I'm noticing that when we try to open a window here https://searchfox.org/mozilla-central/rev/5e6c7717255ca9638b2856c2b2058919aec1d21d/dom/serviceworkers/test/notification_openWindow_worker.js#24 we do end up all the way in DocumentChannel with the correct gClientUrl url, but then a newly started load of an "about:newtab" cancels our gClientUrl load. The "about:newtab" load is caused by https://searchfox.org/mozilla-central/rev/5e6c7717255ca9638b2856c2b2058919aec1d21d/browser/base/content/browser.js#2324 .

To provide some extra info, this is the js stack I'm seeing when we "about:newtab" load starts

0 loadURI(aURI = ""about:newtab"", aLoadURIOptions = "[object Object]") ["resource://gre/modules/RemoteWebNavigation.jsm":115:46]
    this = [object Object]
1 loadURI/<() ["chrome://global/content/elements/browser-custom-element.js":901:27]
2 _wrapURIChangeCall(fn = [function]) ["chrome://global/content/elements/browser-custom-element.js":832:8]
    this = [object XULFrameElement]
3 loadURI(aURI = ""about:newtab"", aParams = "[object Object]") ["chrome://global/content/elements/browser-custom-element.js":900:11]
    this = [object XULFrameElement]
4 maybeCreatePreloadedBrowser(window = "[object Window]") ["resource:///modules/NewTabPagePreloading.jsm":149:12]
    this = [object Object]
5 _schedulePerWindowIdleTasks/<() ["chrome://browser/content/browser.js":2324:27]
6 idleTaskRunner("[object IdleDeadline]") ["chrome://browser/content/browser.js":2259:10]
    this = [object Window]

I'm going to ask Matt if he has any suggestions on what might be causing this or knows why NewTabPagePreloading.maybeCreatePreloadedBrowser is now cancelling other loads.

Anny, Jens, would it be ok to disable the test until a fix is in place? It already has 144 total failures since the 16th: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2020-06-13&endday=2020-06-20&tree=trunk&bug=1630176
Retriggers show that this was permafailing even before https://bugzilla.mozilla.org/show_bug.cgi?id=1644943#c8 last landed here. It first landed on the 15th but it was backed out right after that.

Flags: needinfo?(jstutte)

For me it is ok to disable, if we cannot expect a fast fix here. Anny?

Severity: normal → S3
Flags: needinfo?(jstutte)
Priority: -- → P2

You can disable it as I don't have a fix at the moment. Matt left some comments on my pernosco trace here https://pernos.co/debug/Vmsx9Mf0tgt8pm5vSqFh3w/index.html#f{m[D3xQ,AbMg_,t[Sw,A0Q_,f{e[D3xQ,AbL8_,s{af4rAcSAA,bFLo,oAmXFOQ,uAlbO3Q___ so I'm looking into it.

Fission Milestone: --- → M4.1
Assignee: nobody → agakhokidze
Keywords: leave-open
Whiteboard: [stockwell disable-recommended] → [stockwell disabled]
Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0254a64157c3
Disable test_notification_openWindow.html on linux64 and win64 !debug. r=gbrown

(Brain dump so I can refer to this in future)
The test is failing in nsIBrowserDOMWindow.OPEN_NEWWINDOW case: after we show the notification to the user here https://searchfox.org/mozilla-central/rev/a87a1c3b543475276e6d57a7a80cb02f3e42b6ed/dom/serviceworkers/test/test_notification_openWindow.html#68 we try to open a window here https://searchfox.org/mozilla-central/rev/a87a1c3b543475276e6d57a7a80cb02f3e42b6ed/dom/serviceworkers/test/notification_openWindow_worker.js#24 but the right window never manages to open and the progress listener that is waiting for gClientURL to open never gets called so it never resolve the promise https://searchfox.org/mozilla-central/rev/a87a1c3b543475276e6d57a7a80cb02f3e42b6ed/dom/clients/manager/ClientOpenWindowUtils.cpp#257

I have a pernosco trace here https://pernos.co/debug/-yud_vhka4UWC0Fqm23rOQ/index.html#f{m[FUA+,A+G9_,t[AfU,MrM_,f{e[FUA+,A4lq_,s{af+ieOQAA,bFAg,oEz51gg,uEyO6Wg___ and the promise that never gets resolved is this one https://searchfox.org/mozilla-central/rev/a87a1c3b543475276e6d57a7a80cb02f3e42b6ed/docshell/base/CanonicalBrowsingContext.cpp#769. Earlier we call browser->PrepareToChangeRemoteness, which in turn eventually calls SessionStoreInternal.prepareToChangeRemoteness and awaits for the promise that await TabStateFlusher.flush(aBrowser); returns. TabStateFlusher.flush saves the resolve callback here https://searchfox.org/mozilla-central/rev/a87a1c3b543475276e6d57a7a80cb02f3e42b6ed/browser/components/sessionstore/TabStateFlusher.jsm#107, then later we call https://searchfox.org/mozilla-central/rev/a87a1c3b543475276e6d57a7a80cb02f3e42b6ed/browser/components/sessionstore/TabStateFlusher.jsm#146 once and because we had a native listener, we end up in this case https://searchfox.org/mozilla-central/rev/a87a1c3b543475276e6d57a7a80cb02f3e42b6ed/browser/components/sessionstore/TabStateFlusher.jsm#161-165 waiting for one more flush before we can call the resolve callback. This however never happens.

I put some sections in the pernosco notebook for the trace above

More brain dump:

Even though we have this call https://searchfox.org/mozilla-central/rev/a87a1c3b543475276e6d57a7a80cb02f3e42b6ed/browser/components/sessionstore/TabStateFlusher.jsm#97 we never receive it in nsFrameMessageManager::ReceiveMessage (this gets called at some point as part of the sendAsyncMessage call chain. Since TabStateFlusher.flush is only called once by UpdateSessionStore and not twice as we expect it to, when we have a native listener here https://searchfox.org/mozilla-central/rev/a87a1c3b543475276e6d57a7a80cb02f3e42b6ed/browser/components/sessionstore/TabStateFlusher.jsm#161-166, the promise is never resolved.

The nsFrameMessageManager that receives "SessionStore:flush" message as a result of this call https://searchfox.org/mozilla-central/rev/a87a1c3b543475276e6d57a7a80cb02f3e42b6ed/browser/components/sessionstore/TabStateFlusher.jsm#97 only adds an event listener for this message sometime AFTER we received that message. I wonder if there is a race between browser remoteness change and creation of ContentSessionStore and that's what causes this problem

Flags: needinfo?(agakhokidze)
Attachment #9160130 - Attachment description: Bug 1630176 - Fix the race between process swi switching and the new browser window loading code, r=kmag → Bug 1630176 - Fix the race between process switching and the new browser window loading code, r=kmag
Pushed by agakhokidze@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f0e47b29f8c4
Fix the race between process switching and the new browser window loading code, r=kmag
Keywords: leave-open
Status: REOPENED → RESOLVED
Closed: 3 months ago1 month ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.