Closed Bug 1698036 Opened 4 years ago Closed 3 years ago

Intermittent browser/base/content/test/fullscreen/browser_fullscreen_window_focus.js | Test timed out -

Categories

(Core :: DOM: Core & HTML, defect)

defect

Tracking

()

RESOLVED FIXED
92 Branch
Fission Milestone M8
Tracking Status
firefox92 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: hsivonen)

References

()

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell needswork:owner])

Attachments

(1 file)

Filed by: malexandru [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=332971562&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/WR3xtZ3eTSCOwK_FPZXHvg/runs/0/artifacts/public/logs/live_backing.log


[task 2021-03-12T08:19:00.119Z] 08:19:00     INFO - TEST-OK | browser/base/content/test/fullscreen/browser_fullscreen_permissions_prompt.js | took 3902ms
[task 2021-03-12T08:19:00.137Z] 08:19:00     INFO - checking window state
[task 2021-03-12T08:19:00.157Z] 08:19:00     INFO - TEST-START | browser/base/content/test/fullscreen/browser_fullscreen_window_focus.js
[task 2021-03-12T08:20:30.166Z] 08:20:30     INFO - TEST-INFO | started process screentopng
[task 2021-03-12T08:20:30.527Z] 08:20:30     INFO - TEST-INFO | screentopng: exit 0
[task 2021-03-12T08:20:30.527Z] 08:20:30     INFO - Buffered messages logged at 08:19:00
[task 2021-03-12T08:20:30.527Z] 08:20:30     INFO - Entering test bound setup
[task 2021-03-12T08:20:30.527Z] 08:20:30     INFO - Leaving test bound setup
[task 2021-03-12T08:20:30.527Z] 08:20:30     INFO - Entering test bound test_parentWindowFocus
[task 2021-03-12T08:20:30.527Z] 08:20:30     INFO - Calling window.open()
[task 2021-03-12T08:20:30.527Z] 08:20:30     INFO - Buffered messages logged at 08:19:01
[task 2021-03-12T08:20:30.527Z] 08:20:30     INFO - re-focusing main window
[task 2021-03-12T08:20:30.527Z] 08:20:30     INFO - must wait for focus
[task 2021-03-12T08:20:30.527Z] 08:20:30     INFO - Entering full-screen
[task 2021-03-12T08:20:30.527Z] 08:20:30     INFO - Calling window.focus()
[task 2021-03-12T08:20:30.527Z] 08:20:30     INFO - Buffered messages finished
[task 2021-03-12T08:20:30.527Z] 08:20:30     INFO - TEST-UNEXPECTED-FAIL | browser/base/content/test/fullscreen/browser_fullscreen_window_focus.js | Test timed out - 
[task 2021-03-12T08:20:30.528Z] 08:20:30     INFO - GECKO(1927) | MEMORY STAT | vsize 20975331MB | residentFast 1071MB
[task 2021-03-12T08:20:30.529Z] 08:20:30     INFO - TEST-OK | browser/base/content/test/fullscreen/browser_fullscreen_window_focus.js | took 90040ms
[task 2021-03-12T08:20:30.530Z] 08:20:30     INFO - Not taking screenshot here: see the one that was previously logged
[task 2021-03-12T08:20:30.533Z] 08:20:30     INFO - TEST-UNEXPECTED-FAIL | browser/base/content/test/fullscreen/browser_fullscreen_window_focus.js | Found a tab after previous test timed out: http://example.com/browser/browser/base/content/test/fullscreen/open_and_focus_helper.html - 
[task 2021-03-12T08:20:30.533Z] 08:20:30     INFO - checking window state
[task 2021-03-12T08:20:30.534Z] 08:20:30     INFO - Not taking screenshot here: see the one that was previously logged
[task 2021-03-12T08:20:30.534Z] 08:20:30     INFO - TEST-UNEXPECTED-FAIL | browser/base/content/test/fullscreen/browser_fullscreen_window_focus.js | Found a browser window after previous test timed out - 
[task 2021-03-12T08:20:30.536Z] 08:20:30     INFO - Console message: [JavaScript Warning: "Exited fullscreen because fullscreen element was removed from document." {file: "chrome://browser/content/tabbrowser.js" line: 3960}]
[task 2021-03-12T08:20:30.536Z] 08:20:30     INFO - TEST-START | browser/base/content/test/fullscreen/browser_fullscreen_window_open.js
[task 2021-03-12T08:20:30.537Z] 08:20:30     INFO - GECKO(1927) | [SimpleTest.requestLongerTimeout()] ignoring request, maybe you meant to call the global `requestLongerTimeout` instead?
[task 2021-03-12T08:20:31.036Z] 08:20:31     INFO - GECKO(1927) | JavaScript error: resource:///actors/DOMFullscreenParent.jsm, line 130: Error: TelemetryStopwatch: key "FULLSCREEN_CHANGE_MS" was already initialized
[task 2021-03-12T08:20:31.200Z] 08:20:31     INFO - GECKO(1927) | JavaScript error: resource:///actors/DOMFullscreenParent.jsm, line 100: Error: TelemetryStopwatch: finishing nonexisting stopwatch. Histogram: "FULLSCREEN_CHANGE_MS", key: ""
[task 2021-03-12T08:20:31.201Z] 08:20:31     INFO - Not taking screenshot here: see the one that was previously logged
[task 2021-03-12T08:20:31.201Z] 08:20:31     INFO - Buffered messages logged at 08:20:30
[task 2021-03-12T08:20:31.201Z] 08:20:31     INFO - Entering test bound setup
[task 2021-03-12T08:20:31.201Z] 08:20:31     INFO - Leaving test bound setup
[task 2021-03-12T08:20:31.201Z] 08:20:31     INFO - Entering test bound test_parentWindowOpen
[task 2021-03-12T08:20:31.201Z] 08:20:31     INFO - Entering full-screen
[task 2021-03-12T08:20:31.201Z] 08:20:31     INFO - Buffered messages logged at 08:20:31
[task 2021-03-12T08:20:31.201Z] 08:20:31     INFO - Console message: [JavaScript Error: "Error: TelemetryStopwatch: key "FULLSCREEN_CHANGE_MS" was already initialized" {file: "resource:///actors/DOMFullscreenParent.jsm" line: 130}]
[task 2021-03-12T08:20:31.201Z] 08:20:31     INFO - handleEvent@resource:///actors/DOMFullscreenParent.jsm:130:28
[task 2021-03-12T08:20:31.201Z] 08:20:31     INFO - 
[task 2021-03-12T08:20:31.201Z] 08:20:31     INFO - Calling window.open()
[task 2021-03-12T08:20:31.201Z] 08:20:31     INFO - Buffered messages finished
[task 2021-03-12T08:20:31.201Z] 08:20:31     INFO - TEST-UNEXPECTED-FAIL | browser/base/content/test/fullscreen/browser_fullscreen_window_focus.js | Uncaught exception received from previously timed out test - at chrome://mochitests/content/browser/browser/base/content/test/fullscreen/head.js:60 - ReferenceError: ok is not defined
[task 2021-03-12T08:20:31.201Z] 08:20:31     INFO - Stack trace:
[task 2021-03-12T08:20:31.201Z] 08:20:31     INFO - testExpectFullScreenExit@chrome://mochitests/content/browser/browser/base/content/test/fullscreen/head.js:60:5
[task 2021-03-12T08:20:31.207Z] 08:20:31     INFO - Console message: [JavaScript Error: "Error: TelemetryStopwatch: finishing nonexisting stopwatch. Histogram: "FULLSCREEN_CHANGE_MS", key: """ {file: "resource:///actors/DOMFullscreenParent.jsm" line: 100}]
[task 2021-03-12T08:20:31.207Z] 08:20:31     INFO - receiveMessage@resource:///actors/DOMFullscreenParent.jsm:100:28
[task 2021-03-12T08:20:31.207Z] 08:20:31     INFO - 
[task 2021-03-12T08:20:32.081Z] 08:20:32     INFO - TEST-PASS | browser/base/content/test/fullscreen/browser_fullscreen_window_open.js | Should leave full-screen - 
[task 2021-03-12T08:20:32.178Z] 08:20:32     INFO - Leaving test bound test_parentWindowOpen
[task 2021-03-12T08:20:32.178Z] 08:20:32     INFO - Entering test bound test_iframeWindowOpen
[task 2021-03-12T08:20:32.500Z] 08:20:32     INFO - Entering full-screen
[task 2021-03-12T08:20:32.768Z] 08:20:32     INFO - Calling window.open()
[task 2021-03-12T08:20:33.716Z] 08:20:33     INFO - TEST-PASS | browser/base/content/test/fullscreen/browser_fullscreen_window_open.js | Should leave full-screen - 
[task 2021-03-12T08:20:33.772Z] 08:20:33     INFO - Leaving test bound test_iframeWindowOpen
[task 2021-03-12T08:20:33.794Z] 08:20:33     INFO - GECKO(1927) | MEMORY STAT | vsize 20975329MB | residentFast 1104MB
[task 2021-03-12T08:20:33.795Z] 08:20:33     INFO - TEST-OK | browser/base/content/test/fullscreen/browser_fullscreen_window_open.js | took 3299ms
[task 2021-03-12T08:20:33.859Z] 08:20:33     INFO - checking window state
[task 2021-03-12T08:20:33.860Z] 08:20:33     INFO - GECKO(1927) | must wait for focus
[task 2021-03-12T08:20:34.733Z] 08:20:34     INFO - GECKO(1927) | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2021-03-12T08:20:35.756Z] 08:20:35     INFO - GECKO(1927) | -----------------------------------------------------
[task 2021-03-12T08:20:35.757Z] 08:20:35     INFO - GECKO(1927) | Suppressions used:
[task 2021-03-12T08:20:35.758Z] 08:20:35     INFO - GECKO(1927) |   count      bytes template
[task 2021-03-12T08:20:35.759Z] 08:20:35     INFO - GECKO(1927) |      14        448 nsComponentManagerImpl
[task 2021-03-12T08:20:35.760Z] 08:20:35     INFO - GECKO(1927) |       2        288 libfontconfig.so
[task 2021-03-12T08:20:35.761Z] 08:20:35     INFO - GECKO(1927) | -----------------------------------------------------
[task 2021-03-12T08:20:35.873Z] 08:20:35     INFO - GECKO(1927) | Completed ShutdownLeaks collections in process 1982
[task 2021-03-12T08:20:35.874Z] 08:20:35     INFO - GECKO(1927) | Completed ShutdownLeaks collections in process 2088
[task 2021-03-12T08:20:35.890Z] 08:20:35     INFO - GECKO(1927) | Completed ShutdownLeaks collections in process 2110
[task 2021-03-12T08:20:35.966Z] 08:20:35     INFO - GECKO(1927) | Completed ShutdownLeaks collections in process 1995
[task 2021-03-12T08:20:36.088Z] 08:20:36     INFO - GECKO(1927) | Completed ShutdownLeaks collections in process 2053
[task 2021-03-12T08:20:37.139Z] 08:20:37     INFO - GECKO(1927) | Completed ShutdownLeaks collections in process 1927
[task 2021-03-12T08:20:37.139Z] 08:20:37     INFO - TEST-START | Shutdown```
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 4 years ago3 years ago
Resolution: --- → INCOMPLETE

@Henri , could you take a look at these failure
They seem to start from your push

Flags: needinfo?(hsivonen)
Assignee: nobody → hsivonen
Status: REOPENED → ASSIGNED
Flags: needinfo?(hsivonen)

(In reply to Henri Sivonen (:hsivonen) from comment #12)

Suspected JS line of concern:
https://searchfox.org/mozilla-central/rev/bea2345ce636dc329ac7133fc0a13c9171777e40/browser/base/content/test/fullscreen/head.js#41

On try, the very first instance of that line appears to be what stalls:
https://treeherder.mozilla.org/logviewer?job_id=342971008&repo=try&lineNumber=14432

This shows that whatever stalled my local rr chaos mode run was something completely different and is useless for debugging this.

(In reply to Henri Sivonen (:hsivonen) from comment #14)

(In reply to Henri Sivonen (:hsivonen) from comment #12)

Suspected JS line of concern:
https://searchfox.org/mozilla-central/rev/bea2345ce636dc329ac7133fc0a13c9171777e40/browser/base/content/test/fullscreen/head.js#41

On try, the very first instance of that line appears to be what stalls:
https://treeherder.mozilla.org/logviewer?job_id=342971008&repo=try&lineNumber=14432

In a successful run, this hasFocus() check is completely in-process involving BrowsingContexts that became active and focused due to a newly-opened native window getting focused by the native toolkit.

We've previously identified oddities related to the reliability of Gtk focusing newly-opened windows in CI, but if that was the explanation here, why would the hasFocus() change be implicated?

I'm going add more logging and see what happens.

Fission Milestone: --- → M8

There have been 42 total failures in the last 7 days (recent log).
Affected platforms are:

  • linux1804-64 debug & opt
  • linux1804-64-asan & linux1804-64-shippable
Whiteboard: [stockwell needswork:owner] → [stockwell needswork]

Still having trouble finding a clue of what's going wrong. A local rr chaos mode run with test assertions failing shows the same sequence of true/false return values from document.hasFocus() as a non-chaos-mode run that doesn't show test assert failures.

Clearly, my local failure run is something totally different from the try server failure.

So process 13 is the one we care about. It sets the focused BC from a parent-initiated action actionPid 0, action 75.
Then process 15 sets its focused BC from a self-initiated action actionPid 15, action 2.
Neither wins. Instead, the processes receive each others' updates. Process 15 get the focused BC set by process 13. Process 13 gets the BC set by process 15. And then process 13 gets stuck with the wrong focused BC.

I suspect the policy that parent-initiated actions are special is probably to blame here. The logging so far is insufficient to confirm.

[task 2021-06-17T15:28:09.273Z] 15:28:09 INFO - GECKO(1547) | SETACTIVEBROWSINGCONTEXTINCONTENT ownPid: 13, actionPid: 0, action: 75, context: 0x7f0845542800
[task 2021-06-17T15:28:09.274Z] 15:28:09 INFO - GECKO(1547) | SETFOCUSEDBROWSINGCONTEXT ownPid: 13, actionPid: 0, action: 75, context: 0x7f0845542800
[task 2021-06-17T15:28:09.287Z] 15:28:09 INFO - GECKO(1547) | SETFOCUSEDBROWSINGCONTEXT ownPid: 15, actionPid: 15, action: 2, context: 0x7fd363944000
[task 2021-06-17T15:28:09.295Z] 15:28:09 INFO - GECKO(1547) | SETACTIVEBROWSINGCONTEXTINCONTENT ownPid: 15, actionPid: 0, action: 74, context: (nil)
[task 2021-06-17T15:28:09.295Z] 15:28:09 INFO - GECKO(1547) | SETFOCUSEDBROWSINGCONTEXT ownPid: 15, actionPid: 0, action: 74, context: (nil)
[task 2021-06-17T15:28:09.313Z] 15:28:09 INFO - GECKO(1547) | SETACTIVEBROWSINGCONTEXTFROMOTHERPROCESS ownPid: 15, actionPid: 0, action: 75, context: 0x7fd363943800
[task 2021-06-17T15:28:09.314Z] 15:28:09 INFO - GECKO(1547) | SETFOCUSEDBROWSINGCONTEXTFROMOTHERPROCESS ownPid: 15, actionPid: 0, action: 75, context: 0x7fd363943800
[task 2021-06-17T15:28:09.314Z] 15:28:09 INFO - GECKO(1547) | SETFOCUSEDBROWSINGCONTEXTFROMOTHERPROCESS ownPid: 13, actionPid: 15, action: 2, context: 0x7f083e239000
[task 2021-06-17T15:28:09.314Z] 15:28:09 INFO - Entering full-screen
[task 2021-06-17T15:28:09.314Z] 15:28:09 INFO - GECKO(1547) | CHANGEFULLSCREEN; WILL WAIT FOR FOCUS
[task 2021-06-17T15:28:09.314Z] 15:28:09 INFO - GECKO(1547) | HASFOCUS TRUE: ownPid: 0, bc: 0x7f7aae632800, active: 0x7f7aae632800, focused: 0x7f7aae632800
[task 2021-06-17T15:28:09.314Z] 15:28:09 INFO - GECKO(1547) | CHANGEFULLSCREEN; WAITED FOR FOCUS
[task 2021-06-17T15:28:09.314Z] 15:28:09 INFO - GECKO(1547) | CHANGEFULLSCREEN; WILL WAIT FOR HASFOCUS
[task 2021-06-17T15:28:09.414Z] 15:28:09 INFO - GECKO(1547) | HASFOCUS NOT IN FOCUSED: ownPid: 13, bc: 0x7f0845542800, focused: 0x7f083e239000
[task 2021-06-17T15:28:09.514Z] 15:28:09 INFO - GECKO(1547) | HASFOCUS NOT IN FOCUSED: ownPid: 13, bc: 0x7f0845542800, focused: 0x7f083e239000
[task 2021-06-17T15:28:09.615Z] 15:28:09 INFO - GECKO(1547) | HASFOCUS NOT IN FOCUSED: ownPid: 13, bc: 0x7f0845542800, focused: 0x7f083e239000
[task 2021-06-17T15:28:09.716Z] 15:28:09 INFO - GECKO(1547) | HASFOCUS NOT IN FOCUSED: ownPid: 13, bc: 0x7f0845542800, focused: 0x7f083e239000
[task 2021-06-17T15:28:09.815Z] 15:28:09 INFO - GECKO(1547) | HASFOCUS NOT IN FOCUSED: ownPid: 13, bc: 0x7f0845542800, focused: 0x7f083e239000
[task 2021-06-17T15:28:09.916Z] 15:28:09 INFO - GECKO(1547) | HASFOCUS NOT IN FOCUSED: ownPid: 13, bc: 0x7f0845542800, focused: 0x7f083e239000

AFAICT, in the process we care about, the unwanted focusing genuinely starts later than the wanted one both in terms of when the actual focusing starts and in terms of when the parent becomes aware of the action and establishes the parent-managed order.

In that sense, it looks like this should be fixed by making the test not have an opportunity for a race like this.

However, the processes do appear to end up with a mutually inconsistent notion of what is focused. That seems bad but doesn't look like the cause of this timeout. Not sure what the best solution is for late-arriving IPC messages that cross their ways in just the unfortunate way. One option, that intuitively feels bad, would be setting a timer in the parent when some active/focused BC traffic happens and once the traffic has settled, pushing the state of the world from the parent to children from the timer.

INSERTNEWFOCUSACTIONID actionPid: 0, action: 277 // THE ACTION WE WANT TAKES QUEUE POSITION
SETACTIVEBROWSINGCONTEXTINCONTENT ownPid: 42, actionPid: 0, action: 277, context: 0x7f281fc4f000
SETFOCUSEDBROWSINGCONTEXT ownPid: 42, actionPid: 0, action: 277, context: 0x7f281fc4f000 // THE IN-PROCESS SET THAT WE WANT
SETFOCUSEDBROWSINGCONTEXT ownPid: 43, actionPid: 43, action: 2, context: 0x7f789d650800 // THE UNWANTED SET
INSERTNEWFOCUSACTIONID actionPid: 0, action: 278
SETACTIVEBROWSINGCONTEXTINCONTENT ownPid: 43, actionPid: 0, action: 276, context: (nil)
SETFOCUSEDBROWSINGCONTEXT ownPid: 43, actionPid: 0, action: 276, context: (nil)
PROCESSPENDINGACTIVE actionPid: 0, action: 277
RECVSETACTIVEBROWSINGCONTEXT actionPid: 0, action: 277
PROCESSPENDINGFOCUSED actionPid: 0, action: 277
RECVSETFOCUSEDBROWSINGCONTEXT actionPid: 0, action: 277
SETACTIVEBROWSINGCONTEXTFROMOTHERPROCESS ownPid: 43, actionPid: 0, action: 277, context: 0x7f789d650000
SETFOCUSEDBROWSINGCONTEXTFROMOTHERPROCESS ownPid: 43, actionPid: 0, action: 277, context: 0x7f789d650000 // THE ACTION WE WANT REACHES THE OTHER PROCESS
INSERTNEWFOCUSACTIONID actionPid: 43, action: 2 // THE UNWANTED ACTION TAKES QUEUE POSITION
PROCESSPENDINGFOCUSED actionPid: 43, action: 2
RECVSETFOCUSEDBROWSINGCONTEXT actionPid: 43, action: 2
SETFOCUSEDBROWSINGCONTEXTFROMOTHERPROCESS ownPid: 42, actionPid: 43, action: 2, context: 0x7f2818a11000 // THE UNWANTED SYNC
PROCESSPENDINGACTIVE NOTFOUND actionPid: 0, action: 276
REVISEACTIVEBROWSINGCONTEXT IGNORED STALE ownPid: 43, oldActionPid: 0, oldAction: 276 newActionPid: 0, newAction: 277, context: 0x7f789d650000
[task 2021-07-06T14:48:55.911Z] 14:48:55 INFO - Entering full-screen
CHANGEFULLSCREEN; WILL WAIT FOR FOCUS
HASFOCUS TRUE: ownPid: 0, bc: 0x7fa84b295800, active: 0x7fa84b295800, focused: 0x7fa84b295800
CHANGEFULLSCREEN; WAITED FOR FOCUS
CHANGEFULLSCREEN; WILL WAIT FOR HASFOCUS
INFO - GECKO(1554) | HASFOCUS NOT IN FOCUSED: ownPid: 42, bc: 0x7f281fc4f000, focused: 0x7f2818a11000
INFO - GECKO(1554) | HASFOCUS NOT IN FOCUSED: ownPid: 42, bc: 0x7f281fc4f000, focused: 0x7f2818a11000

It seems to me that all the awaiting is just for parent-process state, which progresses so quickly that the child state, which isn't awaited on races.

Looks like a the minimal pause in sufficient to fix this. Next: Whether it's feasible to make the native window opening explicitly wait for the appropriate OOP focusing taking place.

See Also: → 1719659

Filed bug 1719659 about a proper fix.

Sigh. The minimal pause wasn't enough after all.

A 500-millisecond pause fixes the failure on opt, but Linux debug has a different failure mode, which isn't fixed.

No progress yet on bug 1719659. Also, since there are two different failure modes here, let's proceed with the currently-attached patch for one of them and file yet another bug for the other one.

Note to sheriff: This patch is expected to make the test fail less but doesn't remove all failures. Please don't back this out due to the remaining failure mode showing up on test-verify when this lands.

Let's file a new bug about the remaining failure mode.

Pushed by hsivonen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/dcc1962393f8 Pause for OOP focus to stabilize in browser_fullscreen_window_focus.js. r=edgar
Status: ASSIGNED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 92 Branch

A very non-scientific look at the post-landing results suggests that we should expect the remaining failure mode to be a) debug-only and b) rare.

(As noted in comment 43, when the remaining failure mode eventually shows up, please file a new bug.)

Regressions: 1720489

Bug 1720489 is the new bug for the remaining failure mode.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: