Closed Bug 1586503 Opened 5 years ago Closed 2 years ago

Intermittent Fis remote/cdp/test/browser/runtime/browser_executionContextEvents.js | Test timed out -

Categories

(Remote Protocol :: CDP, defect, P1)

defect
Points:
1

Tracking

(firefox70 unaffected, firefox71 disabled, firefox111 disabled, firefox112 disabled, firefox113 fixed)

RESOLVED FIXED
113 Branch
Tracking Status
firefox70 --- unaffected
firefox71 --- disabled
firefox111 --- disabled
firefox112 --- disabled
firefox113 --- fixed

People

(Reporter: aiakab, Assigned: whimboo)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: intermittent-failure, regression)

Attachments

(3 files)

+++ This bug was initially created as a clone of Bug #1547961 +++

Filed by: rgurzau [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=243601513&repo=autoland
Full log: https://queue.taskcluster.net/v1/task/ES4Ma6OvTj-y9mRSo6Vtgg/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/ES4Ma6OvTj-y9mRSo6Vtgg/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1


12:25:34 INFO - TEST-START | remote/test/browser/browser_runtime_executionContext.js
12:25:35 INFO - GECKO(823) | 1556627135707 RemoteAgent INFO Remote debugging agent listening on ws://localhost:9222/devtools/browser/dcebb924-f922-af49-b959-d305c1deab66
12:25:48 INFO - GECKO(823) | 2019-04-30 12:25:48.040 firefox[823:6126] Persistent UI failed to open file file:///Users/cltbld/Library/Saved%20Application%20State/org.mozilla.nightly.savedState/window_1.data: No such file or directory (2)
12:26:20 INFO - TEST-INFO | started process screencapture
12:26:20 INFO - TEST-INFO | screencapture: exit 0
12:26:20 INFO - Buffered messages logged at 12:25:34
12:26:20 INFO - Entering test bound
12:26:20 INFO - Buffered messages logged at 12:25:36
12:26:20 INFO - TEST-PASS | remote/test/browser/browser_runtime_executionContext.js | CDP client has been instantiated -
12:26:20 INFO - TEST-PASS | remote/test/browser/browser_runtime_executionContext.js | Runtime domain has been enabled -
12:26:20 INFO - TEST-PASS | remote/test/browser/browser_runtime_executionContext.js | The execution context has an id -
12:26:20 INFO - TEST-PASS | remote/test/browser/browser_runtime_executionContext.js | The execution context is the default one -
12:26:20 INFO - TEST-PASS | remote/test/browser/browser_runtime_executionContext.js | The execution context has a frame id set -
12:26:20 INFO - Navigate to a new URL
12:26:20 INFO - TEST-PASS | remote/test/browser/browser_runtime_executionContext.js | A new page has been loaded -
12:26:20 INFO - TEST-PASS | remote/test/browser/browser_runtime_executionContext.js | Page.navigate returns the same frameId than executionContextCreated -
12:26:20 INFO - TEST-PASS | remote/test/browser/browser_runtime_executionContext.js | The destroyed event reports the previous context id -
12:26:20 INFO - TEST-PASS | remote/test/browser/browser_runtime_executionContext.js | The execution context has an id -
12:26:20 INFO - TEST-PASS | remote/test/browser/browser_runtime_executionContext.js | The new execution context has a different id -
12:26:20 INFO - TEST-PASS | remote/test/browser/browser_runtime_executionContext.js | The execution context is the default one -
12:26:20 INFO - TEST-PASS | remote/test/browser/browser_runtime_executionContext.js | The execution context frame id is the same than the one returned by Page.navigate -
12:26:20 INFO - TEST-PASS | remote/test/browser/browser_runtime_executionContext.js | The destroyed id is different from the created one -
12:26:20 INFO - Navigate back to the previous document
12:26:20 INFO - Buffered messages finished
12:26:20 INFO - TEST-UNEXPECTED-FAIL | remote/test/browser/browser_runtime_executionContext.js | Test timed out -
12:26:20 INFO - GECKO(823) | MEMORY STAT | vsize 5441MB | residentFast 310MB | heapAllocated 89MB
12:26:20 INFO - TEST-OK | remote/test/browser/browser_runtime_executionContext.js | took 45065ms
12:26:20 INFO - Not taking screenshot here: see the one that was previously logged
12:26:20 INFO - TEST-UNEXPECTED-FAIL | remote/test/browser/browser_runtime_executionContext.js | Found a tab after previous test timed out: data:text/html;charset=utf-8,test-page -
12:26:20 INFO - checking window state
12:26:21 INFO - GECKO(823) | Completed ShutdownLeaks collections in process 831
12:26:21 INFO - GECKO(823) | Completed ShutdownLeaks collections in process 829
12:26:21 INFO - GECKO(823) | Completed ShutdownLeaks collections in process 832
12:26:21 INFO - GECKO(823) | Completed ShutdownLeaks collections in process 833

Regressed by: 1547961

Looks like it still happens one and off but only for debug builds on MacOS 10.14. I will have a look at it over the next days. So far I haven't been able to reproduce the problem yet.

No longer depends on: 1547961, 1585464
OS: Unspecified → macOS
Priority: P1 → P5
Hardware: Unspecified → All

Something fails at the beginning of the test:

[task 2019-10-05T13:02:56.402Z] 13:02:56     INFO - TEST-START | remote/test/browser/browser_runtime_executionContext.js
[task 2019-10-05T13:02:56.733Z] 13:02:56     INFO - GECKO(1660) | DevTools listening on ws://localhost:9222/devtools/browser/dae68fe8-a65e-304e-be2a-51be017530ac
[task 2019-10-05T13:03:41.420Z] 13:03:41     INFO - TEST-INFO | started process screencapture

So devtools started to listen on the web socket, but the test didn't get started. It's mainly debug builds, but also happens for opt builds.

I can reproduce this hang locally now. My assumption from the last comment is actually not correct. The hang happens while navigating back:

[task 2019-10-09T23:30:26.949Z] 23:30:26     INFO - TEST-PASS | remote/test/browser/browser_runtime_executionContext.js | The destroyed id is different from the created one - 
[task 2019-10-09T23:30:26.949Z] 23:30:26     INFO - Navigate back to the previous document

After the last info output nothing more happens. Having a look into that shows me that in working cases we get a pagehide event right before pageshow which I would expect here to happen. But when it fails this pagehide event is missing.

Working case:

 0:13.33 GECKO(44681) ** event: pagehide [object HTMLDocument] true
 0:13.33 GECKO(44681) [Child 44691, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80070057: file /builds/worker/workspace/build/src/dom/base/ThirdPartyUtil.cpp, line 217
 0:13.33 GECKO(44681) ** event: pageshow [object HTMLDocument] true

Failiing case:

0:17.44 GECKO(44681) ** event: pageshow [object HTMLDocument] false

And only having pageshow the persisted flag has a false value, which means we do not emit the context-created event:

https://searchfox.org/mozilla-central/rev/05a22d864814cb1e4352faa4004e1f975c7d2eb9/remote/domains/ContextObserver.jsm#86

There is a huge increase of intermittent failures recently, and this one actually turns into a perma failure on beta. It means that we should mark the test as skipped on MacOS again for now, while trying to investigate the problem.

Summary: Intermittent remote/test/browser/browser_runtime_executionContext.js | Test timed out - → Intermittent Mac remote/test/browser/browser_runtime_executionContext.js | Test timed out -
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a7ca5ad33f3d Temporarily skip remote/test/browser/browser_runtime_executionContext.js on Mac. r=remote-protocol-reviewers,maja_zf
Whiteboard: [stockwell unknown] → [stockwell disabled]

Because the new failures here were all on windows filed bug 1590930.

See Also: → 1590930

Updating summary for new location of the test.

Summary: Intermittent Mac remote/test/browser/browser_runtime_executionContext.js | Test timed out - → Intermittent Mac remote/test/browser/runtime/browser_executionContext.js | Test timed out -
Summary: Intermittent Mac remote/test/browser/runtime/browser_executionContext.js | Test timed out - → Intermittent Mac remote/cdp/test/browser/runtime/browser_executionContext.js | Test timed out -
Component: Agent → CDP
Has Regression Range: --- → yes
See Also: → 1752185

The test is also disabled on Windows (bug 1590930) and we have to do the same for Linux (bug 1752185).

I'll come up with a patch for Linux and update the bug references in the manifest file.

Blocks: 1752185
See Also: 1590930, 1752185
Summary: Intermittent Mac remote/cdp/test/browser/runtime/browser_executionContext.js | Test timed out - → Intermittent Fis remote/cdp/test/browser/runtime/browser_executionContext.js | Test timed out -
OS: macOS → All
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3d89bba5d547 [remote] Skip browser_runtime_executionContext.js for Fission. r=webdriver-reviewers,jdescottes
Severity: normal → S3

Looks like that bug 1644657 needs to be taken care of first before this test can be re-enabled.

Blocks: 1644657

For what it's worth, when this was disabled on Fission with comment #22, it was because the assert about event order at L117 was failing. We expect to get [DESTROYED, CLEARED, DESTROYED, CREATED] and instead we get [DESTROYED, CLEARED, CREATED, DESTROYED] (last 2 events are swapped).

This is still the case today, but it's not clear why we have this event ordering issue. The particular test which is failing is asserting an iframe navigation. We seem to rely on different events to detect that an execution context is destroyed between fission and non-fission.

I have not investigated in details, but when running with Fission enabled, for the failing tests we emit context-destroyed from DOMWindowCreated and inner-window-destroyed.

Whereas when Fission is disabled, we emit context-destroyed from DOMWindowCreated and pagehide.

Since different events are being used, it doesn't seem too surprising that we get a different order.

The test module name has been changed. It's browser_executionContextEvents.js now.

I think that the first thing we should do here is to figure out what the exact order of events should actually be, or if that matters at all. If we won't have to care about then I would suggest we simply adjust the expected list of events.

Summary: Intermittent Fis remote/cdp/test/browser/runtime/browser_executionContext.js | Test timed out - → Intermittent Fis remote/cdp/test/browser/runtime/browser_executionContextEvents.js | Test timed out -

(In reply to Julian Descottes [:jdescottes] from comment #25)

I have not investigated in details, but when running with Fission enabled, for the failing tests we emit context-destroyed from DOMWindowCreated and inner-window-destroyed.

Whereas when Fission is disabled, we emit context-destroyed from DOMWindowCreated and pagehide.

This is just because we have slightly different rules on whether something is allowed into the BFCache with session history in the child vs session history in the parent. You can easily get the same difference in behaviour without Fission (1) by making the page use a feature that blocks it from the BFCache for both session history implementations.

I'm not sure I understand why we don't use pagehide for all cases here? It is fired whenever we leave a page. Alexandre, it looks like you implemented this in bug 1543098, is there a reason to use inner-window-destroyed for pages that don't go into the BFCache, and to not just use pagehide for all pages?

Note that this is one of the tests that blocks us from turning on session history in the parent without Fission. I could just annotate this test as failing with session history in the parent as opposed to Fission, but that will effectively turn off the test completely on desktop.

(1) without session history in the parent really

Flags: needinfo?(poirot.alex)

(In reply to Peter Van der Beken [:peterv] from comment #27)

Note that this is one of the tests that blocks us from turning on session history in the parent without Fission. I could just annotate this test as failing with session history in the parent as opposed to Fission, but that will effectively turn off the test completely on desktop.

Oh this is news to me and I haven't heard before that this is a blocker for SHIP to be enabled for non-Fission. Nevertheless given that we do not claim that we will support non-Fission we agreed on to just update the test and switch the order of events. No CDP consumer complained about that yet and it seems to be fine to not emit those events in the right order. Our test might be too strict.

Also CDP is basically in maintenance mode and we are only going to fix important issues. This bug is not that important that any time investment would be wisely spent.

I'll attach a patch which will work with Fission enabled and skips the test with Fission disabled.

Flags: needinfo?(poirot.alex)
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Points: --- → 1
Keywords: leave-open
Priority: P5 → P1

(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #28)

Oh this is news to me and I haven't heard before that this is a blocker for SHIP to be enabled for non-Fission.

It's fairly recent news to us too :-).

Also CDP is basically in maintenance mode and we are only going to fix important issues. This bug is not that important that any time investment would be wisely spent.

Fine by me. As I said, using pageHide should work, so if this breaks something we should have a potential fix.

Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4449a6de2a70 [cdp] Fix test eventsWhenNavigatingFrameSet for expected event order. r=webdriver-reviewers,jdescottes
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 113 Branch
Blocks: 1736121
Keywords: test-disabled
Whiteboard: [stockwell disabled]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: