Closed Bug 1641874 Opened 4 years ago Closed 4 years ago

Perma Assertion failure: mDocGroup->MatchesKey(docGroupKey), at /builds/worker/checkouts/gecko/dom/base/Document.cpp:3752 when Gecko 78 merges to Beta on 2020-06-01

Categories

(Core :: DOM: postMessage, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla79
Tracking Status
firefox-esr68 --- unaffected
firefox76 --- unaffected
firefox77 --- unaffected
firefox78 + fixed
firefox79 --- verified

People

(Reporter: aryx, Assigned: tt)

References

(Regression)

Details

(Keywords: intermittent-failure, regression)

Attachments

(2 files, 1 obsolete file)

central-as-beta simulation: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunnable&revision=a7415c718528d68105906a186ba8a78a05e4fc43&selectedTaskRun=biHcQ4jMRjC8ncoG25OKAg-0

Haven't verified it's from bug 1601594.

Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=304256296&repo=try

[task 2020-05-29T12:48:37.075Z] 12:48:37 INFO - TEST-START | /workers/postMessage_clone_port.htm
[task 2020-05-29T12:48:37.076Z] 12:48:37 INFO - Clearing pref browser.tabs.remote.useCrossOriginOpenerPolicy
[task 2020-05-29T12:48:37.092Z] 12:48:37 INFO - Clearing pref browser.tabs.remote.useCrossOriginEmbedderPolicy
[task 2020-05-29T12:48:37.095Z] 12:48:37 INFO - Clearing pref javascript.options.shared_memory
[task 2020-05-29T12:48:37.102Z] 12:48:37 INFO - Clearing pref dom.postMessage.sharedArrayBuffer.withCOOP_COEP
[task 2020-05-29T12:48:37.118Z] 12:48:37 INFO - Closing window 45
[task 2020-05-29T12:48:37.139Z] 12:48:37 INFO - PID 3660 | Assertion failure: mDocGroup->MatchesKey(docGroupKey), at /builds/worker/checkouts/gecko/dom/base/Document.cpp:3752
[task 2020-05-29T12:48:37.170Z] 12:48:37 INFO - STDOUT: Initializing stack-fixing for the first stack frame, this may take a while...
[task 2020-05-29T12:48:37.600Z] 12:48:37 ERROR - Traceback (most recent call last):
[task 2020-05-29T12:48:37.600Z] 12:48:37 ERROR - File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py", line 126, in run_test
[task 2020-05-29T12:48:37.600Z] 12:48:37 ERROR - return self.executor.run_test(test)
[task 2020-05-29T12:48:37.600Z] 12:48:37 ERROR - File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/base.py", line 255, in run_test
[task 2020-05-29T12:48:37.600Z] 12:48:37 ERROR - self.on_environment_change(test.environment)
[task 2020-05-29T12:48:37.600Z] 12:48:37 ERROR - File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 707, in on_environment_change
[task 2020-05-29T12:48:37.600Z] 12:48:37 ERROR - self.protocol.testharness.load_runner(new_environment["protocol"])
[task 2020-05-29T12:48:37.600Z] 12:48:37 ERROR - File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 136, in load_runner
[task 2020-05-29T12:48:37.600Z] 12:48:37 ERROR - self._close_windows()
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 176, in _close_windows
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - self.marionette.switch_to_window(runner_handle)
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py", line 1361, in switch_to_window
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - {"focus": focus, "name": handle, "handle": handle})
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/decorators.py", line 26, in _
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - return func(*args, **kwargs)
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py", line 602, in _send_message
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - self._handle_error(err)
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py", line 622, in _handle_error
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - raise errors.lookup(error)(message, stacktrace=stacktrace)
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - NoSuchWindowException: Unable to locate window: 7
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - stacktrace:
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - WebDriverError@chrome://marionette/content/error.js:175:5
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - NoSuchWindowError@chrome://marionette/content/error.js:409:5
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - GeckoDriver.prototype.switchToWindow@chrome://marionette/content/driver.js:1602:11
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - despatch@chrome://marionette/content/server.js:305:40
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - execute@chrome://marionette/content/server.js:275:16
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - onPacket/<@chrome://marionette/content/server.js:248:20
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - onPacket@chrome://marionette/content/server.js:249:9
[task 2020-05-29T12:48:37.601Z] 12:48:37 ERROR - _onJSONObjectReady/<@chrome://marionette/content/transport.js:501:20

Assignee: nobody → ttung
Status: NEW → ASSIGNED
Flags: needinfo?(ttung)

Hitting this assertion means the document belongs to two different doc groups. And it's probably because the value CrossOriginIsolated gets change during the lifetime of the document.

I can reproduce this issue with ./mach web-platform-tests --headless testing/web-platform/tests/workers/postMessage_block.https.html testing/web-platform/tests/workers/postMessage_clone_port.htm and central-as-beta sim in my local machine.

The same setting and command didn't hit the assertion for Nightly, so I lean to this is an issue related to Nighly only stuff.

postMessage_block.https.html itself has headers (COOP, COEP) set and thus it should be run in webCoop_Coep process.
postMessage_clone_port.htm itself doesn't have headers set.

Only ./mach web-platform-tests --headless testing/web-platform/tests/workers/postMessage_block.https.html or ./mach web-platform-tests --headless testing/web-platform/tests/workers/postMessage_clone_port.htm doesn't hit the assertion

The interesting frames are that:

 0:57.45 TEST_START: /workers/postMessage_block.https.html
...
 1:29.18 TEST_END: Test OK. Subtests passed 1/1. Unexpected 0
 1:29.18 TEST_START: /workers/postMessage_clone_port.htm
 1:29.19 INFO Clearing pref browser.tabs.remote.useCrossOriginOpenerPolicy
 1:29.30 INFO Clearing pref browser.tabs.remote.useCrossOriginEmbedderPolicy
 1:29.37 INFO Clearing pref javascript.options.shared_memory
 1:29.44 INFO Clearing pref dom.postMessage.sharedArrayBuffer.withCOOP_COEP
 1:29.54 INFO Closing window 28
 1:29.79 pid:30091 Assertion failure: mDocGroup->MatchesKey(docGroupKey), at /home/shes050117/Work/mozilla-central/dom/base/Document.cpp:3752

I wonder if the cause can be changing the prefs after the next test starts. If so, we can enable the prefs for all the tests in the directory to temporarily stop the crash and looking into the issue. (we still need to find the root cause, though)

I wonder if it's something like:
postMessage_block.https.html creates a webCoop_Coep process and use it for the test
postMessage_clone_port.htm reuses the process but it shouldn't because it doesn't have the headers set.

Anyway, I have recorded it with rr and will take a closer look next week.

// Hit the assertion
(rr) p docGroupKey
$1 = {<nsTString<char>> = "https://web-platform.test", static kStorageSize = 64, mInlineCapacity = 63,
  mStorage = "https://web-platform.test\000", '\344' <repeats 15 times>, "䪪", '\252' <repeats 20 times>}
(rr) p NodePrincipal()->mURI->GetSpecOrDefault()
$2 = "https://web-platform.test:8443/workers/postMessage_block.https.html"
(rr) p GetBrowsingContext()->CrossOriginIsolated()
$3 = false
(rr) p GetBrowsingContext()->GetOpenerPolicy()
$6 = (const nsILoadInfo::CrossOriginOpenerPolicy &) @0x7f826111c1f8: nsILoadInfo::OPENER_POLICY_SAME_ORIGIN_EMBEDDER_POLICY_REQUIRE_CORP
(rr) p ContentChild::GetSingleton()->GetRemoteType()
$7 = (const nsAString &) @0x7f82808ca888: {<mozilla::detail::nsTStringRepr<char16_t>> = {mData = 0x7f826128bd88 u"webCOOP+COEP=https://web-platform.test", mLength = 38,
    mDataFlags = (mozilla::detail::StringDataFlags::TERMINATED | mozilla::detail::StringDataFlags::REFCOUNTED), mClassFlags = mozilla::detail::StringClassFlags::NULL_TERMINATED},
  static kMaxCapacity = 1073741817}

So, it looks like the document for postMessage_block is about to be destructed. However, the value of CrossOriginIsolated() was false (which was expected to be true) and thus the docGroupKey was the value of SiteOrigin rather than Origin of the NodePrincipal. So that it hit the assertion.

In short, the issue is that the pref changes during the lifetime of a document. It didn't matter before bug 1601594 because it only affects the result of CrossOriginIsolated(). However, we link CrossOriginIsolated and DocGroup together in bug 1601594 and thus we hit this assertion here.

I guess we can fix this by only evaluating the pref when the browser starts so that the value of CrossOriginIsolated won't be affected by changing the value the pref (dom.postMessage.sharedArrayBuffer.withCOOP_COEP).

Severity: -- → S3
Priority: -- → P2

(In reply to Tom Tung [:tt, :ttung] from comment #3)

I guess we can fix this by only evaluating the pref when the browser starts so that the value of CrossOriginIsolated won't be affected by changing the value the pref (dom.postMessage.sharedArrayBuffer.withCOOP_COEP).

It seems that this approach doesn't work well with the current web-platform test since that appears to set the pref after the browser startup.

I'm considering caching the pref value on the cpp object itself instead.

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4530435be2c4ad29121f0b7b129664ee3828faad

From the result of that, it seems that the issue is fixed by the patch.

Depends on D78282

Attachment #9153378 - Attachment is obsolete: true
Pushed by ttung@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/45045a6a1b24 Make the pref unchangeable at runtime; r=nika https://hg.mozilla.org/integration/autoland/rev/df279d4082d8 Update test expectations; r=baku

Ah, it's probably because I removed setting dom.postMessage.sharedArrayBuffer.bypassCOOP_COEP.insecure.enabled to true in the second patch and I shouldn't. I will verify that. Sorry for this.

Flags: needinfo?(ttung)
Pushed by ttung@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/aa72c8fda60e Make the pref unchangeable at runtime; r=nika https://hg.mozilla.org/integration/autoland/rev/156de44a647f Update test expectations; r=baku

Thank you for fixing this. Please request beta approval.

Flags: needinfo?(ttung)

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #16)

Thank you for fixing this. Please request beta approval.

I will tomorrow (just to ensure there is no other issue comes out)

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79

Comment on attachment 9154188 [details]
Bug 1641874 - Update test expectations;

Beta/Release Uplift Approval Request

  • User impact if declined: If they flip the pref during runtime, it might cause a document to find an incorrect doc group. Note that this wouldn't affect most users/websites. It should only affect websites that have applied COOP and COEP stuff.

Note that we don't have a test to directly test this behavior, but we flip the pref in wpt tests at runtime. So I put "unknown" below.

  • Is this code covered by automated tests?: Unknown
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It only makes the pref be evaluated once at startup so should not be risky.
  • String changes made/needed:
Flags: needinfo?(ttung)
Attachment #9154188 - Flags: approval-mozilla-beta?
Attachment #9154187 - Flags: approval-mozilla-beta?

Comment on attachment 9154187 [details]
Bug 1641874 - Make the pref unchangeable at runtime;

approved for 78.0b6

Attachment #9154187 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9154188 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: