Closed Bug 1588357 Opened 3 months ago Closed 1 month ago

Crash in [@ mozilla::dom::ServiceWorkerRegistrationInfo::AddInstance] Assertion firing: "MOZ_DIAGNOSTIC_ASSERT(aDescriptor.Id() == mDescriptor.Id())"

Categories

(Core :: DOM: Service Workers, defect, P1, critical)

Unspecified
All
defect

Tracking

()

VERIFIED FIXED
mozilla73
Tracking Status
firefox-esr68 --- wontfix
firefox70 --- wontfix
firefox71 --- wontfix
firefox72 --- wontfix
firefox73 --- verified

People

(Reporter: jseward, Assigned: ytausky)

References

Details

(Keywords: crash, regression, testcase)

Crash Data

Attachments

(2 files)

This bug is for crash report bp-1291561f-af57-432e-9828-455a00191010.

This crash was reported 3 times in 2 different instances in the Windows nightly 20191010092637. Maybe related to bug 1588356?

Top 10 frames of crashing thread:

0 xul.dll void mozilla::dom::ServiceWorkerRegistrationInfo::AddInstance dom/serviceworkers/ServiceWorkerRegistrationInfo.cpp:101
1 xul.dll void mozilla::dom::ServiceWorkerRegistrationProxy::InitOnMainThread dom/serviceworkers/ServiceWorkerRegistrationProxy.cpp:88
2 xul.dll nsresult mozilla::detail::RunnableMethodImpl< xpcom/threads/nsThreadUtils.h:1176
3 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1225
4 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
5 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:110
6 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:308
7 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:290
8 xul.dll nsBaseAppShell::Run widget/nsBaseAppShell.cpp:137
9 xul.dll nsAppShell::Run widget/windows/nsAppShell.cpp:406

Flags: needinfo?(bugmail)

Hello! It seems that I get this crash report after entering on https://news.google.com/?hl=en-US&gl=US&ceid=US:en and fast middle-clicking on a random link multiple times (for me it crashes at around 30-40 middle clicks). It may be intermittent. I also observed that the crash doesn't happen on every link from google news. This example link crashes for me: Facebook's digital currency dealt another blow. I can't reproduce this crash while dom.serviceWorkers.parent_intercept is set on false.

Priority: -- → P2

So I believe the situation is this:

  • We lookup the registration in SWRP::InitOnMainThread via swm->GetRegistration(mDescriptor.PrincipalInfo(), mDescriptor.Scope()) here
  • Note that we don't pass the id, just the principal and the scope.
    • The scope lookup is exact, the scope matching algorithm is not run.
    • If the lookup returns null, we do NS_ENSURE_TRUE_VOID which causes us to early return and trigger the MakeScopeExit() cleanup.
  • Then we assert the id is the one on the descriptor we created it from.

A new registration has replaced the old registration for this to be the case. This either happens because of a register() job or in LoadRegistration (which really shouldn't happen).

So perhaps the most likely scenario is that the google news site has some logic in place that calls unregister() followed by register() and things are backlogged enough that this particular race happens.

Flags: needinfo?(bugmail)
Summary: Crash in [@ mozilla::dom::ServiceWorkerRegistrationInfo::AddInstance] → Crash in [@ mozilla::dom::ServiceWorkerRegistrationInfo::AddInstance] Assertion firing: "MOZ_DIAGNOSTIC_ASSERT(aDescriptor.Id() == mDescriptor.Id())"

¡Hola!

Got this crash on 1st startup after updating to Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0 ID:20191103213857

bp-7c4f6639-59d8-44a3-bc3e-5722e0191104 03/11/2019 09:36 p. m.

Updated status flags per the stats section.

¡Gracias!
Alex

The situation in 70 and 71 does seem identical to my comment 2 investigation.

Duplicate of this bug: 1599271

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

FYI the fuzzers are hitting this issue frequently.

OS: Windows 10 → All
Priority: P2 → P1
Assignee: nobody → ytausky
Blocks: 1601024

So far I cannot reproduce this. I'll write a tentative patch, but I'm still not sure how to test if it works or not.

It's possible that a registration will be replaced by another one
before its ServiceWorkerRegistration has finished initializing. In
that case, we shouldn't add the old registration as an instance of
the new one.

Attached file testcase.html

(In reply to Yaron Tausky [:ytausky] from comment #8)

So far I cannot reproduce this. I'll write a tentative patch, but I'm still not sure how to test if it works or not.

This testcase is reliable on Linux. It will reload a few times but it works.
STR:

  • download test case to a temporary directory
  • run a local web server to server the content. python -m SimpleHTTPServer from the temporary directory will do the trick
  • open the test case http://localhost:8000/testcase.html
  • wait 30 seconds
Flags: in-testsuite?
Keywords: testcase

Thanks, that's crashing on Mac as well, and the patch seems to solve it.

Unassigning myself before going away for 3 weeks, someone from the team will pick it up.

Assignee: ytausky → nobody
Assignee: nobody → ytausky
Status: NEW → ASSIGNED

needinfo to file spec issue you documented at https://phabricator.services.mozilla.com/D57132#1742345

Flags: needinfo?(perry)
Pushed by bugmail@asutherland.org:
https://hg.mozilla.org/integration/autoland/rev/1c513665e7fb
Handle edge case r=asuth,perry,dom-workers-and-storage-reviewers
Flags: needinfo?(perry)
Status: ASSIGNED → RESOLVED
Closed: 1 month ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla73

Is there a real-world user impact here which would justify considering this for Beta uplift approval or can this fix ride Fx73 to release? If this is just a diagnostic assert, we won't see the crashes when 72 goes to release, but I'm not sure if there's an underlying issue which still warrants consideration.

Flags: needinfo?(ytausky)

It's possible that this bug could be observed by a website if it did the right sequence of things (like what the testcase is doing), but I don't see any realistic use case for a "well behaved" site to do such things. So in theory there is real-world user impact, but in practice I'd say unlikely? (There is also no risk of a security vulnerability if a site does observe this bug).

Flags: needinfo?(ytausky)

Sounds like we can just let it ride with Fx73 then. Thanks!

This is also sw-e10s-only, which is 73-only.

Flags: qe-verify+

Reproduced this with Firefox 71.0a1 (20191013213650) on Windows 10x64 using the test case from comment 10. Firefox crashed after a while with the test case. Here is the crash report.
The issue is verified fixed with Firefox 73.0b2 (20200107212705) on Windows 10x64, macOS 10.15 and Ubuntu 18.04. No crashes encountered while loading the attached test case.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.