Closed Bug 1604943 Opened 4 years ago Closed 4 years ago

Assertion failure: newest (The Update algorithm should have been aborted already if there wasn't a newest worker), at dom/serviceworkers/ServiceWorkerManager.cpp:2410

Categories

(Core :: DOM: Service Workers, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla74
Tracking Status
firefox-esr68 --- disabled
firefox72 --- disabled
firefox73 --- wontfix
firefox74 --- fixed

People

(Reporter: tsmith, Assigned: perry)

References

(Blocks 2 open bugs)

Details

(Keywords: assertion, crash, Whiteboard: [fuzzblocker])

Crash Data

Attachments

(1 file)

Found with m-c: 20191218-609b5711fb9d

Reducing a test case for this issue has been difficult. I will create a Pernosco session for the time being.

Assertion failure: newest (The Update algorithm should have been aborted already if there wasn't a newest worker), at dom/serviceworkers/ServiceWorkerManager.cpp:2410

#0 mozilla::dom::ServiceWorkerManager::UpdateInternal(nsIPrincipal*, nsTSubstring<char> const&, mozilla::dom::ServiceWorkerUpdateFinishCallback*) /src/dom/serviceworkers/ServiceWorkerManager.cpp:2408:3
#1 mozilla::dom::ServiceWorkerManager::Update(nsIPrincipal*, nsTSubstring<char> const&, mozilla::dom::ServiceWorkerUpdateFinishCallback*) /src/dom/serviceworkers/ServiceWorkerManager.cpp:2362:5
#2 mozilla::dom::ServiceWorkerRegistrationProxy::DelayedUpdate::Notify(nsITimer*) /src/dom/serviceworkers/ServiceWorkerRegistrationProxy.cpp:309:8
#3 nsTimerImpl::Fire(int) /src/xpcom/threads/nsTimerImpl.cpp:564:39
#4 nsTimerEvent::Run() /src/xpcom/threads/TimerThread.cpp:259:11
#5 nsThread::ProcessNextEvent(bool, bool*) /src/xpcom/threads/nsThread.cpp:1241:14
#6 NS_ProcessNextEvent(nsIThread*, bool) /src/xpcom/threads/nsThreadUtils.cpp:486:10
#7 mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) /src/ipc/glue/MessagePump.cpp:87:21
#8 RunInternal /src/ipc/chromium/src/base/message_loop.cc:315:10
#9 RunHandler /src/ipc/chromium/src/base/message_loop.cc:308:3
#10 MessageLoop::Run() /src/ipc/chromium/src/base/message_loop.cc:290:3
#11 nsBaseAppShell::Run() /src/widget/nsBaseAppShell.cpp:137:27
#12 nsAppStartup::Run() /src/toolkit/components/startup/nsAppStartup.cpp:272:30
#13 XREMain::XRE_mainRun() /src/toolkit/xre/nsAppRunner.cpp:4600:22
#14 XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) /src/toolkit/xre/nsAppRunner.cpp:4737:8
#15 XRE_main(int, char**, mozilla::BootstrapConfig const&) /src/toolkit/xre/nsAppRunner.cpp:4818:21
#16 do_main /src/browser/app/nsBrowserApp.cpp:217:22
#17 main /src/browser/app/nsBrowserApp.cpp:339:16

A Pernosco session is available here: https://pernos.co/debug/TvrVCsuNk3OOu5OXkycrhA/index.html

Priority: -- → P3

I had a short look into the pernosco trace. I do not see much algorithm before that could have aborted the update. What does this assert want to check? (see also my notes in pernosco)

Flags: needinfo?(perry)

BTW: Without assert we would have a null ptr access. So just removing it would make things worse.

Assignee: nobody → perry

The assertion is verifying that the check in step 3 of ServiceWorkerRegistration.update() was performed, the invariant being that once a ServiceWorkerRegistration has a newest worker (and passes the check in step 3), it should never not have a newest worker (although the newest worker may change). I believe we do maintain this invariant, but the issue appears to be that because Update looks up the associated ServiceWorkerRegistrationInfo by scope and is an async process, it's possible that a new ServiceWorkerRegistrationInfo with the same scope has replaced the previous one while the update is running. The new ServiceWorkerRegistrationInfo possibly has no newest worker, triggering the assertion.

With that said, Update does appear to account for if there isn't a newest worker (e.g. step 4 and 9.2), so I'm inclined to believe that this assertion should be removed.

Flags: needinfo?(perry)

Probably rather than only removed replaced with a graceful return, otherwise we will probably crash here: null ptr ?

Flags: needinfo?(perry)
  • Cleaned up some comments.
  • Changed some const nsACString& parameter types to nsCString to accept r-value
    and l-value nsCStrings.
Pushed by pjiang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/889c88363688
ServiceWorkerRegistration.update should capture newest worker script URL at call-time r=dom-workers-and-storage-reviewers,asuth

(In reply to Jens Stutte [:jstutte] from comment #5)

Probably rather than only removed replaced with a graceful return, otherwise we will probably crash here: null ptr ?

Replacing it with a graceful return makes sense to me, but it looks like the Update algorithm still proceeds at that step 4 even if the newest worker is null. In that case, it is possible that there is a truly "newest" worker whose script URL we should be using instead, which is why gracefully returning and not proceeding with our outdated script URL sounds like the right move. Seems worthy of a spec issue!

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla74

IIUC, this bug only applies to SW-e10s. That said, it looks pretty scary to uplift to Fx73. LMK if you strongly disagree, though.

I believe this is a pretty far out there edge case, so I don't have any strong opinions about uplifting.

Crash Signature: [@ mozilla::dom::ServiceWorkerManager::UpdateInternal ]

I see 66 crashes in 2020 with 5 of them being in 71 or 72. I don't know if that warrants a scary uplift but just wanted to point to the data we have.

Comment on attachment 9118992 [details]
Bug 1604943 - ServiceWorkerRegistration.update should capture newest worker script URL at call-time r?#dom-workers-and-storage

Beta/Release Uplift Approval Request

  • User impact if declined: Browser will crash sometimes. The conditions required for this crash to occur rely on a race between two ServiceWorkers being updated/registered.

Not verified fixed in Nightly because the patch involves removing the assertion that fired, but there has been nothing regressed by the patch as far as I know.

  • Is this code covered by automated tests?: Unknown
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): While the LOC changed in the patch is relatively high, the change is simply copying a string from a content process to be consumed by the parent process. Previously, the consumed string would be read from an object in the parent process. The vast majority of the changes involve copying/moving this string around, which I would say makes it low risk.
  • String changes made/needed:
Attachment #9118992 - Flags: approval-mozilla-beta?

Comment on attachment 9118992 [details]
Bug 1604943 - ServiceWorkerRegistration.update should capture newest worker script URL at call-time r?#dom-workers-and-storage

this crash seems rare enough that I'm going to stick to the assessment in comment 11, for 73.

Attachment #9118992 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: