Closed Bug 1609665 Opened 3 months ago Closed 20 days ago

Crash in [@ mozilla::dom::ServiceWorkerRegistrationInfo::MaybeScheduleTimeCheckAndUpdate]

Categories

(Core :: DOM: Service Workers, defect, P1)

73 Branch
All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla76
Tracking Status
firefox-esr68 --- unaffected
firefox72 --- unaffected
firefox73 --- disabled
firefox74 --- fixed
firefox75 --- fixed
firefox76 --- fixed

People

(Reporter: philipp, Assigned: perry)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(3 files, 2 obsolete files)

This bug is for crash report bp-3f7cd4e1-1124-45ef-a3d4-6dddc0200116.

Top 10 frames of crashing thread:

0 xul.dll mozilla::dom::ServiceWorkerRegistrationInfo::MaybeScheduleTimeCheckAndUpdate dom/serviceworkers/ServiceWorkerRegistrationInfo.cpp:460
1 xul.dll std::_Func_impl_no_alloc<`lambda at z:/task_1579054948/build/src/dom/serviceworkers/ServiceWorkerPrivateImpl.cpp:460:7', void, mozilla::dom::ServiceWorkerOpResult&&>::_Do_call 
2 xul.dll mozilla::MozPromise<mozilla::dom::ServiceWorkerOpResult, mozilla::ipc::ResponseRejectReason, 1>::ThenValue<`lambda at z:/task_1579054948/build/src/dom/serviceworkers/ServiceWorkerPrivateImpl.cpp:840:7'>::DoResolveOrRejectInternal xpcom/threads/MozPromise.h:793
3 xul.dll mozilla::MozPromise<RefPtr<mozilla::AudioData>, mozilla::MediaResult, 1>::ThenValueBase::ResolveOrRejectRunnable::Run xpcom/threads/MozPromise.h:402
4 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1241
5 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
6 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:87
7 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:308
8 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:290
9 xul.dll nsBaseAppShell::Run widget/nsBaseAppShell.cpp:137

another serviceworker-related crash signature that's popping up during the 73.0b cycle.

This seems to be caused by a ServiceWorkerRegistrationInfo nullptr. I see several calls to MaybeScheduleTimeCheckAndUpdate:

	dom/serviceworkers/FetchEventOpChild.cpp
165	mRegistration->MaybeScheduleTimeCheckAndUpdate(); // found in mozilla::dom::(anonymous namespace)::SynthesizeResponseWatcher::CancelInterception
470	mRegistration->MaybeScheduleTimeCheckAndUpdate(); // found in mozilla::dom::FetchEventOpChild::Recv__delete__
678	mRegistration->MaybeScheduleTimeCheckAndUpdate(); // found in mozilla::dom::FetchEventOpChild::MaybeScheduleRegistrationUpdate
	dom/serviceworkers/ServiceWorkerPrivate.cpp
386	mRegistration->MaybeScheduleTimeCheckAndUpdate(); // found in mozilla::dom::(anonymous namespace)::RegistrationUpdateRunnable::Run
1573	registration->MaybeScheduleTimeCheckAndUpdate(); // found in mozilla::dom::ServiceWorkerPrivate::SendFetchEvent
	dom/serviceworkers/ServiceWorkerPrivateImpl.cpp
463	registration->MaybeScheduleTimeCheckAndUpdate(); // found in mozilla::dom::ServiceWorkerPrivateImpl::SendPushEventInternal
466	registration->MaybeScheduleTimeCheckAndUpdate(); // found in mozilla::dom::ServiceWorkerPrivateImpl::SendPushEventInternal

in this case we are inside ServiceWorkerPrivateImpl. None of these callers checks before, if mRegistration is nullptr. But I see several points where it is assigned mRegistration = nullptr;, so probably we need to understand, if we can do anything more meaningful than crash if mRegistration is nullptr (we could crash/assert only in debug builds?).

Flags: needinfo?(perry)
Assignee: nobody → perry
Flags: needinfo?(perry)

Based on stack frame 2 (or 1, depending on the report) in the crash report

mozilla::MozPromise<mozilla::dom::ServiceWorkerOpResult, mozilla::ipc::ResponseRejectReason, 1>::ThenValue<`lambda at z:/task_1579058349/build/src/dom/serviceworkers/ServiceWorkerPrivateImpl.cpp:840:7'>::DoResolveOrRejectInternal(mozilla::MozPromise<mozilla::dom::ServiceWorkerOpResult, mozilla::ipc::ResponseRejectReason, 1>::ResolveOrRejectValue&)

the null ServiceWorkerRegistrationInfo is used in ExecServiceWorkerOp's callback(s) (ServiceWorkerPrivateImpl.cpp:840:7 is ExecServiceWorkerOp). Out of all calls to ExecServiceWorkerOp, only SendPushEventInternal does MaybeScheduleTimeCheckAndUpdate in the callbacks.

Priority: -- → P1
See Also: → 1468042

Going to add some assertions to narrow down the crash, so marking leave-open.

Keywords: leave-open

Attempting to narrow down the crash of a null ServiceWorkerRegistrationInfo.

Many things may go wrong if mOrderedScopes isn't actually ordered. We can put
more of that responsibility on the STL.

Depends on D60326

Attachment #9121673 - Attachment is obsolete: true
Pushed by pjiang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e65f67152a00
promote diagnostic assert to release assert r=asuth

Comment on attachment 9121667 [details]
Bug 1609665 - promote diagnostic assert to release assert r?asuth

Beta/Release Uplift Approval Request

  • User impact if declined: We believe there will be no impact because this MOZ_RELEASE_ASSERT should catch the nullptr dereference before it actually happens and crash earlier instead. If so, then the patch won't change the amount of crashes. If not, then there will be a new assertion failure to fix.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Not risky because it is only changing a MOZ_DIAGNOSTIC_ASSERT to MOZ_RELEASE_ASSERT. This bug's crash is only showing up on beta, where MOZ_DIAGNOSTIC_ASSERTs are removed.
  • String changes made/needed:
Attachment #9121667 - Flags: approval-mozilla-beta?

Comment on attachment 9121667 [details]
Bug 1609665 - promote diagnostic assert to release assert r?asuth

Shifts the crash from a nullptr crash to a release assert so we can hopefully get better diagnostics. Approved for 73.0b7.

Attachment #9121667 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9121667 [details]
Bug 1609665 - promote diagnostic assert to release assert r?asuth

Clearing the approval flag to get this off the needs-uplift radar.

Attachment #9121667 - Flags: approval-mozilla-beta+

Comment on attachment 9122518 [details]
Bug 1609665 - promote diagnostic assert to release assert r?asuth

Beta/Release Uplift Approval Request

Flags: needinfo?(perry)
Attachment #9122518 - Flags: approval-mozilla-beta?

Comment on attachment 9122518 [details]
Bug 1609665 - promote diagnostic assert to release assert r?asuth

Same as comment 9. Approved for 73.0b9.

Attachment #9122518 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9122518 [details]
Bug 1609665 - promote diagnostic assert to release assert r?asuth

Pushed to Beta:
https://hg.mozilla.org/releases/mozilla-beta/rev/eca62a891f1d

Also clearing the approval flag to get this off the needs-uplift radar.

Attachment #9122518 - Flags: approval-mozilla-beta+
Attachment #9121667 - Attachment is obsolete: true

sw-e10s was disabled in 73.0b9

I'm working on this, but the crash is coming from a callback for push notifications and it's difficult to tell what the state is when the callback is registered. It's possible that the Service Worker's "ordered scopes" is buggy (which could explain the crash), so I'll upload a patch for that as a first step.

Attachment #9126299 - Attachment description: Bug 1609665 - use library functions to maintain ordered scopes r?#dom-workers-and-storage → Bug 1609665 - use ScopeContainer instead of plain nsTArray for SW scopes r?#dom-workers-and-storage
Pushed by pjiang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/353a831e9eab
use ScopeContainer instead of plain nsTArray for SW scopes r=dom-workers-and-storage-reviewers,sg,asuth

crashes with the assert are now showing up with the [@ mozilla::dom::ServiceWorkerManager::SendPushEvent ] signature

Crash Signature: [@ mozilla::dom::ServiceWorkerRegistrationInfo::MaybeScheduleTimeCheckAndUpdate] → [@ mozilla::dom::ServiceWorkerRegistrationInfo::MaybeScheduleTimeCheckAndUpdate] [@ mozilla::dom::ServiceWorkerManager::SendPushEvent ]
See Also: → 1616254

Also adds ScopeToPrincipal function to convert scope nsIURI/strings to principals.

Pushed by bugmail@asutherland.org:
https://hg.mozilla.org/integration/autoland/rev/2d889781b71a
push notifications should use registrations with an exact scope match r=dom-workers-and-storage-reviewers,asuth

Comment on attachment 9128265 [details]
Bug 1609665 - push notifications should use registrations with an exact scope match r?#dom-workers-and-storage-reviewers

Beta/Release Uplift Approval Request

  • User impact if declined: nullptr dereference when handling push notifications, which may crash.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Not risky because it just does an early-return when it sees a particular nullptr, instead of saving it and trying to de-reference later.
  • String changes made/needed:
Attachment #9128265 - Flags: approval-mozilla-beta?

Comment on attachment 9128265 [details]
Bug 1609665 - push notifications should use registrations with an exact scope match r?#dom-workers-and-storage-reviewers

Service-worker crash fix, one of the dependencies to allow shipping sw-e10s ship in 74, uplift approved for 74.0b8, thanks.

Attachment #9128265 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Status: NEW → RESOLVED
Closed: 1 month ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla75

Perry, this seems to have spiked big again on Nightly. Any ideas why?

Status: RESOLVED → REOPENED
Flags: needinfo?(perry)
Resolution: FIXED → ---
Target Milestone: mozilla75 → ---

Bug 1407276 looks like the only plausible change in that range.

Flags: needinfo?(echuang)

With any luck, the spike should go away in the next set of Nightlies with bug 1407276 backed out.

(In reply to Ryan VanderMeulen [:RyanVM] from comment #31)

Bug 1407276 looks like the only plausible change in that range.

Yes, it is caused by bug 1407276's patch. The core dump stack must be only produced by the patch.

Flags: needinfo?(echuang)
Flags: needinfo?(perry)
Duplicate of this bug: 1621746
Status: REOPENED → RESOLVED
Closed: 1 month ago20 days ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76
You need to log in before you can comment on or make changes to this bug.