Closed Bug 1461181 Opened 6 years ago Closed 6 years ago

Crash in mozilla::dom::ClientHandle::Control

Categories

(Core :: DOM: Service Workers, defect, P2)

61 Branch
All
Unspecified
defect

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox60 --- unaffected
firefox61 + fixed
firefox62 --- fixed

People

(Reporter: philipp, Assigned: bkelly)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is
report bp-f33d376f-c841-485d-9110-608710180513.
=============================================================

Top 10 frames of crashing thread:

0 xul.dll mozilla::dom::ClientHandle::Control dom/clients/manager/ClientHandle.cpp:123
1 xul.dll mozilla::dom::ServiceWorkerManager::StartControllingClient dom/serviceworkers/ServiceWorkerManager.cpp:344
2 xul.dll mozilla::dom::ServiceWorkerManager::StartControlling dom/serviceworkers/ServiceWorkerManager.cpp:1886
3 xul.dll nsDocShell::MaybeCreateInitialClientSource docshell/base/nsDocShell.cpp:2862
4 xul.dll nsDocShell::DoChannelLoad docshell/base/nsDocShell.cpp:11172
5 xul.dll nsDocShell::DoURILoad docshell/base/nsDocShell.cpp:10968
6 xul.dll nsDocShell::InternalLoad docshell/base/nsDocShell.cpp:10291
7 xul.dll nsDocShell::LoadURI docshell/base/nsDocShell.cpp:1011
8 xul.dll nsFrameLoader::ReallyStartLoadingInternal dom/base/nsFrameLoader.cpp:558
9 xul.dll nsIDocument::MaybeInitializeFinalizeFrameLoaders dom/base/nsDocument.cpp:6719

=============================================================

this crash signature is rising in volume on the beta channel with the start of the 61.0b rollout. perhaps that's related to bug 1425975.
Address suggests a nullptr de-ref somewhere.  I'll investigate on Monday.
Flags: needinfo?(bkelly)
Its not obvious to me how we could be getting a nullptr here unless TLS is lying to us.  I'm going to add some diagnostic assertions.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Flags: needinfo?(bkelly)
Depends on: 1461326
Priority: -- → P2
AFAICT none of my new assertions in bug 1461326 have triggered.  That's good, I guess.

Looking closer at this code I think I see maybe what could be happening now.  The SWM method is effectively doing:


  MOZ_DIAGNOSTIC_ASSERT(aRegistrationInfo->GetActive());

  const ServiceWorkerDescriptor& active =
    aRegistrationInfo->GetActive()->Descriptor();

  RefPtr<ClientHandle> clientHandle =
    ClientManager::CreateHandle(aClientInfo,
                                SystemGroup::EventTargetFor(TaskCategory::Other));

  ref = Move(clientHandle->Control(active));

I think maybe GetActive() is returning nullptr.  The diagnostic assert does not catch it because we are on beta.  Perhaps accessing GetActive()->Descriptor() does not trigger a crash either because we're just storing a pointer reference.  Its not until we try to access the contents of that pointer reference later that we crash.
Comment on attachment 8979238 [details] [diff] [review]
Don't call ServiceWorkerManager::StartControllingClient() if there is no active worker. r=asuth

Andrew, see comment 3 for what I think is happening.  This patch tries to ensure that we don't call StartControllingClient() if the registration does not have an active worker.
Attachment #8979238 - Flags: review?(bugmail)
Attachment #8979238 - Flags: review?(bugmail) → review+
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d5ec7b9c344a
Don't call ServiceWorkerManager::StartControllingClient() if there is no active worker. r=asuth
Ryan, if comment 6 fixes this issue, do you think we should revert the release asserts I have added back to diagnostic asserts to avoid shipping them unnecessarily on release?  In theory it should be fine, but it is more release asserts than we normally use.
Flags: needinfo?(ryanvm)
https://hg.mozilla.org/mozilla-central/rev/d5ec7b9c344a
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
What are the trade-offs to keeping them? Perf? I assume we'd just crash somewhere else, so stability wouldn't be affected much one way or the other?
Flags: needinfo?(ryanvm)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #9)
> What are the trade-offs to keeping them? Perf? I assume we'd just crash
> somewhere else, so stability wouldn't be affected much one way or the other?

Some very minor perf hit.  Its possible they might trigger on general memory corruption issues which might be more noisy in general release population.  Not sure catching those are bad, though.

I guess I'll leave them for now.
Comment on attachment 8979238 [details] [diff] [review]
Don't call ServiceWorkerManager::StartControllingClient() if there is no active worker. r=asuth

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1425975
[User impact if declined]: Low frequency crashes.  I believe this probably only triggers when a service worker does clients.claim() on activation and takes control of a page in a different process.  Its basically our cross-process service worker propagation problem.
[Is this code covered by automated tests?]: We have a lot of tests, but it does not trigger the race condition.
[Has the fix been verified in Nightly?]: Its landed in nightly, but we don't have STR to verify.
[Needs manual test from QE? If yes, steps to reproduce]:  No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Minimal
[Why is the change risky/not risky?]: This adds some additional checks for nullptr and a release assertion that we don't ever pass a nullptr here.
[String changes made/needed]: None
Attachment #8979238 - Flags: approval-mozilla-beta?
Comment on attachment 8979238 [details] [diff] [review]
Don't call ServiceWorkerManager::StartControllingClient() if there is no active worker. r=asuth

Simple fix for a ServiceWorker crash. Approved for 61.0b8.
Attachment #8979238 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.