Closed Bug 1461181 Opened 4 years ago Closed 4 years ago
Crash in mozilla::dom::Client
This bug was filed from the Socorro interface and is report bp-f33d376f-c841-485d-9110-608710180513. ============================================================= Top 10 frames of crashing thread: 0 xul.dll mozilla::dom::ClientHandle::Control dom/clients/manager/ClientHandle.cpp:123 1 xul.dll mozilla::dom::ServiceWorkerManager::StartControllingClient dom/serviceworkers/ServiceWorkerManager.cpp:344 2 xul.dll mozilla::dom::ServiceWorkerManager::StartControlling dom/serviceworkers/ServiceWorkerManager.cpp:1886 3 xul.dll nsDocShell::MaybeCreateInitialClientSource docshell/base/nsDocShell.cpp:2862 4 xul.dll nsDocShell::DoChannelLoad docshell/base/nsDocShell.cpp:11172 5 xul.dll nsDocShell::DoURILoad docshell/base/nsDocShell.cpp:10968 6 xul.dll nsDocShell::InternalLoad docshell/base/nsDocShell.cpp:10291 7 xul.dll nsDocShell::LoadURI docshell/base/nsDocShell.cpp:1011 8 xul.dll nsFrameLoader::ReallyStartLoadingInternal dom/base/nsFrameLoader.cpp:558 9 xul.dll nsIDocument::MaybeInitializeFinalizeFrameLoaders dom/base/nsDocument.cpp:6719 ============================================================= this crash signature is rising in volume on the beta channel with the start of the 61.0b rollout. perhaps that's related to bug 1425975.
Address suggests a nullptr de-ref somewhere. I'll investigate on Monday.
Its not obvious to me how we could be getting a nullptr here unless TLS is lying to us. I'm going to add some diagnostic assertions.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
AFAICT none of my new assertions in bug 1461326 have triggered. That's good, I guess. Looking closer at this code I think I see maybe what could be happening now. The SWM method is effectively doing: MOZ_DIAGNOSTIC_ASSERT(aRegistrationInfo->GetActive()); const ServiceWorkerDescriptor& active = aRegistrationInfo->GetActive()->Descriptor(); RefPtr<ClientHandle> clientHandle = ClientManager::CreateHandle(aClientInfo, SystemGroup::EventTargetFor(TaskCategory::Other)); ref = Move(clientHandle->Control(active)); I think maybe GetActive() is returning nullptr. The diagnostic assert does not catch it because we are on beta. Perhaps accessing GetActive()->Descriptor() does not trigger a crash either because we're just storing a pointer reference. Its not until we try to access the contents of that pointer reference later that we crash.
Comment on attachment 8979238 [details] [diff] [review] Don't call ServiceWorkerManager::StartControllingClient() if there is no active worker. r=asuth Andrew, see comment 3 for what I think is happening. This patch tries to ensure that we don't call StartControllingClient() if the registration does not have an active worker.
Attachment #8979238 - Flags: review?(bugmail)
Attachment #8979238 - Flags: review?(bugmail) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/d5ec7b9c344a Don't call ServiceWorkerManager::StartControllingClient() if there is no active worker. r=asuth
Ryan, if comment 6 fixes this issue, do you think we should revert the release asserts I have added back to diagnostic asserts to avoid shipping them unnecessarily on release? In theory it should be fine, but it is more release asserts than we normally use.
What are the trade-offs to keeping them? Perf? I assume we'd just crash somewhere else, so stability wouldn't be affected much one way or the other?
(In reply to Ryan VanderMeulen [:RyanVM] from comment #9) > What are the trade-offs to keeping them? Perf? I assume we'd just crash > somewhere else, so stability wouldn't be affected much one way or the other? Some very minor perf hit. Its possible they might trigger on general memory corruption issues which might be more noisy in general release population. Not sure catching those are bad, though. I guess I'll leave them for now.
Comment on attachment 8979238 [details] [diff] [review] Don't call ServiceWorkerManager::StartControllingClient() if there is no active worker. r=asuth Approval Request Comment [Feature/Bug causing the regression]: Bug 1425975 [User impact if declined]: Low frequency crashes. I believe this probably only triggers when a service worker does clients.claim() on activation and takes control of a page in a different process. Its basically our cross-process service worker propagation problem. [Is this code covered by automated tests?]: We have a lot of tests, but it does not trigger the race condition. [Has the fix been verified in Nightly?]: Its landed in nightly, but we don't have STR to verify. [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: Minimal [Why is the change risky/not risky?]: This adds some additional checks for nullptr and a release assertion that we don't ever pass a nullptr here. [String changes made/needed]: None
Attachment #8979238 - Flags: approval-mozilla-beta?
Comment on attachment 8979238 [details] [diff] [review] Don't call ServiceWorkerManager::StartControllingClient() if there is no active worker. r=asuth Simple fix for a ServiceWorker crash. Approved for 61.0b8.
Attachment #8979238 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.