Crash in [@ AsyncShutdownTimeout | profile-before-change | ServiceWorkerRegistrar: Flushing data]
Categories
(Core :: DOM: Service Workers, defect, P3)
Tracking
()
People
(Reporter: whimboo, Unassigned)
References
Details
(Keywords: crash, leave-open)
Crash Data
Attachments
(4 files)
We got this report from a user running Selenium tests with geckodriver inside a docker image, and by running Firefox in headless mode (https://github.com/mozilla/geckodriver/issues/1663):
This bug is for crash report bp-804666b6-3adb-4e4b-a607-25c610191209.
Top 10 frames of crashing thread:
0 firefox-bin mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:33
1 libxul.so Abort xpcom/base/nsDebugImpl.cpp:439
2 libxul.so NS_DebugBreak xpcom/string/nsSubstring.cpp
3 libxul.so nsDebugImpl::Abort xpcom/base/nsDebugImpl.cpp:133
4 libxul.so NS_InvokeByIndex
5 libxul.so XPCWrappedNative::CallMethod js/xpconnect/src/XPCWrappedNative.cpp:1149
6 libxul.so XPC_WN_CallMethod js/xpconnect/src/XPCWrappedNativeJSOps.cpp:943
7 libxul.so js::InternalCallOrConstruct js/src/vm/Interpreter.cpp:539
8 libxul.so Interpret js/src/vm/Interpreter.cpp:594
9 libxul.so js::RunScript js/src/vm/Interpreter.cpp:424
It happens for him while navigating to http://google.com
, specifically between the DOMContentLoaded
and pageshow
events.
Interesting are the app notes in the bug report:
Ubuntu 18.04.2 LTSFP(D00-L1000-W00000000-T000) GLXtest process failed (exited with status 1): Unable to open a connection to the X server
WR? WR- OMTP? OMTP+4 xpcom_runtime_abort(###!!! ABORT: file /builds/worker/workspace/build/src/dom/serviceworkers/ServiceWorkerRegistrar.cpp, line 1111)
Comment 1•3 years ago
|
||
The crash rate seems rather high, but it doesn't happen on Nightly. Maybe it's fixed by the new ServiceWorker?
Reporter | ||
Comment 2•3 years ago
|
||
When did that land? I could ask the reporter to check those specific nightly builds (before and after the landing).
Comment 3•3 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #2)
When did that land? I could ask the reporter to check those specific nightly builds (before and after the landing).
I meant the pref (dom.serviceWorkers.parent_intercept). There are lots of changes protected by that pref. It's turned on by default on Nightly only and it's tracked in https://bugzilla.mozilla.org/show_bug.cgi?id=1588154.
Reporter | ||
Comment 4•3 years ago
|
||
As the reporter just replied the headless problem is fixed for him in Firefox 71 while it crashes in 70. So maybe we should just remove this use-case here and keep this bug more general?
Comment 5•3 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #4)
As the reporter just replied the headless problem is fixed for him in Firefox 71 while it crashes in 70. So maybe we should just remove this use-case here and keep this bug more general?
Sounds good to me!
There seems to have multiple causes because we still have crashes on 72.0b3 (AsyncShutdownTimeout | profile-before-change | ServiceWorkerRegistrar: Flushing data). And, apparently, it's another issue since the reporter replied it's fixed for him in 71.
Comment 6•3 years ago
|
||
From the latest crash reports, it seems we have added a shutdown blocker that has never been removed until it gets into our way.
:asuth, can you clarify next steps here?
Comment 7•3 years ago
•
|
||
Looking closer, I see we add the blocker with a string literal:
rv = GetShutdownPhase()->AddBlocker(
this, NS_LITERAL_STRING(__FILE__), __LINE__,
NS_LITERAL_STRING("ServiceWorkerRegistrar: Flushing data"));
but the only place we have the same literal is the function
NS_IMETHODIMP
ServiceWorkerRegistrar::GetName(nsAString& aName) {
aName = NS_LITERAL_STRING("ServiceWorkerRegistrar: Flushing data");
return NS_OK;
}
which seems to have some overloads. I have no evidence, that this causes the crash, but in any case we should use the GetName
function to add the blocker instead of literals anyway, I'd say.
Updated•3 years ago
|
Comment 8•3 years ago
|
||
As discussed on Zoom, AddBlocker
's 4th parameter and nsIAsyncShutdownClient::GetName
are only used to add crash metadata, so they are unlikely to be related to the crash itself, but the string duplication should be removed.
Comment 9•3 years ago
|
||
Updated•3 years ago
|
Updated•3 years ago
|
Comment 10•3 years ago
|
||
Pushed by pjiang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/237b6fc723ea remove ServiceWorkerRegistrar string duplication r=dom-workers-and-storage-reviewers,asuth
Comment 11•3 years ago
|
||
bugherder |
Comment 12•3 years ago
|
||
As far as I can tell, ServiceWorkerRegistrar shutdown hang could only occur if
writing data to disk stalls or the shutdown message from the main thread to
background thread doesn't complete. (Although I still cannot piece together the
exact possible sequence of events that would lead to these hangs.)
Comment 13•3 years ago
|
||
Pushed by pjiang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/43921ad138e3 add debugging info to ServiceWorkerRegistrar shutdown blocking r=asuth
Comment 14•3 years ago
|
||
bugherder |
Comment 15•3 years ago
|
||
The crash reports that include the changes from D78222 show "shuttingDown":false
in the metadata (e.g. https://crash-stats.mozilla.org/report/index/074c3341-1168-4921-9fe2-980530200708#tab-metadata). This most definitely means that ServiceWorkerRegistrar::Shutdown
was never called by BackgroundParentImpl::RecvShutdownServiceWorkerRegistrar
. The best explanation right now for why this hang is happening is that the IPDL background thread is too busy to issue the IPDL call before the shutdown blocker timeout/timer expires.
Comment 16•2 years ago
|
||
It appears that we're dealing with legacy LocalStorage blocking PBackground on a sync preload:
- https://crash-stats.mozilla.org/report/index/53940689-957f-455d-8573-82cbe0201029#allthreads
- https://crash-stats.mozilla.org/report/index/78b383a2-a058-4b3c-bd9a-6de350201029#allthreads
- https://crash-stats.mozilla.org/report/index/0d245d42-1122-46f3-a5ab-8681f0201029#allthreads
- https://crash-stats.mozilla.org/report/index/e25e5cb1-867f-415b-8517-c7ccc0201028#allthreads
Comment 17•2 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:jstutte, maybe it's time to close this bug?
Comment 18•2 years ago
|
||
This is supposed to be fixed by rolling out LSNG, see bug 1711922.
Comment 19•1 year ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:jstutte, maybe it's time to close this bug?
Comment 20•1 year ago
|
||
There are still quite some instances here. Looking at ServiceWorkerRegistrar::MaybeScheduleShutdownCompleted
I read:
if (mRunnableDispatched || !mShuttingDown) {
return;
}
RefPtr<Runnable> runnable =
NewRunnableMethod("dom::ServiceWorkerRegistrar::ShutdownCompleted", this,
&ServiceWorkerRegistrar::ShutdownCompleted);
MOZ_ALWAYS_SUCCEEDS(NS_DispatchToMainThread(runnable.forget()));
So we dispatch the ServiceWorkerRegistrar::ShutdownCompleted
only if there is no mRunnableDispatched
. This was introduced by bug 1452373 apparently to protect serviceworker.txt from multiple writes.
IIUC this can lead to a situation where we just never unblock our shutdown blocker though we did all expected work. I am also wondering if the way we protect serviceworker.txt from parallel writes could not be improved a bit, as just dropping events seems bad in general.
Comment 21•1 year ago
|
||
Hmm, looking at the annotations of the crashes I see the AsyncShutdownTimeout
:
{
"name": "ServiceWorkerRegistrar: Flushing data",
"state": {
"saveDataRunnableDispatched": false,
"shuttingDown": false
},
"filename": "/builds/worker/checkouts/gecko/dom/serviceworkers/ServiceWorkerRegistrar.cpp",
"lineNumber": 1240,
"stack": "ServiceWorkerRegistrar: Flushing data"
}
with no sign of shutdown at all (and no saveDataRunnableDispatched
, too). So mRunnableDispatched
seems not involved here. There might be a minimal chance that child->SendShutdownServiceWorkerRegistrar() failed and we never entered shutdown. We are coming directly from ServiceWorkerRegistrar::BlockShutdown and I assume we can expect the async shutdown blocker to have called BlockShutdown
if we see ourselves as a blocker in the annotation.
Comment 22•1 year ago
|
||
Updated•1 year ago
|
Updated•1 year ago
|
Comment 23•1 year ago
|
||
Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6d7b934ff5aa Rename mRunnableDispatched to mSaveDataRunnableDispatched and crash on shutdown message dispatch failure in ServiceWorkerRegistrar r=dom-worker-reviewers,jesup
Comment 24•1 year ago
|
||
bugherder |
Comment 25•11 months ago
|
||
Comment 26•10 months ago
|
||
Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/099bc1f51e2f Fake ShutdownComplete() for all error conditions of ServiceWorkerRegistrar::ProfileStopped. r=dom-worker-reviewers,jesup
Comment 27•10 months ago
|
||
bugherder |
Comment 28•10 months ago
|
||
Unassigning as I do not have other plans here for now. The last patch here will land in release 102, such that we won't know before if it helped.
Updated•6 months ago
|
Comment 29•20 days ago
|
||
As of today this accounts for only 0,12% of all AsyncShutdownTimeout
s. In general it seems that we track even much more frequent shutdown hangs as S3.
Description
•