Open Bug 1602393 Opened 2 years ago Updated 2 months ago

Crash in [@ AsyncShutdownTimeout | profile-before-change | ServiceWorkerRegistrar: Flushing data]

Categories

(Core :: DOM: Service Workers, defect, P2)

Unspecified
Linux
defect

Tracking

()

People

(Reporter: whimboo, Unassigned)

References

Details

(Keywords: crash, leave-open)

Crash Data

Attachments

(2 files)

We got this report from a user running Selenium tests with geckodriver inside a docker image, and by running Firefox in headless mode (https://github.com/mozilla/geckodriver/issues/1663):

This bug is for crash report bp-804666b6-3adb-4e4b-a607-25c610191209.

Top 10 frames of crashing thread:

0 firefox-bin mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:33
1 libxul.so Abort xpcom/base/nsDebugImpl.cpp:439
2 libxul.so NS_DebugBreak xpcom/string/nsSubstring.cpp
3 libxul.so nsDebugImpl::Abort xpcom/base/nsDebugImpl.cpp:133
4 libxul.so NS_InvokeByIndex 
5 libxul.so XPCWrappedNative::CallMethod js/xpconnect/src/XPCWrappedNative.cpp:1149
6 libxul.so XPC_WN_CallMethod js/xpconnect/src/XPCWrappedNativeJSOps.cpp:943
7 libxul.so js::InternalCallOrConstruct js/src/vm/Interpreter.cpp:539
8 libxul.so Interpret js/src/vm/Interpreter.cpp:594
9 libxul.so js::RunScript js/src/vm/Interpreter.cpp:424

It happens for him while navigating to http://google.com, specifically between the DOMContentLoaded and pageshow events.

Interesting are the app notes in the bug report:

Ubuntu 18.04.2 LTSFP(D00-L1000-W00000000-T000) GLXtest process failed (exited with status 1): Unable to open a connection to the X server

WR? WR- OMTP? OMTP+4 xpcom_runtime_abort(###!!! ABORT: file /builds/worker/workspace/build/src/dom/serviceworkers/ServiceWorkerRegistrar.cpp, line 1111)

The crash rate seems rather high, but it doesn't happen on Nightly. Maybe it's fixed by the new ServiceWorker?

Priority: -- → P2

When did that land? I could ask the reporter to check those specific nightly builds (before and after the landing).

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #2)

When did that land? I could ask the reporter to check those specific nightly builds (before and after the landing).

I meant the pref (dom.serviceWorkers.parent_intercept). There are lots of changes protected by that pref. It's turned on by default on Nightly only and it's tracked in https://bugzilla.mozilla.org/show_bug.cgi?id=1588154.

As the reporter just replied the headless problem is fixed for him in Firefox 71 while it crashes in 70. So maybe we should just remove this use-case here and keep this bug more general?

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #4)

As the reporter just replied the headless problem is fixed for him in Firefox 71 while it crashes in 70. So maybe we should just remove this use-case here and keep this bug more general?

Sounds good to me!

There seems to have multiple causes because we still have crashes on 72.0b3 (AsyncShutdownTimeout | profile-before-change | ServiceWorkerRegistrar: Flushing data). And, apparently, it's another issue since the reporter replied it's fixed for him in 71.

From the latest crash reports, it seems we have added a shutdown blocker that has never been removed until it gets into our way.
:asuth, can you clarify next steps here?

Flags: needinfo?(bugmail)

Looking closer, I see we add the blocker with a string literal:

rv = GetShutdownPhase()->AddBlocker(
      this, NS_LITERAL_STRING(__FILE__), __LINE__,
      NS_LITERAL_STRING("ServiceWorkerRegistrar: Flushing data"));

but the only place we have the same literal is the function

NS_IMETHODIMP
ServiceWorkerRegistrar::GetName(nsAString& aName) {
  aName = NS_LITERAL_STRING("ServiceWorkerRegistrar: Flushing data");
  return NS_OK;
}

which seems to have some overloads. I have no evidence, that this causes the crash, but in any case we should use the GetName function to add the blocker instead of literals anyway, I'd say.

Flags: needinfo?(perry)
Flags: needinfo?(bugmail)

As discussed on Zoom, AddBlocker's 4th parameter and nsIAsyncShutdownClient::GetName are only used to add crash metadata, so they are unlikely to be related to the crash itself, but the string duplication should be removed.

Flags: needinfo?(perry)
Assignee: nobody → perry
Status: NEW → ASSIGNED
Pushed by pjiang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/237b6fc723ea
remove ServiceWorkerRegistrar string duplication r=dom-workers-and-storage-reviewers,asuth

As far as I can tell, ServiceWorkerRegistrar shutdown hang could only occur if
writing data to disk stalls or the shutdown message from the main thread to
background thread doesn't complete. (Although I still cannot piece together the
exact possible sequence of events that would lead to these hangs.)

Pushed by pjiang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/43921ad138e3
add debugging info to ServiceWorkerRegistrar shutdown blocking r=asuth

The crash reports that include the changes from D78222 show "shuttingDown":false in the metadata (e.g. https://crash-stats.mozilla.org/report/index/074c3341-1168-4921-9fe2-980530200708#tab-metadata). This most definitely means that ServiceWorkerRegistrar::Shutdown was never called by BackgroundParentImpl::RecvShutdownServiceWorkerRegistrar. The best explanation right now for why this hang is happening is that the IPDL background thread is too busy to issue the IPDL call before the shutdown blocker timeout/timer expires.

Assignee: perry → nobody
Status: ASSIGNED → NEW

The leave-open keyword is there and there is no activity for 6 months.
:jstutte, maybe it's time to close this bug?

Flags: needinfo?(jstutte)

This is supposed to be fixed by rolling out LSNG, see bug 1711922.

Flags: needinfo?(jstutte)
See Also: → 1711922
You need to log in before you can comment on or make changes to this bug.