Closed Bug 1776209 Opened 2 years ago Closed 2 years ago

Crash in [@ mozilla::RemoteLazyInputStreamThread::Dispatch]

Categories

(Core :: DOM: File, defect, P2)

Unspecified
Linux
defect

Tracking

()

RESOLVED FIXED
106 Branch
Tracking Status
firefox-esr91 --- wontfix
firefox-esr102 --- wontfix
firefox104 --- wontfix
firefox105 --- wontfix
firefox106 --- fixed

People

(Reporter: sefeng, Assigned: jstutte)

References

Details

(Keywords: crash)

Crash Data

Attachments

(2 files)

Crash report: https://crash-stats.mozilla.org/report/index/a3338930-499c-4c52-97a3-7bc6e0220622

Reason: SIGSEGV / SEGV_MAPERR

Top 10 frames of crashing thread:

0 libxul.so mozilla::RemoteLazyInputStreamThread::Dispatch dom/file/ipc/RemoteLazyInputStreamThread.cpp:149
1 libxul.so mozilla::dom::LSObject::DoRequestSynchronously dom/localstorage/LSObject.cpp:867
2 libxul.so mozilla::dom::LSObject::EnsureDatabase dom/localstorage/LSObject.cpp:920
3 libxul.so mozilla::dom::LSObject::SetItem dom/localstorage/LSObject.cpp:614
4 libxul.so mozilla::dom::Storage_Binding::setItem dom/bindings/StorageBinding.cpp:233
5 libxul.so bool mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy, mozilla::dom::binding_detail::ThrowExceptions> dom/bindings/BindingUtils.cpp:3272
6 libxul.so Interpret js/src/vm/Interpreter.cpp:3325
7 libxul.so js::Call js/src/vm/Interpreter.cpp:606
8 libxul.so js::fun_call js/src/vm/JSFunction.cpp:971
9 libxul.so Interpret js/src/vm/Interpreter.cpp:3325

Firefox 100 seems to be the starting point. All the crashes has 0x0000000000000000 as the address.

Note that the vast majority of crashes seem to come from a single user.

Lots of different users in more recent crashes (the spike was one user I think). Looks like mThread is null for some reason.
mThread should only be null if a) we haven't called ::Initialize() yet, or b) if we observe thread-shutdown. We take a mutex lock when doing this, and on shutdown set gShutdownHasStarted to true (this is a true global).
In Dispatch(), which is where the crashes are, we take the mutex and check if shutdown has started (and if so return an error). That's not happening, so shutdown must not have happened, so we must not have called Initialize().
However, the LS code gets the RemoteLazyInputStreamThread using ::GetOrCreate(), which always calls Initialize() before returning RemoteLazyInputStreamThread.

So, how can it be null? Note also that there are hits from other places, though most are from LSObject. (There are a few from Blob)
https://crash-stats.mozilla.org/report/index/764ca895-66ef-4082-8450-b680b0220719
https://crash-stats.mozilla.org/report/index/6373bcc3-174a-4e28-aa43-6e13f0220701

Ideas? Frequency has gone up since spring, even discounting the spike

Flags: needinfo?(nika)
Flags: needinfo?(jvarga)
Priority: -- → P2

If GetOrCreate returns nullptr, we won't be calling domFileThread->Dispatch() (and if we did it would crash there).
We do have a few cases which crash in BindChildActor (called from the Blob code IIRC); so the missing check may well be the cause of those crashes.
https://crash-stats.mozilla.org/signature/?proto_signature=~BindChild&signature=mozilla%3A%3ARemoteLazyInputStreamThread%3A%3ADispatch&date=%3E%3D2022-01-28T14%3A07%3A00.000Z&date=%3C2022-07-28T14%3A07%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1

Flags: needinfo?(rjesup)

There are no crashes for the other GetOrCreate() missing result checks, though we should fix all of them

As jens noted, it is theoretically possible for us to fail to create a thread with NS_NewNamedThread which would lead to this crash as GetOrCreate would fail. It doesn't appear that we're during shutdown here, so I'm guessing it's not caused by gShutdownHasStarted being set meaning that seems like the most likely answer.

In the case of https://crash-stats.mozilla.org/report/index/764ca895-66ef-4082-8450-b680b0220719 it appears that this could be a null thread which wasn't checked, and it's one of the cases :jstutte mentioned.

In the case of https://crash-stats.mozilla.org/report/index/6373bcc3-174a-4e28-aa43-6e13f0220701 it interestingly looks a bit different. thread is clearly non-nullptr here because it's checked (https://searchfox.org/mozilla-central/rev/2bbb0c0a90df20702df8c8011a8996536a83cb75/dom/file/ipc/RemoteLazyInputStream.cpp#1082-1084,1086), but we're still crashing somewhere.

A bit of me wonders if this might be connected to the slightly janky implementation of GetOrCreate which will if the thread doesn't exist, attempt to create & initialize it, but won't clear it if initialization fails meaning it will return the created but uninitilized thread next time, potentially crashing elsewhere (e.g. when actually accessing the mThread member).

I would generally just have changed RemoteLazyInputStreamThread to using a TaskQueue rather than a thread if it wasn't for the fact that it's also used outside of RemoteLazyINputStream due to its history as the "DOM File thread" (https://searchfox.org/mozilla-central/rev/2bbb0c0a90df20702df8c8011a8996536a83cb75/dom/localstorage/LSObject.cpp#1068, https://searchfox.org/mozilla-central/rev/2bbb0c0a90df20702df8c8011a8996536a83cb75/dom/localstorage/LocalStorageManager2.cpp#477, https://searchfox.org/mozilla-central/source/dom/file/TemporaryFileBlobImpl.cpp#71). Some of those look like they'd be simple enough to move to a normal background task (e.g. https://searchfox.org/mozilla-central/rev/2bbb0c0a90df20702df8c8011a8996536a83cb75/dom/file/TemporaryFileBlobImpl.cpp#69-79 could probably just be dispatched as a background task, though I'm less certain about the LocalStorageManager2 and LSObject types as I'm not as familiar with them, but they seem to start PBackground so probably can't use a task queue yet.

Flags: needinfo?(nika)

A bit of me wonders if this might be connected to the slightly janky implementation of GetOrCreate which will if the thread doesn't exist, attempt to create & initialize it, but won't clear it if initialization fails meaning it will return the created but uninitilized thread next time, potentially crashing elsewhere (e.g. when actually accessing the mThread member).

Aha... That would fit - but there are still issues with this theory. On Initialize() failure it returns nullptr but leaves gRemoteLazyThread set, which sets us up for this crash on any future call to GetOrCreate(). Good so far... but what causes Initialize() to fail? The only thing is if NS_NewNamedThread() fails. That will only happen if we can't create the thread, or if attempting to Dispatch() an initial event fails -- and Initialize() doesn't supply an initial event. We can fail to create if we're late in shutdown (mInitialized is false), or if PR_CreateThread() fails.

It seems unlikely that we could be that far in shutdown when this happens from the reports, and the reports don't imply any OOM condition that would cause CreateThread() to fail. So we're back to "how could Initialize() fail here?" I don't see that it would.

That said, GetOrCreate absolutely should clear gRemoteLazyThread on Initialize() failure.

I think this is a duplicate of bug 1788368.

See Also: → 1788368

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #7)

A bit of me wonders if this might be connected to the slightly janky implementation of GetOrCreate which will if the thread doesn't exist, attempt to create & initialize it, but won't clear it if initialization fails meaning it will return the created but uninitilized thread next time, potentially crashing elsewhere (e.g. when actually accessing the mThread member).

That said, GetOrCreate absolutely should clear gRemoteLazyThread on Initialize() failure.

(In reply to Jens Stutte [:jstutte] from comment #8)

I think this is a duplicate of bug 1788368.

The fix there did not cover this specific aspect, we need an additional patch here.

Depends on: 1788368
See Also: 1788368
Assignee: nobody → jstutte
Status: NEW → ASSIGNED
Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/001f506af412 Improve RemoteLazyInputStreamThread::InitializeXxx error handling. r=dom-storage-reviewers,asuth
Keywords: leave-open
Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5930f781e39f RemoteLazyInputStreamThread: Replace gShutdownHasStarted with appropriate IsInOrBeyond check. r=dom-storage-reviewers,asuth
Keywords: leave-open
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 106 Branch
Regressions: 1796687
Regressions: 1870423
Regressions: 1926722
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: