Closed Bug 1530223 Opened 10 months ago Closed 3 months ago

Crash in mozilla::dom::WorkerPrivate::Notify

Categories

(Core :: DOM: Workers, defect, P2, critical)

65 Branch
defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr60 --- unaffected
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 --- fixed

People

(Reporter: philipp, Assigned: ytausky)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(3 files)

+++ This bug was initially created as a clone of Bug #1509585 +++

This bug is for crash report bp-43f5925d-4b57-42b7-b1be-4475e0190224.

Top 10 frames of crashing thread:

0 xul.dll mozilla::dom::WorkerPrivate::Notify dom/workers/WorkerPrivate.cpp:1496
1 xul.dll void mozilla::dom::RemoteWorkerChild::CloseWorkerOnMainThread dom/workers/remoteworkers/RemoteWorkerChild.cpp:449
2 xul.dll nsresult mozilla::detail::RunnableFunction<`lambda at z:/build/build/src/dom/workers/remoteworkers/RemoteWorkerChild.cpp:504:9'>::Run xpcom/threads/nsThreadUtils.h:546
3 xul.dll mozilla::SchedulerGroup::Runnable::Run xpcom/threads/SchedulerGroup.cpp:299
4 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1157
5 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:468
6 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:88
7 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:307
8 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:289
9 xul.dll nsBaseAppShell::Run widget/nsBaseAppShell.cpp:137

this crash signature - primarily a content crash is newly showing up in firefox 65 - bug 1509585 already tried to fix this during the beta cycle and has reduced the crash volume, but hasn't got rid of the crash totally.

This is fairly high volume for beta 66. Andrew, can you ask someone to take a look at these crashes?

Flags: needinfo?(overholt)
Flags: needinfo?(overholt) → needinfo?(ytausky)
Priority: -- → P2

I looked at it yesterday for a bit and it looks like another worker shutdown problem (we already have several of those...). I'll give it a deeper look once I finish my current task.

Flags: needinfo?(ytausky)

I would suggest first changing this assertion to a MOZ_RELEASE_ASSERT and see if it's triggered. It looks like there might be a race condition there.

Assignee: nobody → ytausky
Status: NEW → ASSIGNED

Instead of trying to figure out which pointer is nullptr, add a more
powerful assertion. Since the nullptr already causes a crash, this won't
cause any regression.

Keywords: leave-open

Looking again, RemoteWorkerChild::mWorkerState seems to be written and read from different threads without synchronization.

Pushed by bugmail@asutherland.org:
https://hg.mozilla.org/integration/autoland/rev/19c4dc203724
Promote assertion to MOZ_RELEASE_ASSERT r=asuth

This adds runtime checks that verify thread safety.

Pushed by ytausky@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ed0f9b04cb90
Use ThreadBound for RemoteWorkerChild members accessed on launcher thread r=perry,asuth

Reading and writing data without synchronization from multiple threads
is undefined behavior.

Note that this commit does not attempt to reason about the during of mutex
locking; its purpose is to first establish correctness.

Hi Yaron, are we waiting on softfreeze to land this last patch?

Flags: needinfo?(ytausky)

No, there's ongoing work on the same code by :perry, and since the patch is more of a cleanup to prepare the ground for further work, we decided to not apply it yet. I set :asuth as a blocking reviewer for this; if he thinks there's no harm in pushing it, we could try it now and see if it helps. I'd just like to point out that it's quite possible this will have no effect on the crash -- it's an error I noticed in the class where the crash happens, but I cannot say it's directly related to it. It's hard to say anything more meaningful that's directly related to the crash, because of all the undefined behavior in the code around it.

Flags: needinfo?(ytausky)
Pushed by ytausky@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e6e7b9953ede
Guard RemoteWorkerChild's shared data with a mutex r=perry,asuth

Putting this here so it's not buried on Phabricator...we believe the race condition described here https://phabricator.services.mozilla.com/D22573#629809 still holds and needs to be fixed (see Yaron's followup comments on Phabricator for a fix).

Depends on: 1539535
No longer depends on: 1539535
Regressions: 1539535
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.