Closed
Bug 1493591
Opened 6 years ago
Closed 5 years ago
Crash in mozilla::dom::WorkerPrivate::~WorkerPrivate
Categories
(Core :: DOM: Service Workers, defect, P1)
Tracking
()
RESOLVED
WORKSFORME
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox62 | --- | unaffected |
firefox63 | + | wontfix |
firefox64 | + | wontfix |
People
(Reporter: gsvelto, Assigned: ytausky)
References
Details
(4 keywords, Whiteboard: DWS_NEXT)
Crash Data
Attachments
(1 obsolete file)
This bug was filed from the Socorro interface and is report bp-b4df91d0-9d5c-401a-9eaf-253cf0180923. ============================================================= Top 10 frames of crashing thread: 0 xul.dll mozilla::dom::WorkerPrivate::~WorkerPrivate dom/workers/WorkerPrivate.cpp:2757 1 xul.dll void mozilla::dom::WorkerPrivate::ClearSelfAndParentEventTargetRef dom/workers/WorkerPrivate.h:154 2 xul.dll ?Run@TopLevelWorkerFinishedRunnable@?A0x30C6BC27@dom@mozilla@@EEAA?AW4nsresult@@XZ$37675862552aedd537fb67d4bee880bd dom/workers/WorkerPrivate.cpp:301 3 xul.dll mozilla::ThrottledEventQueue::Inner::Executor::Run xpcom/threads/ThrottledEventQueue.cpp:72 4 xul.dll mozilla::ThrottledEventQueue::Inner::Executor::Run xpcom/threads/ThrottledEventQueue.cpp:72 5 xul.dll mozilla::SchedulerGroup::Runnable::Run xpcom/threads/SchedulerGroup.cpp:337 6 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1166 7 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:519 8 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:97 9 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:318 ============================================================= This looks like a new crash. The first instance seems to have appeared with build 20180918075510.
Updated•6 years ago
|
status-firefox64:
--- → affected
Updated•6 years ago
|
Priority: -- → P2
Comment 1•6 years ago
|
||
This signature just appeared in 63.0b13 with 12 crashes. The pushlog for this build is: https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=0082062b759f&tochange=1e0b0e805d0a :baku, could you investigate please ?
Updated•6 years ago
|
Crash Signature: [@ mozilla::dom::WorkerPrivate::~WorkerPrivate] → [@ mozilla::dom::WorkerPrivate::~WorkerPrivate]
[@ @0x0 | mozilla::dom::WorkerPrivate::~WorkerPrivate]
Updated•6 years ago
|
Whiteboard: DWS_NEXT
Updated•6 years ago
|
status-firefox62:
--- → unaffected
status-firefox-esr60:
--- → unaffected
tracking-firefox63:
--- → +
tracking-firefox64:
--- → +
Updated•6 years ago
|
Group: dom-core-security
Comment 2•6 years ago
|
||
Marking as security sensitive since during Triage today we noticed some possible UAF signatures.
Updated•6 years ago
|
Keywords: csectype-uaf,
sec-high
Comment 3•6 years ago
|
||
Sec high, readjusting priorities. Yaron, can you give this a look? Make it a top priority.
Flags: needinfo?(ytausky)
Updated•6 years ago
|
Assignee: nobody → ytausky
Updated•6 years ago
|
Priority: P2 → P1
Comment 4•6 years ago
|
||
Such as https://crash-stats.mozilla.com/report/index/727a101e-20dc-4e38-bf68-353170181023 (e5 UAF)
Assignee | ||
Comment 5•6 years ago
|
||
I'm assuming that "e5 UAF" refers to the address of the crashing use-after-free access? Anyway, it looks like it happens in an inlined destructor of one of the members of WorkerPrivate. Unfortunately, that class is gigantic, so it's hard to guess which one it is. I'll see if I can figure it out from the executable.
Flags: needinfo?(ytausky)
Assignee | ||
Comment 6•6 years ago
|
||
This report suggests that the problem is a mismanaged refcount: https://crash-stats.mozilla.com/report/index/534b24dc-47d8-4728-892f-fefdc0181025
Status: NEW → ASSIGNED
Assignee | ||
Comment 7•6 years ago
|
||
The crash occurs at the end of this snippet (macOS build): 0x1106884de <+526>: cmpq $0x0, 0x488(%r14) 0x1106884e6 <+534>: je 0x21134f8 ; <+552> 0x1106884e8 <+536>: leaq 0x488(%r14), %rax 0x1106884ef <+543>: movq (%rax), %rdi 0x1106884f2 <+546>: movq (%rdi), %rax 0x1106884f5 <+549>: callq *0x8(%rax) We have a virtual function call through an address stored 0x488 bytes into a WorkerPrivate object. The next step would be to figure out which one that is.
Assignee | ||
Comment 8•6 years ago
|
||
I couldn't find out how to get the debug symbols for this executable, so I built the the same revision locally, hoping for the best. In debug mode I'm not getting the same offsets as in the released executable, so I'm guessing a bit here, but if the offsets are biased uniformly (0x18 bytes), the offending member is WorkerPrivate::mPerformanceStorage. I'll try to see what I can get from a local release build, but in the meantime I noticed something strange about that member: it's accessed both from the main thread [1] and from the worker's parent thread [2]. I don't see any synchronization around those accesses, so it's possible that it's a race condition. Such a race condition could cause the crash seen here. [1] https://searchfox.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#5399 [2] https://searchfox.org/mozilla-central/source/dom/workers/WorkerPrivate.h#154
Comment 9•6 years ago
|
||
Good find!
Assignee | ||
Comment 10•6 years ago
|
||
Another thing that seems racy: mPerformanceCounter is accessed from the main thread without synchronization [1], yet it's also accessible from any other thread with synchronization [2]. This might cause a similar crash in WorkerPrivate::~WorkerPrivate. I think we need a more systematic approach to fixing this problem, especially since WorkerPrivate is so big. I would suggest wrapping data with an object that only grants access to them if a certain condition holds, e.g. a lock is held or the current thread is the expected one. As for testing, I don't know of any reliable way to cause a race, so testing that it doesn't happen any more is similarly problematic. [1] https://searchfox.org/mozilla-central/source/dom/workers/WorkerDebugger.cpp#519 [2] https://searchfox.org/mozilla-central/source/dom/workers/WorkerThread.cpp#165
Assignee | ||
Comment 11•6 years ago
|
||
I dug a little deeper while trying to implement a solution and found out that I was previously wrong: while accesses to mPerformanceStorage are not synchronized by the caller, it only gets assigned objects of type PerformanceStorageWorker [1], which have a thread-safe refcount [2]. I'm going to take a step back and look again at all the places this member is used in. [1] https://searchfox.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#3487 [2] https://searchfox.org/mozilla-central/source/dom/performance/PerformanceStorageWorker.h#23
Assignee | ||
Comment 12•6 years ago
|
||
Just a note: mPerformanceStorage is *still* unsafe because there are raw pointers to it all over the place, so the assumption that refcount = 0 means the object is safe to deallocate does not hold. I just don't see how it could cause this particular crash, because the refcount itself seems to be protected, and WorkerPrivate stores its pointer in a RefPtr. The crash happens when trying to call a virtual function, so the memory must have been deallocated before WorkerPrivate::~WorkerPrivate touches the refcount itself.
Comment 13•6 years ago
|
||
See also bug 1502419 which I just filed
Assignee | ||
Comment 14•6 years ago
|
||
By sharing a raw pointer to a refcounted object it's possible for it to get used after its destruction. It's not clear if that is really the case here, but it doesn't seem like it would hurt performance.
Assignee | ||
Comment 16•6 years ago
|
||
After the initial findings I couldn't find any smoking gun. To recap, this looks like a double destruction (which makes it a use-after-free), so the real problem is likely just before the first destruction, which is no longer visible in a stack trace. To mitigate the problem, in addition to the patch on this bug, I also opened bug 1504638. (That's just a step in a journey towards stricter enforcement of threading restrictions, so it won't solve the problem either.) I'm pretty sure that improper refcounting somewhere along the chain is the root cause here, but unfortunately we currently don't have tools to enforce it automatically. My next step here would be to improve the refcounting situation and see if it helps.
Flags: needinfo?(ytausky)
Assignee | ||
Comment 17•6 years ago
|
||
I am trying to see if I can get some lead from TSan. So far I'm just trying to weed out what looks like false positives.
Assignee | ||
Comment 18•6 years ago
|
||
:mrbkap, could you please share your opinion about this crash? So far I wasn't able to find a definitive cause, so unless there's some new trail, I would mark this as stalled. (There are general safety improvement that can be pursued and would hopefully eventually catch this crash, but I wouldn't do them as part of this bug since they're indirect.)
Flags: needinfo?(mrbkap)
Comment 19•6 years ago
|
||
This is clearly some type of regression, and also is fairly high volume. Pulling the dump tells me that this->mPerformanceStorage.mRawPtr = 0xe5e5e5e5
Comment 20•6 years ago
|
||
Looking at the reports, I don't see any crashes in Nightly (that is 65.0a) or anything after 64b3. Am I misinterpreting crash-stats? I find it interesting that in the regression range reported in comment 1 contains be85aa71510b (bug 1493629, which I can't access) and the range of fixes between 64b3 and b4 includes 3cf4f972200c (bug 1498510, also inaccessible). Both patches are touching the same code and the CSP object is just before mPerformanceStorage in WorkerPrivate.
Updated•6 years ago
|
Flags: needinfo?(mrbkap)
Comment 21•6 years ago
|
||
Crashes stopped with 63.0.3 so maybe the signature morphed but I couldn't find a similar signature with any volume of crashes in 63.03. Given the drop, the fact that the bug is stalled, that we don't have drivers for a new dot release and that we ship 64 in 3 weeks, this is wontfix for 63.
Updated•5 years ago
|
Updated•5 years ago
|
Flags: needinfo?(amarchesini)
Comment 22•5 years ago
|
||
I suggest we close this; we've had one 67 crash, but clearly something changed after 63. The remaining crash may not be related to the original reason this bug was filed, or may be a more random trashing or memory error.
Flags: needinfo?(ytausky)
Assignee | ||
Comment 23•5 years ago
|
||
I agree, let's close this. If someone finds this bug because this function starts crashing again, please open a new bug.
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: needinfo?(ytausky)
Resolution: --- → WORKSFORME
Updated•4 years ago
|
Group: dom-core-security
Updated•11 months ago
|
Attachment #9021150 -
Attachment is obsolete: true
You need to log in
before you can comment on or make changes to this bug.
Description
•