Closed Bug 1493591 Opened 7 years ago Closed 6 years ago

Crash in mozilla::dom::WorkerPrivate::~WorkerPrivate

Categories

(Core :: DOM: Service Workers, defect, P1)

Product:

Component:

Platform:

Unspecified

Windows 10

Type:

defect

Priority:

P1

Severity:

critical

Tracking

()

Status:

RESOLVED WORKSFORME

Tracking Flags:

Tracking

Status

firefox-esr60

---

unaffected

firefox62

---

unaffected

firefox63

+

wontfix

firefox64

+

wontfix

People

(Reporter: gsvelto, Assigned: ytausky)

References

Details

(4 keywords, Whiteboard: DWS_NEXT)

Crash Data

Attachments

(1 obsolete file)

Bug 1493591: Don't share raw pointers to WorkerPrivate::mPerformanceStorage 7 years ago Yaron Tausky [:ytausky] 47 bytes, text/x-phabricator-request		Details \| Review

Gabriele Svelto [:gsvelto]

Reporter

Description

•

7 years ago

This bug was filed from the Socorro interface and is report bp-b4df91d0-9d5c-401a-9eaf-253cf0180923. ============================================================= Top 10 frames of crashing thread: 0 xul.dll mozilla::dom::WorkerPrivate::~WorkerPrivate dom/workers/WorkerPrivate.cpp:2757 1 xul.dll void mozilla::dom::WorkerPrivate::ClearSelfAndParentEventTargetRef dom/workers/WorkerPrivate.h:154 2 xul.dll ?Run@TopLevelWorkerFinishedRunnable@?A0x30C6BC27@dom@mozilla@@EEAA?AW4nsresult@@XZ$37675862552aedd537fb67d4bee880bd dom/workers/WorkerPrivate.cpp:301 3 xul.dll mozilla::ThrottledEventQueue::Inner::Executor::Run xpcom/threads/ThrottledEventQueue.cpp:72 4 xul.dll mozilla::ThrottledEventQueue::Inner::Executor::Run xpcom/threads/ThrottledEventQueue.cpp:72 5 xul.dll mozilla::SchedulerGroup::Runnable::Run xpcom/threads/SchedulerGroup.cpp:337 6 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1166 7 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:519 8 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:97 9 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:318 ============================================================= This looks like a new crash. The first instance seems to have appeared with build 20180918075510.

Pascal Chevrel:pascalc

Updated

•

7 years ago

status-firefox64: --- → affected

Perry Jiang [:perry] [no longer employee, use ni?]

Updated

•

7 years ago

Priority: -- → P2

Calixte Denizet (:calixte)

Comment 1

•

7 years ago

This signature just appeared in 63.0b13 with 12 crashes. The pushlog for this build is: https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=0082062b759f&tochange=1e0b0e805d0a :baku, could you investigate please ?

status-firefox63: --- → affected

Flags: needinfo?(amarchesini)

Keywords: regression

Marcia Knous [:marcia]

Updated

•

7 years ago

Crash Signature: [@ mozilla::dom::WorkerPrivate::~WorkerPrivate] → [@ mozilla::dom::WorkerPrivate::~WorkerPrivate] [@ @0x0 | mozilla::dom::WorkerPrivate::~WorkerPrivate]

Marion Daly [:mdaly]

Updated

•

7 years ago

Whiteboard: DWS_NEXT

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox62: --- → unaffected

status-firefox-esr60: --- → unaffected

tracking-firefox63: --- → +

tracking-firefox64: --- → +

Marcia Knous [:marcia]

Updated

•

7 years ago

Group: dom-core-security

Marcia Knous [:marcia]

Comment 2

•

7 years ago

Marking as security sensitive since during Triage today we noticed some possible UAF signatures.

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Keywords: csectype-uaf, sec-high

Marion Daly [:mdaly]

Comment 3

•

7 years ago

Sec high, readjusting priorities. Yaron, can you give this a look? Make it a top priority.

Flags: needinfo?(ytausky)

Marion Daly [:mdaly]

Updated

•

7 years ago

Assignee: nobody → ytausky

Marion Daly [:mdaly]

Updated

•

7 years ago

Priority: P2 → P1

Randell Jesup [:jesup] (needinfo me)

Comment 4

•

7 years ago

Such as https://crash-stats.mozilla.com/report/index/727a101e-20dc-4e38-bf68-353170181023 (e5 UAF)

Yaron Tausky [:ytausky]

Assignee

Comment 5

•

7 years ago

I'm assuming that "e5 UAF" refers to the address of the crashing use-after-free access? Anyway, it looks like it happens in an inlined destructor of one of the members of WorkerPrivate. Unfortunately, that class is gigantic, so it's hard to guess which one it is. I'll see if I can figure it out from the executable.

Flags: needinfo?(ytausky)

Yaron Tausky [:ytausky]

Assignee

Comment 6

•

7 years ago

This report suggests that the problem is a mismanaged refcount: https://crash-stats.mozilla.com/report/index/534b24dc-47d8-4728-892f-fefdc0181025

Status: NEW → ASSIGNED

Yaron Tausky [:ytausky]

Assignee

Comment 7

•

7 years ago

The crash occurs at the end of this snippet (macOS build): 0x1106884de <+526>: cmpq $0x0, 0x488(%r14) 0x1106884e6 <+534>: je 0x21134f8 ; <+552> 0x1106884e8 <+536>: leaq 0x488(%r14), %rax 0x1106884ef <+543>: movq (%rax), %rdi 0x1106884f2 <+546>: movq (%rdi), %rax 0x1106884f5 <+549>: callq *0x8(%rax) We have a virtual function call through an address stored 0x488 bytes into a WorkerPrivate object. The next step would be to figure out which one that is.

Yaron Tausky [:ytausky]

Assignee

Comment 8

•

7 years ago

I couldn't find out how to get the debug symbols for this executable, so I built the the same revision locally, hoping for the best. In debug mode I'm not getting the same offsets as in the released executable, so I'm guessing a bit here, but if the offsets are biased uniformly (0x18 bytes), the offending member is WorkerPrivate::mPerformanceStorage. I'll try to see what I can get from a local release build, but in the meantime I noticed something strange about that member: it's accessed both from the main thread [1] and from the worker's parent thread [2]. I don't see any synchronization around those accesses, so it's possible that it's a race condition. Such a race condition could cause the crash seen here. [1] https://searchfox.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#5399 [2] https://searchfox.org/mozilla-central/source/dom/workers/WorkerPrivate.h#154

Randell Jesup [:jesup] (needinfo me)

Comment 9

•

7 years ago

Good find!

Yaron Tausky [:ytausky]

Assignee

Comment 10

•

7 years ago

Another thing that seems racy: mPerformanceCounter is accessed from the main thread without synchronization [1], yet it's also accessible from any other thread with synchronization [2]. This might cause a similar crash in WorkerPrivate::~WorkerPrivate. I think we need a more systematic approach to fixing this problem, especially since WorkerPrivate is so big. I would suggest wrapping data with an object that only grants access to them if a certain condition holds, e.g. a lock is held or the current thread is the expected one. As for testing, I don't know of any reliable way to cause a race, so testing that it doesn't happen any more is similarly problematic. [1] https://searchfox.org/mozilla-central/source/dom/workers/WorkerDebugger.cpp#519 [2] https://searchfox.org/mozilla-central/source/dom/workers/WorkerThread.cpp#165

Yaron Tausky [:ytausky]

Assignee

Comment 11

•

7 years ago

I dug a little deeper while trying to implement a solution and found out that I was previously wrong: while accesses to mPerformanceStorage are not synchronized by the caller, it only gets assigned objects of type PerformanceStorageWorker [1], which have a thread-safe refcount [2]. I'm going to take a step back and look again at all the places this member is used in. [1] https://searchfox.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#3487 [2] https://searchfox.org/mozilla-central/source/dom/performance/PerformanceStorageWorker.h#23

Yaron Tausky [:ytausky]

Assignee

Comment 12

•

7 years ago

Just a note: mPerformanceStorage is *still* unsafe because there are raw pointers to it all over the place, so the assumption that refcount = 0 means the object is safe to deallocate does not hold. I just don't see how it could cause this particular crash, because the refcount itself seems to be protected, and WorkerPrivate stores its pointer in a RefPtr. The crash happens when trying to call a virtual function, so the memory must have been deallocated before WorkerPrivate::~WorkerPrivate touches the refcount itself.

Randell Jesup [:jesup] (needinfo me)

Comment 13

•

7 years ago

See also bug 1502419 which I just filed

Yaron Tausky [:ytausky]

Assignee

Comment 14

•

7 years ago

Attached file Bug 1493591: Don't share raw pointers to WorkerPrivate::mPerformanceStorage (obsolete) — Details

By sharing a raw pointer to a refcounted object it's possible for it to get used after its destruction. It's not clear if that is really the case here, but it doesn't seem like it would hurt performance.

Marion Daly [:mdaly]

Comment 15

•

7 years ago

Yaron: Status on this?

Flags: needinfo?(ytausky)

Yaron Tausky [:ytausky]

Assignee

Comment 16

•

7 years ago

After the initial findings I couldn't find any smoking gun. To recap, this looks like a double destruction (which makes it a use-after-free), so the real problem is likely just before the first destruction, which is no longer visible in a stack trace. To mitigate the problem, in addition to the patch on this bug, I also opened bug 1504638. (That's just a step in a journey towards stricter enforcement of threading restrictions, so it won't solve the problem either.) I'm pretty sure that improper refcounting somewhere along the chain is the root cause here, but unfortunately we currently don't have tools to enforce it automatically. My next step here would be to improve the refcounting situation and see if it helps.

Flags: needinfo?(ytausky)

Yaron Tausky [:ytausky]

Assignee

Comment 17

•

7 years ago

I am trying to see if I can get some lead from TSan. So far I'm just trying to weed out what looks like false positives.

Yaron Tausky [:ytausky]

Assignee

Comment 18

•

7 years ago

:mrbkap, could you please share your opinion about this crash? So far I wasn't able to find a definitive cause, so unless there's some new trail, I would mark this as stalled. (There are general safety improvement that can be pursued and would hopefully eventually catch this crash, but I wouldn't do them as part of this bug since they're indirect.)

Flags: needinfo?(mrbkap)

Randell Jesup [:jesup] (needinfo me)

Comment 19

•

7 years ago

This is clearly some type of regression, and also is fairly high volume. Pulling the dump tells me that this->mPerformanceStorage.mRawPtr = 0xe5e5e5e5

Blake Kaplan (:mrbkap) (inactive)

Comment 20

•

7 years ago

Looking at the reports, I don't see any crashes in Nightly (that is 65.0a) or anything after 64b3. Am I misinterpreting crash-stats? I find it interesting that in the regression range reported in comment 1 contains be85aa71510b (bug 1493629, which I can't access) and the range of fixes between 64b3 and b4 includes 3cf4f972200c (bug 1498510, also inaccessible). Both patches are touching the same code and the CSP object is just before mPerformanceStorage in WorkerPrivate.

Blake Kaplan (:mrbkap) (inactive)

Updated

•

7 years ago

Flags: needinfo?(mrbkap)

Pascal Chevrel:pascalc

Comment 21

•

7 years ago

Crashes stopped with 63.0.3 so maybe the signature morphed but I couldn't find a similar signature with any volume of crashes in 63.03. Given the drop, the fact that the bug is stalled, that we don't have drivers for a new dot release and that we ship 64 in 3 weeks, this is wontfix for 63.

status-firefox63: affected → wontfix

Yaron Tausky [:ytausky]

Assignee

Updated

•

7 years ago

Keywords: stalled

Julien Cristau [:jcristau]

Updated

•

6 years ago

status-firefox64: affected → wontfix

Andrea Marchesini [:baku]

Updated

•

6 years ago

Flags: needinfo?(amarchesini)

Randell Jesup [:jesup] (needinfo me)

Comment 22

•

6 years ago

I suggest we close this; we've had one 67 crash, but clearly something changed after 63. The remaining crash may not be related to the original reason this bug was filed, or may be a more random trashing or memory error.

Flags: needinfo?(ytausky)

Yaron Tausky [:ytausky]

Assignee

Comment 23

•

6 years ago

I agree, let's close this. If someone finds this bug because this function starts crashing again, please open a new bug.

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

Flags: needinfo?(ytausky)

Resolution: --- → WORKSFORME

BugBot [:suhaib / :marco/ :calixte]

Updated

•

6 years ago

Keywords: stalled

Jens Stutte [:jstutte]

Updated

•

6 years ago

See Also: → 1539508

Daniel Veditz [:dveditz]

Updated

•

5 years ago

Group: dom-core-security

Phabricator Automation

Updated

•

2 years ago

Attachment #9021150 - Attachment is obsolete: true

You need to log in before you can comment on or make changes to this bug.