Closed
Bug 1342420
Opened 7 years ago
Closed 4 months ago
Crash in mozilla::AutoTaskDispatcher::TaskGroupRunnable::Run
Categories
(Core :: XPCOM, defect, P3)
Tracking
()
People
(Reporter: philipp, Unassigned)
References
Details
(4 keywords)
Crash Data
Attachments
(1 file)
1.97 KB,
patch
|
bholley
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-60c71aa5-f0df-4e3b-98b6-eae892170221. ============================================================= Crashing Thread (23) Frame Module Signature Source 0 xul.dll mozilla::AutoTaskDispatcher::TaskGroupRunnable::Run() obj-firefox/dist/include/mozilla/TaskDispatcher.h:191 1 xul.dll mozilla::TaskQueue::Runner::Run() xpcom/threads/TaskQueue.cpp:236 2 xul.dll nsThreadPool::Run() xpcom/threads/nsThreadPool.cpp:225 3 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1067 4 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp:311 5 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:338 6 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:225 7 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:205 8 xul.dll nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:465 9 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397 10 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:95 11 ucrtbase.dll _o__CIpow 12 kernel32.dll BaseThreadInitThunk 13 ntdll.dll __RtlUserThreadStart 14 ntdll.dll _RtlUserThreadStart these cross-platform crashes have been around for a while. the crashing address of a portion of them indicates a uaf situation.
Comment 1•7 years ago
|
||
The majority of crashes with this signature are a null deref, but there are a decent number of UAF-looking crash addresses too.
I think this code is typically used by media, so I'm going to move it there. Hopefully that's the right component.
Component: IPC → Audio/Video
Comment 3•7 years ago
|
||
I had to do some digging to make the link to Audio/Video code from the stack. But it seems correct: 1) The relation between the crashing code and Audio/Video is TaskQueue, which is used by our video decoders. 2) The crash stack shows one thread inside msmpeg2vdec.dll CH264DecoderSoftware::GetDataFromStream(char**, int*, InputSampleInfo_t*) which indicates an H264 video decode is ongoing. Jesup, my best guess is that something is adding a Runnable but not keeping it alive long enough or so? Got an idea who can own this?
Flags: needinfo?(rjesup)
Comment 4•7 years ago
|
||
gcp's comment about CH264DecoderSoftware::GetDataFromStream() in msmpeg2vdec.dll implies strongly this has something to do with Platform Decoders.
Flags: needinfo?(rjesup) → needinfo?(jyavenard)
Updated•7 years ago
|
Component: Audio/Video → Audio/Video: Playback
Comment 5•7 years ago
|
||
This is an old bug (42b7 at least). Looking at some older crashes, like https://crash-stats.mozilla.com/report/index/18813b11-fec2-45f5-9373-b3e332170328 and https://crash-stats.mozilla.com/report/index/ef44c07e-8c06-4a5c-9b76-bdcb52170328, they seem to imply that mTasks is invalid (this is on a "for (size_t i = 0; i < mTasks->mStateChangeTasks.Length(); ++i)"), and some of the other reports hit on the mTasks->mRegularTasks.Length(), and a bunch more hit on one of the mTasks->...[i]->Run() lines. Several of these i looked at had nothing running that was media-related (other than cubeb idle in the background; just means that audio was used sometime in the session). This all makes me think it's a flaw somewhere in TaskDispatcher - mTasks lifetime? Or a cache issue (missing locking for multi-thread access)? I haven't looked deeply at the code in TaskDispatcher yet. Since mStateChangeTasks and mRegularTasks are both nsTArray<nsCOMPtr<nsIRunnable>>'s, it seems unlikely that it's a runnable that has been freed. CC'd a number of people who've touched TaskDispatcher.h for their awareness/thoughts. Given some of these are EXEC crashes (widen search to a few months and I see them; didn't look at the 1-week view), and it's a clear UAF with apparent reallocation in many cases, I'm tempted to bump it to sec-critical - al, dan? Updated affected list. No 55 crashes so far, but not high frequency in nightly/aurora in any rev.
status-firefox55:
--- → ?
status-firefox-esr45:
--- → affected
status-firefox-esr52:
--- → affected
Flags: needinfo?(n.nethercote)
Flags: needinfo?(jyavenard)
Flags: needinfo?(dveditz)
Flags: needinfo?(bobbyholley)
Flags: needinfo?(abillings)
Comment 6•7 years ago
|
||
I've only touched TaskDispatcher.h a little bit. I don't really know anything about it, sorry.
Flags: needinfo?(n.nethercote)
Comment 7•7 years ago
|
||
You want JW here.
Flags: needinfo?(jwwang)
Flags: needinfo?(dveditz)
Flags: needinfo?(bobbyholley)
Flags: needinfo?(abillings)
Comment 8•7 years ago
|
||
We have https://crash-stats.mozilla.com/report/index/978fff99-50b0-406f-9d39-a0ecf2170329 https://crash-stats.mozilla.com/report/index/6bfbd5fb-b02f-4292-8549-9db5b2170329 https://crash-stats.mozilla.com/report/index/9edc3dbe-9217-485d-ae27-4f7da2170330 which look like a null deref and crashed on the main thread. I will add some null checks in the hope of reducing null deref crash so other UAF issues can stand out.
Flags: needinfo?(jwwang)
Comment 9•7 years ago
|
||
Add null checks to avoid null deref.
Comment 10•7 years ago
|
||
Thanks for looking at this. It might be better to file a separate public bug for the null deref crashes.
Updated•7 years ago
|
Attachment #8853361 -
Flags: review?(bobbyholley) → review+
Comment 11•7 years ago
|
||
Comment on attachment 8853361 [details] [diff] [review] 1342420_P1_add_null_checks.patch Review of attachment 8853361 [details] [diff] [review]: ----------------------------------------------------------------- null-check will slightly help stability... but it doesn't solve anything. Most likely the null-derefs are "the memory was reallocated and set to null". We can search for non-null crashes today. And the null-check doesn't help the security aspect of the bug, I'm afraid.
Comment 12•7 years ago
|
||
Per https://bugzilla.mozilla.org/show_bug.cgi?id=1338025#c7, the crash address might be incorrect for some reason. The null checks makes debugging easier and less confusing.
Comment 13•7 years ago
|
||
If we really can't rely on the crash address, we need to either flag those value/platform combos in crash-stats, or omit them (or put <unknown>) -- but if 0 or fffff are "unknown", we'll have to consider a lot more bugs as potential sec issues. Ted? Is this true? Can we do anything?
Flags: needinfo?(ted)
Comment 14•7 years ago
|
||
I have Breakpad patches but they're complicated--it's basically pulling in a better disassembler and disassembling the faulting instruction to try to figure out the actual fault address. The x86-64 fault literally doesn't give out the faulting address when it's a non-canonical pointer.
Flags: needinfo?(ted)
Comment 15•7 years ago
|
||
Too late for firefox 52, mass-wontfix.
Comment 16•7 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #10) > Thanks for looking at this. It might be better to file a separate public bug > for the null deref crashes. Bug is filed for null checks.
Updated•7 years ago
|
Keywords: testcase-wanted
Comment 17•7 years ago
|
||
Sec-high triage: this bug hasn't been updated for one month. JW, is there any update?
Flags: needinfo?(jwwang)
Comment 18•7 years ago
|
||
No, I don't have new findings. The crashes so far all look like random memory corruption to me.
Flags: needinfo?(jwwang)
Updated•7 years ago
|
Priority: -- → P1
Comment 19•7 years ago
|
||
I'm pretty concerned about this one, since it's an EXEC crash on a runnable, and has been happening regularly Very likely this is a runnable that was somehow released or overwritten - I'll place my bets on released, even though we're holding a ref to it. Now the event queue should hold a ref to it, so if it is released somehow there's a refcounting mistake. The other option would be the PerThreadTaskGroup object itself somehow to be overwritten or destroyed. That's passed around in UniquePtr's, so it should be correct. In the TaskDispatcher, these runnables are held in a nsTArray<nsCOMPtr<nsIRunnable>>, so it shouldn't get released out from under us, and thus the assumption that there's a refcounting mistake in some other code, or something overwrote memory (and does so very regularly). If the PerThreadTaskGroup were destroyed, all the runnables would be likely freed, but if so I'm unclear how we'd get into here. Also, as noted, many of these crashes are EXECs to these pointers, so this might be exploitable. I also don't see any obvious stacks in the media code (we have media decoder/etc threads, but those are part of pools that hang around normally). It's certainly possible we're running media code, though. This doesn't appear to be truly random memory corruption, but that's not impossible. Does anyone have any thoughts, or additional tripwires/release-asserts we could land?
Comment 20•7 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #19) > Does anyone have any thoughts, or additional tripwires/release-asserts we > could land? Who did you mean to address with this question? Please forward the needinfo accordingly.
Flags: needinfo?(rjesup)
Comment 21•7 years ago
|
||
> ho did you mean to address with this question? Please forward the needinfo accordingly.
Anyone on this bug, really. It's very unclear where this is coming from.
Flags: needinfo?(rjesup) → needinfo?(wmccloskey)
Comment 22•7 years ago
|
||
https://crash-stats.mozilla.com/signature/?product=Firefox&signature=mozilla%3A%3AAutoTaskDispatcher%3A%3ATaskGroupRunnable%3A%3ARun&date=%3E%3D2017-01-20T06%3A06%3A47.000Z&date=%3C2017-07-20T06%3A06%3A47.000Z There are no reports after 54.0.1, I guess it's been fixed by some other bugs.
Comment 23•7 years ago
|
||
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #22) > https://crash-stats.mozilla.com/signature/ > ?product=Firefox&signature=mozilla%3A%3AAutoTaskDispatcher%3A%3ATaskGroupRunn > able%3A%3ARun&date=%3E%3D2017-01-20T06%3A06%3A47.000Z&date=%3C2017-07- > 20T06%3A06%3A47.000Z > > There are no reports after 54.0.1, I guess it's been fixed by some other > bugs. Still happening in 55b7 (reports from beta and nightly are always much lower in volume). About 10 or 12 in the last month or so in 55. https://crash-stats.mozilla.com/report/index/d0637fc6-b9d5-4316-b24d-aab6d0170705 (clear UAF)
status-firefox56:
--- → ?
The code looks correct to me. All I can think to do is to add some thread safety assertions, since this code is threaded and it doesn't have any assertions right now. However, I don't actually see any threading bugs just from inspection.
Flags: needinfo?(wmccloskey)
Comment 25•7 years ago
|
||
Bill, can *you* add those thread safety assertions?
Flags: needinfo?(wmccloskey)
I'm pretty busy. I might have time in a couple weeks. This is really JW's area, though.
Flags: needinfo?(wmccloskey) → needinfo?(jwwang)
Comment 27•7 years ago
|
||
Turned out my plate is full of media tasks and won't have time for XPCOM stuff. Maybe some other XPCOM guy is interested in taking this bug.
Assignee: jwwang → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(jwwang)
Updated•7 years ago
|
Component: Audio/Video: Playback → XPCOM
Keywords: stale-bug
Comment 29•7 years ago
|
||
The null derefs seemed to go away in more recent versions(In reply to JW Wang [:jwwang] [:jw_wang] from comment #16) > (In reply to Andrew McCreight [:mccr8] from comment #10) > > Thanks for looking at this. It might be better to file a separate public bug for the null deref crashes. > > Bug is filed for null checks. The initial comment was filed on a UAF crash and explicitly mentions UAF as the reason for filing the bug.
Assignee: nobody → nfroyd
Keywords: stale-bug
Comment 30•7 years ago
|
||
I don't have much to add; the following is just me trying to write everything out so maybe somebody else can poke holes in this logic. OK, so we have a UAF / null deref in this code: // State change tasks get run all together before any code is run, so // that all state changes are made in an atomic unit. for (size_t i = 0; i < mTasks->mStateChangeTasks.Length(); ++i) { mTasks->mStateChangeTasks[i]->Run(); } mTasks is a UniquePtr; mStateChangeTasks is an array of nsCOMPtrs. So, assuming those are used correctly, we shouldn't have any issues. The only way mStateChangeTasks gets touched is in AutoTaskDispatcher::AddStateChangeTask(), where we pass in an existing strong reference. AddStateChangeTask is only called by AbstractThread::DispatchStateChange, which in turn is only called by places that allocate a runnable and immediately pass it in. So, that looks like a dead end, and the likely UAF is mTasks? TaskGroupRunnable::mTasks is a UniquePtr<PerThreadTaskGroup>, so we should start by figuring out how data flows into that. TaskGroupRunnable is constructed in AutoTaskDispatcher::DispatchTaskGroup, which receives a UniquePtr of the appropriate type. DispatchTaskGroup is called in two places: 1. ~AutoTaskDispatcher; 2. AutoTaskDispatcher::DispatchTasksFor. Both of these places are moving UniquePtr<PerThreadTaskGroup> out of AutoTaskDispatcher::mTaskGroups, which is an nsTArray<UniquePtr<PerThreadTaskGroup>>. That seems safe enough (though we have suspicious looking array indexing crashes in ~AutoTaskDispatcher...). How does mTaskGroups get added to? In one of: 1. AutoTaskDispatcher::EnsureTaskGroup; or 2. AutoTaskDispatcher::AddTask. EnsureTaskGroup is only called in AutoTaskDispatcher::AddStateChangeTask. So where would AddStateChangeTask or AddTask get called from? Well, where would AutoTaskDispatcher get used? It gets used in two places: 1. TaskQueue::AutoTaskGuard (subclassed); 2. EventTargetWrapper, as a Maybe<> member. Assuming I understand things correctly, EventTargetWrapper and its use of AutoTaskDispatcher can only be used on cycle-collected threads, because when we construct EventTargetWrapper::mTailDispatcher in EventTargetWrapper::TailDispatcher, we call nsContentUtils::RunInStableState, which calls into the CC...if we weren't on a CC thread, we'd crash. But what if we *weren't* on a CC thread when TailDispatcher was called for the nth time, and we handed out a reference to memory that FireTailDispatcher was about to destroy on a different, CC'd thread? That seems like a semi-plausible scenario; a lot of the crashes are happening on non-CC'd threads (generally threads in an nsThreadPool). That leaves TaskQueue::AutoTaskGuard. AutoTaskGuard's constructor takes a TaskQueue*, and sets the TaskQueue's mTailDispatcher member to the address AutoTaskGuard being constructed. So maybe TaskQueue is handing out pointers/references to this thing that lives on the stack, and those pointers are being used--as an AutoTaskDispatcher *after* the AutoTaskGuard has been deconstructed? The AutoTaskDispatcher's members would still be reasonable values (subject to the vagaries of stack use by other functions), but they'd point to freed memory, perhaps, and we'd get hosed? The only problem with that theory (and the one around EventTargetWrapper, above), is that TaskQueue::TailDispatcher (and EventTargetWrapper::TailDispatcher) is only called to immediately do something with the result; the result of TailDispatcher is not stored away for later use. So the problematic scenario is something like: 1. Call TailDispatcher, get reference back; 2. Destroy the underlying AutoTaskDispatcher on a different thread; 3. Use the TailDispatcher...which is overwritten with freed memory at this point? That scenario doesn't seem to work for TaskQueue::AutoTaskGuard, since AutoTaskGuard is stack-scoped, so I think that leaves EventTargetWrapper as the likeliest place for problems? We could test that theory by doing some combination of: 1. converting the MOZ_ASSERT in EventTargetWrapper::TailDispatcher into a MOZ_RELEASE_ASSERT; and/or 2. Making Maybe::reset() actually scribble over the destroyed memory with the poison value. Actually, the second one sounds like a really good idea, irrespective of whether it does something in this bug. I'll file that as a separate, probably security-sensitive bug?
Flags: needinfo?(nfroyd)
Comment 31•6 years ago
|
||
Nathan: did you file that other bug? If so please link to it from here or add it to "See Also". Do we have any other ways forward on this bug?
Flags: needinfo?(nfroyd)
Comment 32•6 years ago
|
||
(In reply to Daniel Veditz [:dveditz] from comment #31) > Nathan: did you file that other bug? If so please link to it from here or > add it to "See Also". Bug 1414901, added to "See Also". > Do we have any other ways forward on this bug? I can convert the MOZ_ASSERT in EventTargetWrapper::TailDispatcher into a release or diagnostic assert, I guess? We get a few hits on this crash each week on beta and nightly, so that might provide a little information. Bug 1414901 would help some here, too.
Flags: needinfo?(nfroyd)
See Also: → 1414901
Comment 33•6 years ago
|
||
The most recent (and likely most interesting) crash is on 60.0b7 from a 2018-03-29 06:17:37 build https://crash-stats.mozilla.com/report/index/8540ccb4-b2fe-4fd3-8e49-909290180329 There's nothing else much. What else do you think we need to do with the bugs mentioned above fixed, Nathan?
Flags: needinfo?(nfroyd)
Comment 34•6 years ago
|
||
(In reply to Frederik Braun [:freddyb] from comment #33) > The most recent (and likely most interesting) crash is on 60.0b7 from a > 2018-03-29 06:17:37 build > https://crash-stats.mozilla.com/report/index/8540ccb4-b2fe-4fd3-8e49- > 909290180329 > > There's nothing else much. What else do you think we need to do with the > bugs mentioned above fixed, Nathan? I can't tell from that crash if we're crashing somewhere in a poisoned region or not. Did we remove the poison address information from crash reports? =/ Also, why does that crash not show up when searching for the signature?
Flags: needinfo?(nfroyd)
Comment 35•6 years ago
|
||
The poison info annotations are still there: https://dxr.mozilla.org/mozilla-central/rev/c23c7481957f7b982cffc0ce1d25979c69ca2c2f/toolkit/xre/nsAppRunner.cpp#4455 ...but apparently we don't send the parent process ones for child processes (which sorta makes sense, as we could ostensibly choose a different address per process): https://dxr.mozilla.org/mozilla-central/rev/c23c7481957f7b982cffc0ce1d25979c69ca2c2f/toolkit/crashreporter/nsExceptionHandler.cpp#321 I guess we should make child processes explicitly annotate these as well.
Updated•5 years ago
|
Updated•3 years ago
|
Assignee: froydnj+bz → nobody
Updated•2 years ago
|
Severity: critical → S2
Comment 37•4 months ago
|
||
We're working to close out stalled bugs that cannot be actioned. There are still a few coming in every release but at this point we're going to close it out and can reopen or refile and reinvestigate if we want to.
Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → INCOMPLETE
Comment 38•4 months ago
|
||
Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit BugBot documentation.
Keywords: stalled
Updated•3 months ago
|
Group: dom-core-security
You need to log in
before you can comment on or make changes to this bug.
Description
•