Closed
Bug 1498610
Opened 6 years ago
Closed 5 years ago
Crash in std::__push_heap<T>
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
WORKSFORME
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox62 | --- | unaffected |
firefox63 | --- | unaffected |
firefox64 | --- | wontfix |
firefox65 | --- | ? |
firefox66 | --- | ? |
People
(Reporter: calixte, Unassigned)
References
(Blocks 1 open bug)
Details
(4 keywords)
Crash Data
This bug was filed from the Socorro interface and is report bp-35423ee0-539d-44eb-8f55-0d4c10181012. ============================================================= Top 10 frames of crashing thread: 0 libxul.so void std::__push_heap<mozilla::ArrayIterator<mozilla::UniquePtr<TimerThread::Entry, mozilla::DefaultDelete<TimerThread::Entry> >&, nsTArray<mozilla::UniquePtr<TimerThread::Entry, mozilla::DefaultDelete<TimerThread::Entry> > > >, long, mozilla::UniquePtr<TimerThread::Entry, mozilla::DefaultDelete<TimerThread::Entry> >, __gnu_cxx::__ops::_Iter_comp_val<bool xpcom/ds/nsTArray.h:1112 1 libxul.so void std::__pop_heap<mozilla::ArrayIterator<mozilla::UniquePtr<TimerThread::Entry, mozilla::DefaultDelete<TimerThread::Entry> >&, nsTArray<mozilla::UniquePtr<TimerThread::Entry, mozilla::DefaultDelete<TimerThread::Entry> > > >, __gnu_cxx::__ops::_Iter_comp_iter<bool clang/include/c++/4.9.4/bits/stl_heap.h:228 2 libxul.so TimerThread::Run clang/include/c++/4.9.4/bits/stl_heap.h:310 3 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1252 4 libxul.so NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:530 5 libxul.so mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:364 6 libxul.so nsThread::ThreadFunc ipc/chromium/src/base/message_loop.cc:325 7 libnspr4.so _pt_root nsprpub/pr/src/pthreads/ptthread.c:201 8 libpthread-2.27.so libpthread-2.27.so@0x8090 9 libc-2.27.so libc-2.27.so@0x118efe ============================================================= There are 2 crashes (from 1 installation) in nightly 64 with buildid 20181012103207. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1299118. [1] https://hg.mozilla.org/mozilla-central/rev?node=9f83ebaabaa3
Flags: needinfo?(rjesup)
Comment 1•6 years ago
|
||
I'm assuming from the signature that this is a sec bug - addresses are real addresses. Though the documentation in nsITimer.idl doesn't say you have to Cancel() manually, I notice that 95% of calls to blahTimer = nullptr are preceded by blahTimer->Cancel(). However, in nsTimerImpl.cpp in nsTimer::Release(), if the timer is active (count = 1), then it CancelImpl()'s the timer - so Cancel() shouldn't be needed. Given that, I don't see how bug 1299118 could be causing this. And there have been no other crashes so far -- though perhaps they just haven't been lucky enough. This did add a one-shot timer, and it doesn't do anything specific to cancel it - which should be ok, though I think I should land a followup to null the timer (and mTTFI) on Clear(). Any thoughts Nathan?
Group: core-security
Component: Layout → XPCOM
Flags: needinfo?(rjesup) → needinfo?(nfroyd)
Keywords: csectype-uaf,
sec-high
OS: Linux → All
Updated•6 years ago
|
Group: core-security → dom-core-security
Comment 2•6 years ago
|
||
nulling out the timer would be a good first step. I'm not sure I like the implication that forgetting to null out the timer and just Release()'ing it normally from an nsCOMPtr means that you've introduced a security bug...
Flags: needinfo?(nfroyd)
Comment 3•6 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #2) > nulling out the timer would be a good first step. I'm not sure I like the > implication that forgetting to null out the timer and just Release()'ing it > normally from an nsCOMPtr means that you've introduced a security bug... So we have 2 reports, both from the same install on the same day, a few minutes apart. push_heap() is called in two places: in AddTimerInternal(), and in FindNextFireTimeForCurrentThread(). Interestingly, there's a spike of crashes at RemoveElementsAtUnsafe | TimerThread::RemoveFirstTimerInternal() and in InvalidArrayIndex called from TimerThread::RemoveLeadingCanceledTimersInternal() (again on one installation, this from 10/11), plus 2 crashes (1 e5 UAF) from nsTArray_Impl<T>::RemoveElementsAtUnsafe | TimerThread::RemoveLeadingCanceledTimersInternal() also on 10/11 (and none before that). Also std::_Pop_heap_hole_unchecked<T> (2, on 10/11) See https://crash-stats.mozilla.com/report/index/124df73c-d634-42d6-92b4-d6fad0181015 It looks like something happened on 10/11; bug 1299118 landed on 10/12 See https://crash-stats.mozilla.com/search/?proto_signature=~TimerThread%3A%3ARun&product=Firefox&version=63.0b14&date=%3E%3D2018-10-10T21%3A07%3A00.000Z&date=%3C2018-10-15T21%3A07%3A10.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature There are certainly several new Timer signatures, starting 10/11, 10/12, 10/13 that we don't see before that, mostly around the list of timers. Most are called from TimerThread::RemoveFirstTimerInternal() So right now I think the link to bug 1299118 is unlikely given Occam's razor and a bunch of new signatures starting 10/11 in this area. I wonder what landed 10/10-10/11? Nathan, thoughts?
Flags: needinfo?(nfroyd)
Comment 4•6 years ago
|
||
Just note for the stack in comment 0 we're looking at a *pop_heap* call that internally ends up calling __push_heap. The invalid array indexes look something like: > ElementAt(aIndex = 2264521068808, aLength = 18) So we're most likely looking at some sort of negative index. Also note for each of those crashes the main thread is blocked waiting to remove a timer. This seems to imply that `mTimers` is getting destroyed or resized while we're operating on it w/ heap operations.
Comment 5•6 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #4) > Just note for the stack in comment 0 we're looking at a *pop_heap* call that > internally ends up calling __push_heap. > > The invalid array indexes look something like: > > > ElementAt(aIndex = 2264521068808, aLength = 18) > > So we're most likely looking at some sort of negative index. Also note for > each of those crashes the main thread is blocked waiting to remove a timer. > > This seems to imply that `mTimers` is getting destroyed or resized while > we're operating on it w/ heap operations. This seems really unlikely since all the heap operations are (and do, just checked) happening under a lock. Unless we're holding interior pointers to the heap *outside* of a lock, and then somehow corrupting those? But that doesn't line up with the scenario you describe...
Flags: needinfo?(nfroyd)
Comment 6•6 years ago
|
||
6 crashes yesterdayday from the same user (a pattern...) - a few minutes apart, one with a facebook URL. https://crash-stats.mozilla.com/signature/?product=Firefox&version=63.0b14&proto_signature=~TimerThread%3A%3ARun&signature=nsTArray_Impl%3CT%3E%3A%3ARemoveElementsAtUnsafe%20%7C%20TimerThread%3A%3ARemoveFirstTimerInternal&date=%3E%3D2018-10-10T07%3A15%3A34.000Z&date=%3C2018-10-17T07%3A15%3A34.000Z&_columns=date&_columns=product&_columns=version&_columns=reason&_columns=address&_columns=install_time&_columns=url&_columns=uptime&_sort=-date&page=1#reports Note that none appear to have a timer removal in any of the other threads. WRITE failures.
Comment 7•6 years ago
|
||
A single user makes me think that this is some weird heap corruption issue.
Comment 8•6 years ago
|
||
Ok, so that crash was on Beta (and several of the others I flagged); bug 1299118 is only on 64 currently. There are a bunch of ugly TimerThread::Run() (on the stack) crashes I see in beta; those still could use investigation (like the one I flagged above, like the arraybounds errors). As erahm indicates, MainThread is waiting on a timer Cancel() when this happens in the crashes initially flagged here. Also: Both crashes reported with this signature are the same crash: same install time, same age since install (1510 seconds); though one says uptime 1507 seconds, the other says 500 seconds. Says to me that (somehow) these are the same crash. I can land a followup to Cancel(), but I really doubt that's the problem.
Updated•5 years ago
|
Comment 9•5 years ago
|
||
We haven't seen this since December, lets close for now.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
Updated•4 years ago
|
Group: dom-core-security
You need to log in
before you can comment on or make changes to this bug.
Description
•