Closed Bug 1498610 Opened 6 years ago Closed 5 years ago

Crash in std::__push_heap<T>

Categories

(Core :: XPCOM, defect)

Unspecified
All
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr60 --- unaffected
firefox62 --- unaffected
firefox63 --- unaffected
firefox64 --- wontfix
firefox65 --- ?
firefox66 --- ?

People

(Reporter: calixte, Unassigned)

References

(Blocks 1 open bug)

Details

(4 keywords)

Crash Data

This bug was filed from the Socorro interface and is
report bp-35423ee0-539d-44eb-8f55-0d4c10181012.
=============================================================

Top 10 frames of crashing thread:

0 libxul.so void std::__push_heap<mozilla::ArrayIterator<mozilla::UniquePtr<TimerThread::Entry, mozilla::DefaultDelete<TimerThread::Entry> >&, nsTArray<mozilla::UniquePtr<TimerThread::Entry, mozilla::DefaultDelete<TimerThread::Entry> > > >, long, mozilla::UniquePtr<TimerThread::Entry, mozilla::DefaultDelete<TimerThread::Entry> >, __gnu_cxx::__ops::_Iter_comp_val<bool  xpcom/ds/nsTArray.h:1112
1 libxul.so void std::__pop_heap<mozilla::ArrayIterator<mozilla::UniquePtr<TimerThread::Entry, mozilla::DefaultDelete<TimerThread::Entry> >&, nsTArray<mozilla::UniquePtr<TimerThread::Entry, mozilla::DefaultDelete<TimerThread::Entry> > > >, __gnu_cxx::__ops::_Iter_comp_iter<bool  clang/include/c++/4.9.4/bits/stl_heap.h:228
2 libxul.so TimerThread::Run clang/include/c++/4.9.4/bits/stl_heap.h:310
3 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1252
4 libxul.so NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:530
5 libxul.so mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:364
6 libxul.so nsThread::ThreadFunc ipc/chromium/src/base/message_loop.cc:325
7 libnspr4.so _pt_root nsprpub/pr/src/pthreads/ptthread.c:201
8 libpthread-2.27.so libpthread-2.27.so@0x8090 
9 libc-2.27.so libc-2.27.so@0x118efe 

=============================================================

There are 2 crashes (from 1 installation) in nightly 64 with buildid 20181012103207. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1299118.

[1] https://hg.mozilla.org/mozilla-central/rev?node=9f83ebaabaa3
Flags: needinfo?(rjesup)
I'm assuming from the signature that this is a sec bug - addresses are real addresses.  

Though the documentation in nsITimer.idl doesn't say you have to Cancel() manually, I notice that 95% of calls to blahTimer = nullptr are preceded by blahTimer->Cancel().  However, in nsTimerImpl.cpp in nsTimer::Release(), if the timer is active (count = 1), then it CancelImpl()'s the timer - so Cancel() shouldn't be needed.  

Given that, I don't see how bug 1299118 could be causing this.  And there have been no other crashes so far -- though perhaps they just haven't been lucky enough.  This did add a one-shot timer, and it doesn't do anything specific to cancel it - which should be ok, though I think I should land a followup to null the timer (and mTTFI) on Clear().


Any thoughts Nathan?
Group: core-security
Component: Layout → XPCOM
Flags: needinfo?(rjesup) → needinfo?(nfroyd)
OS: Linux → All
Group: core-security → dom-core-security
nulling out the timer would be a good first step.  I'm not sure I like the implication that forgetting to null out the timer and just Release()'ing it normally from an nsCOMPtr means that you've introduced a security bug...
Flags: needinfo?(nfroyd)
(In reply to Nathan Froyd [:froydnj] from comment #2)
> nulling out the timer would be a good first step.  I'm not sure I like the
> implication that forgetting to null out the timer and just Release()'ing it
> normally from an nsCOMPtr means that you've introduced a security bug...

So we have 2 reports, both from the same install on the same day, a few minutes apart.

push_heap() is called in two places: in AddTimerInternal(), and in FindNextFireTimeForCurrentThread().

Interestingly, there's a spike of crashes at RemoveElementsAtUnsafe | TimerThread::RemoveFirstTimerInternal() and in InvalidArrayIndex called from TimerThread::RemoveLeadingCanceledTimersInternal() (again on one installation, this from 10/11), plus 2 crashes (1 e5 UAF) from nsTArray_Impl<T>::RemoveElementsAtUnsafe | TimerThread::RemoveLeadingCanceledTimersInternal() also on 10/11 (and none before that).  Also std::_Pop_heap_hole_unchecked<T> (2, on 10/11)

See https://crash-stats.mozilla.com/report/index/124df73c-d634-42d6-92b4-d6fad0181015

It looks like something happened on 10/11; bug 1299118 landed on 10/12

See https://crash-stats.mozilla.com/search/?proto_signature=~TimerThread%3A%3ARun&product=Firefox&version=63.0b14&date=%3E%3D2018-10-10T21%3A07%3A00.000Z&date=%3C2018-10-15T21%3A07%3A10.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

There are certainly several new Timer signatures, starting 10/11, 10/12, 10/13 that we don't see before that, mostly around the list of timers.  Most are called from TimerThread::RemoveFirstTimerInternal()

So right now I think the link to bug 1299118 is unlikely given Occam's razor and a bunch of new signatures starting 10/11 in this area.  I wonder what landed 10/10-10/11?  Nathan, thoughts?
Flags: needinfo?(nfroyd)
Just note for the stack in comment 0 we're looking at a *pop_heap* call that internally ends up calling __push_heap.

The invalid array indexes look something like:

> ElementAt(aIndex = 2264521068808, aLength = 18)

So we're most likely looking at some sort of negative index. Also note for each of those crashes the main thread is blocked waiting to remove a timer.

This seems to imply that `mTimers` is getting destroyed or resized while we're operating on it w/ heap operations.
(In reply to Eric Rahm [:erahm] from comment #4)
> Just note for the stack in comment 0 we're looking at a *pop_heap* call that
> internally ends up calling __push_heap.
> 
> The invalid array indexes look something like:
> 
> > ElementAt(aIndex = 2264521068808, aLength = 18)
> 
> So we're most likely looking at some sort of negative index. Also note for
> each of those crashes the main thread is blocked waiting to remove a timer.
> 
> This seems to imply that `mTimers` is getting destroyed or resized while
> we're operating on it w/ heap operations.

This seems really unlikely since all the heap operations are (and do, just checked) happening under a lock.  Unless we're holding interior pointers to the heap *outside* of a lock, and then somehow corrupting those?  But that doesn't line up with the scenario you describe...
Flags: needinfo?(nfroyd)
A single user makes me think that this is some weird heap corruption issue.
Ok, so that crash was on Beta (and several of the others I flagged); bug 1299118 is only on 64 currently.  There are a bunch of ugly TimerThread::Run() (on the stack) crashes I see in beta; those still could use investigation (like the one I flagged above, like the arraybounds errors).

As erahm indicates, MainThread is waiting on a timer Cancel() when this happens in the crashes initially flagged here.

Also: Both crashes reported with this signature are the same crash: same install time, same age since install (1510 seconds); though one says uptime 1507 seconds, the other says 500 seconds.  Says to me that (somehow) these are the same crash.  

I can land a followup to Cancel(), but I really doubt that's the problem.

We haven't seen this since December, lets close for now.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
Group: dom-core-security
You need to log in before you can comment on or make changes to this bug.