Closed Bug 1440166 Opened 7 years ago Closed 7 years ago

AddressSanitizer: heap-use-after-free xpcom/threads/nsThreadUtils.cpp:50:1 in mozilla::Runnable::Release()

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla61

Tracking Flags:

Tracking

Status

firefox61

---

fixed

People

(Reporter: bc, Assigned: froydnj)

References

(Blocks 1 open bug,
URL
)

Details

(Keywords: csectype-uaf, Whiteboard: [debug only])

Attachments

(7 files)

debug asan report 7 years ago Bob Clary [:bc] (inactive) 21.90 KB, text/plain		Details
2018-02-21-asan-mozilla-runnable-release.txt 7 years ago Bob Clary [:bc] (inactive) 11.84 KB, text/plain		Details
huffingtonpost.com debug log 7 years ago Bob Clary [:bc] (inactive) 56.60 KB, text/plain		Details
youtube debug log 7 years ago Bob Clary [:bc] (inactive) 68.75 KB, text/plain		Details
moonwars debug log 7 years ago Bob Clary [:bc] (inactive) 64.79 KB, text/plain		Details
moonwars.txt 7 years ago Bob Clary [:bc] (inactive) 70.67 KB, text/plain		Details
avoid races in refcount logging with named classes 7 years ago Nathan Froyd [:froydnj] 3.27 KB, patch	mccr8 : review+	Details \| Diff \| Splinter Review

Bob Clary [:bc] (inactive)

Reporter

Description

•

7 years ago

Attached file debug asan report — Details

1. https://moonwars.drakemoon.com/ Linux x86_64 debug asan build on Fedora 27 4.15.3 may need to try multiple times or reload 2. ==5599==ERROR: AddressSanitizer: heap-use-after-free on address 0x6070005942e0 at pc 0x7f4f91eff4eb bp 0x7f4f7c56bf30 sp 0x7f4f7c56bf28 READ of size 8 at 0x6070005942e0 thread T26 (Softwar~cThread) #0 0x7f4f91eff4ea in mozilla::Runnable::Release() /builds/worker/workspace/build/src/xpcom/threads/nsThreadUtils.cpp:50:1 #1 0x7f4f987054fa in mozilla::layout::VsyncParent::NotifyVsync(mozilla::TimeStamp) /builds/worker/workspace/build/src/layout/ipc/VsyncParent.cpp:60:1 #2 0x7f4f987059bc in non-virtual thunk to mozilla::layout::VsyncParent::NotifyVsync(mozilla::TimeStamp) /builds/worker/workspace/build/src/layout/ipc/VsyncParent.cpp #3 0x7f4f97b65105 in mozilla::RefreshTimerVsyncDispatcher::NotifyVsync(mozilla::TimeStamp) /builds/worker/workspace/build/src/widget/VsyncDispatcher.cpp:117:29 #4 0x7f4f941538b6 in mozilla::gfx::VsyncSource::Display::NotifyVsync(mozilla::TimeStamp) /builds/worker/workspace/build/src/gfx/thebes/VsyncSource.cpp:68:33 #5 0x7f4f941532a2 in SoftwareDisplay::NotifyVsync(mozilla::TimeStamp) /builds/worker/workspace/build/src/gfx/thebes/SoftwareVsyncSource.cpp:108:12

Eric Rahm [:erahm]

Comment 1

•

7 years ago

This seems like more of a layout bug.

Group: core-security → layout-core-security

Component: XPCOM → Layout

Andrew McCreight [:mccr8]

Comment 2

•

7 years ago

Well, the actual code involved is pretty bog standard IPDL runnable stuff, AFAICT. Any ideas, Nathan? From the stacks, it looks like this is a runnable because created on the vsync thread, which gets destroyed on the IPDL background thread, then we try to release it on the vsync thread. The only way I can imagine that happening is if the refcounting is messed up somehow. But this looks like a super common pattern, so I don't know what could be going wrong.

Flags: needinfo?(nfroyd)

Bob Clary [:bc] (inactive)

Reporter

Comment 3

•

7 years ago

Attached file 2018-02-21-asan-mozilla-runnable-release.txt — Details

pseudo-stacks for the top 5 frames and counts for all of the instances of this error that I have seen in total. They are mostly in mozilla::Runnable::Release() nsThreadPool::Run() non-virtual thunk to nsThreadPool::Run() nsThread::ProcessNextEvent(bool, bool*) NS_ProcessNextEvent(nsIThread*, bool) while only a few have mozilla::Runnable::Release() mozilla::layout::VsyncParent::NotifyVsync(mozilla::TimeStamp) non-virtual thunk to mozilla::layout::VsyncParent::NotifyVsync(mozilla::TimeStamp) See bug 1416529 comment 7 for an example where I found mozilla::net::CacheIOThread::ThreadFunc() on the stack at https://ngodung.tk/. Even though I have retested these urls multiple times, Bughunter has for the most part only been able to hit the error once per url though I have a handful of urls where it was reproduced 2-3 times. I'll look for better reproducible instances and update here when I find one.

Nathan Froyd [:froydnj]

Assignee

Comment 4

•

7 years ago

I have no idea what's going on here: // Called on hardware vsync thread. We should post to current ipc thread. MOZ_ASSERT(!IsOnBackgroundThread()); nsCOMPtr<nsIRunnable> vsyncEvent = NewRunnableMethod<TimeStamp>("layout::VsyncParent::DispatchVsyncEvent", this, &VsyncParent::DispatchVsyncEvent, aTimeStamp); MOZ_ALWAYS_SUCCEEDS(mBackgroundThread->Dispatch(vsyncEvent, NS_DISPATCH_NORMAL)); 1. We create a runnable method, taking a strong reference to VsyncParent. 2. We addref the runnable. 3. We dispatch the runnable, which means that the thread's event queue takes (another) strong reference to the runnable. 4. We release the runnable. According to the stack in comment 0, step 4 means that we're touching freed memory? The only plausible way that could happen is if somebody dropped the strong reference the event queue takes (fine, that happens when the runnable actually runs, as one can see from the ASan log) *and* dropped another (non-existent) strong reference as well. But soooo many things would break elsewhere if that happened that I have a hard time imagining a plausible scenario for that. We can't have both threads being in Runnable::Release() and attempting to delete the runnable simultaneously, because the runnable's reference count is atomic...if we did, all sorts of things would be broken. Maybe the compiler deferred the AddRef() of `vsyncEvent` until *after* the Dispatch()? But I don't see how the compiler could be permitted to do that... Comment 0 claims this is a *debug* build, which suggests that perhaps the Release() in debug builds is doing bad things? Debug-only logging, or something like that? But then I'd expect that normal ASan runs (even our test suite) would show badness like this in dozens of other places. I'm pretty mystified.

Andrew McCreight [:mccr8]

Updated

•

7 years ago

Group: layout-core-security → dom-core-security

Component: Layout → XPCOM

Keywords: sec-high

Bob Clary [:bc] (inactive)

Reporter

Comment 5

•

7 years ago

I can provide full logs of instances where this reproduced if that would help. Just let me know.

Nathan Froyd [:froydnj]

Assignee

Comment 6

•

7 years ago

(In reply to Bob Clary [:bc:] from comment #5) > I can provide full logs of instances where this reproduced if that would > help. Just let me know. What do full logs include that the logs you have already posted do not?

Flags: needinfo?(nfroyd) → needinfo?(bob)

Bob Clary [:bc] (inactive)

Reporter

Comment 7

•

7 years ago

Attached file huffingtonpost.com debug log — Details

Normal Gecko debug spew. Example mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::NotifyVsync related stack. https://www.huffingtonpost.com/entry/parkland-florida-school-shooting_us_5a849439e4b0774f31d19627

Flags: needinfo?(bob)

Bob Clary [:bc] (inactive)

Reporter

Comment 8

•

7 years ago

Attached file youtube debug log — Details

https://www.youtube.com/watch?v=au54r1157T0 is an example of cases where I see a number of INFO:audioipc_client::stream messages. This does not have the vsync frames but does have Error: PLayerTransaction::Msg_ReleaseLayer Processing error: message was deserialized, but the handler returned false (indicating failure)

Bob Clary [:bc] (inactive)

Reporter

Comment 9

•

7 years ago

Attached file moonwars debug log — Details

moonwars debug log from a local run. Note I am using marionette to launch the browser. Maybe that is part of the issue?

Eric Rahm [:erahm]

Comment 10

•

7 years ago

Nathan Froyd [:froydnj]

Assignee

Comment 11

•

7 years ago

I like that idea, but Dispatch(nsIRunnable*) is literally just: nsresult Dispatch(nsIRunnable* aEvent, uint32_t aFlags) { return Dispatch(nsCOMPtr<nsIRunnable>(aEvent).forget(), aFlags); } Hardly anything exciting there. I guess maybe we could just pass in an already_AddRefed thing to Dispatch and then the release-after-Run() inside nsThread would take care of the lone reference, and we wouldn't have to worry about underflow. I'd *really* like to understand how this error is coming about. In limited testing with a Valgrind-enabled build, I wasn't able to trigger the use-after-free from comment 0. It could be linked to marionette, I guess (I wasn't using marionette), but again, I have trouble seeing how that happens.

Bob Clary [:bc] (inactive)

Reporter

Comment 12

•

7 years ago

Attached file moonwars.txt — Details

Note I've found that the follow reproduces the one of the failure modes reliably: 1. https://moonwars.drakemoon.com/ in debug asan build 2. press Ctrl-Shift-K to open Web Console. 3. enter window.open() in the input to open a new tab. 4. press Ctrl-Shift-K in the new tab to open a Web Console. 5. enter setInterval('opener.document.location.reload()', 30000); 6. wait for the crash. Doesn't seem to affect opt asan builds at all.

Nathan Froyd [:froydnj]

Assignee

Comment 13

•

7 years ago

I can confirm that bc's instructions reproduce for me on Ubuntu 16.04; I had to wait a little longer (~20 mins?) than bc did on the original Fedora machine (5-10 mins). I get the same stacks as comment 0, which completely mystifies me for all the reasons in comment 4. I'm going to try to catch this in rr, and then see if poking at the runnable yields interesting information.

Nathan Froyd [:froydnj]

Assignee

Comment 14

•

7 years ago

OK, I spent some quality time with rr today and I *believe* this is a race induced by ASan instrumentation, at least for the stack demonstrated by comment 0. For code that looks like: nsCOMPtr<nsIRunnable> r = ...; mOtherThread->Dispatch(r, ...); the sequence of problematic steps on the main thread (MT) and the off-main thread (OT) is: 1. MT: AddRef the runnable after its creation (refcnt 1) 2. MT: Dispatch the runnable (refcnt 2) 3. MT: Release the runnable (refcnt 2) 4. MT: Enter ThreadSafeAutoRefCnt::operator-- (refcnt 2) 5. MT: Atomically decrement the refcnt (refcnt 1) 6. MT: *Do not* return from operator--. Get interrupted, and... 7. OT: Pull runnable out of the event queue (refcnt 1) 8. OT: Run runnable (refcnt 1) 9. OT: Release runnable (refcnt 0) 10. OT: Destroy runnable, running ~Runnable, and freeing the memory (refcnt 0). 11. OT: ASan will scribble appropriate bytes in its shadow memory denoting freed memory for the runnable. However... 12. MT: Come back to Release() from ThreadSafeAutoRefCnt::operator--. 13. MT: We know that the refcnt was 1, so we don't have to destroy the runnable. But... 14. MT: Check that the memory for the runnable is still valid (I'm not entirely sure why it does this, given that there are no *actual* memory references in Release() at this point. But the fact is that it does, and then...) 15. MT: Crash the program because it looks like we were touching freed memory. This sequence of steps neatly explains why we had to run the testcase in comment 12 so long--we're racing between the Release() on the main thread and the Release() on the off-main thread. Also explains why things seem to be intermittent in comment 0. It would be interesting to see what the instrumented code in non-debug ASan builds looks like and whether there's still the check at step 14 after the refcount has been atomically decremented, or whether this is just an artifact of debug builds (and whether this is potentially fixed in newer clangs?).

Christian Holler (:decoder)

Comment 15

•

7 years ago

(In reply to Nathan Froyd [:froydnj] from comment #14) > 14. MT: Check that the memory for the runnable is still valid (I'm not > entirely sure why it does this, given that there are no *actual* memory > references in Release() at this point. But the fact is that it does, and > then...) I think this is the right question to ask.. Why is this check happening here? ASan does not check if memory is valid or not without an actual access being there. If that access is from us, then it is a bug in our codebase. So we need to find out why exactly there is something that causes this check to be present. Maybe you could look at the decompilation of Release() in this case (maybe with and without ASan), to see if you can see the access itself that is happening here?

Nathan Froyd [:froydnj]

Assignee

Comment 16

•

7 years ago

Oh, that's a good point, I should have looked a little farther and thought about this being a debug build. The disassembly looks like: 0x00007fea28b660a8 <+72>: mov %r14,%rdi 0x00007fea28b660ab <+75>: callq 0x7fea28948140 <mozilla::ThreadSafeAutoRefCnt::operator--()> => 0x00007fea28b660b0 <+80>: mov %rax,%r13 0x00007fea28b660b3 <+83>: cmpb $0x0,0x7fff8000(%rbx) 0x00007fea28b660ba <+90>: jne 0x7fea28b661a3 <mozilla::Runnable::Release()+323> 0x00007fea28b660c0 <+96>: mov 0x20(%r15),%rdx 0x00007fea28b660c4 <+100>: mov %r15,%rdi 0x00007fea28b660c7 <+103>: mov %r13,%rsi 0x00007fea28b660ca <+106>: callq 0x7fea28a04520 <NS_LogRelease(void*, nsrefcnt, char const*)> and the cmpb check at +83 is verifying that the access to mName for the call to NS_LogRelease is still valid. Which it's not, necessarily, because that might be freed memory at this point. Andrew, do you want to have a go at fixing that up? Relatedly, if all the things bc have seen in this bug are debug-only, then I don't think there's any reason for this to be a security bug. bc, is that a correct assumption? It certainly seems to be so from the logs...

Flags: needinfo?(continuation)

Flags: needinfo?(bob)

Bob Clary [:bc] (inactive)

Reporter

Comment 17

•

7 years ago

It certainly seems that this is not security sensitive. I would really like a landed patch that I could use to re-test all of the urls where I have seen this, but that is probably not necessary. Clear the security group if you like.

Flags: needinfo?(bob)

Nathan Froyd [:froydnj]

Assignee

Comment 18

•

7 years ago

Actually, the patch for this is not that hard. I will fix things up. ni? for mccr8 stays, but we'll repurpose it to making the bug not a security bug?

Assignee: nobody → nfroyd

Andrew McCreight [:mccr8]

Comment 19

•

7 years ago

Fun! Thanks for investigating and fixing this, Nathan.

Group: dom-core-security

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Updated

•

7 years ago

Keywords: sec-high

Whiteboard: [debug only]

Nathan Froyd [:froydnj]

Assignee

Comment 20

•

7 years ago

Attached patch avoid races in refcount logging with named classes — Details — Splinter Review

Attachment #8958804 - Flags: review?(continuation)

Andrew McCreight [:mccr8]

Comment 21

•

7 years ago

Comment on attachment 8958804 [details] [diff] [review] avoid races in refcount logging with named classes Review of attachment 8958804 [details] [diff] [review]: ----------------------------------------------------------------- ::: xpcom/base/nsISupportsImpl.h @@ +691,5 @@ > +// Use this macro to ensure that we have a local copy of the name prior to > +// the refcnt decrement. (We use a macro to make absolutely sure the name > +// isn't loaded in builds where it wouldn't be used.) > +#ifdef NS_BUILD_REFCNT_LOGGING > +#define NS_LOAD_NAME_BEFORE_RELEASE(localname, _name) \ Having a macro feels like overkill to me, but I suppose it is fine.

Attachment #8958804 - Flags: review?(continuation) → review+

Nathan Froyd [:froydnj]

Assignee

Comment 22

•

7 years ago

(In reply to Andrew McCreight [:mccr8] from comment #21) > ::: xpcom/base/nsISupportsImpl.h > @@ +691,5 @@ > > +// Use this macro to ensure that we have a local copy of the name prior to > > +// the refcnt decrement. (We use a macro to make absolutely sure the name > > +// isn't loaded in builds where it wouldn't be used.) > > +#ifdef NS_BUILD_REFCNT_LOGGING > > +#define NS_LOAD_NAME_BEFORE_RELEASE(localname, _name) \ > > Having a macro feels like overkill to me, but I suppose it is fine. I also wasn't sure if there would be unused variable warnings if the code was inserted directly into the named release macro...

Pulsebot

Comment 23

•

7 years ago

Pushed by nfroyd@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/a3c51e3bb6d9 avoid races in refcount logging with named classes; r=mccr8

Pulsebot

Comment 24

•

7 years ago

Pushed by nfroyd@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/4fcc2c3fa993 actually avoid races in refcount logging with named classes; rs=erahm

Andreea Pavel [:apavel]

Comment 25

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a3c51e3bb6d9

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox61: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla61

Stefan Hindli [:stefan_hindli]

Comment 26

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/4fcc2c3fa993

Sylvestre Ledru [:Sylvestre]

Updated

•

5 years ago

Blocks: asan-maintenance

You need to log in before you can comment on or make changes to this bug.