Closed
Bug 1371203
Opened 7 years ago
Closed 7 years ago
AddressSanitizer: heap-use-after-free nsCOMPtr.h:834:7 in nsCOMPtr
Categories
(Core :: Networking: HTTP, defect)
Core
Networking: HTTP
Tracking
()
VERIFIED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox53 | --- | unaffected |
firefox54 | --- | unaffected |
firefox55 | + | fixed |
People
(Reporter: cbook, Assigned: schien)
References
(Blocks 1 open bug, )
Details
(4 keywords, Whiteboard: [post-critsmash-triage])
Attachments
(2 files, 1 obsolete file)
13.00 KB,
text/plain
|
Details | |
2.79 KB,
patch
|
mayhemer
:
review+
|
Details | Diff | Splinter Review |
Found via bughunter and seems not reproducible on fedora but seems on ubuntu
Steps to reproduce:
(Warning NSFW URL!!!)
-Load->
http://www.xvideos.com/video18880039/japinhas_putas_trocando_esperma_de_boca_-_cumswap_japanese_asian_chicks_60fps?pl=16226303&plname=asian_bang
i guess the crash might have to do with this video playback maybe
SUMMARY: AddressSanitizer: heap-use-after-free /home/worker/workspace/build/src/obj-firefox/dist/include/nsCOMPtr.h:834:7 in nsCOMPtr
Shadow bytes around the buggy address:
0x0c388000afc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c388000afd0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c388000afe0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c388000aff0: fd fd fd fd fa fa fa fa fa fa fa fa fa fa fa fa
0x0c388000b000: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c388000b010: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd[fd]
0x0c388000b020: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c388000b030: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c388000b040: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c388000b050: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
0x0c388000b060: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Heap right redzone: fb
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack partial redzone: f4
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==13782==ABORTING
Comment 1•7 years ago
|
||
Stack involves HTTP.
Group: core-security → network-core-security
Component: General → Networking: HTTP
Assignee | ||
Comment 3•7 years ago
|
||
@tomcat, which version does this bug report on?
Flags: needinfo?(cbook)
Reporter | ||
Comment 4•7 years ago
|
||
bc: do you know the version number of the ubuntu maschines in production ?
Flags: needinfo?(cbook) → needinfo?(bob)
Assignee | ||
Comment 5•7 years ago
|
||
Additional question, is this bug filed on Firefox 55?
Assignee | ||
Comment 6•7 years ago
|
||
Oh, I see PBackgroundHttp related class in the call stack so it must be Firefox 55.
Flags: needinfo?(schien)
Assignee | ||
Comment 7•7 years ago
|
||
From the free call stack
> #0 0x4bb62b in __interceptor_free /builds/slave/moz-toolchain/src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:47:3
> #1 0x7f9c14223db1 in mozilla::net::HttpChannelChild::Release() /home/worker/workspace/build/src/netwerk/protocol/http/HttpChannelChild.cpp:232:5
> #2 0x7f9c147f8a2e in DestroySubtree /home/worker/workspace/build/src/obj-firefox/ipc/ipdl/PHttpBackgroundChannelChild.cpp:417:5
> #3 0x7f9c147f8a2e in mozilla::net::PHttpBackgroundChannelChild::OnMessageReceived(IPC::Message const&) /home/worker/workspace/build/src/obj-firefox/ipc/ipdl/PHttpBackgroundChannelChild.cpp:370
> #4 0x7f9c1470043f in mozilla::ipc::PBackgroundChild::OnMessageReceived(IPC::Message const&) /home/worker/workspace/build/src/obj-firefox/ipc/ipdl/PBackgroundChild.cpp:1608:28
> #5 0x7f9c14651614 in mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) /home/worker/workspace/build/src/ipc/glue/MessageChannel.cpp:2075:25
> #6 0x7f9c1464e257 in mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) /home/worker/workspace/build/src/ipc/glue/MessageChannel.cpp:2001:17
> #7 0x7f9c1464ff64 in mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) /home/worker/workspace/build/src/ipc/glue/MessageChannel.cpp:1870:5
https://searchfox.org/mozilla-central/rev/a798ee4fc323f9387b7576dbed177859d29d09b7/netwerk/protocol/http/HttpBackgroundChannelChild.cpp#436
Should be the code cause the release of HttpChannelChild, which means it enters normal channel release procedure. So, the problem is not on the free part.
So the root cause is that there is a ChannelEventQueue::CompleteResume dispatched and channel release procedure kicks in before the CompleteResuem is executed. ChannelEventQueue is kept alive because the RunnableMethod holds the last strong reference.
This bug should be introduced by bug 1320744.
Assignee | ||
Comment 8•7 years ago
|
||
If the HttpChannelChild::Release happens on the same thread as the ChannelEventQueue::CompleteResume, we can simply null the mOwner and check the nullity before use.
However, if the HttpChannelChild::Release happens on different thread, there is no way you can guarantee that mOwner is not half way of the destruction. The only way is to hold a strong reference along with the CompleteResume runnable.
@mayhemer, do you think the solution above is the correct way to solve this bug?
Flags: needinfo?(honzab.moz)
Assignee | ||
Comment 9•7 years ago
|
||
here is the corresponding patch for comment #8.
Comment 10•7 years ago
|
||
Very good catch.
I just hope this is the only place where this in ChannelEventQueue:
// Keep ptr to avoid refcount cycle: only grab ref during flushing.
nsISupports *mOwner;
pays off like that...
Should we do an audit? I'd like to make sure this can't happen in the future. Some redesign would be nice (in a followup).
Flags: needinfo?(honzab.moz)
Comment 11•7 years ago
|
||
Comment on attachment 8876095 [details] [diff] [review]
test.patch
Review of attachment 8876095 [details] [diff] [review]:
-----------------------------------------------------------------
If your analyzes is correct, then this should solve this crash.
::: netwerk/ipc/ChannelEventQueue.cpp
@@ +142,5 @@
> mSuspended = false;
> return;
> }
>
> + // Hold a strong reference of mOwner to avoid channel released
'to avoid the channel release'
@@ +143,5 @@
> return;
> }
>
> + // Hold a strong reference of mOwner to avoid channel released
> + // before CompleteResume is executed.
'was executed'
Attachment #8876095 -
Flags: feedback?(honzab.moz) → feedback+
Comment 12•7 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #10)
> Some redesign would be nice (in a followup).
or at least make sure that _ANY_ addref(release) on the ChannelEventQueue object also does a corresponding addref(release) on its mOwner
Assignee | ||
Comment 13•7 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #12)
> (In reply to Honza Bambas (:mayhemer) from comment #10)
> > Some redesign would be nice (in a followup).
>
> or at least make sure that _ANY_ addref(release) on the ChannelEventQueue
> object also does a corresponding addref(release) on its mOwner
Three places that will touch mOwner:
1) ChannelEventQueue constructor:
ChannelEventQueue is created as a member of the owner/channel object. No life cycle issue at this point.
2) ChannelEventQueue::RunOrEnqueue:
This function is all called directly by IPC message handler of the owner/channel object. No life cycle issue at this point.
3) ChannelEventQueue::FlushQueue:
Maybe reached by two path:
a) ChannelEventQueue::CompleteResume, which will be asynchronously called by dispatching a runnable method. Will be fixed by this patch.
b) AutoEventEnqueuer destructor, an RAII helper which used by owner/channel object. If owner/channel object is released on the stack, we might hit the same issue. Need to hold a strong reference of mOwner as well.
Assignee | ||
Comment 14•7 years ago
|
||
hold strong reference in AutoEventEnqueuer as well.
Attachment #8876153 -
Flags: review?(honzab.moz)
Assignee | ||
Updated•7 years ago
|
Attachment #8876095 -
Attachment is obsolete: true
Assignee | ||
Comment 15•7 years ago
|
||
@tomcat, is it possible to ask bug reporter to help verify my patch?
My ubuntu machine uses corporation network which is not allowed to access the test URL.
Flags: needinfo?(cbook)
Comment 16•7 years ago
|
||
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #13)
> (In reply to Honza Bambas (:mayhemer) from comment #12)
> > (In reply to Honza Bambas (:mayhemer) from comment #10)
> > > Some redesign would be nice (in a followup).
> >
> > or at least make sure that _ANY_ addref(release) on the ChannelEventQueue
> > object also does a corresponding addref(release) on its mOwner
>
> Three places that will touch mOwner:
>
> 1) ChannelEventQueue constructor:
> ChannelEventQueue is created as a member of the owner/channel object. No
> life cycle issue at this point.
>
> 2) ChannelEventQueue::RunOrEnqueue:
> This function is all called directly by IPC message handler of the
> owner/channel object. No life cycle issue at this point.
>
> 3) ChannelEventQueue::FlushQueue:
> Maybe reached by two path:
> a) ChannelEventQueue::CompleteResume, which will be asynchronously
> called by dispatching a runnable method. Will be fixed by this patch.
> b) AutoEventEnqueuer destructor, an RAII helper which used by
> owner/channel object. If owner/channel object is released on the stack, we
> might hit the same issue. Need to hold a strong reference of mOwner as well.
Those are good, but it's not exactly what I wanted. You check on access to mOwner. I wanted to check access to |this| - when |this| is somehow addreffed then mOwner must be as well, for the same period.
It's a coincidence only that AutoEventEnqueuer touches mOwner. It also refers the queue itself. So the fix is correct, but from a different reason.
Comment 17•7 years ago
|
||
Comment on attachment 8876153 [details] [diff] [review]
fix.patch
Review of attachment 8876153 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks! This LGTM.
Attachment #8876153 -
Flags: review?(honzab.moz) → review+
Assignee | ||
Updated•7 years ago
|
Keywords: checkin-needed
Comment 19•7 years ago
|
||
Blocks: 1320744
status-firefox53:
--- → unaffected
status-firefox54:
--- → unaffected
status-firefox55:
--- → affected
status-firefox-esr52:
--- → unaffected
tracking-firefox55:
--- → ?
Keywords: checkin-needed
Comment 20•7 years ago
|
||
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #7)
> So the root cause is that there is a ChannelEventQueue::CompleteResume
> dispatched and channel release procedure kicks in before the CompleteResuem
> is executed.
Could ~nsHttpChannel (last Release()) be called on thread 1 just before thread 2 executes mOwner->AddRef in ChannelEventQueue::ResumeInternal?
If so, this is still not fixed and we have "reference count atomicity illusion problem" as described at https://www.janbambas.cz/illusion-atomic-reference-counting/, with some abstraction to map on what I describe above.
The situation I describe would not be possible if we were 100% sure that all calls to AddRef() on a ChannelEventQueue object were always preceded by a call to mOwner->AddRef() on the same thread. Are we sure?
Flags: needinfo?(schien)
Comment 21•7 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Keywords: csectype-uaf,
sec-critical
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Assignee | ||
Comment 22•7 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #20)
> (In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment
> #7)
> > So the root cause is that there is a ChannelEventQueue::CompleteResume
> > dispatched and channel release procedure kicks in before the CompleteResuem
> > is executed.
>
> Could ~nsHttpChannel (last Release()) be called on thread 1 just before
> thread 2 executes mOwner->AddRef in ChannelEventQueue::ResumeInternal?
>
> If so, this is still not fixed and we have "reference count atomicity
> illusion problem" as described at
> https://www.janbambas.cz/illusion-atomic-reference-counting/, with some
> abstraction to map on what I describe above.
>
> The situation I describe would not be possible if we were 100% sure that all
> calls to AddRef() on a ChannelEventQueue object were always preceded by a
> call to mOwner->AddRef() on the same thread. Are we sure?
This is guaranteed in current code base since
1) RunOrEnqueue will hold additional reference of mOwner before ChannelEventQueue::ResumeInternal
2) FlushQueue will hold additional reference mOwner (which is now guaranteed to be lived) before next ResumeInternal cycle.
The first additional reference is added on the call stack of Resume, which is either by RunOrEnqueue or directly mEventQueue->Resume by the HttpChannelChild.
3) AutoEventEnqueuer is called by mOwner, I don't see any "this" pointer use-after-free issue reported by HttpChannelChild.
Flags: needinfo?(schien)
Reporter | ||
Comment 23•7 years ago
|
||
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #15)
> @tomcat, is it possible to ask bug reporter to help verify my patch?
> My ubuntu machine uses corporation network which is not allowed to access
> the test URL.
Bob: can we let the production bughunter that detected this orginally verify this issue ? thanks!
Flags: needinfo?(cbook) → needinfo?(bob)
Comment 24•7 years ago
|
||
I ran this url 30 times through bughunter and could not reproduce though we only had the original crash to begin with. As far as I can tell with the intermittent nature of this bug, it seems resolved.
Flags: needinfo?(bob)
Comment 25•7 years ago
|
||
I also ran this on a vm from the command line loading the url 100 times and did not reproduce.
Updated•7 years ago
|
Updated•7 years ago
|
Group: network-core-security → core-security-release
Comment 26•7 years ago
|
||
Marking verified as per comment 24 and comment 25.
Status: RESOLVED → VERIFIED
Whiteboard: [post-critsmash-triage]
Updated•7 years ago
|
Group: core-security-release
Comment 28•5 years ago
|
||
Bugbug thinks this bug is a regression, but please revert this change in case of error.
Keywords: regression
Updated•5 years ago
|
Blocks: asan-maintenance
Updated•3 years ago
|
Comment 32•3 years ago
|
||
Restricting comments to cut down on spam on this old bug.
Restrict Comments: true
You need to log in
before you can comment on or make changes to this bug.
Description
•