AddressSanitizer: heap-use-after-free @ compare_exchange_strong | mozilla::net::HttpChannelChild::TrySendDeletingChannel()

RESOLVED FIXED in Firefox 58

Status

()

defect
P1
normal
RESOLVED FIXED
2 years ago
9 months ago

People

(Reporter: bc, Assigned: schien)

Tracking

(Blocks 1 bug, 4 keywords)

57 Branch
mozilla59
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox57 wontfix, firefox58+ fixed, firefox59+ fixed)

Details

(Whiteboard: [necko-triaged][post-critsmash-triage][adv-main58+], )

Attachments

(6 attachments, 4 obsolete attachments)

Reporter

Description

2 years ago
1. https://m.imgur.com/r/gaming/ggNcSBj

   This is one of several urls where I have seen this error over the last
   few weeks. I have attempted to reproduce this error manually but have
   failed. I am giving up attempting to reproduce before filing the bug
   and am just going to file it in the hope that someone will be able to
   figure out the problem. I'll attach a list of recent urls in case
   they are of help.

2. AddressSanitizer: heap-use-after-free on address 0x61d000f0395c at pc 0x7f096609a58c bp 0x7ffc67a80980 sp 0x7ffc67a80978
WRITE of size 4 at 0x61d000f0395c thread T0 (Web Content)

    #0 0x7f096609a58b in std::__atomic_base<unsigned int>::compare_exchange_strong(unsigned int&, unsigned int, std::memory_order, std::memory_order) /builds/worker/workspace/build/src/clang/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.9.4/../../../../include/c++/4.9.4/bits/atomic_base.h:581:9
    #1 0x7f096609a58b in mozilla::detail::IntrinsicMemoryOps<unsigned int, (mozilla::MemoryOrdering)2>::compareExchange(std::atomic<unsigned int>&, unsigned int, unsigned int) /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/Atomics.h:212
    #2 0x7f0966836b62 in mozilla::net::HttpChannelChild::TrySendDeletingChannel() /builds/worker/workspace/build/src/netwerk/protocol/http/HttpChannelChild.cpp:3399:29
Reporter

Comment 1

2 years ago
Posted file recent sample urls
Lots of interesting assertions in the ASAN log attachment. Have anyone who can take a look at this, Jason?
Flags: needinfo?(jduell.mcbugs)
Group: core-security → network-core-security

Comment 3

2 years ago
Can you take a look at this Dragana?  Pass the ni back to me if not.
Flags: needinfo?(jduell.mcbugs) → needinfo?(dd.mozilla)
We have a lot of assertions before this UAF.

There is a lot of warnings:
[Parent 12648, Socket Thread] WARNING: '!workerPrivate', file /builds/worker/workspace/build/src/dom/base/nsContentUtils.cpp, line 2384
probably they are harmless.

Further there are 4:
Child 12688, Main Thread] ###!!! ASSERTION: nsHTMLDocument::Close(): Trying to remove nonexistent wyciwyg channel!: 'mWyciwygChannel', file /builds/worker/workspace/build/src/dom/html/nsHTMLDocument.cpp, line 1886

and also:
[Child 12688, Main Thread] ###!!! ASSERTION: Wrong Document Channel: 'request == mDocumentRequest', file /builds/worker/workspace/build/src/uriloader/base/nsDocLoader.cpp, line 1454


HttpChannelChild dispatch TrySendDeletingChannel to main thread and before that gets executed PContentChild::OnChannelClose() is called on WebConntent thread that releases the HttpChannelChild.
In HttpChannelChild::Release() there is a comment that IPDL holds reference to HttpChannelChild.
I think this uaf is a consequence of the assertions beforehand.

Henri, can you figures something here? I am not sure really how could take a look.
Flags: needinfo?(dd.mozilla) → needinfo?(hsivonen)
(In reply to Dragana Damjanovic [:dragana] from comment #4)
> There is a lot of warnings:
> [Parent 12648, Socket Thread] WARNING: '!workerPrivate', file
> /builds/worker/workspace/build/src/dom/base/nsContentUtils.cpp, line 2384
> probably they are harmless.

I know nothing about this.

> Further there are 4:
> Child 12688, Main Thread] ###!!! ASSERTION: nsHTMLDocument::Close(): Trying
> to remove nonexistent wyciwyg channel!: 'mWyciwygChannel', file
> /builds/worker/workspace/build/src/dom/html/nsHTMLDocument.cpp, line 1886

I can't recall what my comment there means. I didn't add the assertions. (They come from bug 35340.) I don't recall if my comment means that the assertions firing is normal if a document.opened document document.closes itself.

> and also:
> [Child 12688, Main Thread] ###!!! ASSERTION: Wrong Document Channel:
> 'request == mDocumentRequest', file
> /builds/worker/workspace/build/src/uriloader/base/nsDocLoader.cpp, line 1454

Probably worth trying to catch this one in rr and seeing why it happens.

Sorry about this not being a very useful comment.
Flags: needinfo?(hsivonen)
Moving it to dom. It would be good if someone from dom could take a look at this assertions.
Component: Networking → DOM
Boris, do these DOM assertions from comment 4 suggest to you what might be causing a use-after-free in networking code here? Thanks.
Flags: needinfo?(bzbarsky)
Not offhand.  In particular, I see no way those assertions would affect things like ipdl lifetime management.  They indicate that we may have multiple document channels in flight at once (generally bad), but not much else.
Flags: needinfo?(bzbarsky)

Comment 9

2 years ago
Hi Andrew:

I have assigned these security bugs to you to reassign them to appropriate developers in your team to investigate and fix them.

Thanks!

Wennie

Updated

2 years ago
Assignee: nobody → overholt
Andrew McCreight and I discussed this bug the other day and it is pretty unactionable from our standpoint. Samael, Andrew, Ben: anything here trigger a thought of what could be a cause or where to investigate more?
Flags: needinfo?(sawang)
Flags: needinfo?(bugmail)
Flags: needinfo?(bkelly)
Amy told me that :schien is working on a similar security bug. CC'd him to see if he has some comments.
Although I still don't know how to reproduce it and what error causes the channel shuts down, it looks the UAF is because the TrySendDeletingChannel runnable was dispatched before a channel error caused PContentChild::OnChannelClose, but executed after that.

I read from bug 1320744 that the runnable was created tho NewNonOwningRunnableMethod [1] to prevent additional refcount on HttpChannelChild::Release (when mRefCnt becomes 1), but maybe it's worth to reconsider NewRunnableMethod instead?

[1] http://searchfox.org/mozilla-central/rev/1c4901d060e3953e41088c038b62a0c026c1e1fb/netwerk/protocol/http/HttpChannelChild.cpp#3501-3503
Flags: needinfo?(sawang)
The HttpChannelChild is destroyed because PContent is shutting down. I think the proper fix is to cancel the NewNonOwningRunnableMethod mentioned in comment #12 in the dtor of HttpChannelChild, since there is no need to manually close IPC since the entire object is going to die.

I'll suggest to split the DOM assertion to another bug. This UAF is triggered by an unusual code flow as comment #4 mentioned.
Flags: needinfo?(bkelly)
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #13)
> The HttpChannelChild is destroyed because PContent is shutting down. I think
> the proper fix is to cancel the NewNonOwningRunnableMethod mentioned in
> comment #12 in the dtor of HttpChannelChild, since there is no need to
> manually close IPC since the entire object is going to die.

Can you do that, S-C?
Flags: needinfo?(bugmail) → needinfo?(schien)
I'll take this bug. @overholt can you file another bug for continuing the investigation of DOM assertion?
Component: DOM → Networking: HTTP
Flags: needinfo?(schien) → needinfo?(overholt)
(In reply to Dragana Damjanovic [:dragana] from comment #4)
> and also:
> [Child 12688, Main Thread] ###!!! ASSERTION: Wrong Document Channel:
> 'request == mDocumentRequest', file
> /builds/worker/workspace/build/src/uriloader/base/nsDocLoader.cpp, line 1454

Note, I recently fixed this assertion to properly accept internal redirects in some cases where it did not before.

https://searchfox.org/mozilla-central/diff/9ed7291825223cdcf0704b88b1daf62ea2bfb096/uriloader/base/nsDocLoader.cpp#1456

It seems this assertion is from a version before I made that fix.  Does this assertion fire on latest nightly 58?
Assignee: overholt → schien
Priority: -- → P1
Whiteboard: [necko-triaged]
Reporter

Comment 17

2 years ago
(In reply to Ben Kelly [:bkelly] from comment #16)

> It seems this assertion is from a version before I made that fix.  Does this
> assertion fire on latest nightly 58?

Bughunter stopped seeing the assertion Wrong Document Channel on Nightly builds after 2017-10-09. Beta continues to see the assertion though not in 100% of the cases.

The assertion is not a necessary to reproduce the asan message as Nightly continues to be able to reproduce the heap errors. It does appear that it is more likely for Beta to have both the assertion and the asan heap error together though there are cases where Beta does not have the assertion but does have the asan heap error.
(In reply to Bob Clary [:bc:] from comment #17)
> (In reply to Ben Kelly [:bkelly] from comment #16)
> > It seems this assertion is from a version before I made that fix.  Does this
> > assertion fire on latest nightly 58?
> 
> Bughunter stopped seeing the assertion Wrong Document Channel on Nightly
> builds after 2017-10-09. Beta continues to see the assertion though not in
> 100% of the cases.

That's when I landed my fix in bug 1391693.  So that checks out.  Thanks for looking.
Posted patch [WIP]bug1401459.patch (obsolete) — Splinter Review
This is the sketch of the solution I mentioned in comment #13, cancel the runnable of TrySendDeletingChannel while IPDL is destroyed. Not completely fix the multi-thread issue but I think this is a good starting point.
Attachment #8924592 - Flags: feedback?(honzab.moz)
No need to file bug for DOM assertion according to comment #17.
Flags: needinfo?(overholt)
Attachment #8924592 - Flags: feedback?(honzab.moz) → feedback-
What's the next step for this bug? Looks like we need a new patch?
Flags: needinfo?(schien)
Yes, I'll update my patch soon. The complete solution might be more complicated according to my prior discussion with @mayhemer.

The mechanism of TrySendDeletingChannel is not safe to be triggered by non-main-thread. All the flags used by TrySendDeletingChannel should only be evaluated on main thread. In addition, we should keep one reference count when doing this to prevent the object being destroyed if IPC closed by other procedure.
Flags: needinfo?(schien)
Posted patch bug1401459.patch (obsolete) — Splinter Review
extra refcnt for kept-alive channel child, removed when TrySendDeletingChannel is invoked on main thread.
Attachment #8924592 - Attachment is obsolete: true
Attachment #8930417 - Flags: feedback?(honzab.moz)
Generally, using the ref counter this way is wrong.  There is always a chance the counter will be 2 but it the actual state of things will be different.  Maybe not today, but when we change something in the future not keeping this super-fragile thing in mind.  I'll take a look at the patch when I find time, but I think in advance I may want yet different solution here, not refcounter-tight.
Then we'll need to remove the refcount hack for IPDL lifecycle as well. I'm thinking about adding a HttpChannelChildSet per process to maintain the IPDL lifecycle out side of HttpChannelChild. In this case, we'll know exactly if the HttpChannelChild is not referenced by any object except the IPDL, using real RefPtr.
Comment on attachment 8930417 [details] [diff] [review]
bug1401459.patch

Review of attachment 8930417 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/HttpChannelChild.cpp
@@ +3484,5 @@
> +  MOZ_ASSERT(mIPCOpen);
> +  MOZ_ASSERT(!mKeptAlive);
> +  MOZ_ASSERT(!mRequireDeletingChannel);
> +
> +  mKeptAlive = true;

do we still need this flag?  since mRequireDeletingChannel seems to me like a sufficient indication.  but it's easy to loose track here for me what's needed and what is not...

@@ +3486,5 @@
> +  MOZ_ASSERT(!mRequireDeletingChannel);
> +
> +  mKeptAlive = true;
> +  mRequireDeletingChannel = true;
> +  RefPtr<HttpChannelChild> self = this;

hypothetically:

  ref count is 2 (this and some other thread has a ref)

mRequireDeletingChannel = true;

  someone on another thread releases -> we trigger the "count == 2" logic in Release()

RefPtr<HttpChannelChild> self = this;

  refcnt goes to 2

f- based on this.

@@ +3500,5 @@
>        return;
>      }
>  
> +    if (mIPCOpen) {
> +      Unused << PHttpChannelChild::SendDeletingChannel();

should mKeepAlive be dropped here?  what about its atomicity?  or can this method now be called only from Release()? - that would be OK then and I justified mKeepAlive is not needed.
Attachment #8930417 - Flags: feedback?(honzab.moz) → feedback-
Can you please make another attempt?
Flags: needinfo?(schien)
Duplicate of this bug: 1427116
Posted file bug1401459.pdf
Trying to document the root cause and proposed solution in comment #25.

I'm writing the corresponding patch and maybe @mayhemer can take a look at my proposal in parallel.
Attachment #8930417 - Attachment is obsolete: true
Flags: needinfo?(schien)
Attachment #8939764 - Flags: feedback?(honzab.moz)
sc, I'm so sorry I didn't get to this sooner, somehow I missed the f? request among all others.

it would be good to talk about this directly some day.  not sure I'll make it for the Thursday meeting, but I may be available around yours 5-6pm tomorrow or on Thursday later during that day if that's ok.

in the meantime, I would be interested in any wip patches or at least explanation what the HttpChannelChildActor, mKeptAliveSet and mActiveSet do exactly, it's not clear from the pdf.
Flags: needinfo?(schien)
Let's talk about my proposal after the Thursday meeting. I plan to upload a WIP today so that you can take a glance before our discussion.
Flags: needinfo?(schien)
Posted patch wip-bug1401459.patch (obsolete) — Splinter Review
This is the WIP for my proposal.
Haven't fixed all the interaction between HttpChannelChild and HttpChannelChildActor but worth to take a look about how HttpChannelChildActorSet works.
Attachment #8941377 - Flags: feedback?(honzab.moz)
Comment on attachment 8939764 [details]
bug1401459.pdf

when I'm looking at the first page, why don't we do something like this:

https://searchfox.org/mozilla-central/rev/03877052c151a8f062eea177f684a2743cd7b1d5/dom/storage/LocalStorageCache.cpp#99

simply said - always re-dispatch HttpChannelChild::Release() to the main thread.

if the last non-main thread (T1) throws the ref while the child is kept-alive, we would:
- T1: refcnt == 2, dispatch(&Release) ; after this, only IPC can manipulate the refcnt, what happens strictly only on the main thread, since no one keeps a reference except IPC now
- MT: ActorDestroy: refcnt goes from 2 to 1 -> ipc closed - hence no TrySendDeletingChannel, refcnt != 0 - hence no |delete this|
- no one can addref after this point (this was release of the IPC reference and the last reference release is on its way, no one has access to this anymore)
- MT: processing the dispatched Release(): refcnt goes from 1 to 0 -> delete this
Attachment #8939764 - Flags: feedback?(honzab.moz)
(In reply to Honza Bambas (:mayhemer) from comment #33)
> Comment on attachment 8939764 [details]
> bug1401459.pdf
> 
> when I'm looking at the first page, why don't we do something like this:
> 
> https://searchfox.org/mozilla-central/rev/
> 03877052c151a8f062eea177f684a2743cd7b1d5/dom/storage/LocalStorageCache.cpp#99
> 
> simply said - always re-dispatch HttpChannelChild::Release() to the main
> thread.
> 
> if the last non-main thread (T1) throws the ref while the child is
> kept-alive, we would:
> - T1: refcnt == 2, dispatch(&Release) ; after this, only IPC can manipulate
> the refcnt, what happens strictly only on the main thread, since no one
> keeps a reference except IPC now
> - MT: ActorDestroy: refcnt goes from 2 to 1 -> ipc closed - hence no
> TrySendDeletingChannel, refcnt != 0 - hence no |delete this|
> - no one can addref after this point (this was release of the IPC reference
> and the last reference release is on its way, no one has access to this
> anymore)
> - MT: processing the dispatched Release(): refcnt goes from 1 to 0 -> delete
> this

Yes, always do `Release` on main thread will make the refcount happy. I can try this approach first to see if we can solve this UAF with a less scary patch. However I believe HttpChannelChild needs some refactory to fully support concurrency.
Dispatch HttpChannelChild::Release to main thread if called on other threads.
Attachment #8941697 - Flags: review?(honzab.moz)
Comment on attachment 8941697 [details] [diff] [review]
bug1401459-release-on-main-thread.patch

Review of attachment 8941697 [details] [diff] [review]:
-----------------------------------------------------------------

thanks!

::: netwerk/protocol/http/HttpChannelChild.cpp
@@ +252,5 @@
> +                                               &HttpChannelChild::Release));
> +
> +    // Continue Release procedure if failed to dispatch to main thread.
> +    if (!NS_WARN_IF(NS_FAILED(rv))) {
> +      return mRefCnt;

you can't touch |this| after you have posted the event.  cache mRefCnt locally and return the cached valued (maybe subtracted by 1)
Attachment #8941697 - Flags: review?(honzab.moz) → feedback+
Comment on attachment 8941697 [details] [diff] [review]
bug1401459-release-on-main-thread.patch

Review of attachment 8941697 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/HttpChannelChild.cpp
@@ +252,5 @@
> +                                               &HttpChannelChild::Release));
> +
> +    // Continue Release procedure if failed to dispatch to main thread.
> +    if (!NS_WARN_IF(NS_FAILED(rv))) {
> +      return mRefCnt;

nice catch! will fix it and redo the test.
Comment on attachment 8941377 [details] [diff] [review]
wip-bug1401459.patch

dropping f? since we have a better patch coming
Attachment #8941377 - Flags: feedback?(honzab.moz)
update according to review comment.
Attachment #8941377 - Attachment is obsolete: true
Attachment #8941697 - Attachment is obsolete: true
Attachment #8942038 - Flags: review?(honzab.moz)
Duplicate of this bug: 1405818
Comment on attachment 8942038 [details] [diff] [review]
bug1401459-release-on-main-thread.patch

Review of attachment 8942038 [details] [diff] [review]:
-----------------------------------------------------------------

thanks!
Attachment #8942038 - Flags: review?(honzab.moz) → review+
Reporter

Comment 42

Last year
schien: We need Security Approval before landing don't we? https://wiki.mozilla.org/Security/Bug_Approval_Process
Shih-Chiang, can you request approval to land this on beta and release, please? The patch applies cleanly.
Flags: needinfo?(schien)
Comment on attachment 8942038 [details] [diff] [review]
bug1401459-release-on-main-thread.patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
Not likely.
Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
no
Which older supported branches are affected by this flaw?
release 57, 58 (supported branch after release 55)
If not all supported branches, which bug introduced the flaw?
bug 1320744
Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
Yes, the same patch can apply to all affected branches
How likely is this patch to cause regressions; how much testing does it need?
low, dispatching Release to main thread is a proven technique to solve XPCOM; Test on bughunter and treeherder should be enough
Attachment #8942038 - Flags: sec-approval?
(In reply to Bob Clary [:bc:] from comment #42)
> schien: We need Security Approval before landing don't we?
> https://wiki.mozilla.org/Security/Bug_Approval_Process

I was too excited to land the fix, a late sec_approval is submitted.
Reporter

Comment 47

Last year
I no longer see the compare_exchange_strong related asan failures on Nightly but do on Beta.

However the number and variety of asan heap use after free errors at mozilla::Runnable::Release that remain on Nightly do not give me confidence the entire issue is resolved. These existed prior to the patch landing going back to at least 2017-12-29 which is when I cleared bughunter's database due to the crash reporting collection issue.
Reporter

Comment 48

Last year
Yesterday, I retested all of my asan related errors since 2017-12-29 and found these mozilla::Runnable::Release() related signatures where there was not also an associated SEGV in asan.
Reporter

Comment 49

Last year
These mozilla::Runnable::Release() related signatures where there was also an associated SEGV in asan.
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #46)
> (In reply to Bob Clary [:bc:] from comment #42)
> > schien: We need Security Approval before landing don't we?
> > https://wiki.mozilla.org/Security/Bug_Approval_Process
> 
> I was too excited to land the fix, a late sec_approval is submitted.

You've just endangered our users by doing this. We have no betas left and are shipping Firefox 58 in a week with the release candidate being built in the next day or so. We now have to decide whether to take a fix into 58 with no betas left or to risk our users because the fix is public in trunk because you checked in without sec-approval. This is the WHOLE point that sec-approval is meant to mitigate against. PLEASE don't do this in the future.

Gerry, this fix was checked into trunk by the dev. We probably need to take it in 58. The patch is pretty small overall but this is not an ideal scenario.
Flags: needinfo?(rkothari)
Flags: needinfo?(gchang)
Blocks: 1320744
Flags: needinfo?(dveditz)
Given the sec rating, low-risk (tell me if that's a wrong assumption), landed in central for 3 days, I think we can include this in RC2. NI Julien, in case he will gtb RC2.
Flags: needinfo?(rkothari) → needinfo?(jcristau)
Comment on attachment 8942038 [details] [diff] [review]
bug1401459-release-on-main-thread.patch

Giving sec-approval+ based on Ritu's comment.
Attachment #8942038 - Flags: sec-approval? → sec-approval+
Comment on attachment 8942038 [details] [diff] [review]
bug1401459-release-on-main-thread.patch

Approval Request Comment
[Feature/Bug causing the regression]: 1320744
[User impact if declined]: random crash while browsing web page.
[Is this code covered by automated tests?]: yes by bughunter
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: no
[Is the change risky?]: no
[Why is the change risky/not risky?]: remove concurrency so that execution order is more predictable
[String changes made/needed]: no
Flags: needinfo?(schien)
Attachment #8942038 - Flags: approval-mozilla-release?
Attachment #8942038 - Flags: approval-mozilla-beta?
Comment on attachment 8942038 [details] [diff] [review]
bug1401459-release-on-main-thread.patch

Per comment #50, this is a sec-high and we need this in 58. Beta58+.
Flags: needinfo?(gchang)
Attachment #8942038 - Flags: approval-mozilla-release?
Attachment #8942038 - Flags: approval-mozilla-release+
Attachment #8942038 - Flags: approval-mozilla-beta?
Attachment #8942038 - Flags: approval-mozilla-beta+
Flags: needinfo?(jcristau)
Flags: qe-verify-
Whiteboard: [necko-triaged] → [necko-triaged][post-critsmash-triage]
Group: network-core-security → core-security-release
Whiteboard: [necko-triaged][post-critsmash-triage] → [necko-triaged][post-critsmash-triage][adv-main58+]
Group: core-security-release
See Also: → 1491030
You need to log in before you can comment on or make changes to this bug.