Closed Bug 1365894 Opened 7 years ago Closed 7 years ago

AddressSanitizer: heap-use-after-free RefPtr.h:40:11 Release ~RefPtr ~SchedulerEventTarget (anonymous namespace)::SchedulerEventTarget::Release()

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- wontfix
firefox56 --- wontfix
firefox57 + fixed
firefox58 + fixed

People

(Reporter: cbook, Assigned: JamesCheng)

References

(Blocks 1 open bug, )

Details

(4 keywords, Whiteboard: [adv-main57+][post-critsmash-triage])

Attachments

(4 files, 3 obsolete files)

Attached file asan report
found by bughunter and working on reproduce this

Bughunter found this on 
https://www.horizon.tv/de_de/live-sender/live-channel.html/29438502968/location/97245734974

with ==17792==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d00000a7a0 at pc 0x7efe626d9611 bp 0x7efe5f72c5e0 sp 0x7efe5f72c5d8
dveditz, do you know who could take a look at this ?
Flags: needinfo?(dveditz)
Strange thing is that this crash seems to happen on ubuntu but no on fedora. So for reproducing this crash or fixing seems to require a ubuntu vm/system
This is crashing inside a GMP plugin. Since the site asks to have "DRM" enabled maybe it's in the linux CRM?

cpearce: any ideas on this one?

CC'ing billm because scheduler shows up in the stack, too.
Component: General → Audio/Video: Playback
Flags: needinfo?(dveditz) → needinfo?(cpearce)
I'd have thought that this code could only run on shutdown.

Is this a regression from bug 1346678? Maybe the GeckoMediaPluginServiceParent's reference to the main thread's event target is referring to something which has already been freed?
Flags: needinfo?(cpearce)
http://searchfox.org/mozilla-central/rev/24c443a440104dabd9447608bd41b8766e8fc2f5/xpcom/threads/SchedulerGroup.cpp#32

The crash happened in the destructor of SchedulerEventTarget when releasing |mDispatcher| (which is the only member of the RefPtr<> type I guess). I can't see how |mDispatcher| could be deleted despite SchedulerEventTarget keeping it alive.
Group: core-security → media-core-security
In the dupe, it looks like one of the stacks is hitting the MOZ_CRASH("MessageChannel destroyed without being closed") in MessageChannel::Clear, being called from ~PGMPContentParent().
bill, thoughts?
Flags: needinfo?(wmccloskey)
Nathan, is this something you could take a look at?
Flags: needinfo?(nfroyd)
What does "AddressSanitizer: nested bug in the same thread, aborting." mean?  That there was an additional problem when trying to print out more information, or that ASan had already found a bug prior to the use-after-free?

It would be nice if there was more information in the ASan report...but I am just as puzzled by the stack as JW, unless it's not |mDispatcher| that's being freed, but something else.
Flags: needinfo?(nfroyd)
(In reply to Nathan Froyd [:froydnj] from comment #10)
> What does "AddressSanitizer: nested bug in the same thread, aborting." mean?
> That there was an additional problem when trying to print out more
> information, or that ASan had already found a bug prior to the
> use-after-free?

A second bug in another thread is found while AddressSanitizer is reporting a bug (e.g., the user-after-free).
Looking at the AddressSanitizer source it seems they are afraid of deadlocks and races, so they don't report something meaningful for the second bug.

See https://chromium.googlesource.com/chromiumos/third_party/compiler-rt/+/59a9c97922c02a4cd76893a8d55614d5a3814d29/lib/asan/asan_report.cc#641 for more
Who can move this bug forward?  JW, Nathan, CPearce, bill?
Attached file asan stack
i maybe found a better testcase 

http://www.showtimeanytime.com/#/play/3418480 crashed according to bughunter with the same signature as this bug and i could reproduce this with latest m-c tinderbox debug build and latest ubuntu 17.4
Attached file new bughunter stack
also from the new url
Chris, can you take a look at this?
asan stack trace from comment 14 lists GMP seems useful.
Flags: needinfo?(cpearce)
(In reply to Frederik Braun [:freddyb] from comment #15)
> Chris, can you take a look at this?
> asan stack trace from comment 14 lists GMP seems useful.

I'm busy, but someone on bwu's team may be available.
Flags: needinfo?(cpearce) → needinfo?(bwu)
Component: Audio/Video: Playback → Audio/Video: GMP
James and Kilik, 
Can anyone of you have a look?
Flags: needinfo?(kikuo)
Flags: needinfo?(jacheng)
Flags: needinfo?(bwu)
IIUC,

[1] is the only place that creates "SchedulerEventTarget" and the "this" here is 

being recorded by "RefPtr<SchedulerGroup> SchedulerEventTarget::mDispatcher;"

By the comment [2], the concrete class should be SystemGroup(but truely it's SystemGroupImpl)

The SystemGroupImpl's AddRef and Release [3] are special which return a const value that means 

even mDispatcher records the instance, it will never delete the pointer by calling its Release().

[4] indicate it's a singleton and only be destroyed by [5].

so it seems

"SystemGroup::Shutdown()" is executing before "~GMPProcessParent()", so we access the dangling pointer

destroyed by "UniquePtr<SystemGroupImpl> sSingleton".

I'm not familiar with the process termination calling flow so I need someone to help indicating how to ensure the "~GMPProcessParent()" runs before "SystemGroup::Shutdown()".

Hi cpearce,
Could you please take a look of my comment and recommend me a right person for me? Hope we can make this bug move on. (Sorry for bothering you since you're busy)
Thanks.

[1]
http://searchfox.org/mozilla-central/rev/f0e4ae5f8c40ba742214e89aba3f554da0b89a33/xpcom/threads/SchedulerGroup.cpp#285

[2]
http://searchfox.org/mozilla-central/rev/30a47c4339bd397b937abdb2305f99d3bb537ba6/xpcom/threads/SchedulerGroup.h#39-41

[3]
http://searchfox.org/mozilla-central/rev/f0e4ae5f8c40ba742214e89aba3f554da0b89a33/xpcom/threads/SystemGroup.cpp#29,33

[4]
http://searchfox.org/mozilla-central/rev/f0e4ae5f8c40ba742214e89aba3f554da0b89a33/xpcom/threads/SystemGroup.cpp#40

[5]
http://searchfox.org/mozilla-central/rev/f0e4ae5f8c40ba742214e89aba3f554da0b89a33/xpcom/threads/SystemGroup.cpp#59
Flags: needinfo?(jacheng) → needinfo?(cpearce)
Flags: needinfo?(kikuo)
Flags: needinfo?(gsquelart)
By the comment 18,

I don't think it is necessary to use RefPtr<SchedulerGroup> to hold that instance.

Use raw pointer would be enough and the destructor has no chance to touch the destroyed instance.

Please review this patch or help me to transfer the review request to more appropriate person.

Thanks.

BTW, I don't know the intention why we use RefPtr to share the ownership with UniquePtr, if we can change this design we can solve this issue fundamentally.
Attachment #8895204 - Flags: review?(ehsan)
TabGroup is one of the concrete class of SchedulerGroup, so I need the check to let TabGroup well ref-counted.
Attachment #8895204 - Attachment is obsolete: true
Attachment #8895204 - Flags: review?(ehsan)
Attachment #8895213 - Flags: review?(ehsan)
This patch is quite a hack...  Bill, do you remember why <https://searchfox.org/mozilla-central/rev/0f16d437cce97733c6678d29982a6bcad49f817b/xpcom/threads/SystemGroup.cpp#40> needs to be a UniquePtr?  Couldn't we switch it to be a StaticRefPtr and make it have normal refcounting so that we don't have to resort to hacks like this?
Comment on attachment 8895213 [details] [diff] [review]
Bug-1365894-Skip-the-SchedulerGroupImpl-instance-inv.patch

Feedback?billm instead of needinfo since you can't set that flag twice.
Attachment #8895213 - Flags: feedback?(wmccloskey)
Yes, it's hacky... I hope we can use StaticRefPtr instead.
Comment on attachment 8895213 [details] [diff] [review]
Bug-1365894-Skip-the-SchedulerGroupImpl-instance-inv.patch

Review of attachment 8895213 [details] [diff] [review]:
-----------------------------------------------------------------

I think the right thing to do here is to go back to the original design for this code:
https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=1337537&attachment=8834672

That is, we can use StaticRefPtr and use normal refcounting functions (not the special singleton ones that return 2 and 1). That way, SystemGroupImpl should live until the last user of it is done with it.

Sorry I took so long to respond to this. I confused it with a different needinfo that's not as important.
Attachment #8895213 - Flags: feedback?(wmccloskey)
Ugh, I see now that the way the code is now is kind of my fault.  I'm so sorry about that.  :-(
Attachment #8895213 - Flags: review?(ehsan)
It sounds like this is under control, so I'm dropping my ni? request.
Flags: needinfo?(cpearce)
Flags: needinfo?(gsquelart)
So we will go back using StaticRefPtr for this?

Please mark the dependency once the new bug had been created or resolved.

Thanks.
https://hg.mozilla.org/mozilla-central/annotate/5322c03f4c8587fe526172d3f87160031faa6d75/xpcom/threads/SystemGroup.cpp#l40

indicates that the change is caused by bug 1337537, but in that bug it used StaticRefPtr...

Do I miss something? Which bug made it UniquePtr?
It landed as UniquePtr, ehsan was referring to a non-landed patch: https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=1337537&attachment=8834672
Flags: needinfo?(jacheng)
Thanks, it makes me confused.
Flags: needinfo?(jacheng)
James, any luck in pursuing this further?
Needinfo'ing James in case he didn't see Bob's question.
Flags: needinfo?(jacheng)
Unless we can get rid of UniquePtr [1] and use ref-counted smart pointer, we cannot solve this issue.

ni Ehsan and Bill to know their plan for fixing this.

I'm not sure if there is an technique difficulty using StaticRefPtr as comment 29 said.

Or adopt my workaround patch to handle this case.

[1]
http://searchfox.org/mozilla-central/rev/e76c0fee79a34a3612740b33508276f4b37c3ee8/xpcom/threads/SystemGroup.cpp#40
Flags: needinfo?(jacheng) → needinfo?(ehsan)
Not a GMP bug...
Rank: 7
Component: Audio/Video: GMP → XPCOM
Priority: P1 → --
Doesn't comment 24 answer your question?

> I think the right thing to do here is to go back to the original design for
> this code:
> https://bugzilla.mozilla.org/page.cgi?id=splinter.
> html&bug=1337537&attachment=8834672
> 
> That is, we can use StaticRefPtr and use normal refcounting functions (not
> the special singleton ones that return 2 and 1). That way, SystemGroupImpl
> should live until the last user of it is done with it.

I think we should fix this bug exactly as Bill suggested.
Flags: needinfo?(wmccloskey)
Flags: needinfo?(ehsan)
Please help to review this patch.
Thanks.
Attachment #8895213 - Attachment is obsolete: true
Attachment #8908515 - Flags: review?(ehsan)
Comment on attachment 8908515 [details] [diff] [review]
Make-SystemGroupImpl-be-a-normal-ref-cou.patch

Review of attachment 8908515 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thanks!

::: xpcom/threads/SystemGroup.cpp
@@ +27,2 @@
>  private:
> +  ~SystemGroupImpl() {}

Nit:

  ~SystemGroupImpl() = default;
Attachment #8908515 - Flags: review?(ehsan) → review+
Assignee: nobody → jacheng
carry r+ and fix nit.
thanks for reviewing.

try looks ok though there are some existing intermittent failures.
Attachment #8908515 - Attachment is obsolete: true
Attachment #8908675 - Flags: review+
Comment on attachment 8908675 [details] [diff] [review]
Make-SystemGroupImpl-be-a-normal-ref-cou.patch

Actually, I don't think this bug is sec-high. It just happened when closing firefox. And this is my first time handling sec-high bug, I think eliminate the severity to sec-low.

[Security approval request comment]
How easily could an exploit be constructed based on the patch? not easy.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? Yes

Which older supported branches are affected by this flaw? gecko 54

If not all supported branches, which bug introduced the flaw? bug 1337537

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? no risk, the uaf only occurred during firefox shutdown.

How likely is this patch to cause regressions; how much testing does it need? 
we make the object ref-counted to and it should live until the last user of it is done with it. The only way to test is to run AddressSanitizer again.
Attachment #8908675 - Flags: sec-approval?
BTW, Are all UAF bug classified into security bug (always mark as sec-high)?

Since the root cause of this bug is not related to security.
Note that bug 1400206 has reproducible STR that don't require any
external pages, just the Gecko profiler.
(In reply to James Cheng[:JamesCheng] from comment #39)
> Actually, I don't think this bug is sec-high. It just happened when closing firefox.

If the bughunter automated system can find it then it can happen to users. Is it global shut down or could it also happen at child process shutdown?

(In reply to James Cheng[:JamesCheng] from comment #40)
> BTW, Are all UAF bug classified into security bug (always mark as sec-high)?

Most of the browser 0-days and pwn2own contest exploits over the past decade or so (not just Firefox, browsers as a whole) have been based on use-after-free bugs. They're plentiful and attackers have gotten very good at plugging them into their exploit toolkits. We have to assume the worst.
This has missed the current release cycle (we're shipping in less than two weeks and making final builds next week). As such, this isn't going to get sec-approval to land until about a month from now (two weeks into the next cycle). At that point, we'll want a 57 (beta) patch nominated for this as well.
sec-approval for trunk checkin on October 9, two weeks into the cycle. Do not check this in before that date, please.

We'll want a beta patch made and nominated for 57 at that time.
Whiteboard: [checkin on 10/9]
Attachment #8908675 - Flags: sec-approval? → sec-approval+
Ok, I didn't handle sec-bug before, 

Do I need to manually mark "check-in-needed" on or after October 9?

Or it will automatically check-in since the whiteboard has been marked as "[checkin on 10/9]".

Thanks.
Flags: needinfo?(abillings)
(In reply to James Cheng[:JamesCheng] from comment #46)
> Or it will automatically check-in since the whiteboard has been marked as
> "[checkin on 10/9]".

RyanVM and maybe some of the sheriffs keep an eye on that whiteboard tag, so you probably shouldn't need to do anything.
Flags: needinfo?(abillings)
Can you create a patch for beta and request uplift, please? Thanks!  This can still make the 57 beta 8 build later this week.
Flags: needinfo?(jacheng)
https://hg.mozilla.org/mozilla-central/rev/3959033a3166

This grafts cleanly to Beta, so we just need an approval request on the existing patch.
Target Milestone: --- → mozilla58
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
(In reply to Ryan VanderMeulen [:RyanVM] from comment #50)
> https://hg.mozilla.org/mozilla-central/rev/3959033a3166
> 
> This grafts cleanly to Beta, so we just need an approval request on the
> existing patch.

As Ryan said, I will do a? by this patch instead.

Thank you.
Flags: needinfo?(jacheng)
Comment on attachment 8908675 [details] [diff] [review]
Make-SystemGroupImpl-be-a-normal-ref-cou.patch

Approval Request Comment
[Feature/Bug causing the regression]: 
The design of Bug 1337537 makes the leak.
[User impact if declined]: leak when closing.
[Is this code covered by automated tests?]: NA
[Has the fix been verified in Nightly?]:NA
[Needs manual test from QE? If yes, steps to reproduce]: 
we make the object ref-counted to and it should live until the last user of it is done with it. The only way to test is to run AddressSanitizer again.
[List of other uplifts needed for the feature/fix]:NA
[Is the change risky?]:NO
[Why is the change risky/not risky?]:make ref-counted instead of leak.
[String changes made/needed]:NA
Attachment #8908675 - Flags: approval-mozilla-beta?
Comment on attachment 8908675 [details] [diff] [review]
Make-SystemGroupImpl-be-a-normal-ref-cou.patch

Sec-high, Beta57+
Attachment #8908675 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Group: media-core-security → core-security-release
Whiteboard: [adv-main57+]
Flags: qe-verify-
Whiteboard: [adv-main57+] → [adv-main57+][post-critsmash-triage]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: