Move PCMedia ctor to after thread acquisition in PCImpl

VERIFIED FIXED in Firefox 22

Status

()

defect
--
critical
VERIFIED FIXED
7 years ago
6 years ago

People

(Reporter: ekr, Assigned: abr)

Tracking

({crash, testcase})

Trunk
mozilla23
x86
macOS
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox22+ fixed, firefox23 verified)

Details

(Whiteboard: [WebRTC][blocking-webrtc+])

Attachments

(3 attachments, 1 obsolete attachment)

No description provided.
Posted patch Fix thread crash (obsolete) — Splinter Review
* * *
Bug 856433. Fix thread crash--fix
Comment on attachment 732100 [details] [diff] [review]
Fix thread crash

Review of attachment 732100 [details] [diff] [review]:
-----------------------------------------------------------------

carrying forward r+ from ehugg when this was bug 856433
Attachment #732100 - Flags: review+
Blocking+ as bug 856433 blocked as well
Whiteboard: [WebRTC][blocking-webrtc+]
Assignee: nobody → ekr
Priority: -- → P1
Whiteboard: [WebRTC][blocking-webrtc+] → [WebRTC][blocking-webrtc+][webrtc-uplift]
Can we land the patch on this bug when we land TURN support (Bug 786235)?
Flags: needinfo?(ekr)
Blocks: 860143
Assignee: ekr → adam
The patch applies cleanly, albeit with offsets, and passes unit, mochi, and crash tests. However, when running with the test case from bug 860143, I have once (and only once, out of about a dozen attempts) had it assert when attempting to dispatch to the STS thread.

Notably, in examining the STS thread, you'll see:

  mThread = {
    mRawPtr = 0x0
  }, 

Which I take to mean that the STS thread is gone at this point.

I think the solution here is to make RUN_ON_THREAD resilient to NS_ERROR_NOT_INITIALIZED.
It's probably worth noting that, the one time it did trigger, I was doing repeated reloads of the test case, and then (rather quickly after a reload) shut down the browser with Command-Q. I suspect that's relevant.
Attachment #732100 - Attachment is obsolete: true
Comment on attachment 735967 [details] [diff] [review]
Fix thread crash

ekr -- All I need a review on is the change in runnable_utils.h; the rest is unchanged since ehugg's review. See Comment 5 for rationale.
Attachment #735967 - Flags: review?(ekr)
Comment on attachment 735967 [details] [diff] [review]
Fix thread crash

Carrying forward ehugg's r+
Attachment #735967 - Flags: review+
Flags: needinfo?(ekr)
Excerpts from an IRC discussion on the RUN_ON_THREAD changes:

[10:49am] ekr: I'm not sold that this patch is right

[10:49am] abr: The change to RUN_ON_THREAD?

[10:49am] ekr: So, we want these things to happen.

[10:50am] ekr: and it's not clear to me it's cool to have them just fail

...

[10:50am] abr: Okay. And if the thread is gone, should we just run them on the current thread?

[10:50am] ekr: I am not sure.

[10:50am] ekr: it's a real problem

[10:50am] ekr: The whole shutdown thing is a nightmare

[10:50am] abr: In practice, the only way the threads we're currently using this for (main and sts) could be gone is if (1) the system is shutting down, or (2) the system is messed up beyond description.

[10:51am] ekr: yeah

[10:51am] ekr: So, is it better to have a silent fail or an assert?

[10:51am] ekr: I mean, I bet we are assuming that these succeed

[10:51am] abr: In (1), failing to run this stuff might leak resources. I guess that's bad from an orangefactor perspective.

[10:51am] ekr: so it could easily be creating some sort of "can't happen"

[10:51am] abr: In (2), of course, it doesn't matter.

[10:52am] ekr: But say we have a dispatch for an operation that's supposed to happen and we depend on that

[10:52am] abr: I would think a silent fail? The ensure_success will emit a logging message.

[10:52am] ekr: it might

[10:52am] ekr: i mean, it will emit a logging message

[10:52am] ekr: But suppose it creates some later crash


[10:53am] abr: Are you assuming (1) or (2) above?

[10:53am] ekr: 1

[10:53am] abr: I'm thinking through what that kind of crash might look like.

[10:53am] ekr: not sure...

...

[10:59am] abr: Okay. So, from a practical perspective, I see a handful of choices. (1) Land this patch as-is, which might let some resources hit the floor in a shutdown situation; (2) Remove the RUN_ON_THREAD changes, which may (and probably will) lead to intermittent oranges; (3) Change the patch to run the code on current thread if the target thread is gone (which will require modification to several thread-check asserts); or (4) Try to ferret out and fix the root cause of the STS thread going away before the PC is torn down (which, now that I think about it, is going to involve the relative shutdown orders of GC/CC, Main, and STS).

[11:01am] abr: Actually, that's ineresting. The real sequencing problem here is that STS is torn down before GC/CC finishes its job.

...

[11:06am] abr: Yeah, none of those 4 seem ideal.

[11:06am] ekr: I guess your patch is probably the best choice, but I really don't love it

[11:08am] abr: Understood. I'll add something to Bug 829591 describing this situation, and maybe when we fix that issue (if doing so is even possible), we can take this into account and avoid it happening at all.
Comment on attachment 735967 [details] [diff] [review]
Fix thread crash

r+ from ekr via IRC
Attachment #735967 - Flags: review?(ekr) → review+
re-trying after replacing ASSERT_TRUE with EXPECT_TRUE in constructor:

https://tbpl.mozilla.org/?tree=Try&rev=47b937880eca
Targeted to ship for Fx22, tracking to get on rel mgmt radar.
That push worked just fine, so here we go again:

https://hg.mozilla.org/integration/mozilla-inbound/rev/70f7ccdb5a98
Duplicate of this bug: 861048
Flags: in-testsuite?
Group: qualcomm-confidential
> Group: qualcomm-confidential

Um... What?
Flags: needinfo?(akeybl)
That was probably unintended. Removing.
Group: qualcomm-confidential
Flags: needinfo?(akeybl)
(In reply to Jason Smith [:jsmith] from comment #19)
> That was probably unintended. Removing.

Yep, unintentional thanks.
Severity: normal → critical
Keywords: crash, testcase
https://hg.mozilla.org/mozilla-central/rev/70f7ccdb5a98
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: in-testsuite? → in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Whiteboard: [WebRTC][blocking-webrtc+][webrtc-uplift] → [WebRTC][blocking-webrtc+][webrtc-uplift][qa-]
Was verified and confirmed that this patch fixes the crash in the dupe.
Status: RESOLVED → VERIFIED
Whiteboard: [WebRTC][blocking-webrtc+][webrtc-uplift][qa-] → [WebRTC][blocking-webrtc+][webrtc-uplift]
Comment on attachment 735967 [details] [diff] [review]
Fix thread crash

[Approval Request Comment]
Bug caused by (feature/regressing bug #): N/A

User impact if declined: intermittent (null de-ref) crashes and debug MOZ_ASSERT crashes

Testing completed (on m-c, etc.): On m-c.  ALso used to verify that several other crash bugs reported were actually duplicates of this one. (Not all are marked as dups of this, as some were other bugs that triggered the RUN_ON_THREAD MOZ_ASSERT() as a side-effect.

Risk to taking this patch (and alternatives if risky): Low; moves initialization of some important thread pointers to occur before they can used for Dispatch()es

String or IDL/UUID changes made by this patch: none
Attachment #735967 - Flags: approval-mozilla-aurora?
Triage Comment: 

Since we've left the other untracked TURN bugs un-approved on this round, we'll hold off here just for now, it can be approved if/when bug 786235 and bug 855769 are landed.
This is not a TURN bug, it's just behind TURN in the commit history.
Eric is 100% correct. This problem can and has shown up in a variety of non-TURN scenarios (most notably, if you simply replace "turn" with "stun," the problem remains). Unless we uplift this patch prior to shipping, we're going to end up with a situation that makes it very easy to crash the browser from javascript, both accidentally and maliciously.
Comment on attachment 735967 [details] [diff] [review]
Fix thread crash

Thanks for clarifying, we can take this uplift then since it will not impact whether we enable TURN support in FF22.
Attachment #735967 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Up on Aurora, removing webrtc-uplift
Whiteboard: [WebRTC][blocking-webrtc+][webrtc-uplift] → [WebRTC][blocking-webrtc+]
Priority: P1 → --
Whiteboard: [WebRTC][blocking-webrtc+] → [WebRTC][blocking-webrtc+][webrtc-uplift]
(In reply to Ryan VanderMeulen [:RyanVM] from comment #30)
> Backed out for bustage. Going to leave WebRTC uplifts for the pros from now
> on...

Even though this isn't a TURN bug, it does need to land on top of the TURN patches (or be ported to Aurora without those patches, which I would not recommend).
Comment on attachment 739874 [details] [diff] [review]
Fix thread crash (patch ported to Aurora)

This is the m-c patch ported to Aurora; the only non-mechanical merge change is changing addStunServer to addServer.

With the signaling_unittest patch from bug 856644 (which we should also uplift), all signaling unittests pass.

spraying the r? request because I'm not sure who's available.
Attachment #739874 - Flags: review?(ethanhugg)
Attachment #739874 - Flags: review?(ekr)
Attachment #739874 - Flags: review?(adam)
Comment on attachment 739874 [details] [diff] [review]
Fix thread crash (patch ported to Aurora)

Review of attachment 739874 [details] [diff] [review]:
-----------------------------------------------------------------

r+ based on comparing it to previously reviewed M-C version.
Attachment #739874 - Flags: review?(ethanhugg) → review+
Comment on attachment 739874 [details] [diff] [review]
Fix thread crash (patch ported to Aurora)

Carrying forward a=lsblakk on the un-bitrot for Aurora though I can't mark it.
Attachment #739874 - Flags: review?(ekr)
Attachment #739874 - Flags: review?(adam)
https://hg.mozilla.org/releases/mozilla-aurora/rev/a582bade4e53
Whiteboard: [WebRTC][blocking-webrtc+][webrtc-uplift] → [WebRTC][blocking-webrtc+]
(In reply to Jason Smith [:jsmith] from comment #22)
> Was verified and confirmed that this patch fixes the crash in the dupe.

Jason, is it safe to mark the status flags verified?
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #37)
> (In reply to Jason Smith [:jsmith] from comment #22)
> > Was verified and confirmed that this patch fixes the crash in the dupe.
> 
> Jason, is it safe to mark the status flags verified?

For 23, yeah.
You need to log in before you can comment on or make changes to this bug.