Closed Bug 1214675 Opened 4 years ago Closed 4 years ago

Crash in mozilla::camera::PCameras::Transition(mozilla::camera::PCameras::State, mozilla::ipc::Trigger, mozilla::camera::PCameras::State*)

Categories

(Core :: WebRTC: Audio/Video, defect, P2, critical)

defect

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox43 --- affected
firefox44 --- fixed
Blocking Flags:

People

(Reporter: adalucinet, Assigned: gcp)

Details

(Keywords: regression, reproducible)

Crash Data

Attachments

(1 file, 1 obsolete file)

Reproducible with latest 43.0a2 Developer Edition and 44.0a1 Nightly, both e10s enabled/disabled
Affected platform: Windows 7 64-bit, Windows 8 64-bit, Ubuntu 14.04 32-bit

Steps to reproduce:
1. Start Firefox and Click on Hello icon (chat bubble with smiley face).
2. Click on "Start a conversation" and share any window.
3. Restart/Close Firefox.

Expected result: No crash.

Actual result: bp-4b0b12bb-0cf9-40ee-8764-dd34a2151014

Additional info:
1. This signature showed up only at the first crash. The next signatures look like bug 1116884, although is reproducible with e10s disabled too:

bp-158d7b26-caa0-46ca-acc3-db7742151014
bp-0a3b3c90-bff5-4d5c-821c-a7c662151014
bp-32df93c4-53c7-48e7-b66a-b6d922151014
 
2. Not reproducible with 42 beta 6 nor with Mac OS X 10.10.4.
Component: Client → WebRTC: Audio/Video
Product: Hello (Loop) → Core
Crash Signature: [@ mozalloc_abort(char const* const) | NS_DebugBreak | mozilla::camera::PCameras::Transition(mozilla::camera::PCameras::State, mozilla::ipc::Trigger, mozilla::camera::PCameras::State*) ]
Version: unspecified → Trunk
(In reply to Alexandra Lucinet, QA Mentor [:adalucinet] from comment #1)
> Regression range (m-c):
> Last good: 2015-08-31
> First bad: 2015-09-01 → bp-43b61d54-32a1-4b05-ae09-a1dae2151019

Unfortunately that just tells us this started to fail when the feature landed.

Sounds like some sort of IPC shutdown hang triggering the shutdown watchdog (for the later ones).  The first appears to be here, in Transition():

    case __Dead:
        NS_RUNTIMEABORT("__delete__()d actor");
Flags: needinfo?(gpascutto)
backlog: --- → webrtc/webaudio+
Rank: 21
Priority: -- → P2
I suspect this is a duplicate of bug 1209987. The fix for that landed the same day this was reported, so it was probably not in. Can QA re-verify on latest Nightly?
Flags: needinfo?(gpascutto) → needinfo?(alexandra.lucinet)
(In reply to Gian-Carlo Pascutto [:gcp] from comment #3)
> I suspect this is a duplicate of bug 1209987. The fix for that landed the
> same day this was reported, so it was probably not in. Can QA re-verify on
> latest Nightly?

Still reproducible with latest 44.0a1 (from 2015-10-20) → bp-4bf78de6-2129-4e73-adbf-7b84f2151021
Flags: needinfo?(alexandra.lucinet)
This doesn't reproduce for me following the given STR with

Nightly on 64-bit Linux
Nightly on Windows 7 64-bit
Nightly on Windows 8.1 64-bit

I'm puzzled because comment 4 implies it's very reproducible.

I investigated the code and I see something suspect, but without a reproducible testcase it's going to be guesswork.
CloseEngines calls RecvStopCapture but the latter has is missing a check
for an alive IPC connection. I'll provide a try build as I can't reproduce
the original failure, but I'm reasonably confident the patch will solve it.
Attachment #8679495 - Flags: review?(rjesup)
Attachment #8679495 - Flags: review?(rjesup) → review+
Attachment #8679495 - Attachment is obsolete: true
Comment on attachment 8679986 [details] [diff] [review]
Factor out cleanup functions to avoid deadlock/dispatch-without-IPC

Review of attachment 8679986 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!
Attachment #8679986 - Flags: review?(rjesup) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/692dbb16a94976c90cd7700604bb02be54f3d20b
Bug 1214675 - Factor out cleanup functions to avoid deadlock/dispatch-without-IPC. r=jesup
Jesup, can you verify that this patch fixes the deadlock warnings you were seeing?
Flags: needinfo?(rjesup)
https://hg.mozilla.org/mozilla-central/rev/692dbb16a949
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
According to the reporter, this affects Firefox 43 (current beta). This is caused by bug 1104619. Unfortunately the fix in this bug also depends on the fixes for bug 1209987, which is very complicated, and would need to go onto beta as well (and probably everything inbetween, such as bug 1205164). I could do a simplified fix for beta, but that would be entirely untested code then.

The fact that the STR in the original report are such a basic Hello use case would make me think there's pressing reasons to uplift those patches anyway, but I was never able to reproduce them (see comment 5).

So without having any idea how broken Firefox 43 is in this regard, I can't even really make a judgement call whether to ask approval for mozilla-beta or not :-/
Flags: needinfo?(mreavy)
Gian-Carlo -- Can you do a risk analysis (based on what you know) of taking the patch in Fx43 vs not?  Basically, write an uplift request here in the bug without asking for uplift. I'll then circle with Jesup and decide if we really need this in Fx43 or not. One or both of us will ping you if we have any questions when we read your analysis.  Thanks!
Assignee: nobody → gpascutto
Flags: needinfo?(mreavy) → needinfo?(gpascutto)
[Feature/regressing bug #]: Bug 1104619 - Sandboxing/Proxy for video capture
[User impact if declined]: Potential hang on shutdown, likely requires a specific timing. I did not reproduce it on any of my machines, although from this bug obviously our QA does.
[Describe test coverage new/current, TreeHerder]: Our tests exercise the code but due to the likelihood of this being timing specific won't really tell us anything. Bug 1209987, the main dependency of this patch, landed on Nightly a month ago, and the patch here about 3 weeks ago.
[Risks and why]: The patch here requires a bunch of other patches that substantially modify the relevant code and are quite complex, and could cause other hangs/crashes during webrtc use.
[String/UUID change made/needed]: NA
Flags: needinfo?(gpascutto)
Thanks, Gian-Carlo, for your analysis.  I talked with Jesup, and we agree that (given risks vs rewards) we should just let this fix ride the trains.

Also, Jesup verifies that he's no longer seeing the deadlock warnings with this fix.
Flags: needinfo?(rjesup)
Confirming that the crash is no longer reproducible with latest 45.0a1 (from 2015-11-17), under Windows 7 64-bit and Ubuntu 12.04 32-bit. Although, latest 44.0a1 with e10s enable/disabled is still crashing under Windows 7, but not on Ubuntu, with the same STR as in comment 0.
Crash report: 
→ bp-b6285491-c5c0-479f-bdf4-d45ac2151118bp-7c281621-617c-45d4-9849-d37192151118bp-c8d0e367-71b9-416a-8fcb-82a332151118

Gian-Carlo, any ideas? It might happen because of the dependency bug(s) mentioned in comment 16?
Flags: needinfo?(gpascutto)
Sorry, it's 44.0a2 Aurora still crashing, not 44.0a1.
Unless I missed something, there are no patches I mentioned that are in Firefox 44 but not in 45. So I have no idea how 45 could be working, but not 44.

I guess the crashes are just a bit random and timing dependent, as already hypothesized in the past comments?

>https://crash-stats.mozilla.com/report/index/7c281621-617c-45d4-9849-d37192151118

This is an old build that predates this patch.
Flags: needinfo?(gpascutto)
(In reply to Gian-Carlo Pascutto [:gcp] from comment #20)

> I guess the crashes are just a bit random and timing dependent, as already
> hypothesized in the past comments?
No, it's not random nor timing dependent - every time I follow STR via comment 0, latest 44.0a2 crashes but with different signatures:
- bp-48e24e47-3bf1-49c8-8888-a18a82151119 (same as bug 1225266)
- bp-34c6b010-1566-468c-98eb-db6272151119 (same as bug 1225266)
- bp-a06cd444-5be9-443b-8835-975992151119

> >https://crash-stats.mozilla.com/report/index/7c281621-617c-45d4-9849-d37192151118
> 
> This is an old build that predates this patch.
Sorry, pasted the wrong link.
(In reply to Alexandra Lucinet, QA Mentor [:adalucinet] from comment #21)
> (In reply to Gian-Carlo Pascutto [:gcp] from comment #20)
> 
> > I guess the crashes are just a bit random and timing dependent, as already
> > hypothesized in the past comments?
> No, it's not random nor timing dependent - every time I follow STR via
> comment 0, latest 44.0a2 crashes but with different signatures:

But it's happening in 44 and not in 45?!

I don't understand at all and I can't reproduce this on any machine nor with any OS I have - and I tried a lot.

Is it possible to spell out the STR in as much exact detail as possible? For example '"Start a conversation" and share any window."' isn't really possible with Hello, there are some extra clicks and actions involved there. Which window or tab do you share. Are there any other tabs open? Empty profile?

We must be doing something subtly different if this reproduces 100% for you and never for me.
Flags: needinfo?(alexandra.lucinet)
(In reply to Gian-Carlo Pascutto [:gcp] from comment #22)
> (In reply to Alexandra Lucinet, QA Mentor [:adalucinet] from comment #21)
> > (In reply to Gian-Carlo Pascutto [:gcp] from comment #20)
> > 
> > > I guess the crashes are just a bit random and timing dependent, as already
> > > hypothesized in the past comments?
> > No, it's not random nor timing dependent - every time I follow STR via
> > comment 0, latest 44.0a2 crashes but with different signatures:
> 
> But it's happening in 44 and not in 45?!
Yes, 44.0a2 was mentioned above (please see comment 18 and comment 19).
 
> I don't understand at all and I can't reproduce this on any machine nor with
> any OS I have - and I tried a lot.
> 
> Is it possible to spell out the STR in as much exact detail as possible? For
> example '"Start a conversation" and share any window."' isn't really
> possible with Hello, there are some extra clicks and actions involved there.
> Which window or tab do you share. Are there any other tabs open? Empty
> profile?

STR:
1. Launch latest 44.0a2 with a clean profile.
2. Click on Hello icon.
3. Click on 'Get Started' button.
4. Click on 'Start a conversation' button as soon as Hello panel shows up.
5. In the newly opened conversation window, click on 'Share your screen' button and select 'Share other windows' option.
6. From the 'Window to share' dropdown select any listed window and click on 'Share Selected items' button.
7. Restart Firefox.
*maybe here try to restart via Developer toolbar.

Also reproduced with 2 different Windows 10 32-bit machines:
- bp-6a43ae34-3208-44cf-b85d-fdfdc2151119
- bp-529146d5-8b7d-4410-bfd0-7c3a62151119

Let me know if I can help more.
Flags: needinfo?(alexandra.lucinet)
I think I was able to reproduce this with 44, once. The "restart" command may be a critical point here.

Also, while trying to log/trace this I stumbled into bug 1218799. So that's a difference between 44 and 45 that may affect the reproducibility of this bug.
You need to log in before you can comment on or make changes to this bug.