Closed Bug 1156182 Opened 5 years ago Closed 4 years ago

Intermittent browser_devices_get_user_media.js | application terminated with exit code -5

Categories

(Core :: Graphics: Layers, defect)

x86
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla43
Tracking Status
firefox40 --- wontfix
firefox41 --- wontfix
firefox42 --- fixed
firefox43 --- fixed
firefox-esr31 --- unaffected
firefox-esr38 --- unaffected

People

(Reporter: cbook, Assigned: nical)

References

()

Details

(Keywords: intermittent-failure, Whiteboard: [gfx-noted])

Attachments

(2 files)

Ubuntu VM 12.04 x64 mozilla-inbound opt test mochitest-browser-chrome-1

https://treeherder.mozilla.org/logviewer.html#?job_id=9049468&repo=mozilla-inbound

22:14:33 WARNING - TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_devices_get_user_media.js | application terminated with exit code -5
Looks like crashes while running gum tests. Is this related to bug 1166707?
Component: General → WebRTC: Audio/Video
Product: Firefox → Core
Crashes in Layers:
xul.dll!nsWindow::PreRender()
Component: WebRTC: Audio/Video → Graphics: Layers
This is interesting, it appears we're crashing while entering the mPresentLock Critical Section. When we look at the main thread we can see that we're currently waiting for the compositor to go down from the nsBaseWidget destructor. When this destructor is running the nsWindow destructor has already run and uninitialized the critical section.

As far as I can tell there is nothing preventing this race from happening and this is just a bug in our shutdown sequence. Presumably the Ubuntu bug that was in the first report is caused by a similar race although I cannot get to the report for that.

Nical, what do you think? It seems to be like perhaps destroying the compositor should be done before we destroy the critical section, i.e. on windows it would need to be run from the nsWindow destructor rather than from the nsBaseWidget destructor.
Flags: needinfo?(nical.bugzilla)
Whiteboard: [gfx-noted]
(In reply to Bas Schouten (:bas.schouten) from comment #11)
> Nical, what do you think? It seems to be like perhaps destroying the
> compositor should be done before we destroy the critical section, i.e. on
> windows it would need to be run from the nsWindow destructor rather than
> from the nsBaseWidget destructor.

Yes, I think that we should do that.
Flags: needinfo?(nical.bugzilla)
Actually, ~nsWindow() should have called Destroy() which calls DestroyLayerManager() which calls DestroyCompositor().

Unless mWnd was null before ~nsWindow but I don't see how it could have happened without Destroy() being called in the first place.
I am not sure why mWnd could be null while we have a layer tree and all, but calling Destroy() more than once will early-return after the first time, so let's play it safe.
Assignee: nobody → nical.bugzilla
Attachment #8628239 - Flags: review?(bas)
Attachment #8628239 - Flags: review?(bas) → review+
https://hg.mozilla.org/mozilla-central/rev/65c580e2cb9b
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Please nominate this for Aurora/Beta approval when you get a chance.
Flags: needinfo?(nical.bugzilla)
Comment on attachment 8628239 [details] [diff] [review]
unconditionally call Destroy() in ~nsWindow

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: A crash caught intermittently on tbpl on Windows. The volume doesn't appear very high but the fix is very simple.
[Describe test coverage new/current, TreeHerder]:
[Risks and why]: rather low, the patch is simple. The crash volume isn't too worrying as far as I can tell so we could still let the patch bake in nightly/aurora for a bit before we take it in beta.
[String/UUID change made/needed]:
Flags: needinfo?(nical.bugzilla)
Attachment #8628239 - Flags: approval-mozilla-beta?
Attachment #8628239 - Flags: approval-mozilla-aurora?
Reopening because there are two similar signatures here, one on linux and one on windows, and my patch only affects windows.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 8628239 [details] [diff] [review]
unconditionally call Destroy() in ~nsWindow

Approving for uplift to Aurora. This has been in m-c for a few days so seems like a stable fix.
Attachment #8628239 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Target Milestone: mozilla42 → ---
Comment on attachment 8628239 [details] [diff] [review]
unconditionally call Destroy() in ~nsWindow

We can take this fix on Beta as well. We'll want to keep status as affected so that we can follow up with the Linux fix. Beta+
Attachment #8628239 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
The interesting thing about the remaining ubuntu stack in this bug is that the main thread is waiting for the destruction of the compositor, in ~nsBaseWidget. But we should already have destroyed the compositor and layer manager in ~nsWindow. And took care of doing it early so that the underlying X/gtk/gl/whatever platform resources are destroyed after we make sure the compositor is done using them. So my guess is that if the wind comes from the wrong direction, we end up in a path where GetLayerManager gets called after we destroyed the compositor in ~nsWindow and before we attempt to destroy the compositor again in ~nsBaseWidget (which should always skip since we have already done it in ~nsWindow), causing the compositor to be re-created in the middle, probably with soon-to-be-destroyed or already-destroyed platform resources. See the discussion in bug 1133426 about reentrant madness causing the compositor to be recreated during shutdown.
This is a shot in the dark but even if it does not fix the issue, we should take it since it removes some of the mental overhead of thinking about all of the horrible things that could happen if GetLayerManager is called somewhere in the middle of the widget's shutdown.
Attachment #8661220 - Flags: review?(sotaro.ikeda.g)
Attachment #8661220 - Flags: review?(sotaro.ikeda.g) → review+
https://hg.mozilla.org/mozilla-central/rev/c6b0c072e68f
Status: REOPENED → RESOLVED
Closed: 5 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
nical: so  would need "Early-return GetLayerManager if we can see that we are shutting down" also an uplift to aurora ?
Flags: needinfo?(nical.bugzilla)
Comment on attachment 8661220 [details] [diff] [review]
Early-return GetLayerManager if we can see that we are shutting down

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: shutdown crashes
[Describe test coverage new/current, TreeHerder]: none
[Risks and why]: Rather low. The change is rather simple and it hasn't caused any issue on mozilla-central so far.
[String/UUID change made/needed]: none
Flags: needinfo?(nical.bugzilla)
Attachment #8661220 - Flags: approval-mozilla-aurora?
Comment on attachment 8661220 [details] [diff] [review]
Early-return GetLayerManager if we can see that we are shutting down

Fix a shutdown crash, taking it.
Attachment #8661220 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.