Closed
Bug 1374164
Opened 7 years ago
Closed 7 years ago
Intermittent Main app process exited normally | application crashed [@ mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::~Log]
Categories
(Core :: Graphics, defect, P1)
Core
Graphics
Tracking
()
RESOLVED
FIXED
mozilla57
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox55 | --- | unaffected |
firefox56 | --- | fixed |
firefox57 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: jesup)
References
Details
(Keywords: crash, intermittent-failure, Whiteboard: [stockwell fixed:product])
Crash Data
Attachments
(2 files, 2 obsolete files)
1.79 KB,
patch
|
jib
:
review+
|
Details | Diff | Splinter Review |
9.14 KB,
patch
|
jib
:
review+
lizzard
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
Filed by: philringnalda [at] gmail.com https://treeherder.mozilla.org/logviewer.html#?job_id=108025253&repo=mozilla-central https://queue.taskcluster.net/v1/task/Er8MrMolS_GWWhXMZCreOA/runs/0/artifacts/public/logs/live_backing.log
Comment 1•7 years ago
|
||
[task 2017-06-18T19:35:58.600574Z] 19:35:58 INFO - GECKO(2930) | Assertion failure: [GFX1]: Texture deallocated too late during shutdown, at /home/worker/workspace/build/src/gfx/2d/Logging.h:519 [task 2017-06-18T19:36:31.027046Z] 19:36:31 INFO - GECKO(2930) | #01: mozilla::layers::DeallocateTextureClient [gfx/layers/client/TextureClient.cpp:326] [task 2017-06-18T19:36:31.028023Z] 19:36:31 INFO - [task 2017-06-18T19:36:31.028093Z] 19:36:31 INFO - GECKO(2930) | #02: mozilla::layers::TextureClient::Destroy [gfx/layers/client/TextureClient.cpp:423] [task 2017-06-18T19:36:31.028707Z] 19:36:31 INFO - [task 2017-06-18T19:36:31.029412Z] 19:36:31 INFO - GECKO(2930) | #03: mozilla::layers::TextureClient::~TextureClient [gfx/layers/client/TextureClient.cpp:601] [task 2017-06-18T19:36:31.029959Z] 19:36:31 INFO - [task 2017-06-18T19:36:31.030512Z] 19:36:31 INFO - GECKO(2930) | #04: mozilla::layers::TextureClient::~TextureClient [gfx/layers/client/TextureClient.cpp:605] [task 2017-06-18T19:36:31.031038Z] 19:36:31 INFO - [task 2017-06-18T19:36:31.031589Z] 19:36:31 INFO - GECKO(2930) | #05: mozilla::AtomicRefCountedWithFinalize<mozilla::layers::TextureClient>::Release [gfx/layers/AtomicRefCountedWithFinalize.h:138] [task 2017-06-18T19:36:31.032138Z] 19:36:31 INFO - [task 2017-06-18T19:36:31.032742Z] 19:36:31 INFO - GECKO(2930) | #06: mozilla::layers::TextureClientHolder::Release [memory/mozalloc/mozalloc.h:218] [task 2017-06-18T19:36:31.032827Z] 19:36:31 INFO - [task 2017-06-18T19:36:31.033400Z] 19:36:31 INFO - GECKO(2930) | #07: mozilla::layers::TextureClientRecycleAllocator::Destroy [gcc/include/c++/4.9.4/bits/stl_stack.h:218] [task 2017-06-18T19:36:31.033943Z] 19:36:31 INFO - [task 2017-06-18T19:36:31.034497Z] 19:36:31 INFO - GECKO(2930) | #08: mozilla::layers::CompositableClient::Destroy [gfx/layers/client/CompositableClient.cpp:85] [task 2017-06-18T19:36:31.034669Z] 19:36:31 INFO - [task 2017-06-18T19:36:31.035345Z] 19:36:31 INFO - GECKO(2930) | #09: mozilla::layers::CompositableClient::~CompositableClient [mfbt/RefPtr.h:77] [task 2017-06-18T19:36:31.036881Z] 19:36:31 INFO - [task 2017-06-18T19:36:31.037591Z] 19:36:31 INFO - GECKO(2930) | #10: mozilla::layers::ImageClientSingle::~ImageClientSingle [gfx/layers/client/ImageClient.h:87] [task 2017-06-18T19:36:31.038136Z] 19:36:31 INFO - [task 2017-06-18T19:36:31.039653Z] 19:36:31 INFO - GECKO(2930) | #11: mozilla::layers::CompositableClient::Release [gfx/layers/client/CompositableClient.h:75] [task 2017-06-18T19:36:31.040193Z] 19:36:31 INFO - [task 2017-06-18T19:36:31.041120Z] 19:36:31 INFO - GECKO(2930) | #12: mozilla::layers::ImageContainer::~ImageContainer [gfx/layers/ImageContainer.cpp:183] [task 2017-06-18T19:36:31.041697Z] 19:36:31 INFO - [task 2017-06-18T19:36:31.042368Z] 19:36:31 INFO - GECKO(2930) | #13: RefPtr<mozilla::layers::ImageContainer>::~RefPtr [memory/mozalloc/mozalloc.h:218] [task 2017-06-18T19:36:31.042905Z] 19:36:31 INFO - [task 2017-06-18T19:36:31.043777Z] 19:36:31 INFO - GECKO(2930) | #14: mozilla::MediaEngineCameraVideoSource::~MediaEngineCameraVideoSource [dom/media/webrtc/MediaEngineCameraVideoSource.h:75] [task 2017-06-18T19:36:31.044408Z] 19:36:31 INFO - [task 2017-06-18T19:36:31.045265Z] 19:36:31 INFO - GECKO(2930) | #15: mozilla::MediaEngineRemoteVideoSource::~MediaEngineRemoteVideoSource [dom/media/webrtc/MediaEngineRemoteVideoSource.h:98] [task 2017-06-18T19:36:31.045883Z] 19:36:31 INFO - [task 2017-06-18T19:36:31.046789Z] 19:36:31 INFO - GECKO(2930) | #16: mozilla::MediaEngineRemoteVideoSource::Release [dom/media/webrtc/MediaEngineRemoteVideoSource.cpp:28] [task 2017-06-18T19:36:31.047399Z] 19:36:31 INFO - [task 2017-06-18T19:36:31.048459Z] 19:36:31 INFO - GECKO(2930) | #17: mozilla::MediaDevice::~MediaDevice [xpcom/string/nsTString.h:20] [task 2017-06-18T19:36:31.049133Z] 19:36:31 INFO - [task 2017-06-18T19:36:31.050080Z] 19:36:31 INFO - GECKO(2930) | #18: mozilla::VideoDevice::~VideoDevice [dom/media/MediaManager.h:112] [task 2017-06-18T19:36:31.050801Z] 19:36:31 INFO - [task 2017-06-18T19:36:31.051666Z] 19:36:31 INFO - GECKO(2930) | #19: mozilla::MediaDevice::Release [dom/media/MediaManager.cpp:704] [task 2017-06-18T19:36:31.052364Z] 19:36:31 INFO -
Comment 2•7 years ago
|
||
Blake, it seems like the resource that held on media side was deallocated after shutdown. Can you have someone to check it?
Flags: needinfo?(bwu)
Comment 3•7 years ago
|
||
Ni Randell since this is related to WebRTC.
Flags: needinfo?(bwu) → needinfo?(rjesup)
Assignee | ||
Comment 4•7 years ago
|
||
Sigh.... Something is holding a MediaDevice from some object in a CC situation it appears, and the "final-cc" is after we shut down gfx (which causes a lot of problems, btw). #18: mozilla::VideoDevice::~VideoDevice [dom/media/MediaManager.h:112] #19: mozilla::MediaDevice::Release [dom/media/MediaManager.cpp:704] <called from CC code in final shutdown - nsCycleCollector::Shutdown> Given I don't think we can know/stop people from doing this, we'll need to neuter all video sources on Shutdown().
Flags: needinfo?(rjesup)
Assignee | ||
Comment 5•7 years ago
|
||
Speculative fix; we know somehow mImage isn't null at final-cc here.
Attachment #8879633 -
Flags: review?(padenot)
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → rjesup
Status: NEW → ASSIGNED
Assignee | ||
Updated•7 years ago
|
Rank: 15
Priority: -- → P1
Updated•7 years ago
|
Attachment #8879633 -
Flags: review?(padenot) → review+
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 10•7 years ago
|
||
this is linux debug in mochitest-media-e10s tests. In the last week this has doubled in the number of failures and we should be taking a look at this. There is a patch with r+, can we just land this?
Flags: needinfo?(padenot)
Whiteboard: [stockwell needswork]
Comment 11•7 years ago
|
||
FWIW I seem able to provoke this same assert quite reliably with my local patch set for bug 1088621: https://treeherder.mozilla.org/#/jobs?repo=try&revision=08d27ef79fc105be8711d300cc5e738dd41ccbd6&selectedJob=111305606 I then added a fix that seemed to help avoid late refs messing up GC for that particular case: https://treeherder.mozilla.org/#/jobs?repo=try&revision=23b79e7e8b76b8b17d87f59cfea2f5427c7be3bc&selectedJob=111396466 I'll try to rebase those and attach them to a new bug and link it here, in case it's helpful.
Comment 12•7 years ago
|
||
clearing NI, per comment 11, jib will handle this.
Flags: needinfo?(padenot)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 20•7 years ago
|
||
(In reply to Paul Adenot (:padenot) from comment #12) > clearing NI, per comment 11, jib will handle this. Much thanks for jib for work on bug 1382095, but failures continue, and may have actually increased in frequency since that change.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 25•7 years ago
|
||
:jesup - Are you working on this? Making progress? This is one of our most frequent failures.
Flags: needinfo?(rjesup)
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 27•7 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #25) > :jesup - Are you working on this? Making progress? This is one of our most > frequent failures. It's a speculative fix.... and bandaid on something that shouldn't be happening. I'd like to at least put out a warning when we hit the "bad" case. We still should never be getting to this point. Pushing a Try before landing
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 30•7 years ago
|
||
jib: I just added the 'Unused << NS_WARN_IF...', because I don't want to totally hide this (but want to resolve the orange since it causes pain)
Attachment #8896962 -
Flags: review?(jib)
Assignee | ||
Updated•7 years ago
|
Attachment #8879633 -
Attachment is obsolete: true
Updated•7 years ago
|
Attachment #8896962 -
Flags: review?(jib) → review+
Comment 31•7 years ago
|
||
Pushed by rjesup@wgate.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/3f44f13e8021 guarantee mImage is released by video sources before final-cc shutdown r=padenot,jib
Assignee | ||
Comment 32•7 years ago
|
||
Pushed the wallpaper patch... This doesn't really fix the bug, but demotes it to a warning. Leave-open
Assignee: rjesup → nobody
Severity: critical → normal
Status: ASSIGNED → NEW
Rank: 15 → 18
Flags: needinfo?(rjesup)
Keywords: leave-open
Whiteboard: [stockwell needswork]
Comment 33•7 years ago
|
||
Thanks! That will clear up a lot of orange on treeherder.
Comment hidden (Intermittent Failures Robot) |
Comment 35•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/3f44f13e8021
Comment hidden (Intermittent Failures Robot) |
Comment 37•7 years ago
|
||
This is still happening. Is there anything preventing an actual investigation of this? It seems to happen frequently enough on the linux64-qr M-e10s(mda2) job that it should be debuggable via try pushes even if you can't reproduce locally.
Comment 38•7 years ago
|
||
Yeah, it looks like the wallpaper patch had no effect on failure frequency. This remains the #1 most frequent test failure.
Whiteboard: [stockwell needswork]
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 40•7 years ago
|
||
(In reply to Geoff Brown [:gbrown] (pto Aug 19 - Aug 27) from comment #38) > Yeah, it looks like the wallpaper patch had no effect on failure frequency. > This remains the #1 most frequent test failure. AHA! this is failing on mImageContainer not being nulled, not mImage! Easy fix.
Assignee | ||
Comment 41•7 years ago
|
||
Attachment #8898281 -
Flags: review?(jib)
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → rjesup
Status: NEW → ASSIGNED
Comment 42•7 years ago
|
||
Comment on attachment 8898281 [details] [diff] [review] Clear mImageContainer as well as mImage when stopping a gUM stream Review of attachment 8898281 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/media/webrtc/MediaEngineCameraVideoSource.h @@ -71,5 @@ > - // somehow.) (Bug 1374164) > - > - Unused << NS_WARN_IF(mImage); > - > - mImage = nullptr; I see you're backing off of clearing mImage here. Note that this is a different mImage from that in MediaEngineDefault.h. See bug 1391278. I think we should still clear mImage and mImageContainer here. This is a hard requirement of the gfx code, and I don't know that we can guarantee that MediaEngine references are not held too long in all cases, or that stop will always have been called (may be the case). Can we do both, MOZ_ASSERT AND clear?
Attachment #8898281 -
Flags: review?(jib) → review-
Comment 43•7 years ago
|
||
Comment on attachment 8898281 [details] [diff] [review] Clear mImageContainer as well as mImage when stopping a gUM stream From #media, we'd "rather let it be caught by the gfx check than hide it by clearing it", so switching to r+
Attachment #8898281 -
Flags: review- → review+
Comment 44•7 years ago
|
||
Pushed by rjesup@wgate.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/e9e36f27b698 Clear mImageContainer as well as mImage when stopping a gUM stream r=jib
Comment 45•7 years ago
|
||
Backed out for failing e.g. mda's test_getUserMedia_mediaElementCapture_tracks.html and browser-chrome's browser_devices_get_user_media_unprompted_access.js: https://hg.mozilla.org/integration/mozilla-inbound/rev/806d980dba310e653175973bd7f7e1c31b73c2d1 A push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=05358f0058a7a711dce865ea74af6ff451686657&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable Failure log mda: https://treeherder.mozilla.org/logviewer.html#?job_id=123953587&repo=mozilla-inbound GECKO(3163) | Assertion failure: mRawPtr != nullptr (You can't dereference a NULL RefPtr with operator->().), at /home/worker/workspace/build/src/obj-firefox/dist/include/mozilla/RefPtr.h:315 GECKO(3163) | Assertion failure: mDestroyed, at /home/worker/workspace/build/src/gfx/layers/composite/ImageLayerComposite.cpp:44 TEST-UNEXPECTED-FAIL | dom/media/tests/mochitest/test_getUserMedia_mediaElementCapture_tracks.html | application terminated with exit code 11 Failure log bc: https://treeherder.mozilla.org/logviewer.html#?job_id=123953681&repo=mozilla-inbound GECKO(1734) | Assertion failure: mRawPtr != nullptr (You can't dereference a NULL RefPtr with operator->().), at /home/worker/workspace/build/src/obj-firefox/dist/include/mozilla/RefPtr.h:315 TEST-UNEXPECTED-FAIL | browser/base/content/test/webrtc/browser_devices_get_user_media_unprompted_access.js | Test timed out -
Flags: needinfo?(rjesup)
Comment 46•7 years ago
|
||
Backout by kwierso@gmail.com: https://hg.mozilla.org/mozilla-central/rev/6613ef541bb1 Backed out changeset e9e36f27b698 for failing e.g. mda's test_getUserMedia_mediaElementCapture_tracks.html and browser-chrome's browser_devices_get_user_media_unprompted_access.js. r=backout
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 48•7 years ago
|
||
Also removes hiding of mImage/mImageContainer/mMonitor by MediaEngineDefault, general cleanup of MediaEngineDefault shutdown, and fixing locking around access to protected members.
Attachment #8898834 -
Flags: review?(jib)
Comment 49•7 years ago
|
||
Comment on attachment 8898834 [details] [diff] [review] Clear mImageContainer as well as mImage when stopping a gUM stream Review of attachment 8898834 [details] [diff] [review]: ----------------------------------------------------------------- Lgtm. One question on whether we need to lock twice. ::: dom/media/webrtc/MediaEngineRemoteVideoSource.cpp @@ +435,2 @@ > // implicitly releases last image > mImage = image.forget(); Just wondering if there are ways to avoid locking twice here. Since borh image and mImage are refptrs, couldn't we set mImage ahead of CopyData instead under the first lock? I may not understand the guarantees here, but wanted to point that out, as a potential optimization.
Attachment #8898834 -
Flags: review?(jib) → review+
Assignee | ||
Comment 50•7 years ago
|
||
> ::: dom/media/webrtc/MediaEngineRemoteVideoSource.cpp
> @@ +435,2 @@
> > // implicitly releases last image
> > mImage = image.forget();
>
> Just wondering if there are ways to avoid locking twice here. Since borh
> image and mImage are refptrs, couldn't we set mImage ahead of CopyData
> instead under the first lock? I may not understand the guarantees here, but
> wanted to point that out, as a potential optimization.
I'm locking twice to avoid holding it on a large memcpy/etc. mImage needs a lock to access/modify, since it's touched from multiple threads. If we copy data outside the lock, after setting mImage, then the MSG may come along and grab mImage and send to the compositor before we've finished the copy.
Flags: needinfo?(rjesup)
Assignee | ||
Comment 51•7 years ago
|
||
Green try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=842702adc7a92390cdc4be3f4c1de3192c24a311
Comment 52•7 years ago
|
||
Pushed by rjesup@wgate.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/4d6210358c49 Clear mImageContainer as well as mImage when stopping a gUM stream r=jib
Comment 54•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/4d6210358c49
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Whiteboard: [stockwell needswork] → [stockwell fixed:product]
Comment hidden (Intermittent Failures Robot) |
Comment 57•7 years ago
|
||
:jesup, this has so many failures in the last week, almost all on linux64-qr, but we still see issues on regular linux- can you look into this some more, or should we file a new bug?
Flags: needinfo?(rjesup)
Whiteboard: [stockwell fixed:product] → [stockwell needswork]
Comment 58•7 years ago
|
||
imeantbrasstacks |
Seems fixed to me, the last few on basstacks are on builds without the fix.
Comment 59•7 years ago
|
||
oh, thanks for pointing out my oversight! After the 18th (when the fix was merged everywhere) we just have a few spots on try/beta. sorry for the confusion
Flags: needinfo?(rjesup)
Whiteboard: [stockwell needswork] → [stockwell fixed:product]
Updated•7 years ago
|
Attachment #8898281 -
Attachment is obsolete: true
Comment 60•7 years ago
|
||
This grafts cleanly to Beta and OrangeFactor seems to indicate that the latest fix worked. Please nominate these patches for Beta approval when you get a chance.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox55:
--- → unaffected
status-firefox56:
--- → affected
status-firefox57:
--- → fixed
status-firefox-esr52:
--- → unaffected
Flags: needinfo?(rjesup)
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Assignee | ||
Comment 61•7 years ago
|
||
Comment on attachment 8898834 [details] [diff] [review] Clear mImageContainer as well as mImage when stopping a gUM stream Approval Request Comment [Feature/Bug causing the regression]: Timing changes especially with QuantumRender [User impact if declined]: Oranges in automation, possible shutdown crashes [Is this code covered by automated tests?]: yes [Has the fix been verified in Nightly?]: yes [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: no [Why is the change risky/not risky?]: simple changes it add locking, and avoid shadowing the monitor [String changes made/needed]: none
Flags: needinfo?(rjesup)
Attachment #8898834 -
Flags: approval-mozilla-beta?
Comment 62•7 years ago
|
||
Comment on attachment 8898834 [details] [diff] [review] Clear mImageContainer as well as mImage when stopping a gUM stream Possible crash fix, should help with intermittent orange issues in tests. Please uplift for tomorrow's beta 6 build.
Attachment #8898834 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Assignee | ||
Comment 63•7 years ago
|
||
uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/84ec3c9fb45645b45ee29ec96877e3c5973cae2b https://hg.mozilla.org/releases/mozilla-beta/rev/c9f31d1cd425266f7a7440b09dc03e2fb4dcf1de
Comment 64•7 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #61) > [Is this code covered by automated tests?]: yes > > [Has the fix been verified in Nightly?]: yes > > [Needs manual test from QE? If yes, steps to reproduce]: no Setting qe-verify- based on Randell's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•