Closed
Bug 1397407
Opened 8 years ago
Closed 8 years ago
Crash in mozalloc_abort | abort | core::option::expect_failed | webrender::resource_cache::ResourceCache::get_cached_image
Categories
(Core :: Graphics: WebRender, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla57
Tracking | Status | |
---|---|---|
firefox56 | --- | unaffected |
firefox57 | --- | unaffected |
People
(Reporter: kah0922, Assigned: nical)
References
Details
(Whiteboard: [wr-mvp] [gfx-noted])
Crash Data
Attachments
(1 file, 1 obsolete file)
5.55 KB,
patch
|
Gankra
:
review+
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0
Build ID: 20170906100107
Steps to reproduce:
It seems to happen randomly, especially from switching tabs.
I've encountered this on both of my systems running Intel and AMD graphics.
Crash reports: https://crash-stats.mozilla.com/signature/?product=Firefox&signature=mozalloc_abort%20%7C%20abort%20%7C%20core%3A%3Aoption%3A%3Aexpect_failed%20%7C%20webrender%3A%3Aresource_cache%3A%3AResourceCache%3A%3Aget_cached_image
Actual results:
The browser crashed.
Expected results:
It should not crash.
Reporter | ||
Updated•8 years ago
|
Component: Untriaged → Graphics: WebRender
OS: Unspecified → Linux
Product: Firefox → Core
Hardware: Unspecified → x86_64
I'm assuming you're running with webrender enabled.
Blocks: stage-wr-nightly
Updated•8 years ago
|
Crash Signature: [@ core::option::expect_failed | webrender::resource_cache::ResourceCache::get_cached_image]
[@ mozalloc_abort | abort | core::option::expect_failed | webrender::resource_cache::ResourceCache::get_cached_image]
Reporter | ||
Comment 3•8 years ago
|
||
Yeah, webrender is enanbled, as is webrenderest, layers-free, and blob-images.
Whiteboard: [gfx-noted]
Updated•8 years ago
|
Priority: -- → P3
Updated•8 years ago
|
Priority: P3 → P2
Whiteboard: [gfx-noted] → [wr-mvp] [gfx-noted]
Updated•8 years ago
|
status-firefox56:
--- → unaffected
status-firefox57:
--- → unaffected
Comment 4•8 years ago
|
||
Yeah you can get this pretty reliably in a ~minute just opening tabs to websites and toggling youtube videos in and out of fullscreen.
This is probably(?) the last major blocker for making webrender dogfoodable.
Updated•8 years ago
|
![]() |
||
Comment 6•8 years ago
|
||
This is the #9 Windows topcrash in Nightly 20170908220146.
Comment 7•8 years ago
|
||
Did some investigation tonight into why this is happening:
* Always a resource with this format `ImageRequest { key: ImageKey(IdNamespace(1), 309), rendering: Auto, tile: None }`
* Always removed as a result of running the ~WebRenderImageData destructor.
* Problem destructor generally shows up in a big batch of destructors, suggesting some kind of bulk event.
* Image seems to always be removed very close to the crash, suggesting it's used on the same/next frame.
Updated•8 years ago
|
Assignee: nobody → a.beingessner
Updated•8 years ago
|
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Priority: P2 → P1
Target Milestone: --- → mozilla57
Updated•8 years ago
|
Crash Signature: [@ core::option::expect_failed | webrender::resource_cache::ResourceCache::get_cached_image]
[@ mozalloc_abort | abort | core::option::expect_failed | webrender::resource_cache::ResourceCache::get_cached_image] → [@ core::option::expect_failed | webrender::resource_cache::ResourceCache::get_cached_image]
[@ mozalloc_abort | abort | core::option::expect_failed | webrender::resource_cache::ResourceCache::get_cached_image]
[@ mozalloc_abort | abort | core::option::…
OS: Linux → All
Comment 8•8 years ago
|
||
Some more notes:
* It's a WrImageMask that's going missing; looks like the mask that fades out text that's too long to fit on a tab.
* It's possible this is related to a graphical glitch where the whole chrome flickers if your tabs are full enough to fill the title bar.
Comment 10•8 years ago
|
||
Oh, we were suspicious that maybe it was an EmptyTransaction thing, but changing WebRenderLayerManager::EndEmptyTransaction to just `return false` doesn't fix it.
Comment 11•8 years ago
|
||
(In reply to Alexis Beingessner [:Gankro] from comment #8)
> * It's possible this is related to a graphical glitch where the whole chrome
> flickers if your tabs are full enough to fill the title bar.
If this bug occurs with webrendest, then it has nothing to do with the tab text glitch. The tab text glitch (bug 1389476) is due to changing layerization, but with webrendest there is no layerization.
Flags: needinfo?(mstange)
Comment 12•8 years ago
|
||
Ok so we've figured this out. The issue is that when we go to EndTransaction, we don't atomically batch up deleting images from the previous display list (DiscardImages -> update_resources) with the sending of the new display list. So we end up deleting the images, then sending the display list (set_display_list), creating a race condition where a render can happen between those two events.
The solution is to get the ResourceUpdates from DiscardImages and stuff them in set_display_list, so that this is properly atomic.
Nical has volunteered to fix this, as he's in the middle of refactoring all of this code anyway.
Updated•8 years ago
|
Assignee: a.beingessner → nical.bugzilla
Assignee | ||
Comment 13•8 years ago
|
||
This should do.
I think that fonts have the same issue (although we churn them a lot less so it's less likely to happen).
Attachment #8907172 -
Flags: review?(a.beingessner)
Comment 14•8 years ago
|
||
Comment on attachment 8907172 [details] [diff] [review]
Apply deferred image key deletions to the next transaction
Review of attachment 8907172 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/layers/wr/WebRenderLayerManager.cpp
@@ +822,5 @@
> + for (const auto& key : mImageKeysToDelete) {
> + resourceUpdates.DeleteImage(key);
> + }
> + mImageKeysToDelete.Clear();
> +
1:28.60 In file included from /Users/ABeingessner/dev/gecko/obj-x86_64-apple-darwin16.7.0/gfx/layers/Unified_cpp_gfx_layers11.cpp:74:
1:28.60 /Users/ABeingessner/dev/gecko/gfx/layers/wr/WebRenderLayerManager.cpp:809:24: error: invalid range expression of type 'mozilla::wr::ResourceUpdateQueue'; no viable 'begin' function available
1:28.60 for (const auto& key : mImageKeysToDelete) {
1:28.60 ^ ~~~~~~~~~~~~~~~~~~
1:28.60 /Users/ABeingessner/dev/gecko/gfx/layers/wr/WebRenderLayerManager.cpp:810:5: error: use of undeclared identifier 'resourceUpdates'
1:28.60 resourceUpdates.DeleteImage(key);
1:28.60 ^
1:28.60 2 errors generated.
Assignee | ||
Comment 15•8 years ago
|
||
Yeah the patch is rebased on top of bug 1393031. In the current state of mozilla-central mImageKeysToDelete is not an nsTArray (yet).
Comment 16•8 years ago
|
||
Comment on attachment 8907172 [details] [diff] [review]
Apply deferred image key deletions to the next transaction
Tried to crash the browser for 10 minutes with this, and nothing! \o/
Attachment #8907172 -
Flags: review?(a.beingessner) → review+
Comment 17•8 years ago
|
||
Note to checkin people, this patch is based on top of https://bugzilla.mozilla.org/show_bug.cgi?id=1393031, which has been pushed to inbound but hasn't hit unified yet. Dunno what the process is!
Keywords: checkin-needed
Updated•8 years ago
|
Comment 18•8 years ago
|
||
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5dff64340068
Apply deferred image key deletions to the next transaction. r=Gankro
Keywords: checkin-needed
![]() |
||
Comment 19•8 years ago
|
||
Backed out because it depends on bug 1393031:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ac08b84acbcc781b4542ceca12bf417da73ac563
Flags: needinfo?(nical.bugzilla)
Comment 20•8 years ago
|
||
Pushed by nsilva@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/70f5f23a429f
Apply deferred image key deletions to the next transaction. r=Gankro
Assignee | ||
Comment 21•8 years ago
|
||
Although not all of bug 1393031 has made it to m-c yet, the part needed by this patch is in now.
Flags: needinfo?(nical.bugzilla)
Comment 22•8 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
![]() |
||
Comment 23•8 years ago
|
||
Backed out for letting Talos g1 timeout and spam its log on Linux x64 QuantumRender opt:
https://hg.mozilla.org/mozilla-central/rev/c99a1520c7a5af8f3769d3a4fce38697ce47cbb2
Merge to mozilla-central with failing job: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=7aceaf8bcb9f582db0f93488b48ef7019e348dba&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Log of failing job (>50MB due to the log spam): https://treeherder.mozilla.org/logviewer.html#?job_id=131349043&repo=mozilla-central
It contains many lines like these:
> Delete the non-exist key:ImageKey(IdNamespace(1), 1)
Status: RESOLVED → REOPENED
Flags: needinfo?(nical.bugzilla)
Resolution: FIXED → ---
Target Milestone: mozilla57 → ---
Updated•8 years ago
|
Target Milestone: --- → mozilla57
![]() |
||
Comment 24•8 years ago
|
||
This is the #7 Windows topcrash in Nightly 20170915100121.
Assignee | ||
Comment 25•8 years ago
|
||
D'oh! I think that this is caused by our bad pattern of creating an image and immediately queueing it's destruction in various places. That'll take a bit longer to fix so I'll start by delaying the deletion of the images by one transaction which sucks for memory usage but should fix the issue while we work through the nastiness of the image key management.
Flags: needinfo?(nical.bugzilla)
Updated•8 years ago
|
Nical, this is still WebRender only, right?
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 27•8 years ago
|
||
Same patch with an additional array of image keys that buffers the deletion and to delay them to the transaction after next.
try push looks good so far: try push with this patch looks good so far: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1c27c1f61b9bae2e9d2e645d599b916b4c13c3a9
Attachment #8907172 -
Attachment is obsolete: true
Flags: needinfo?(nical.bugzilla)
Attachment #8909345 -
Flags: review?(a.beingessner)
Assignee | ||
Comment 28•8 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #26)
> Nical, this is still WebRender only, right?
Absolutely.
Comment 29•8 years ago
|
||
Comment on attachment 8909345 [details] [diff] [review]
Apply deferred image key deletions to the next transaction (delayed by one transaction)
Review of attachment 8909345 [details] [diff] [review]:
-----------------------------------------------------------------
r=me with typo fix
::: gfx/layers/wr/WebRenderLayerManager.h
@@ +249,5 @@
>
> private:
> nsIWidget* MOZ_NON_OWNING_REF mWidget;
> nsTArray<wr::ImageKey> mImageKeysToDelete;
> + // TODO - This is needed because we have some code that creates a image keys
typo: "a images" -> "images"
Attachment #8909345 -
Flags: review?(a.beingessner) → review+
Comment 30•8 years ago
|
||
Pushed by nsilva@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9f8c6be7d2f4
Apply deferred image key deletions to the next transaction. r=Gankro
Comment 31•8 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 8 years ago → 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•