Closed Bug 1452513 Opened 6 years ago Closed 6 years ago

blob-invalidation: Crash in mozalloc_abort | abort | libxul.so@0x3d1f718 | libxul.so@0x3d1f708 | libxul.so@0x3d0f260 | webrender::resource_cache::ResourceCache::update_resources

Categories

(Core :: Graphics: WebRender, defect, P2)

x86_64
All
defect

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox59 --- unaffected
firefox60 --- unaffected
firefox61 --- disabled
firefox62 --- disabled
firefox63 --- disabled

People

(Reporter: jan, Assigned: aosmond)

References

(Blocks 1 open bug, )

Details

(Keywords: crash, nightly-community)

Crash Data

Attachments

(2 files)

Nightly 61 x64 20180408100251 de_DE @ Debian Testing, KDE, Radeon RX480
main profile: gpu process, webrender, blob-invalidation, etc.

I wanted to know how I could make a Gtk window with Rust, so I opened some examples. Then I dragged a tab of https://github.com/ab0v3g4me/epicwar-downloader a bit down to open it in a new window. I grabbed the tab there and placed it at back to the original position in the first window. Then I did a middleclick to activate autoscroll and scrolled a bit down. I could reproduce the crash once. (Only tested once.)

bp-d4184df8-9c01-4fcf-b637-cca210180408	09.04.18 00:08
bp-0e17119d-53c7-4726-bc4c-6e0e30180408	09.04.18 00:06
> MOZ_CRASH Reason 	Attempt to update non-existent image

https://github.com/servo/webrender/blob/df73569c5a9fba35a7c268d176a3eb0d6c52b249/webrender/src/resource_cache.rs#L501
fresh profile: gfx.webrender.all, gfx.webrender.blob.invalidation
Open 2 tabs of https://github.com/ab0v3g4me/epicwar-downloader, move one out of the window, move the tab back to the first window. Hover something or try to scroll down. Browser crash.

Not reproducible without blob invalidation.

bp-06ef322f-eeb4-4973-a088-1d5a20180409 09.04.18 02:06
bp-2eb4d2ad-51d1-4d44-99a4-f96ec0180409 09.04.18 02:06
bp-53bcb452-bc72-4660-83a9-fbe470180409 09.04.18 02:06
bp-59c5c9d0-1f40-4764-8da4-8973f0180409 09.04.18 02:04
Blocks: 1388842
Crash Signature: [@ mozalloc_abort | abort | libxul.so@0x3d1f718 | libxul.so@0x3d1f708 | libxul.so@0x3d0f260 | webrender::resource_cache::ResourceCache::update_resources ] → [@ mozalloc_abort | abort | libxul.so@0x3d1f718 | libxul.so@0x3d1f708 | libxul.so@0x3d0f260 | webrender::resource_cache::ResourceCache::update_resources ] [@ mozalloc_abort | abort | libxul.so@0x3d4b668 | libxul.so@0x3d4b658 | libxul.so@0x3d3b1b0 | webre…
Summary: Crash in mozalloc_abort | abort | libxul.so@0x3d1f718 | libxul.so@0x3d1f708 | libxul.so@0x3d0f260 | webrender::resource_cache::ResourceCache::update_resources → blob-invalidation: Crash in mozalloc_abort | abort | libxul.so@0x3d1f718 | libxul.so@0x3d1f708 | libxul.so@0x3d0f260 | webrender::resource_cache::ResourceCache::update_resources
Win10, Radeon RX480: STR from comment 1. Not reproducible without blob invalidation.
bp-422dfe39-2572-44dd-a7cf-5bf020180409 09.04.2018 16:46 [@ static void std::panicking::rust_panic_with_hook ]
> Attempt to update non-existent image
Crash Signature: webrender::resource_cache::ResourceCache::update_resources ] → webrender::resource_cache::ResourceCache::update_resources ] [@ static void std::panicking::rust_panic_with_hook ]
OS: Linux → All
Nightly 61 x64 20180412100111 de_DE 246c614e160586c1eb3167cff866dd550be35e03 @ Debian Testing, KDE, Radeon RX480
fresh profile: gfx.webrender.all, gfx.webrender.blob.invalidation

bp-72266f65-30a6-4571-b81d-bad230180412 12.04.18 17:40
> Attempt to update non-existent image

https://hg.mozilla.org/mozilla-central/graph/246c614e160586c1eb3167cff866dd550be35e03
This build includes the patch from bug 1451458.
Crash Signature: webrender::resource_cache::ResourceCache::update_resources ] [@ static void std::panicking::rust_panic_with_hook ] → webrender::resource_cache::ResourceCache::update_resources ] [@ static void std::panicking::rust_panic_with_hook ] [@ mozalloc_abort | abort | libxul.so@0x3d147d8 | libxul.so@0x3d147c8 | libxul.so@0x3d04320 | webrender::resource_cache::ResourceCache::u…
Has STR: --- → yes
Crash Signature: webrender::resource_cache::ResourceCache::update_resources ] → webrender::resource_cache::ResourceCache::update_resources ] [@ mozalloc_abort | abort | libxul.so@0x3d268b8 | libxul.so@0x3d268a8 | libxul.so@0x3d16400 | webrender::resource_cache::ResourceCache::update_resources ]
Crash Signature: webrender::resource_cache::ResourceCache::update_resources ] [@ mozalloc_abort | abort | libxul.so@0x3d268b8 | libxul.so@0x3d268a8 | libxul.so@0x3d16400 | webrender::resource_cache::ResourceCache::update_resources ] → webrender::resource_cache::ResourceCache::update_resources ] [@ mozalloc_abort | abort | libxul.so@0x3d268b8 | libxul.so@0x3d268a8 | libxul.so@0x3d16400 | webrender::resource_cache::ResourceCache::update_resources ] [@ mozalloc_abort | abort | libxul.s…
https://hg.mozilla.org/mozilla-central/rev/789e30ff2e3d6e1fcfce1a373c1e5635488d24da includes the latest WebRender update.
bp-49deb394-f7c0-4a58-8c30-3c0fa0180418 18.04.18 03:34

try build from bug 1454659 comment 6:
mozregression --repo mozilla-inbound --launch 7b816219f708 --pref gfx.webrender.all:true gfx.webrender.blob.invalidation:true startup.homepage_welcome_url:'https://github.com/ab0v3g4me/epicwar-downloader|https://github.com/ab0v3g4me/epicwar-downloader'
-> Crash. (Process exited with code 11)
Crash Signature: libxul.so@0x3d08e98 | libxul.so@0x3d08e88 | libxul.so@0x3cf89e0 | webrender::resource_cache::ResourceCache::update_resources ] → libxul.so@0x3d08e98 | libxul.so@0x3d08e88 | libxul.so@0x3cf89e0 | webrender::resource_cache::ResourceCache::update_resources ] [@ mozalloc_abort | abort | libxul.so@0x3d0a6c8 | libxul.so@0x3d0a6b8 | libxul.so@0x3cfa210 | webrender::resource_cache::Resou…
Not "try", I meant "inbound".
I'll take a look at this tomorrow.
Assignee: nobody → jmuizelaar
Crash Signature: webrender::resource_cache::ResourceCache::update_resources ] → webrender::resource_cache::ResourceCache::update_resources ] [@ mozalloc_abort | abort | libxul.so@0x3cdd828 | libxul.so@0x3cdd818 | libxul.so@0x3ccd370 | webrender::resource_cache::ResourceCache::update_resources ] [@ static void webrender::resource_c…
I have a recording in rr. Will debug tomorrow.
Crash Signature: webrender::resource_cache::ResourceCache::update_resources::h548867be502942f8 ] → webrender::resource_cache::ResourceCache::update_resources::h548867be502942f8 ] [@ mozalloc_abort | abort | libxul.so@0x3d031f8 | libxul.so@0x3d031e8 | libxul.so@0x3cf2d40 | webrender::resource_cache::ResourceCache::update_resources ] [@ mozalloc_abort…
Crash Signature: mozalloc_abort | abort | libxul.so@0x3d1eeb8 | libxul.so@0x3d1eea8 | libxul.so@0x3d0ea00 | webrender::resource_cache::ResourceCache::update_resources ] → mozalloc_abort | abort | libxul.so@0x3d1eeb8 | libxul.so@0x3d1eea8 | libxul.so@0x3d0ea00 | webrender::resource_cache::ResourceCache::update_resources ] [@ mozalloc_abort | abort | libxul.so@0x3d0d818 | libxul.so@0x3d0d808 | libxul.so@0x3cfd360 | webrend…
This happens for me when I move a tab with video playback out of the main browser window (with WebRender enabled).
Crash Signature: webrender::resource_cache::ResourceCache::update_resources ] → webrender::resource_cache::ResourceCache::update_resources ] [@ mozalloc_abort | abort | libxul.so@0x3d0a6d8 | libxul.so@0x3d0a6c8 | libxul.so@0x3cfa220 | webrender::resource_cache::ResourceCache::update_resources ] [@ mozalloc_abort | abort | libxul.s…
This might be fixed by bug 1453801 which landed on inbound this morning.
Flags: needinfo?(jan)
https://hg.mozilla.org/integration/mozilla-inbound/graph/d1479b21a284

> Bug 1453801 - Part 3. Fix race condition shutting down the render thread and shared surfaces. r=sotaro
mozregression --repo mozilla-inbound --launch d1479b21a284 --pref gfx.webrender.all:true startup.homepage_welcome_url:'https://github.com/ab0v3g4me/epicwar-downloader|https://github.com/ab0v3g4me/epicwar-downloader'
no crash

> Bug 1454398 - Comment out another invalid assert I missed in 42e037e0b8d1. r=me
mozregression --repo mozilla-inbound --launch e5b94fa417c8 --pref gfx.webrender.all:true startup.homepage_welcome_url:'https://github.com/ab0v3g4me/epicwar-downloader|https://github.com/ab0v3g4me/epicwar-downloader'
crash

Promising, thanks! :)
Flags: needinfo?(jan)
Seen on Socorro.
bp-11139dd5-260b-46b8-a900-6b9290180425 build 20180424220100 MacOS
> Attempt to update non-existent image
Crash Signature: libxul.so@0x3cfd4e8 | libxul.so@0x3cfd4d8 | libxul.so@0x3ced030 | webrender::resource_cache::ResourceCache::update_resources ] → libxul.so@0x3cfd4e8 | libxul.so@0x3cfd4d8 | libxul.so@0x3ced030 | webrender::resource_cache::ResourceCache::update_resources ] [@ mozalloc_abort | abort | webrender::resource_cache::ResourceCache::update_resources::h978ef7b666262248 ]
FWIW, I believe this is somehow related to concurrency: I seem to encounter it when I open-in-new-tab many many tabs from the same website (ie that share the same resources)
Crash Signature: libxul.so@0x3cfd4e8 | libxul.so@0x3cfd4d8 | libxul.so@0x3ced030 | webrender::resource_cache::ResourceCache::update_resources ] [@ mozalloc_abort | abort | webrender::resource_cache::ResourceCache::update_resources::h978ef7b666262248 ] → libxul.so@0x3cfd4e8 | libxul.so@0x3cfd4d8 | libxul.so@0x3ced030 | webrender::resource_cache::ResourceCache::update_resources ] [@ mozalloc_abort | abort | webrender::resource_cache::ResourceCache::update_resources::h978ef7b666262248 ] [@ mozalloc_abort…
Crash Signature: mozalloc_abort | abort | webrender::resource_cache::ResourceCache::update_resources ] → mozalloc_abort | abort | webrender::resource_cache::ResourceCache::update_resources ] [@ mozalloc_abort | abort | webrender::resource_cache::ResourceCache::update_resources::hb1727d001ad478db ]
Crash Signature: mozalloc_abort | abort | webrender::resource_cache::ResourceCache::update_resources ] [@ mozalloc_abort | abort | webrender::resource_cache::ResourceCache::update_resources::hb1727d001ad478db ] → mozalloc_abort | abort | webrender::resource_cache::ResourceCache::update_resources ] [@ mozalloc_abort | abort | webrender::resource_cache::ResourceCache::update_resources::hb1727d001ad478db ] [@ mozalloc_abort | abort | webrender::resource_cache::Res…
I'm reproducing this a lot on today's nightly by having some youtube videos open and dragging a video tab to another window (2nd screen is an external monitor if that matters). As soon as I drop the tab Firefox crashes.
I think the patch from bug 1455597 might help here. At least, I saw this crash once while trying to reproduce that crash and it seemed to be the same sort of problem.
Depends on: 1455597
Crash Signature: webrender::resource_cache::ResourceCache::update_resources::h14f93eb22c5a2253 ] → webrender::resource_cache::ResourceCache::update_resources::h14f93eb22c5a2253 ] [@ mozalloc_abort | abort | webrender::resource_cache::ResourceCache::update_resources::haae17d0d229b1a4c ] [@ mozalloc_abort | abort | webrender::resource_cache::ResourceC…
Crash Signature: webrender::resource_cache::ResourceCache::update_resources::h06673922f16d42c5 ] → webrender::resource_cache::ResourceCache::update_resources::h06673922f16d42c5 ] [@ mozalloc_abort | abort | webrender::resource_cache::ResourceCache::update_resources::hed991ea73c6eb688 ]
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #15)
> I think the patch from bug 1455597 might help here. At least, I saw this crash once while trying to reproduce that crash and it seemed to be the same sort of problem.

Last crash with build 20180604100129.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Someone uploaded a new report:
bp-f1e96392-4d98-46b1-b04f-679170180606 build 20180605220158 Win10
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Win10 with GTX 1060. Same STR as comment 19, but I didn't open and install the AMO link.
bp-b33ffc9e-4140-408c-9aaf-20b520180625 25 Jun 2018 14:46
bp-0b86c0d4-e5f5-4d35-92b8-499210180625 25 Jun 2018 14:46
bp-b725d547-b7ce-4a62-99b9-82e3c0180625 25 Jun 2018 14:46
bp-b46ce205-2cfd-4a3d-83b5-0a4190180625 25 Jun 2018 14:46
See Also: → 1468732
Crash Signature: webrender::resource_cache::ResourceCache::update_resources::h06673922f16d42c5 ] [@ mozalloc_abort | abort | webrender::resource_cache::ResourceCache::update_resources::hed991ea73c6eb688 ] → webrender::resource_cache::ResourceCache::update_resources::h06673922f16d42c5 ] [@ mozalloc_abort | abort | webrender::resource_cache::ResourceCache::update_resources::hed991ea73c6eb688 ] [@ mozalloc_abort | abort | _ZN11panic_abort18__rust_start_panic…
Crash Signature: _ZN11panic_abort18__rust_start_panic5abort17hfb98714efe360e0fE | __rust_start_panic | _ZN3std9panicking20rust_panic_with_hook17h608586f043d70222E ] → _ZN11panic_abort18__rust_start_panic5abort17hfb98714efe360e0fE | __rust_start_panic | _ZN3std9panicking20rust_panic_with_hook17h608586f043d70222E ] [@ mozalloc_abort | abort | _ZN11panic_abort18__rust_start_panic5abort17h8d2f6471e2847f91E | __rust_start…
When the WebRenderApi object is destroyed, it will trigger RenderApi::drop which clears the namespace. This can happen while the scene builder is still busy, and has yet to send its SceneBuilderResult::Transaction message. As a result we process the ApiMsg::ClearNamespace message beforehand, and then we hit the panic once the scene is ready. We need synchronization on the teardown. I'll see if I can figure it out.
Yeah, I have that patch in my tree, but for some reason, the flush doesn't stop a transaction from starting afterwards. Event ordering is something like this:

api flush scene builder send
api flush scene builder sent
render backend send scene builder flush
scene builder transaction start
render backend sent scene builder flush
scene builder transaction end
scene builder flush start
scene builder flush end
api flush scene builder recv
clear_namespace: IdNamespace(14)
scene builder transaction start
scene builder transaction end
update_image_template: ImageKey(IdNamespace(12), 82)

Investigation continues.
Oh I was missing the most important part of the log, because there were two flushes, whoops...

api flush scene builder send
api flush scene builder sent
render backend send scene builder flush
render backend sent scene builder flush
scene builder transaction start
scene builder transaction end
scene builder flush start
scene builder flush end
api flush scene builder recv
clear_namespace: IdNamespace(12)

api flush scene builder send
api flush scene builder sent
render backend send scene builder flush
scene builder transaction start
render backend sent scene builder flush
scene builder transaction end
scene builder flush start
scene builder flush end
api flush scene builder recv
clear_namespace: IdNamespace(14)

scene builder transaction start
scene builder transaction end
update_image_template: ImageKey(IdNamespace(12), 82)
Adding the thread IDs to the logging proved important. It appears that the images were originally being handled on one WRRenderBackend thread, we flushed / cleared the namespace on that thread, and then we try to use the same namespace on a different WRRenderBackend thread. Thus even if we hadn't done the clear, this would have panicked since the resource caches are different. In the successful transitions, it seems to generate new keys for a new namespace.
I think this line causes us to inject stale transaction data into the new WebRenderAPI object:

https://searchfox.org/mozilla-central/rev/6ef785903fee6c0b16a1eab79d722373d940fd78/gfx/layers/wr/WebRenderBridgeParent.cpp#657
Assignee: jmuizelaar → aosmond
Comment on attachment 8990305 [details] [diff] [review]
0001-Bug-1452513-Avoid-issuing-transactions-with-WebRende.patch, v1

Review of attachment 8990305 [details] [diff] [review]:
-----------------------------------------------------------------

Hm, so before I added this AutoSender thingy in https://searchfox.org/mozilla-central/diff/4cb71672840e62bff04aa2e4184f88ed048043ee/gfx/layers/wr/WebRenderBridgeParent.cpp#622 we were explicitly sending the transaction after ProcessWebRenderParentCommands, regardless of the namespace check. Do we want to restore that behaviour? The ProcessWebRenderParentCommands looks like it does some useful cleanup stuff which we probably want to still run even if the namespaces don't match.

Can you also update the commit message to specify exactly which commands (UpdateImageBuffer, per your response on IRC) were getting sent in the transaction that were undesirable in the case of a namespace change?
Comment on attachment 8990305 [details] [diff] [review]
0001-Bug-1452513-Avoid-issuing-transactions-with-WebRende.patch, v1

Review of attachment 8990305 [details] [diff] [review]:
-----------------------------------------------------------------

Dropping flag for now, see comment above.
Attachment #8990305 - Flags: review?(bugmail)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #28)
> Comment on attachment 8990305 [details] [diff] [review]
> 0001-Bug-1452513-Avoid-issuing-transactions-with-WebRende.patch, v1
> 
> Review of attachment 8990305 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Hm, so before I added this AutoSender thingy in
> https://searchfox.org/mozilla-central/diff/
> 4cb71672840e62bff04aa2e4184f88ed048043ee/gfx/layers/wr/WebRenderBridgeParent.
> cpp#622 we were explicitly sending the transaction after
> ProcessWebRenderParentCommands, regardless of the namespace check. Do we
> want to restore that behaviour? The ProcessWebRenderParentCommands looks
> like it does some useful cleanup stuff which we probably want to still run
> even if the namespaces don't match.
> 
> Can you also update the commit message to specify exactly which commands
> (UpdateImageBuffer, per your response on IRC) were getting sent in the
> transaction that were undesirable in the case of a namespace change?

Hmmm, so the transactions created by ProcessWebRenderParentCommands are all for pipeline removal, but any stale pipelines should have been removed by ClearResources when the namespace was changed in UpdateWebRender. Similarly any stale texture hosts and shared surfaces should have been released. However, this begs the question of should we even call ProcessWebRenderParentCommands at all. Surely we don't want to *add* stale pipelines, or compositor animations...
So looking at this more closely, you seem to be mostly right. The one discrepancy I see between ClearResources and ProcessWebRenderParentCommands is that the latter also removes the compositable pipeline [1] whereas the former does not [2]. However we can fix that pretty easily by also removing the pipeline in ClearResources. With that in place I guess your patch is fine.

The only other consideration is Sotaro's change in bug 1475187, but if I'm understanding you correctly we should be ok to skip that pipeline-addition in the case where the namespace id has changed, because it would be stale anyway. Is that right?

[1] https://searchfox.org/mozilla-central/rev/a80651653faa78fa4dfbd238d099c2aad1cec304/gfx/layers/wr/WebRenderBridgeParent.cpp#1089
[2] https://searchfox.org/mozilla-central/rev/a80651653faa78fa4dfbd238d099c2aad1cec304/gfx/layers/wr/WebRenderBridgeParent.cpp#1662-1665
Crash Signature: __rust_start_panic | _ZN3std9panicking20rust_panic_with_hook17hc80c992e8db03b06E ] → __rust_start_panic | _ZN3std9panicking20rust_panic_with_hook17hc80c992e8db03b06E ] [@ mozalloc_abort | abort | panic_abort::__rust_start_panic::abort | __rust_start_panic | webrender::resource_cache::ResourceCache::update_resources ]
See Also: → 1477571
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #31)
> So looking at this more closely, you seem to be mostly right. The one
> discrepancy I see between ClearResources and ProcessWebRenderParentCommands
> is that the latter also removes the compositable pipeline [1] whereas the
> former does not [2]. However we can fix that pretty easily by also removing
> the pipeline in ClearResources. With that in place I guess your patch is
> fine.
> 

I'll wait for bug 1477571 to land.

> The only other consideration is Sotaro's change in bug 1475187, but if I'm
> understanding you correctly we should be ok to skip that pipeline-addition
> in the case where the namespace id has changed, because it would be stale
> anyway. Is that right?
> 
> [1]
> https://searchfox.org/mozilla-central/rev/
> a80651653faa78fa4dfbd238d099c2aad1cec304/gfx/layers/wr/WebRenderBridgeParent.
> cpp#1089
> [2]
> https://searchfox.org/mozilla-central/rev/
> a80651653faa78fa4dfbd238d099c2aad1cec304/gfx/layers/wr/WebRenderBridgeParent.
> cpp#1662-1665

Yes I believe so.
Comment on attachment 8990305 [details] [diff] [review]
0001-Bug-1452513-Avoid-issuing-transactions-with-WebRende.patch, v1

Review of attachment 8990305 [details] [diff] [review]:
-----------------------------------------------------------------

With bug 1479939 landed this patch should be fine, after appropriate rebasing.
Attachment #8990305 - Flags: review+
Pushed by aosmond@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bd62249f8b51
Avoid issuing transactions with WebRender when the namespace has changed. r=kats
https://hg.mozilla.org/mozilla-central/rev/bd62249f8b51
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: