Crash in [@ core::option::expect_failed | webrender::renderer::TextureResolver::bind]

ASSIGNED
Assigned to

Status

()

defect
P2
critical
ASSIGNED
27 days ago
5 hours ago

People

(Reporter: darkspirit, Assigned: dthayer)

Tracking

(Blocks 2 bugs, {crash, nightly-community, regression})

Trunk
Unspecified
All
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox68 affected)

Details

(Whiteboard: [fxperf:p1], crash signature)

Attachments

(1 attachment)

Happens on Twitter if gfx.webrender.split-render-roots is true.

This bug is for crash report bp-b5ec86b8-cc89-46c9-b65e-ee4ca0190324.

Top 10 frames of crashing thread:

0 libxul.so MOZ_Crash mfbt/Assertions.h:314
1 libxul.so GeckoCrash toolkit/xre/nsAppRunner.cpp:5093
2 libxul.so gkrust_shared::panic_hook toolkit/library/rust/shared/lib.rs:237
3 libxul.so core::ops::function::Fn::call src/libcore/ops/function.rs:78
4 libxul.so std::panicking::rust_panic_with_hook src/libstd/panicking.rs:495
5 libxul.so std::panicking::continue_panic_fmt src/libstd/panicking.rs:398
6 libxul.so rust_begin_unwind src/libstd/panicking.rs:325
7 libxul.so core::panicking::panic_fmt src/libcore/panicking.rs:95
8 libxul.so core::option::expect_failed src/libcore/option.rs:1008
9 libxul.so webrender::renderer::TextureResolver::bind src/libcore/option.rs:322

Hello Doug - is this related to document splitting?

Flags: needinfo?(dothayer)
Priority: -- → P2
(Assignee)

Comment 2

25 days ago

Yes, this presents itself with document splitting on and the gfx.webrender.debug.texture-cache.disable-shrink pref off. I've created Bug 1538710 to track the work of no longer depending on this pref.

Flags: needinfo?(dothayer)
(Assignee)

Updated

25 days ago
Depends on: 1538710
Duplicate of this bug: 1538944

The TextureResolver::bind signature is currently the #1 GPU process crash for WR-enabled nightly builds. crash-stats search results. There's ~127 crashes in the last 7 days. It seems unlikely that this is only triggered when document-splitting is enabled.

The crash signature was very low-volume until March 23, when it suddenly spiked. That's the same time the document-splitting stuff landed. So I'm fairly sure the spike in crashes is a result of those patches. Doug, do you have cycles to prioritize this, since it seems to affect users even in the default configuration of document-splitting disabled?

Flags: needinfo?(dothayer)
(Assignee)

Comment 6

4 days ago

Yeah. I'm going to prioritize landing the work for bug 1538710 and see how that affects these numbers. Then if they're still present (likely based on comment 5), I'll dig into what else might be causing this.

Flags: needinfo?(dothayer)

Looks like this is still happening. Here's a crash from today's nightly: https://crash-stats.mozilla.com/report/index/147b0aab-757b-43b8-acaa-aef9d0190417

(Assignee)

Updated

2 days ago
Assignee: nobody → dothayer
Status: NEW → ASSIGNED
Whiteboard: [fxperf:p1]
(Assignee)

Comment 8

a day ago

I haven't found any smoking gun yet. However, here is a summary of my spelunking so far:

  • Something is missing from the TextureCache's texture_cache_map

  • This means one of two possibilities:

    • The item has not been added yet, and yet we render with a frame that holds the key. This should not be possible, as in order to be added to the frame the item must have been queued to be added to texture_cache_map via TextureCacheAllocationKind::Alloc
    • The item has been removed, and we render with a frame that has not been updated. This is likely what's going on, and it's what was going on in bug 1538710. However, that was specific to multiple documents, where one document's frame build would trigger a clearing of cached resources, and the other document's frame would not be built, leading to a stale frame when it came time to render
  • Progressing from the assumption that we are removing the relevant texture_cache_map item, triggering a render, and yet not building a frame, we need to examine how we remove items from texture_cache_map.

  • The item is actually removed here

  • To hit this, we must go through TextureUpdateList::push_free

    • This is called from TextureCache::free for a standalone entry, and from TextureCache::clear_shared

      • TextureCache::free is called from three places:

        • TextureCache::expire_old_entries - this is ruled out as it is only called during a frame build
        • TextureCache::upsert_entry - this is ruled out for the same reason
        • TextureCache::clear_all - this can be called independently outside of a frame build
      • TextureCache::clear_shared is called from two places:

        • TextureCache::maybe_reclaim_shared_memory - this was the cause of bug 1538710, in which it was moved to TextureCache::before_frames, where it forces us to build a frame by setting require_frame_build to true
        • TextureCache::clear_all - (as stated above) this can be called independently outside of a frame build
  • This leaves us with the conclusion that either we've missed something, or we have to call TextureCache::clear_all without doing a frame build, but still trigger a render.

  • So, going down that rabbit hole, there's two places where we can trigger TextureCache::clear_all

  • So, assuming NotifyMemoryPressure is the culprit, because it's a possible way of clearing items from texture_cache_map which isn't directly tied with a frame build, the question becomes whether we can then trigger a render while continuing to avoid the frame build.

  • Before the fixes for bug 1538710, this was relatively easy - if we called TextureCache::clear_all from a memory pressure event, we could have the main document with frame_is_valid equal to true, causing render_frame to be true and build_frame to be false inside RenderBackend::update_document, which would lead us to call wr_notifier_new_frame_ready, without having built an actual frame (?). After bug 1538710, it seems trickier, though there is this strange comment referencing what seems to be a similar problem in the handling of ResultMsg::UpdateResources, though that code seems to be dead. EDIT: ResultMsg::UpdateResources is indeed used, and seems to be used correctly. I just missed it because it only shows up in searchfox for text searches. I think my next step is to do some archaeology on that code and see if anything pops out. However, I still don't have any hunches as to how the document splitting patches would have affected any of this.

(Assignee)

Comment 9

5 hours ago

This is a bit of a stab in the dark, but I wanted to rule it out
as a potential cause. See comment in the code for some justification.

You need to log in before you can comment on or make changes to this bug.