Open Bug 1645841 Opened 1 year ago Updated 7 months ago

Crash in [@ core::option::expect_failed | webrender_api::resources::ApiResources::update_blob_image]

Categories

(Core :: Graphics: WebRender, defect, P3)

defect

Tracking

()

Tracking Status
firefox82 --- affected
firefox83 --- affected
firefox84 --- affected
firefox85 --- affected
firefox86 --- affected
firefox87 --- affected
firefox88 --- affected

People

(Reporter: agi, Assigned: aosmond)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: crash, intermittent-failure, Whiteboard: [geckoview][retriggered][stockwell unknown])

Crash Data

We're seeing this but when opening a WebExtension page on Fenix https://github.com/mozilla-mobile/android-components/issues/7393

This bug is for crash report bp-7be62153-139b-46e1-b058-0ac310200615.

Top 10 frames of crashing thread:

0 libxul.so RustMozCrash mozglue/static/rust/wrappers.cpp:17
1 libxul.so mozglue_static::panic_hook mozglue/static/rust/lib.rs:89
2 libxul.so core::ops::function::Fn::call src/libcore/ops/function.rs:72
3 libxul.so std::panicking::rust_panic_with_hook src/libstd/panicking.rs:474
4 libxul.so rust_begin_unwind src/libstd/panicking.rs:378
5 libxul.so core::panicking::panic_fmt src/libcore/panicking.rs:85
6 libxul.so core::option::expect_failed src/libcore/option.rs:1203
7 libxul.so webrender_api::resources::ApiResources::update_blob_image gfx/wr/webrender_api/src/resources.rs:183
8 libxul.so webrender_api::resources::ApiResources::update gfx/wr/webrender_api/src/resources.rs:111
9 libxul.so wr_api_send_transaction gfx/wr/webrender_api/src/api.rs:1651
Whiteboard: [geckoview]

I have been unable to reproduce this on my Pixel 2 (Android 9 and 10), OnePlus 6 (Android 10), or Moto G7 Play (Android 9).

Blocking wr-adreno5xx6xx even though this is unlikely device specific, because we will want to fix it before shipping to more devices.

I'll ask on the github ticket whether the reporter could try a custom build with some logging added.

Severity: -- → S2

I got a new phone (a OnePlus 8 Pro) and am now able to reproduce this. Very easily on the official Fenix build, and frustratingly less often on a local build with logging.

A key thing I noticed in my logcat just before the crash is:

[GFX1-]: DataSourceSurface of SharedSurfaces does not exist for extId:8589934603

I see this is also present in all the crash reports. If this warning is logged, then WebRenderBridgeParent::AddSharedExternalImage() will return false, and WebRenderBridgeParent::UpdateResources() will return early. If there were some AddBlobImage commands in the resource list after the AddSharedExternalImage command which fails, then they will not be processed. Then, if in the next display list there are some SetBlobImageVisibleArea commands which refer to those blob images, we will hit this assertion.

So the question is, why does the call to SharedSurfacesParent::Acquire() fail? From adding some logging, I can see that SharedSurfacesParent::RemoveSameProcess() is being called a couple of milliseconds before the call to Acquire(), from a different thread. I will continue looking in to why this is happening, but it's unfamiliar code. Any ideas, Andrew?

Flags: needinfo?(aosmond)
No longer blocks: wr-80
No longer blocks: wr-81

Removing from 82 since this tracks wr-adreno5xx6xx. Doesn't look too serious based on crash rate.

No longer blocks: gfx-82
Crash Signature: [@ core::option::expect_failed | webrender_api::resources::ApiResources::update_blob_image] → [@ core::option::expect_failed | webrender_api::resources::ApiResources::update_blob_image] [@ core::option::expect_failed | webrender::api::resources::ApiResources::update_blob_image]
Crash Signature: [@ core::option::expect_failed | webrender_api::resources::ApiResources::update_blob_image] [@ core::option::expect_failed | webrender::api::resources::ApiResources::update_blob_image] → [@ core::option::expect_failed | webrender_api::resources::ApiResources::update_blob_image] [@ core::option::expect_failed | webrender::api_resources::ApiResources::update_blob_image]

This crash also affects Windows, macOS, and Linux. Fortunately, the crash volume is very low.

bp-113edbb5-3551-435e-983b-6f6550201117

OS: Android → All
Hardware: Unspecified → All
Crash Signature: [@ core::option::expect_failed | webrender_api::resources::ApiResources::update_blob_image] [@ core::option::expect_failed | webrender::api_resources::ApiResources::update_blob_image] → [@ core::option::expect_failed | webrender_api::resources::ApiResources::update_blob_image] [@ core::option::expect_failed | webrender::api_resources::ApiResources::update_blob_image] [@ mozglue_static::panic_hook]
Duplicate of this bug: 1680736
Crash Signature: [@ core::option::expect_failed | webrender_api::resources::ApiResources::update_blob_image] [@ core::option::expect_failed | webrender::api_resources::ApiResources::update_blob_image] [@ mozglue_static::panic_hook] → [@ core::option::expect_failed | webrender_api::resources::ApiResources::update_blob_image] [@ core::option::expect_failed | webrender::api_resources::ApiResources::update_blob_image] [@ mozglue_static::panic_hook] [@ std::sys_common::backtrace::__rust_e…

I see 1-3 instances of this crash signature a day in 85 Nightly. I don't witness the crash. The reports are found by the unsubmitted crash report check when Nightly restarts.

All 3 instances from yesterday are for the same Google Sheet. I leave this sheet open in a tab all day and it has been reproducing the "Google Sheet turns black" bug 1674147 (though I don't currently have Software WebRender enabled like the STR in that bug).

Crash Signature: [@ core::option::expect_failed | webrender_api::resources::ApiResources::update_blob_image] [@ core::option::expect_failed | webrender::api_resources::ApiResources::update_blob_image] [@ mozglue_static::panic_hook] [@ → [@ core::option::expect_failed | webrender_api::resources::ApiResources::update_blob_image] [@ core::option::expect_failed | webrender::api_resources::ApiResources::update_blob_image] [@ mozglue_static::panic_hook] [@
See Also: → 1674147

There have been 64 failures in the last 7 days

  • 51 failures on linux64-shippable-qr
  • 13 failures on macosx1014-64-shippable-qr
Flags: needinfo?(jmathies)
Whiteboard: [geckoview][retriggered] → [geckoview][retriggered][[stockwell needswork]]
Blocks: gfx-triage
Flags: needinfo?(jmathies)
Flags: needinfo?(aosmond)

Jim do you know who can work on this? Failure rate doesn't seem to go down.

Flags: needinfo?(jmathies)

Andrew will take a look.

Flags: needinfo?(jmathies) → needinfo?(aosmond)
Whiteboard: [geckoview][retriggered][][stockwell disable-recommended] → [geckoview][retriggered][][stockwell needswork:owner]
Whiteboard: [geckoview][retriggered][][stockwell disable-recommended] → [geckoview][retriggered][stockwell needswork:owner]
Flags: needinfo?(aosmond)
Flags: needinfo?(aosmond)

Hi Mike, this started on this merge: https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=d4b21131a094a0819b8856fbf4165cfb32d671c6

Could it be from bug 1651311? Failures are PROCESS-CRASH | tart | application crashed [@ mozglue_static::panic_hook]

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=325490183&repo=mozilla-central&lineNumber=1860

[task 2020-12-28T11:57:26.650Z] 11:57:26 INFO - PROCESS-CRASH | tart | application crashed [@ mozglue_static::panic_hook]
[task 2020-12-28T11:57:26.650Z] 11:57:26 INFO - Crash dump filename: /tmp/tmpMMCOg6/profile/minidumps/5f26dffe-9a79-6c96-ddfe-2924ea519825.dmp
[task 2020-12-28T11:57:26.650Z] 11:57:26 INFO - Mozilla crash reason: Attempt to update non-existent blob image
[task 2020-12-28T11:57:26.650Z] 11:57:26 INFO - Operating system: Linux
[task 2020-12-28T11:57:26.650Z] 11:57:26 INFO - 0.0.0 Linux 4.4.0-66-generic #87-Ubuntu SMP Fri Mar 3 15:29:05 UTC 2017 x86_64
[task 2020-12-28T11:57:26.650Z] 11:57:26 INFO - CPU: amd64
[task 2020-12-28T11:57:26.651Z] 11:57:26 INFO - family 6 model 94 stepping 3
[task 2020-12-28T11:57:26.651Z] 11:57:26 INFO - 8 CPUs
[task 2020-12-28T11:57:26.651Z] 11:57:26 INFO - GPU: UNKNOWN
[task 2020-12-28T11:57:26.652Z] 11:57:26 INFO - Crash reason: SIGSEGV /SEGV_MAPERR
[task 2020-12-28T11:57:26.652Z] 11:57:26 INFO - Crash address: 0x0
[task 2020-12-28T11:57:26.652Z] 11:57:26 INFO - Process uptime: not available
[task 2020-12-28T11:57:26.652Z] 11:57:26 INFO - Thread 37 (crashed)
[task 2020-12-28T11:57:26.652Z] 11:57:26 INFO - 0 libxul.so!RustMozCrash [wrappers.cpp:74c33e8ce86d9ff26e2bcf402039b436556f97ed : 17 + 0xa]
[task 2020-12-28T11:57:26.652Z] 11:57:26 INFO - rax = 0x000055f4ac106440 rdx = 0x00007f92cb7c279a
[task 2020-12-28T11:57:26.652Z] 11:57:26 INFO - rcx = 0x00007f92cb7c252c rbx = 0x00007f92e28af590
[task 2020-12-28T11:57:26.652Z] 11:57:26 INFO - rsi = 0x00000000000000c5 rdi = 0x00007f92cb7c2592
[task 2020-12-28T11:57:26.652Z] 11:57:26 INFO - rbp = 0x00007f92cb7c2580 rsp = 0x00007f92cb7c2580
[task 2020-12-28T11:57:26.652Z] 11:57:26 INFO - r8 = 0x00007f92cb7c25a6 r9 = 0x00007f92ec4008c0
[task 2020-12-28T11:57:26.652Z] 11:57:26 INFO - r10 = 0x00007f92cb7c25a6 r11 = 0x000000000000001a
[task 2020-12-28T11:57:26.652Z] 11:57:26 INFO - r12 = 0x00000000000000c5 r13 = 0x0000000000000025
[task 2020-12-28T11:57:26.652Z] 11:57:26 INFO - r14 = 0x00007f92e05c16f7 r15 = 0x00007f92b7141e50
[task 2020-12-28T11:57:26.653Z] 11:57:26 INFO - rip = 0x00007f92dfdadbee
[task 2020-12-28T11:57:26.653Z] 11:57:26 INFO - Found by: given as instruction pointer in context
[task 2020-12-28T11:57:26.653Z] 11:57:26 INFO - 1 libxul.so!mozglue_static::panic_hook [lib.rs:74c33e8ce86d9ff26e2bcf402039b436556f97ed : 89 + 0x9]
[task 2020-12-28T11:57:26.653Z] 11:57:26 INFO - rbx = 0x00007f92e28af590 rbp = 0x00007f92cb7c29d0
[task 2020-12-28T11:57:26.653Z] 11:57:26 INFO - rsp = 0x00007f92cb7c2590 r12 = 0x00000000000000c5
[task 2020-12-28T11:57:26.653Z] 11:57:26 INFO - r13 = 0x0000000000000025 r14 = 0x00007f92e05c16f7
[task 2020-12-28T11:57:26.653Z] 11:57:26 INFO - r15 = 0x00007f92b7141e50 rip = 0x00007f92ddaaa408
[task 2020-12-28T11:57:26.653Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.653Z] 11:57:26 INFO - 2 libxul.so!core::ops::function::Fn::call [function.rs : 70 + 0xc]
[task 2020-12-28T11:57:26.653Z] 11:57:26 INFO - rbx = 0x0000000000000001 rbp = 0x00007f92cb7c29e0
[task 2020-12-28T11:57:26.653Z] 11:57:26 INFO - rsp = 0x00007f92cb7c29e0 r12 = 0x0000000000000001
[task 2020-12-28T11:57:26.653Z] 11:57:26 INFO - r13 = 0x00007f92e083bcb0 r14 = 0x00007f92e2952248
[task 2020-12-28T11:57:26.653Z] 11:57:26 INFO - r15 = 0x00007f92cb7c2a60 rip = 0x00007f92ddaaa04c
[task 2020-12-28T11:57:26.654Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.654Z] 11:57:26 INFO - 3 libxul.so!std::panicking::rust_panic_with_hook [panicking.rs:7eac88abb2e57e752f3302f02be5f3ce3d7adfb4 : 581 + 0x6]
[task 2020-12-28T11:57:26.654Z] 11:57:26 INFO - rbx = 0x0000000000000001 rbp = 0x00007f92e28e4ad8
[task 2020-12-28T11:57:26.654Z] 11:57:26 INFO - rsp = 0x00007f92cb7c29f0 r12 = 0x0000000000000001
[task 2020-12-28T11:57:26.654Z] 11:57:26 INFO - r13 = 0x00007f92e083bcb0 r14 = 0x00007f92e2952248
[task 2020-12-28T11:57:26.654Z] 11:57:26 INFO - r15 = 0x00007f92cb7c2a60 rip = 0x00007f92dfb0c7c6
[task 2020-12-28T11:57:26.654Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.654Z] 11:57:26 INFO - 4 libxul.so!std::panicking::begin_panic_handler::{{closure}} [panicking.rs:7eac88abb2e57e752f3302f02be5f3ce3d7adfb4 : 484 + 0x1b]
[task 2020-12-28T11:57:26.654Z] 11:57:26 INFO - rbx = 0x00007f92cb7c2ad8 rbp = 0x00007f92cb7c2c10
[task 2020-12-28T11:57:26.654Z] 11:57:26 INFO - rsp = 0x00007f92cb7c2a60 r12 = 0x00000d4600000009
[task 2020-12-28T11:57:26.654Z] 11:57:26 INFO - r13 = 0x00007f92b75f8700 r14 = 0x00007f92e28af590
[task 2020-12-28T11:57:26.654Z] 11:57:26 INFO - r15 = 0x0000000000000d46 rip = 0x00007f92dfb0c3c9
[task 2020-12-28T11:57:26.654Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.654Z] 11:57:26 INFO - 5 libxul.so!std::sys_common::backtrace::__rust_end_short_backtrace [backtrace.rs:7eac88abb2e57e752f3302f02be5f3ce3d7adfb4 : 153 + 0x17]
[task 2020-12-28T11:57:26.655Z] 11:57:26 INFO - rbx = 0x00007f92cb7c2ad8 rbp = 0x00007f92cb7c2c10
[task 2020-12-28T11:57:26.655Z] 11:57:26 INFO - rsp = 0x00007f92cb7c2a90 r12 = 0x00000d4600000009
[task 2020-12-28T11:57:26.655Z] 11:57:26 INFO - r13 = 0x00007f92b75f8700 r14 = 0x00007f92e28af590
[task 2020-12-28T11:57:26.655Z] 11:57:26 INFO - r15 = 0x0000000000000d46 rip = 0x00007f92dfb07f08
[task 2020-12-28T11:57:26.655Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.655Z] 11:57:26 INFO - 6 libxul.so!rust_begin_unwind [panicking.rs:7eac88abb2e57e752f3302f02be5f3ce3d7adfb4 : 483 + 0x10]
[task 2020-12-28T11:57:26.655Z] 11:57:26 INFO - rbx = 0x00007f92cb7c2ad8 rbp = 0x00007f92cb7c2c10
[task 2020-12-28T11:57:26.655Z] 11:57:26 INFO - rsp = 0x00007f92cb7c2aa0 r12 = 0x00000d4600000009
[task 2020-12-28T11:57:26.655Z] 11:57:26 INFO - r13 = 0x00007f92b75f8700 r14 = 0x00007f92e28af590
[task 2020-12-28T11:57:26.655Z] 11:57:26 INFO - r15 = 0x0000000000000d46 rip = 0x00007f92dfb0c398
[task 2020-12-28T11:57:26.655Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.655Z] 11:57:26 INFO - 7 libxul.so!core::panicking::panic_fmt [panicking.rs:7eac88abb2e57e752f3302f02be5f3ce3d7adfb4 : 85 + 0x6]
[task 2020-12-28T11:57:26.655Z] 11:57:26 INFO - rbx = 0x0000000000000000 rbp = 0x00007f92cb7c2c10
[task 2020-12-28T11:57:26.656Z] 11:57:26 INFO - rsp = 0x00007f92cb7c2ad0 r12 = 0x00000d4600000009
[task 2020-12-28T11:57:26.656Z] 11:57:26 INFO - r13 = 0x00007f92b75f8700 r14 = 0x00007f92b29a4bd0
[task 2020-12-28T11:57:26.656Z] 11:57:26 INFO - r15 = 0x0000000000000d46 rip = 0x00007f92dfb6a0b1
[task 2020-12-28T11:57:26.656Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.656Z] 11:57:26 INFO - 8 libxul.so!core::option::expect_failed [option.rs:7eac88abb2e57e752f3302f02be5f3ce3d7adfb4 : 1226 + 0x50]
[task 2020-12-28T11:57:26.656Z] 11:57:26 INFO - rbx = 0x0000000000000000 rbp = 0x00007f92cb7c2c10
[task 2020-12-28T11:57:26.656Z] 11:57:26 INFO - rsp = 0x00007f92cb7c2b00 r12 = 0x00000d4600000009
[task 2020-12-28T11:57:26.656Z] 11:57:26 INFO - r13 = 0x00007f92b75f8700 r14 = 0x00007f92b29a4bd0
[task 2020-12-28T11:57:26.656Z] 11:57:26 INFO - r15 = 0x0000000000000d46 rip = 0x00007f92dfb69c93
[task 2020-12-28T11:57:26.656Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.656Z] 11:57:26 INFO - 9 libxul.so!webrender::api_resources::ApiResources::update_blob_image [api_resources.rs:74c33e8ce86d9ff26e2bcf402039b436556f97ed : 195 + 0x19]
[task 2020-12-28T11:57:26.656Z] 11:57:26 INFO - rbx = 0x0000000000000000 rbp = 0x00007f92cb7c2c10
[task 2020-12-28T11:57:26.656Z] 11:57:26 INFO - rsp = 0x00007f92cb7c2b60 r12 = 0x00000d4600000009
[task 2020-12-28T11:57:26.656Z] 11:57:26 INFO - r13 = 0x00007f92b75f8700 r14 = 0x00007f92b29a4bd0
[task 2020-12-28T11:57:26.657Z] 11:57:26 INFO - r15 = 0x0000000000000d46 rip = 0x00007f92dd82b7ae
[task 2020-12-28T11:57:26.657Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.657Z] 11:57:26 INFO - 10 libxul.so!webrender::api_resources::ApiResources::update [api_resources.rs:74c33e8ce86d9ff26e2bcf402039b436556f97ed : 114 + 0x18]
[task 2020-12-28T11:57:26.657Z] 11:57:26 INFO - rbx = 0x00007f92b726ba00 rbp = 0x00007f92cb7c2e80
[task 2020-12-28T11:57:26.657Z] 11:57:26 INFO - rsp = 0x00007f92cb7c2c20 r12 = 0x00007f92b726ba58
[task 2020-12-28T11:57:26.657Z] 11:57:26 INFO - r13 = 0x00007f92b726ba0c r14 = 0x00007f92b726bab0
[task 2020-12-28T11:57:26.657Z] 11:57:26 INFO - r15 = 0x00007f92e05b409c rip = 0x00007f92dd82abe1
[task 2020-12-28T11:57:26.657Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.657Z] 11:57:26 INFO - 11 libxul.so!webrender::render_api::RenderApi::send_transaction [render_api.rs:74c33e8ce86d9ff26e2bcf402039b436556f97ed : 1321 + 0xc]
[task 2020-12-28T11:57:26.657Z] 11:57:26 INFO - rbx = 0x00007f92c22a0d60 rbp = 0x00007f92cb7c3000
[task 2020-12-28T11:57:26.657Z] 11:57:26 INFO - rsp = 0x00007f92cb7c2e90 r12 = 0x0000000000004302
[task 2020-12-28T11:57:26.657Z] 11:57:26 INFO - r13 = 0x00007f92b21df400 r14 = 0x00007f92b29a4ba0
[task 2020-12-28T11:57:26.657Z] 11:57:26 INFO - r15 = 0x0000000000000001 rip = 0x00007f92dd830a57
[task 2020-12-28T11:57:26.657Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.658Z] 11:57:26 INFO - 12 libxul.so!wr_api_send_transaction [bindings.rs:74c33e8ce86d9ff26e2bcf402039b436556f97ed : 2127 + 0x39]
[task 2020-12-28T11:57:26.658Z] 11:57:26 INFO - rbx = 0x00007f92b1f1f380 rbp = 0x00007f92cb7c3150
[task 2020-12-28T11:57:26.658Z] 11:57:26 INFO - rsp = 0x00007f92cb7c3010 r12 = 0x0000000000004302
[task 2020-12-28T11:57:26.658Z] 11:57:26 INFO - r13 = 0x00007f92b21df400 r14 = 0x00007f92b29a4ba0
[task 2020-12-28T11:57:26.658Z] 11:57:26 INFO - r15 = 0x0000000000000001 rip = 0x00007f92dd631240
[task 2020-12-28T11:57:26.658Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.660Z] 11:57:26 INFO - 13 libxul.so!mozilla::layers::WebRenderBridgeParent::SetDisplayList(mozilla::gfx::RectTyped<mozilla::LayoutDevicePixel, float> const&, mozilla::ipc::ByteBuf&&, mozilla::wr::BuiltDisplayListDescriptor const&, nsTArray<mozilla::layers::OpUpdateResource> const&, nsTArray<mozilla::layers::RefCountedShmem> const&, nsTArray<mozilla::ipc::Shmem> const&, mozilla::TimeStamp const&, mozilla::wr::TransactionBuilder&, mozilla::wr::Epoch, bool, bool) [WebRenderBridgeParent.cpp:74c33e8ce86d9ff26e2bcf402039b436556f97ed : 1080 + 0xf]
[task 2020-12-28T11:57:26.660Z] 11:57:26 INFO - rbx = 0x00007f92cb7c3560 rbp = 0x00007f92cb7c3280
[task 2020-12-28T11:57:26.660Z] 11:57:26 INFO - rsp = 0x00007f92cb7c3160 r12 = 0x0000000000004302
[task 2020-12-28T11:57:26.660Z] 11:57:26 INFO - r13 = 0x00007f92b21df400 r14 = 0x00007f92cb7c32d8
[task 2020-12-28T11:57:26.660Z] 11:57:26 INFO - r15 = 0x00007f92cb7c34e4 rip = 0x00007f92daee03ee
[task 2020-12-28T11:57:26.660Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.661Z] 11:57:26 INFO - 14 libxul.so!mozilla::layers::WebRenderBridgeParent::ProcessDisplayListData(mozilla::layers::DisplayListData&, mozilla::wr::Epoch, mozilla::TimeStamp const&, bool, bool) [WebRenderBridgeParent.cpp:74c33e8ce86d9ff26e2bcf402039b436556f97ed : 1115 + 0x1c]
[task 2020-12-28T11:57:26.661Z] 11:57:26 INFO - rbx = 0x00007f92b21df400 rbp = 0x00007f92cb7c3320
[task 2020-12-28T11:57:26.661Z] 11:57:26 INFO - rsp = 0x00007f92cb7c3290 r12 = 0x00007f92cb7c32d8
[task 2020-12-28T11:57:26.661Z] 11:57:26 INFO - r13 = 0x00007f92cb7c3570 r14 = 0x0000000000000000
[task 2020-12-28T11:57:26.661Z] 11:57:26 INFO - r15 = 0x0000000000000001 rip = 0x00007f92daee06cf
[task 2020-12-28T11:57:26.661Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.661Z] 11:57:26 INFO - 15 libxul.so!mozilla::layers::WebRenderBridgeParent::RecvSetDisplayList(mozilla::layers::DisplayListData&&, nsTArray<mozilla::layers::OpDestroy>&&, unsigned long const&, mozilla::layers::BaseTransactionId<mozilla::layers::TransactionIdType> const&, bool const&, mozilla::layers::BaseTransactionId<mozilla::VsyncIdType> const&, mozilla::TimeStamp const&, mozilla::TimeStamp const&, mozilla::TimeStamp const&, nsTString<char> const&, mozilla::TimeStamp const&, nsTArray<mozilla::layers::CompositionPayload>&&) [WebRenderBridgeParent.cpp:74c33e8ce86d9ff26e2bcf402039b436556f97ed : 1162 + 0x12]
[task 2020-12-28T11:57:26.662Z] 11:57:26 INFO - rbx = 0x0000000000000000 rbp = 0x00007f92cb7c33e0
[task 2020-12-28T11:57:26.662Z] 11:57:26 INFO - rsp = 0x00007f92cb7c3330 r12 = 0x0000000000000009
[task 2020-12-28T11:57:26.662Z] 11:57:26 INFO - r13 = 0x0000000000000009 r14 = 0x00007f92cb7c34e0
[task 2020-12-28T11:57:26.662Z] 11:57:26 INFO - r15 = 0x00007f92b21df400 rip = 0x00007f92daee0c5f
[task 2020-12-28T11:57:26.662Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.662Z] 11:57:26 INFO - 16 libxul.so!mozilla::layers::PWebRenderBridgeParent::OnMessageReceived(IPC::Message const&) [PWebRenderBridgeParent.cpp: : 403 + 0x59]
[task 2020-12-28T11:57:26.662Z] 11:57:26 INFO - rbx = 0x00007f92cb7c3498 rbp = 0x00007f92cb7c35f0
[task 2020-12-28T11:57:26.663Z] 11:57:26 INFO - rsp = 0x00007f92cb7c33f0 r12 = 0x00007f92cb7c3470
[task 2020-12-28T11:57:26.663Z] 11:57:26 INFO - r13 = 0x00007f92cb7c3490 r14 = 0x00007f92b7fd5188
[task 2020-12-28T11:57:26.663Z] 11:57:26 INFO - r15 = 0x00007f92b21df400 rip = 0x00007f92dacd6c61
[task 2020-12-28T11:57:26.663Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.663Z] 11:57:26 INFO - 17 libxul.so!mozilla::layers::PCompositorManagerParent::OnMessageReceived(IPC::Message const&) [PCompositorManagerParent.cpp: : 205 + 0xd]
[task 2020-12-28T11:57:26.663Z] 11:57:26 INFO - rbx = 0x00007f92ad5f81e0 rbp = 0x00007f92cb7c3690
[task 2020-12-28T11:57:26.663Z] 11:57:26 INFO - rsp = 0x00007f92cb7c3600 r12 = 0x0000000000000000
[task 2020-12-28T11:57:26.663Z] 11:57:26 INFO - r13 = 0x00007f92b7fd5180 r14 = 0x00007f92c8260c00
[task 2020-12-28T11:57:26.664Z] 11:57:26 INFO - r15 = 0x00007f92c8378f00 rip = 0x00007f92debdc28c
[task 2020-12-28T11:57:26.664Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.664Z] 11:57:26 INFO - 18 libxul.so!mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) [MessageChannel.cpp:74c33e8ce86d9ff26e2bcf402039b436556f97ed : 2077 + 0x7c]
[task 2020-12-28T11:57:26.664Z] 11:57:26 INFO - rbx = 0x0000000000000001 rbp = 0x00007f92cb7c39a0
[task 2020-12-28T11:57:26.664Z] 11:57:26 INFO - rsp = 0x00007f92cb7c36a0 r12 = 0x0000000000000000
[task 2020-12-28T11:57:26.664Z] 11:57:26 INFO - r13 = 0x00007f92c8260cc0 r14 = 0x00007f92c8378fd0
[task 2020-12-28T11:57:26.664Z] 11:57:26 INFO - r15 = 0x00007f92c8378f00 rip = 0x00007f92de0f54c9
[task 2020-12-28T11:57:26.664Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.664Z] 11:57:26 INFO - 19 libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:74c33e8ce86d9ff26e2bcf402039b436556f97ed : 1200 + 0x17b]
[task 2020-12-28T11:57:26.664Z] 11:57:26 INFO - rbx = 0x00007f92b7fd5180 rbp = 0x00007f92cb7c3bd0
[task 2020-12-28T11:57:26.664Z] 11:57:26 INFO - rsp = 0x00007f92cb7c39b0 r12 = 0x00007f92cf3b8030
[task 2020-12-28T11:57:26.664Z] 11:57:26 INFO - r13 = 0x00007f92c8260cc0 r14 = 0x00007f92b7fd5120
[task 2020-12-28T11:57:26.664Z] 11:57:26 INFO - r15 = 0x00007f92c83fe660 rip = 0x00007f92de0ba576
[task 2020-12-28T11:57:26.664Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.665Z] 11:57:26 INFO - 20 libxul.so!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) [MessagePump.cpp:74c33e8ce86d9ff26e2bcf402039b436556f97ed : 302 + 0x2b]
[task 2020-12-28T11:57:26.665Z] 11:57:26 INFO - rbx = 0x0000000000000000 rbp = 0x00007f92cb7c3c50
[task 2020-12-28T11:57:26.665Z] 11:57:26 INFO - rsp = 0x00007f92cb7c3be0 r12 = 0x0000000000000001
[task 2020-12-28T11:57:26.665Z] 11:57:26 INFO - r13 = 0x00007f92cb7c3cb0 r14 = 0x00007f92cc060800
[task 2020-12-28T11:57:26.665Z] 11:57:26 INFO - r15 = 0x00007f92cf3b8030 rip = 0x00007f92deb9b4b8
[task 2020-12-28T11:57:26.665Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.665Z] 11:57:26 INFO - 21 libxul.so!MessageLoop::Run() [message_loop.cc:74c33e8ce86d9ff26e2bcf402039b436556f97ed : 309 + 0xc]
[task 2020-12-28T11:57:26.665Z] 11:57:26 INFO - rbx = 0x00007f92cb7c3c68 rbp = 0x00007f92cb7c3c90
[task 2020-12-28T11:57:26.665Z] 11:57:26 INFO - rsp = 0x00007f92cb7c3c60 r12 = 0x000000000000000a
[task 2020-12-28T11:57:26.665Z] 11:57:26 INFO - r13 = 0x00007f92cc18cdc0 r14 = 0x00007ffd8a009ad0
[task 2020-12-28T11:57:26.665Z] 11:57:26 INFO - r15 = 0x00007f92cf3b8030 rip = 0x00007f92deb79bcf
[task 2020-12-28T11:57:26.665Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.665Z] 11:57:26 INFO - 22 libxul.so!nsThread::ThreadFunc(void*) [nsThread.cpp:74c33e8ce86d9ff26e2bcf402039b436556f97ed : 441 + 0x8]
[task 2020-12-28T11:57:26.665Z] 11:57:26 INFO - rbx = 0x00007f92cb7c3cb0 rbp = 0x00007f92cb7c3e80
[task 2020-12-28T11:57:26.666Z] 11:57:26 INFO - rsp = 0x00007f92cb7c3ca0 r12 = 0x000000000000000a
[task 2020-12-28T11:57:26.666Z] 11:57:26 INFO - r13 = 0x00007f92cc18cdc0 r14 = 0x00007ffd8a009ad0
[task 2020-12-28T11:57:26.666Z] 11:57:26 INFO - r15 = 0x00007f92cf3b8030 rip = 0x00007f92de9dc8cd
[task 2020-12-28T11:57:26.666Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.666Z] 11:57:26 INFO - 23 libnspr4.so!_pt_root [ptthread.c:74c33e8ce86d9ff26e2bcf402039b436556f97ed : 201 + 0x8]
[task 2020-12-28T11:57:26.666Z] 11:57:26 INFO - rbx = 0x0000000000000002 rbp = 0x00007f92cb7c3ed0
[task 2020-12-28T11:57:26.666Z] 11:57:26 INFO - rsp = 0x00007f92cb7c3e90 r12 = 0x00007f92ec364d68
[task 2020-12-28T11:57:26.666Z] 11:57:26 INFO - r13 = 0x00007f92cc18cdc0 r14 = 0x00007f92cb7c4700
[task 2020-12-28T11:57:26.666Z] 11:57:26 INFO - r15 = 0x00007f92ec3b36c0 rip = 0x00007f92ed6ffe01
[task 2020-12-28T11:57:26.666Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.666Z] 11:57:26 INFO - 24 libpthread.so.0 + 0x76ba
[task 2020-12-28T11:57:26.666Z] 11:57:26 INFO - rbx = 0x0000000000000000 rbp = 0x0000000000000000
[task 2020-12-28T11:57:26.666Z] 11:57:26 INFO - rsp = 0x00007f92cb7c3ee0 r12 = 0x0000000000000000
[task 2020-12-28T11:57:26.667Z] 11:57:26 INFO - r13 = 0x00007ffd8a00997f r14 = 0x00007f92cb7c49c0
[task 2020-12-28T11:57:26.667Z] 11:57:26 INFO - r15 = 0x00007f92cc18cdc0 rip = 0x00007f92ed3856ba
[task 2020-12-28T11:57:26.667Z] 11:57:26 INFO - Found by: call frame info
[task 2020-12-28T11:57:26.667Z] 11:57:26 INFO - 25 libc.so.6 + 0x10682d
[task 2020-12-28T11:57:26.667Z] 11:57:26 INFO - rsp = 0x00007f92cb7c3f80 rip = 0x00007f92ec61682d
[task 2020-12-28T11:57:26.667Z] 11:57:26 INFO - Found by: stack scanning

Flags: needinfo?(mconley)

Hi Mike, this started on this merge: https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=d4b21131a094a0819b8856fbf4165cfb32d671c6

Could it be from bug 1651311? Failures are PROCESS-CRASH | tart | application crashed [@ mozglue_static::panic_hook]

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=325490183&repo=mozilla-central&lineNumber=1860

Whiteboard: [geckoview][retriggered][stockwell disable-recommended] → [geckoview][retriggered][stockwell needswork:owner]

Can we make the crash happen earlier to get a better idea of what's going on?

Flags: needinfo?(aosmond)
Flags: needinfo?(aosmond)
Whiteboard: [geckoview][retriggered][stockwell disable-recommended] → [geckoview][retriggered][stockwell needswork:owner]
Flags: needinfo?(mconley)

I apologize for not getting back sooner. Yes, my patch might have caused this to get worse because it re-enabled TART (which had been disabled in https://hg.mozilla.org/integration/autoland/rev/aa09cdc39dad). That change is likely not the source of the failure, but is causing it to trigger more reliably with the test runs.

Flags: needinfo?(aosmond)

I'm actively investigating this now.

Priority: -- → P1
Assignee: nobody → aosmond
Depends on: 1688144
No longer depends on: 1688144
See Also: → 1688144

Update:

There have been 36 failures within the last 7 days:

  • 7 failures on OS X 10.14 WebRender Shippable opt
  • 29 failures on Linux x64 WebRender Shippable opt

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=328340339&repo=mozilla-central&lineNumber=2856

Flags: needinfo?(aosmond)

Andrew are there any updates here?

Flags: needinfo?(aosmond)
See Also: → 1553522

Bug 1621887 might be a similar bug.

See Also: → 1621887

I hit this crash almost every day (in 32-bit Firefox Nightly on Windows 10).

bp-149f1f59-5737-4d7f-82d2-5ed7f0210203

MOZ_CRASH Reason: Attempt to update non-existent blob image

Depends on: 1690821
Depends on: 1690857

I've landed new diagnostic asserts but the latest crash report from Chris has shed some new potential light on the problem:

From the critical log I can see:

[G274][GFX1-]: SSP:Add 176093660498 init (t=218368)
[G275][GFX1-]: DataSourceSurface of SharedSurfaces does not exist for extId:176093660498 (t=218368)

If we fail to map the surface into the compositor process memory space, that would indeed cause the problem we see.....the other crash reports don't have that first print but the buffer seems to be completely filled often so it might have gotten truncated (at least the ones I've looked at).

Functionally that crash report would be treated as an OOM had it happened in other places.

Taking stock of the aggregation of critical log errors, paints a different story that what we've seen before. Most of them don't refer to imagelib's shared surfaces, they actually log font issues:

[0][GFX1-]: Failed to initialize shared font list, falling back to in-process list.

|[0][GFX1-]: Failed sanitizing font 331 (t=58.2875)

In particular,

https://crash-stats.mozilla.org/report/index/18ef1b5a-95b1-46b8-83cb-908300210201

|[0][GFX1-]: Failed sanitizing font 5 (t=5331.69) |[1][GFX1-]: TOpAddRawFont failed (t=5331.69)

is fascinating because I would have expected this to result in an IPC_FAIL return to the IPDL protocol ::Recv handling it came in on, as well as bailing early on an UpdateResources call. If we kept receiving messages instead of bailing, that could easily trigger the blob image key missing path.

Okay, so the relationship with imagelib is here:

https://searchfox.org/mozilla-central/rev/f982032c7c7618c626165bb557968f478a1952dc/gfx/layers/wr/WebRenderBridgeParent.cpp#558

If we fail to create the imagekey mapping, we bail early on the UpdateResources. Similar issues with fonts. Probably the blob image creation is done here, and we never get to it.

I think this was coded with the expectation that UpdateResources failing would cause the IPDL instance to be destroyed / the content process possibly crash, but obviously that didn't happen here despite the IPC_FAIL return value to the IPDL call.

I was too focused on trying to understand how the resource management could have failed on the imagelib side, keeping the surfaces alive, etc and not considering a mapping into memory failure. The true cause of the crash (bailing early on UpdateResources, but still continuing on) would have been obvious if I wasn't fixated on that.

As of this moment, it isn't immediately obvious what the fix is, but I now have a decent grasp of what the most likely cause of the majority of the crashes is at least.

Depends on: 1691016

Bug 1691016 will likely make this crash signature go away, unless there are multiple root causes. It doesn't solve the crash. Solving the crash is difficult given most appear to be process memory space/OOM related. It probably shows up today without WebRender with very different signatures.

Flags: needinfo?(aosmond)

If the crashes in nightly don't go away entirely between the three assert bugs I have landed/will land, then the remaining will require further investigation.

Very interesting -- bug 1691025 tells us there are actually multiple root causes then, because we used a bad key.

Depends on: 1691065

(In reply to Intermittent Failures Robot from comment #45)

44 failures in 3966 pushes (0.011 failures/push) were associated with this bug in the last 7 days.

This is the #22 most frequent failure this week.

** This test has failed more than 150 times in the last 21 days. It should be disabled until it can be fixed. **

Repository breakdown:

  • autoland: 10
  • mozilla-central: 21
  • try: 5
  • mozilla-beta: 8

Platform and build breakdown:

  • macosx1014-64-shippable-qr: 8
    • opt: 8
  • linux64-shippable-qr: 36
    • opt: 36

For more details, see:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?bug=1645841&startday=2021-02-01&endday=2021-02-07&tree=all

Please don't disable these tests. Bug 1691065 should have fixed (some of) the issues we see in CI.

Depends on: 1691475
See Also: → 1691309
Whiteboard: [geckoview][retriggered][stockwell disable-recommended] → [geckoview][retriggered][stockwell needswork:owner]

It appears the intermittent test failure is solved now, it went down to 0 after bug 1691475 and bug 1691065 got uplifted.

See Also: → 1691556
No longer blocks: gfx-triage
Priority: P1 → P3

I hit this crash (and similar WR crash signatures) on Windows almost every day. Is there any information you would like about my computer? Or debugging steps you'd like me to take?

bp-913aac17-eabd-4205-879f-aa0b20210223

(In reply to Chris Peterson [:cpeterson] from comment #49)

I hit this crash (and similar WR crash signatures) on Windows almost every day. Is there any information you would like about my computer? Or debugging steps you'd like me to take?

bp-913aac17-eabd-4205-879f-aa0b20210223

Your crash reports usually seem to be hitting this code path:

https://searchfox.org/mozilla-central/rev/f47a4b67643b3048ef9a2e2ac0c34edf6d1ebff3/gfx/layers/SourceSurfaceSharedData.cpp#45

We failed to map shared memory in the compositor process, and then it later crashes as a result. Most of the time this will be because we are out of address space. The vast majority of crash reports seeing SSP:Add init failures in the gfx critical error log are 32-bit hosts -- Windows x86 and Android ARM, which corroborates that since we are far more likely to see OOMs there. As well, all of the top signatures with similar gfx critical error logs are OOMs in various places:

https://crash-stats.mozilla.org/search/?graphics_critical_error=~SSP%3AAdd&date=%3E%3D2021-02-16T18%3A05%3A00.000Z&date=%3C2021-02-23T18%3A05%3A00.000Z&_facets=signature&_facets=cpu_arch&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature

Images are particularly vulnerable here because we need to map in the images for all the content processes into the compositor process space in order to use them. Due to file handle limits on some systems, we never unmap the memory until the buffer is released by the content process (and any short lived dependencies in the compositor process after that). I don't think Windows has that limit, so I'm not sure if heroics are possible here to unmap images for content processes that aren't currently displayed....

One question that has come up in discussions with jrmuizel, is do we forward OOM events (virtual memory) to the content processes to free up memory. In theory this could help alleviate (although not solve) the problem if it isn't happening. We do it from parent process to content.

Depends on: 1694480

(In reply to Chris Peterson [:cpeterson] from comment #49)

I hit this crash (and similar WR crash signatures) on Windows almost every day. Is there any information you would like about my computer? Or debugging steps you'd like me to take?

bp-913aac17-eabd-4205-879f-aa0b20210223

Let me know if bug 1694480 reduces the frequency at all for you. It made it into 20210225215504 and later.

(In reply to Andrew Osmond [:aosmond] from comment #52)

Let me know if bug 1694480 reduces the frequency at all for you. It made it into 20210225215504 and later.

Good news! Since comment 52 10 days ago, I have not seen this crash signature once, whereas I used to see it almost every day.

However, I still did hit webrender::resource_cache::ResourceCache::update_image_template bug 1553522 twice. I don't know if that is related.

bp-78f28399-df80-4cbb-a7c1-dcf9f0210303
bp-24bf14f2-7897-495c-9a2f-71e1e0210304

Flags: needinfo?(aosmond)

(In reply to Chris Peterson [:cpeterson] from comment #53)

(In reply to Andrew Osmond [:aosmond] from comment #52)

Let me know if bug 1694480 reduces the frequency at all for you. It made it into 20210225215504 and later.

Good news! Since comment 52 10 days ago, I have not seen this crash signature once, whereas I used to see it almost every day.

However, I still did hit webrender::resource_cache::ResourceCache::update_image_template bug 1553522 twice. I don't know if that is related.

bp-78f28399-df80-4cbb-a7c1-dcf9f0210303
bp-24bf14f2-7897-495c-9a2f-71e1e0210304

Same root cause as for the reports I looked at in comment 50. So same crash more or less, but reduced frequency. We need to re-architect how we do image decoding to fix this this, but it won't happen on any existing roadmap. I don't think there is any low hanging fruit left to further reduce the crash rate.

Flags: needinfo?(aosmond)
URL: 1699224
URL: 1699224
See Also: → 1699224
You need to log in before you can comment on or make changes to this bug.