Open Bug 1826257 Opened 2 years ago Updated 3 months ago

Crash in [@ libc.so | mozilla::image::SurfaceFilter::AdvanceRow ]

Categories

(Core :: Graphics: ImageLib, defect)

Unspecified
Android
defect

Tracking

()

Tracking Status
firefox-esr128 --- affected
firefox126 --- wontfix
firefox127 --- wontfix
firefox128 --- affected
firefox129 --- affected
firefox130 --- affected

People

(Reporter: RyanVM, Unassigned)

References

Details

(Keywords: crash, topcrash)

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/e158d570-766d-46ee-ae0d-4d5ec0230403

Reason: SIGSEGV / SEGV_MAPERR

Top 10 frames of crashing thread:

0  libc.so  libc.so@0x1abc0  
1  libxul.so  mozilla::image::SurfaceFilter::AdvanceRow  image/SurfacePipe.h:128
1  libxul.so  mozilla::image::SwizzleFilter<mozilla::image::BlendAnimationFilter<mozilla::image::SurfaceSink> >::DoAdvanceRowFromBuffer  image/SurfaceFilters.h:98
2  libxul.so  mozilla::image::SurfaceFilter::AdvanceRow  image/SurfacePipe.h:141
2  libxul.so  mozilla::image::SurfaceFilter::WriteBuffer<unsigned int>  image/SurfacePipe.h:300
2  libxul.so  mozilla::image::SurfacePipe::WriteBuffer<unsigned int>  image/SurfacePipe.h:705
2  libxul.so  mozilla::image::nsPNGDecoder::WriteRow  image/decoders/nsPNGDecoder.cpp:851
3  libxul.so  MOZ_PNG_push_proc_row  media/libpng/pngpread.c
3  libxul.so  MOZ_PNG_proc_IDAT_data  media/libpng/pngpread.c:879
4  libxul.so  MOZ_PNG_push_read_IDAT  media/libpng/pngpread.c:755

I think this might be related to bug 1753060.

See Also: → 1753060
Severity: -- → S3
Crash Signature: [@ libc.so@0x1abc0 | mozilla::image::SurfaceFilter::AdvanceRow] → [@ libc.so@0x1abc0 | mozilla::image::SurfaceFilter::AdvanceRow] [@ libc.so | mozilla::image::SurfaceFilter::AdvanceRow ]
Summary: Crash in [@ libc.so@0x1abc0 | mozilla::image::SurfaceFilter::AdvanceRow] → Crash in [@ libc.so | mozilla::image::SurfaceFilter::AdvanceRow ]

Bug 1895527 means that all the stacks for this crash tend to coalesce on a smaller number of signatures which I've added. I may have misinterpreted the results from our bit-flip detection heuristic in bug 1753060. It's true that in some signatures it triggers a lot, but the addresses that we detect as potential bit-flips are very close to a very large allocation, so it looks more like we've overflown a large buffer than an actual bit-flip. We know that the bit-flip detection logic can give false positives in this case and the Android crashes are even more similar to potential overflows as they often happen on a page boundary. We should look into this again.

Crash Signature: [@ libc.so@0x1abc0 | mozilla::image::SurfaceFilter::AdvanceRow] [@ libc.so | mozilla::image::SurfaceFilter::AdvanceRow ] → [@ libc.so@0x1abc0 | mozilla::image::SurfaceFilter::AdvanceRow] [@ libc.so | mozilla::image::SurfaceFilter::AdvanceRow] [@ libc.so | mozilla::image::SurfaceFilter::ResetToFirstRow] [@ memcpy | mozilla::image::BlendAnimationFilter<T>::DoAdvanceRow] [@ …

Note: the crash isn't spiking, it's just a signature change, the volume hasn't changed much over time.

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 AArch64 and ARM crashes on release

:tnikkel, could you consider increasing the severity of this top-crash bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(tnikkel)
Keywords: topcrash

Looks like we have some new libc crash signatures on Android starting a few months ago.

Crash Signature: [@ libc.so@0x1abc0 | mozilla::image::SurfaceFilter::AdvanceRow] [@ libc.so | mozilla::image::SurfaceFilter::AdvanceRow] [@ libc.so | mozilla::image::SurfaceFilter::ResetToFirstRow] [@ memcpy | mozilla::image::BlendAnimationFilter<T>::DoAdvanceRow] [@ … → [@ libc.so | mozilla::image::SurfaceFilter::AdvanceRow] [@ libc.so | mozilla::image::SurfaceFilter::ResetToFirstRow] [@ libc.so@0x1abc0 | mozilla::image::SurfaceFilter::AdvanceRow] [@ libc.so@0x52ba0 | mozilla::image::SurfaceFilter::AdvanceRow] [@ lib…
Flags: needinfo?(tnikkel)

I think I may have found something useful for diagnosing the bug. Many comments mention one or more of these three things: watching video, Firefox being very slow (probably due to swapping) and the UI briefly flashing white. This lead me to check the contents of the GraphicsCriticalError annotation and practically all the crashes I've looked at have this error:

CompositorBridgeChild receives IPC close with reason=AbnormalShutdown

So IIUC the GPU process crashed and the crash we're experiencing here is likely fallout from this issue.

Crash Signature: [@ libc.so | mozilla::image::SurfaceFilter::AdvanceRow] [@ libc.so | mozilla::image::SurfaceFilter::ResetToFirstRow] [@ libc.so@0x1abc0 | mozilla::image::SurfaceFilter::AdvanceRow] [@ libc.so@0x52ba0 | mozilla::image::SurfaceFilter::AdvanceRow] [@ lib… → [@ libc.so | mozilla::image::SurfaceFilter::AdvanceRow] [@ libc.so | mozilla::image::SurfaceFilter::ResetToFirstRow] [@ memcpy | mozilla::image::BlendAnimationFilter<T>::DoAdvanceRow] [@ memcpy | mozilla::image::BlendAnimationFilter<T>::DoResetToFirstR…

This seems to have spiked around the time of Fx126 shipping in case that points to anything obvious.

Spiked exactly the same time as bug 1903810, which apparantly only occurs after the gpu process has been disabled ( https://bugzilla.mozilla.org/show_bug.cgi?id=1907135#c2 )

See Also: → 1903810

I think that's probably a coincidence - My assumption is that the spike in bug 1903810 was probably due to a software update. We had previously encountered the same crash on different samsung devices in bug 1868825, which also required the GPU process to have been disabled

Comment 6 here linked this bug to a gpu process crash. Could whatever caused that gpu process crash be behind the gpu process getting disabled for bug 1903810?

GPU process crashes will happen for a large variety of reasons, and a single crash will not cause the GPU process to be disabled.

It seems we do have an issue with the GPU process being disabled too frequently, and whilst in the background so without the user noticing. My hunch is that whilst the app is in the background the process is repeatedly getting launched for some reason, then killed by the OS to free resources. This was probably responsible for the vast majority of cases in bug 1903810. We're tracking that in bug 1907135.

Comment 6 indicates here users are seeing the GPU process crash whilst the app is in the foreground, so likely due to a genuine crash as opposed to an OS kill. A quick glance at some crash reports in these signatures shows that the launch count is low and that the GPU process is still enabled. So it seems to be related to the GPU process crashing, but not being disabled.

Could we perhaps be attempting to write to a shmem that has become invalid following a GPU process crash?

You need to log in before you can comment on or make changes to this bug.