Closed Bug 1871069 Opened 1 year ago Closed 1 year ago

Crash in [@ core::slice::<T>::copy_from_slice] on Linux

Categories

(Core :: Graphics: WebRender, defect, P3)

Unspecified
Linux
defect

Tracking

()

RESOLVED FIXED
123 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox121 --- unaffected
firefox122 --- unaffected
firefox123 --- fixed

People

(Reporter: Sylvestre, Assigned: sotaro)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression)

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/2909ff05-38fb-4fc4-9a7c-a60c10231220

Reason: SIGSEGV / SEGV_MAPERR

Top 10 frames of crashing thread:

0  libc.so.6  __memcpy_evex_unaligned_erms  sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:265
1  libxul.so  core::intrinsics::copy_nonoverlapping  library/core/src/intrinsics.rs:2687
1  libxul.so  core::slice::<impl [T]>::copy_from_slice  library/core/src/slice/mod.rs:3619
1  libxul.so  webrender::device::gl::TextureUploader::upload  gfx/wr/webrender/src/device/gl.rs:4693
2  libxul.so  webrender::renderer::upload::upload_to_texture_cache  gfx/wr/webrender/src/renderer/upload.rs:180
2  libxul.so  webrender::renderer::Renderer::update_texture_cache  gfx/wr/webrender/src/renderer/mod.rs:1933
3  libxul.so  webrender::renderer::Renderer::render_impl  gfx/wr/webrender/src/renderer/mod.rs:1478
4  libxul.so  webrender::renderer::Renderer::render  gfx/wr/webrender/src/renderer/mod.rs:1235
5  libxul.so  wr_renderer_render  gfx/webrender_bindings/src/bindings.rs:619
6  libxul.so  mozilla::wr::RendererOGL::UpdateAndRender  gfx/webrender_bindings/RendererOGL.cpp:190
OS: Unspecified → Linux
Summary: Crash in [@ core::slice::<T>::copy_from_slice] → Crash in [@ core::slice::<T>::copy_from_slice] on Linux
Blocks: wr-linux

I can't interpret the assembly in memmove, and the source pointer is to the start of that function so it's not obvious which clause we are failing. Based on the qualifier further up the stack of of copy_nonoverlapping it makes me wonder if we are supplying an overlapping data pointer and if we could add any checks here to ensure that we aren't. Glenn, what do you think?

Severity: -- → S3
Flags: needinfo?(gwatson)
Priority: -- → P3

I think we might be dealing with several different crashes under this signature. The nightly crashes all have in common that they seem to happen right past the end of a buffer, as if they were overflowing. In one case the buffer has a name: anon_inode:i915.gem. Note that the buffer is not always writable, in some crashes it's not accessible but it is mapped, as if something had changed its access permissions either before or right after the crash. This doesn't change the nature of the crash as the address we're accessing is always one past the end of the buffer.

See Also: → 1218607

My PC was crashing from this very often. it's random, I can't find any STR. Maybe it's because I had gfx.webrender.all set to true, I did that for some other testing a long time ago and never removed it. Otherwise I don't know anything that could help. Sometimes it happens when I'm using firefox, sometimes it happens when I'm not using firefox. It doesn't matter which page I look at. Maybe it's which sites I have loaded since it started this evening but firefox had been fine while the sun was up ;-).

I have:
NVidia RTS 2060
Ryzen 3900X
Linux Mint
Firefox Nightly.

gfx.webrender.all made no difference.

(In reply to Gabriele Svelto [:gsvelto] from comment #2)

I think we might be dealing with several different crashes under this signature.

There is definitely a recent nightly regression that is getting mixed up with pre-existing release and ESR volume. According to nightly volume aggregated over build ID the recent regression has most likely started with build ID 20231219231600 since that is when we've started to receive continuous volume:

6 	20231028092407 	1 	3.33 %
7 	20231124214933 	1 	3.33 %
8 	20231215050055 	1 	3.33 %
3 	20231219231600 	3 	10.00 %
2 	20231220041048 	4 	13.33 %
5 	20231220221923 	2 	6.67 %
4 	20231221052522 	3 	10.00 %
1 	20231221170215 	14 	46.67 %
9 	20231222164453 	1 	3.33 %

As far as I can tell the nightly crashes occur because we are providing to memcpy a source buffer that is not (or no longer) mapped. Given the call stack I think that would mean that our update_list contains a texture that we want to upload for which the source buffer is not (or no longer) mapped.

These two considerations make the two recent commits from bug 1829026 highly suspicous to me since they are the only addition from 20231219152636 to 20231219231600 and one of them talks about removing waiting texture IDs which I guess could lead to "no longer mapped", so I'm setting it as regressor but feel free to change that if incorrect.

The nightly crashes all have in common that they seem to happen right past the end of a buffer, as if they were overflowing. In one case the buffer has a name: anon_inode:i915.gem. Note that the buffer is not always writable, in some crashes it's not accessible but it is mapped, as if something had changed its access permissions either before or right after the crash. This doesn't change the nature of the crash as the address we're accessing is always one past the end of the buffer.

Maybe what's after the end of the buffer you see used to be the start of another buffer? (the crashing addresses are always page-aligned)

Keywords: regression
Regressed by: 1829026

Set release status flags based on info from the regressing bug 1829026

Duplicate of this bug: 1871978
Flags: needinfo?(lsalzman)

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 desktop browser crashes on nightly

:gw, could you consider increasing the severity of this top-crash bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(gwatson)
Keywords: topcrash
Flags: needinfo?(lsalzman)
Flags: needinfo?(gwatson)

:lsalzman, since you are the author of the regressor, bug 1829026, could you take a look?

For more information, please visit BugBot documentation.

Flags: needinfo?(lsalzman)
Flags: needinfo?(lsalzman)
Depends on: 1872522

If the patch at fault is "Bug 1829026 - Remove waiting texture ids if nothing uses them.", then bug 1872522 should fix it. To be clear, it will not fix any pre-existing crashes here prior to that patch, since clearly this particular crash also has a conflated cause that preexists any of my work, though some of my work might have caused a spike. To make things simpler, I separated out bug 1872522 to address just that fix.

It seems like Sotaro's work in bug 1868928 improved this. The builds that include that patch aren't showing up in new nightly reports.

See Also: → 1868928

Sotaro, it really looks like bug 1868928 almost entirely fix this. Can you guess why?

Flags: needinfo?(sotaro.ikeda.g)

(In reply to Lee Salzman [:lsalzman] from comment #12)

Sotaro, it really looks like bug 1868928 almost entirely fix this. Can you guess why?

Sorry, I am not sure about the reason.

Flags: needinfo?(sotaro.ikeda.g)

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash

We don't have crashes in 123 beta for this signature, can we mark this bug as fixed by bug 1868928? Thanks

Status: NEW → RESOLVED
Closed: 1 year ago
Depends on: 1868928
Resolution: --- → FIXED
Assignee: nobody → sotaro.ikeda.g
Target Milestone: --- → 123 Branch
You need to log in before you can comment on or make changes to this bug.