Closed Bug 1415151 Opened 2 years ago Closed 1 year ago

Crash in av1_temporal_filter_apply_sse2 (but appears to be an OOM in rust code)

Categories

(Toolkit :: Crash Reporting, defect, critical)

Unspecified
Windows 10
defect
Not set
critical

Tracking

()

RESOLVED WONTFIX

People

(Reporter: marcia, Unassigned)

References

Details

(Keywords: crash)

Crash Data

This bug was filed from the Socorro interface and is 
report bp-2ac57400-3b59-478a-931d-e9ff80171014.
=============================================================

Seen while looking at nightly crash data: http://bit.ly/2Aq7EuT. 48 crashes/20 installations

One comment:

Firefox Nighly58 64bit is the only Firefox I use and... I'm the only one who uses it at all. And I only use it on a hp laptop windows 10 creators update. And this is the first time I crashed on the url today. And I'm still using the setting from the WebRender newsletter#5.
This looks like a stylo crash.
Component: Graphics: WebRender → CSS Parsing and Computation
(Note that although the user comment quoted in comment 0 indicates WebRender was probably enabled, this crash happens without WebRender as well, because the crash report linked in comment 0 has "WR? WR-" in the app notes which means WebRender is off)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #2)
> (Note that although the user comment quoted in comment 0 indicates WebRender
> was probably enabled, this crash happens without WebRender as well, because
> the crash report linked in comment 0 has "WR? WR-" in the app notes which
> means WebRender is off)

So this looks like a crash in rust code, but I can't make any sense of the stack traces at all.

Here's one that doesn't look stylo related, for example: https://crash-stats.mozilla.com/report/index/a24c752c-6e44-4f0b-960a-d664a0171107
So there are a handful of them in stylo code, and a handful others in WR code, so I suspect stylo or WR are not to blame per se, but something deeper...

Kats, do you happen to have any idea of something like this?
Flags: needinfo?(bugmail)
Hm, interesting. I haven't seen one like this before. CC'ing Gankro and Jeff in case they have any thoughts on the WR stack.

It looks like the crashes are all on Windows, all on 58.0 nightly, and started on the 20171014100219 buildid. I believe that gives us a regression range of https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a31334a65a1c&tochange=d43c1c0fa038 which includes the rust update to 1.20.0.
Flags: needinfo?(bugmail)
Also FWIW the av1_temporal_filter_apply_sse2 symbol comes from http://searchfox.org/mozilla-central/rev/7e090b227f7a0ec44d4ded604823d48823158c51/third_party/aom/av1/encoder/x86/temporal_filter_apply_sse2.asm#28 and doesn't seem to have been touched recently. I also don't see any aom library updates around the time the crashes started, so it's probably unrelated to the aom library and that symbol is just getting randomly blamed.
Ah! It looks like the "OOM | unknown | alloc::oom::oom | style::properties::StyleStructRef<T>::mutate<T>" signature stopped at around the same time - see the buildid list at [1]. This OOM also affected windows only, but goes back to 56 and 57 as well. So most likely the signature just morphed as a result of the rust compiler update.

[1] https://crash-stats.mozilla.com/signature/?version=58.0a1&signature=OOM%20%7C%20unknown%20%7C%20alloc%3A%3Aoom%3A%3Aoom%20%7C%20style%3A%3Aproperties%3A%3AStyleStructRef%3CT%3E%3A%3Amutate%3CT%3E&date=%3E%3D2017-08-07T02%3A22%3A04.000Z&date=%3C2017-11-06T23%3A22%3A04.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#aggregations
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #7)
> Ah! It looks like the "OOM | unknown | alloc::oom::oom |
> style::properties::StyleStructRef<T>::mutate<T>" signature stopped at around
> the same time

Are you saying this is a crash reporter issue? Correctly reporting this crash as an OOM (if it's actually an OOM) sounds like the bug here. Pointing at random assembly sources is very bad (if it's a reporting issue) but is catastrophic if Firefox executes a random instruction on OOM.
Flags: needinfo?(bugmail)
I suspect that this is a case of the former (it's a reporting issue) rather than Firefox actually executing a random instruction. The rust folks might have a better handle on what's going on here. Moving to crash reporting per Ted's suggestion on IRC.
Component: CSS Parsing and Computation → Crash Reporting
Flags: needinfo?(bugmail)
Product: Core → Toolkit
Summary: Crash in av1_temporal_filter_apply_sse2 → Crash in av1_temporal_filter_apply_sse2 (but appears to be an OOM in rust code)
The rest of the stack of the crash in comment 0 looks pretty sensible, so I don't think it's completely broken or anything, just frame 0 is labeled wrong somehow. Clicking through to the source link in frame 1 (I wouldn't recommend it, properties.rs is *huge*) shows that that line is:
            StyleStructRef::Vacated => panic!("Accessed vacated style struct")

So this is an intentional panic in stylo code. Rust panics do manifest as `EXCEPTION_ILLEGAL_INSTRUCTION`, so that's sensible. The only issue is why the top frame is mislabeled. This could be due to an issue with Rust debug symbols, certainly.
Duplicate of this bug: 1404633
Closing because no crash reported since 12 weeks.
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → WONTFIX
Closing because no crash reported since 12 weeks.
You need to log in before you can comment on or make changes to this bug.