Closed Bug 1404050 Opened 8 years ago Closed 8 years ago

stylo: Crash in cssparser::tokenizer::consume_name on family 6 model 61 stepping 4 (Broadwell-U) with old microcode (<=0x19)

Categories

(Core :: CSS Parsing and Computation, defect, P3)

58 Branch
Unspecified
Windows 8
defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
relnote-firefox --- 57+
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 --- unaffected
firefox57 --- affected
firefox58 --- affected

People

(Reporter: calixte, Unassigned)

References

Details

(Keywords: crash)

Crash Data

This bug was filed from the Socorro interface and is report bp-1f4e5f2f-7945-4b86-90f7-db36b0170928. ============================================================= There are 39 crashes in nightly 57-58 starting with buildid 20170903100443. There are 4 crashes in beta 57.
Hrm, no clear correlations in the crash reports. It seems to be crashing around next_byte_unchecked, which might suggest we're reading past the end somehow, but it's really hard to say exactly what's going on without analyzing a minidump.
Priority: -- → P3
These crash reports started showing up on september 3rd, so it seems plausible it's a regression from bug 1354989. Henri/jdm, thoughts?
Flags: needinfo?(josh)
Flags: needinfo?(hsivonen)
Crashing in next_byte_unchecked due to reading past the end would be surprising, since we check whether the tokenizer has reached eof right before matching on the next byte: https://github.com/servo/rust-cssparser/blob/bd213ed25eab0fd61094c556a6326fcf1240ad9c/src/tokenizer.rs#L918-L921
Flags: needinfo?(josh)
As Josh said, next_byte_unchecked is "unchecked" in the sense that it returns u8 rather than Option<u8>, but it panics if called at EOF. In that sense it is different from "unchecked" in the standard library where it is bound checking that’s not done, so incorrect use of [T]::get_unchecked could cause a segfault. (Maybe we should rename next_byte_unchecked to something else.) But it is still possible for next_byte_unchecked to segfault if the &str that was originally given to csssparser is not valid. At the FFI boundary, Servo_StyleSheet_FromUTF8Bytes receives (*const u8, usize), trusts them, and uses std::slice::from_raw_parts and std::str::from_utf8_unchecked. So a bug introduced in bug 1354989 is plausible.
The Windows crash location info looks bogus. The Linux crash location info looks to me like the &self of the &str might have gotten corrupted. The call to mozilla::css::Loader::ParseSheet() (which in turn calls Servo_StyleSheet_FromUTF8Bytes()) relies on XPCOM strings converting to Span correctly. Per bug 1403083, it seems that bug 1393230 partially broke the conversions but that change landed a few days later than these crash reports started. Also, it those conversions being broken were the cause, I'd expect things to be much more broken much more clearly. (In reply to Simon Sapin (:SimonSapin) from comment #4) > But it is still possible for next_byte_unchecked to segfault if the &str > that was originally given to csssparser is not valid. At the FFI boundary, > Servo_StyleSheet_FromUTF8Bytes receives (*const u8, usize), trusts them, and > uses std::slice::from_raw_parts and std::str::from_utf8_unchecked. So a bug > introduced in bug 1354989 is plausible. Do we have any knowledge of input that triggers this bug? cargo-fuzz hasn't found cases of encoding_rs producing invalid output, so if the bug is in encoding_rs, it isn't an easy-to-find one.
Flags: needinfo?(hsivonen)
(In reply to Henri Sivonen (:hsivonen) from comment #5) > Do we have any knowledge of input that triggers this bug? cargo-fuzz hasn't > found cases of encoding_rs producing invalid output, so if the bug is in > encoding_rs, it isn't an easy-to-find one. In addition to <link rel=stylesheet> cases that involve encoding_rs, there are stacks that involve <style>, which goes through UTF-16 to UTF-8 conversion provided by XPCOM strings (i.e. encoding_rs not participating).
I have fuzzed encoding_rs more, and no new bugs have been found on the decode code path. (On bug was found on the encode code path, but that's irrelevant to this bug.) Also, a bug in encoding_rs wouldn't explain the <style> case. I guess one thing we could do is temporarily changing std::str::from_utf8_unchecked to UTF-8 validation to see if the timing in coincidental and there's a parser bug that was introduced around the time bug 1354989 landed.
(In reply to Henri Sivonen (:hsivonen) from comment #6) > there > are stacks that involve <style>, which goes through UTF-16 to UTF-8 > conversion provided by XPCOM strings (i.e. encoding_rs not participating). It looks like XPCOM string conversions deal gracefully with bogus UTF-16, so that's not the cause, but that not being the cause makes the <style> crashes all the more interesting.
Also: Why aren't we seeing any Mac crashes?
And why are there crashes on illegal instruction on both Windows and Linux?
(In reply to Henri Sivonen (:hsivonen) from comment #10) > And why are there crashes on illegal instruction on both Windows and Linux? All the Linux crashes have CPU info: family 6 model 61 stepping 4 | 1. All the Windows crashes have CPU info: family 6 model 61 stepping 4 | 4.
(In reply to Henri Sivonen (:hsivonen) from comment #11) > All the Linux crashes have CPU info: family 6 model 61 stepping 4 | 1. > All the Windows crashes have CPU info: family 6 model 61 stepping 4 | 4. It appears that those numbers mean various i3-xxxxU, i5-xxxxU and i7-xxxxU-branded Broadwell-U CPUs. This looks weird and concerning. We have a small number of crashes, but can they all be Broadwell-U by chance?
I sent a PI request about fuzzing encoding_rs on Broadwell-U.
Observations: * This continues to happen only with family 6 model 61 stepping 4 | N where N is 1 on Linux and N is 4 or 2 on Windows. (I don't now what N, i.e. the number after the vertical bar, means.) This family/model/stepping seems to mean Broadwell-U. * No occurrences on Mac, even though "Early 2015" MacBook Pros and MacBook Airs use Broadwell-U. * It looks like we don't collect microcode revision info on Mac. * The highest microcode revision this crash occurs with is 0x19. * Other crash reports for the same family/model/stepping/verticalbar show that there have been many microcode revisions after 0x19. * Bug 1401060 is a panic where the panic message shows an out-of-bounds index whose value doesn't look like it could plausibly arise from a bug in either encoding_rs or in mozilla::Span. I'm confident enough to guess that the explanation is that this is a Broadwell-U CPU bug that was fixed/mitigated in microcode revision 0x1A and later and Macs consistently have new-enough microcode that I'm resolving this as WONTFIX. Users who encounter this bug should unblock system updates so that they get newer Intel microcode via Windows Update or their Linux distro.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
Summary: stylo: Crash in cssparser::tokenizer::consume_name → stylo: Crash in cssparser::tokenizer::consume_name on family 6 model 61 stepping 4 (Broadwell-U) with old microcode (<=0x19)
Filed bug 1410841 about adding a SUMO article about microcode updates.
Release Note Request (optional, but appreciated) > [Why is this notable]: Firefox 57+ crash that can be remedied by user action (but not by Mozilla), and the fix will likely make the system less crashy in general. > [Affects Firefox for Android]: No > [Suggested wording]: On Windows and Linux, Firefox crashes occasionally on Intel Broadwell-U processors with old microcode. Windows users should ensure Windows Update is set to install updates. Linux users should ensure that the distribution package for Intel microcode is installed. > [Links (documentation, blog post, etc)]: https://support.mozilla.org/en-US/kb/broadwell-u-microcode/
relnote-firefox: --- → ?
Added to the release notes using Henri's wording. Thanks
You need to log in before you can comment on or make changes to this bug.