Closed Bug 1489744 Opened Last year Closed Last year

Startup crash in prefs_parser::Parser::get_token

Categories

(Core :: Preferences: Backend, defect, critical)

60 Branch
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla64
Tracking Status
firefox-esr60 62+ verified
firefox62 --- wontfix
firefox63 --- verified
firefox64 --- verified

People

(Reporter: philipp, Assigned: njn)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is
report bp-94518b24-5877-43de-b0e5-b65f50180509.
=============================================================

Top 10 frames of crashing thread:

0 xul.dll std::panicking::rust_panic_with_hook src/libstd/panicking.rs:583
1 xul.dll std::panicking::begin_panic<alloc::string::String> src/libstd/panicking.rs:538
2 xul.dll std::panicking::begin_panic_fmt src/libstd/panicking.rs:522
3 xul.dll core::panicking::panic_fmt src/libcore/panicking.rs:71
4 xul.dll core::panicking::panic_bounds_check src/libcore/panicking.rs:58
5 xul.dll prefs_parser::Parser::get_token modules/libpref/parser/src/lib.rs:701
6 xul.dll prefs_parser::Parser::error_and_recover modules/libpref/parser/src/lib.rs:532
7 xul.dll prefs_parser::prefs_parser_parse modules/libpref/parser/src/lib.rs:144
8 xul.dll Parser::Parse modules/libpref/Preferences.cpp:1119
9 xul.dll mozilla::openPrefFile modules/libpref/Preferences.cpp:3634

=============================================================

this startup crash signature is showing up since firefox 60, so it's looking related to the preference parser rewrite from bug 1423840. the reports are commonly showing up with a moz crash reason field like "index out of bounds: the len is 267 but the index is 267".

the crash is much more prevalent for esr users than on release, which suggests it might be triggered by some config choices that are prevalent in an enterprise environment (also the comment to this report says the crash is going away with a new profile).

this signature is the #2 top browser crash in current data for 60.2.0esr accounting for 3.28% of all browser crashes - though it's hitting individual installations repeatedly (~6 crashes per install).
Flags: needinfo?(n.nethercote)
Unfortunately the stack traces aren't making much sense, particularly the
most important frames within the prefs parser. Perhaps some of the addresses
are incorrect. So I don't know which frames to trust, which makes things
difficult.

For example, frame 7 in the abovementioned crash report is in
`prefs_parser_parse`, and the frame points to this line:

> assert!(buf.last() == Some(&EOF));

That suggests that this assertion is failing. But there are a couple more
frames within the prefs parser below that one, which doesn't make sense. Nor
does it make sense that the crash report error message would be "index out of
bounds". So that frame is probably wrong, and probably should point to the
`parser.parse()` call three lines later.

Similarly, the last stack frame before the bounds error functions points to
this line:

> return info.token;

which doesn't make sense, because there is no slice access on that line. So
that frame is probably wrong, too, and it's not clear to me what the correct
line of code would be.
Assignee: nobody → n.nethercote
Flags: needinfo?(n.nethercote)
Mike: the new prefs parser hasn't exhibited a single known problem since it shipped in FF60. But it's hitting this crash regularly now that ESR is on 60. So there must be something particular about preference files in ESR releases that is different to non-ESR releases. Can you think of anything?

libpref does try to parse any .js file in the "PrfDef" directory, so I wonder if autoconfig.js is involved.

If I could get my hands on a pref file that triggers the crash I'm confident I could diagnose and fix this in about five minutes. Without that, and with the messed-up stack traces described in comment 1, I'm groping around in the dark.
After staring at the code for a while, I have identified one bug: if a malformed token consisting of a single '/' char occurs as the last char of a pref data file, the parser will abort in the same manner as the crashes. E.g. this will trigger it:

> pref("my.pref", 123); /

I don't know how or why multiple ESR users would have trailing '/' chars like this in their pref data files. Also, the code paths involved in this case aren't all that close to the frames in the crashes' stack traces. But it's the only lead I have for now.

I will post my patch once my try run passes. The parser currently has a subtle invariant that must be maintained when dealing with EOFs. This '/' case is the only place I can see that violates the invariant. However, my patch eliminates the invariant, so there's a chance it'll fix the crashes even if they are caused by something other than the '/' problem.
Currently, if a get_char() call returns EOF, the index moves beyond the
buffer's bounds and get_char() cannot be called again without triggering a
panic. As a result, everywhere that encounters an EOF and then does subsequent
parsing ungets the EOF... except there was one place that failed to do that:
the match case for CharKind::Slash in get_token(). This meant that a single '/'
at the end of the input could trigger a bounds violation.

This EOF-unget requirement is subtle and easy to get wrong, so this patch
eliminates it. get_char() now can be called repeatedly after an EOF, and will
return EOF on each subsequent call. This means that some of the existing
unget_char() calls can be removed. Some others are still necessary to get line
numbers correct in error messages, but the outcome of mishandled cases is now
much less drastic -- an incorrect line number in an error message instead of an
abort.

The patch also clarifies a couple of related comments.
Attachment #9007676 - Flags: review?(mh+mozilla)
Mike: I forgot to needinfo you on comment 2.
Flags: needinfo?(mozilla)
I wrote a very simple fuzzer that just throws random ASCII strings at the parser. It found the trailing '/' problem instantly but hasn't found any other problems.
It's interesting how almost 100% of the crashes have the crash reason of "index out of bounds: the len is 267 but the index is 267". I.e. they all have the same length. I wonder if all those files have the same contents, and if so, what those contents might be.
> I wrote a very simple fuzzer that just throws random ASCII strings at the parser.
> It found the trailing '/' problem instantly but hasn't found any other problems.

After making the fuzzer a bit smarter I found one other problem: bug 1490115.
So I'm guess at some point someone put comments in their autoconfig.js file?

What would have happened in that case with the old parser?
Flags: needinfo?(mozilla)
Also, note that I haven't heard any talk of this on the lists...
Comments are handled fine, both // and /* */ styles. This bug only triggers if (a) last char in the file is a '/', and (b) that '/' is the start of a new token.

As for the old parser, it had the curious and dubious and possibly unintentional ability to skip over a wide variety of junk chars (including '/') between valid prefs, so it would have just skipped any trailing '/'.
Attachment #9007676 - Flags: review?(mh+mozilla) → review+
Pushed by nnethercote@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5097db630d1f
Fix a bounds violation crash in the prefs parser. r=glandium
Comment on attachment 9007676 [details] [diff] [review]
Fix a bounds violation crash in the prefs parser


[ESR Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: #2 ESR topcrash, and it's a startup crash to boot. I'm not 100% certain that this patch fixes the crash, due to a lack of steps to reproduce and some dodgy stack traces in crash reports. But it's the only explanation I can come up with, and fuzzing corroborates it (see below).

User impact if declined: Firefox won't start for some users, even in safe mode. A new profile must be made to make the problem go away.

Fix Landed on Version: 64

Risk to taking this patch (and alternatives if risky): Low. The change is small, and the code is in Rust. I wrote a small fuzzer to stress the prefs parser. This fuzzer reproduced this crash immediately, and found one other obscure problem (bug 1490115), and nothing beyond that.

String or UUID changes made by this patch: None.

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.


Beta Approval Request Comment
[Feature/Bug causing the regression]: bug 1423840.

[User impact if declined]: possible startup crashes, though in practice it's only affecting ESR users, for unclear reasons.

[Is this code covered by automated tests?]: yes.

[Has the fix been verified in Nightly?]: not yet.

[Needs manual test from QE? If yes, steps to reproduce]: no.

[List of other uplifts needed for the feature/fix]: none.

[Is the change risky?]: No.

[Why is the change risky/not risky?]: See ESR approval request above.

[String changes made/needed]: None.
Attachment #9007676 - Flags: approval-mozilla-esr60?
Attachment #9007676 - Flags: approval-mozilla-beta?
https://hg.mozilla.org/mozilla-central/rev/5097db630d1f
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
If we need to dot release ESR 60.2, I'd consider taking this as a ride-along.
Comment on attachment 9007676 [details] [diff] [review]
Fix a bounds violation crash in the prefs parser

Approving for 63.0b6 for the time-being while we decide what to do for ESR60.
Attachment #9007676 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 9007676 [details] [diff] [review]
Fix a bounds violation crash in the prefs parser

Fixes a common startup crash for ESR60 users migrating from ESR52. Approved for ESR 60.2.1.
Attachment #9007676 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
ESR 60.2.1 clearly shows that this patch was effective in resolving the crashes.
\o/

Thank you for the verification. I'd love to see the specific files that people have that were triggering this crash... they'll now give syntax errors instead of crashing, but right at the end of the file where it doesn't matter, because the preceding data will get processed anyway.
You need to log in before you can comment on or make changes to this bug.