Closed Bug 1274887 Opened 8 years ago Closed 5 years ago

Crash in Fold

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1547561
Tracking Status
firefox47 --- wontfix
firefox48 --- wontfix
firefox49 --- wontfix
firefox-esr45 --- wontfix
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- wontfix
firefox59 --- wontfix

People

(Reporter: n.nethercote, Assigned: jorendorff)

References

Details

(Keywords: crash, csectype-wildptr, sec-high, Whiteboard: [#jsapi:crashes-retriage])

Crash Data

This bug was filed from the Socorro interface and is 
report bp-ae9c8549-8a52-4ae8-b7ab-79fd22160521.
=============================================================

This is currently the #254 topcrash on 46.0, and there have been over 600 occurrences across all versions in the past 7 days. Despite no bug having been filed previously, I suspect it's an old bug because we have crashes in many versions; 26 is the lowest I saw from the past 7 days.

In the cases I looked at, the crash was always on this line:

> switch (pn->getKind()) 

or this line:

> MOZ_CRASH("shouldn't reach here");

but there may be other lines in crash reports I didn't look at.

jorendorff, any ideas?
Flags: needinfo?(jorendorff)
First I tried looking at stacks to (sort of) figure out what kind of grammar causes the crash.

But it happens with all sorts of different grammar. One report crashes in FoldForInOrOf inside a JS try block; another crashes folding JS function call arguments.

I also took a peek at other threads, but in the reports I checked, they weren't doing anything relevant.

Presumably we crash in constant folding because that's the first place where we touch every node of a tree after parsing. The bug is in the parser. It's building a bad tree. Unfortunately the number of ways a parse node could get messed up is effectively infinite.

The most likely thing I can think of, given the variety of crash signatures, is that freeTree is involved - we freed a node that is still pointed-to by some other part of the AST, then re-allocated that node as something else. Even this is not super likely.
Do we have any kind of sanity-checking pass for the AST? And if we did, would it help if the problem was prematurely freed nodes?
(In reply to Nicholas Nethercote [:njn] from comment #2)
> Do we have any kind of sanity-checking pass for the AST?

No. What sort of thing do you have in mind?

I filed bug 1275292 to get drop freeTree. It's a lot of code for not much benefit.
Flags: needinfo?(jorendorff) → needinfo?(n.nethercote)
(In reply to Jason Orendorff [:jorendorff] from comment #3)
> (In reply to Nicholas Nethercote [:njn] from comment #2)
> > Do we have any kind of sanity-checking pass for the AST?
> 
> No. What sort of thing do you have in mind?

I've been wondering implementing detailed sanity checks ("extended assertions") all over Firefox. You'd trigger them manually or programmatically (so fuzzers could request them) and each sub-component would then (probably via nsObserverService) check its own data structure for integrity.

But the AST may be transient, so it might not fit in well with that model.
Flags: needinfo?(n.nethercote)
Yes, it's transient. It's also all used, so if it's buggy you'll either have crashed already, or the bug will be immortalized in a later artifact, like bytecode.

Bytecode (and JSScripts generally) we could sanity-check. Want to file a bug for it?
(In reply to Jason Orendorff [:jorendorff] from comment #5)
> Yes, it's transient. It's also all used, so if it's buggy you'll either have
> crashed already, or the bug will be immortalized in a later artifact, like
> bytecode.

Well, ok, but wouldn't it be nice to try to catch any errors earlier? E.g. we have IR-checking passes in the JITs.

> Bytecode (and JSScripts generally) we could sanity-check. Want to file a bug
> for it?

Sure: bug 1276097.
Crash volume for signature 'Fold':
 - nightly (version 50): 12 crashes from 2016-06-06.
 - aurora  (version 49): 38 crashes from 2016-06-07.
 - beta    (version 48): 1025 crashes from 2016-06-06.
 - release (version 47): 2516 crashes from 2016-05-31.
 - esr     (version 45): 63 crashes from 2016-04-07.

Crash volume on the last weeks:
             Week N-1   Week N-2   Week N-3   Week N-4   Week N-5   Week N-6   Week N-7
 - nightly          0          2          3          2          3          1          0
 - aurora           5          5         14          3          4          3          2
 - beta           192        136        175        161        136        126         44
 - release        398        333        411        398        375        343        132
 - esr             12          3          5          5         12          6          4

Affected platforms: Windows, Mac OS X, Linux
Crash volume for signature 'Fold':
 - nightly (version 51): 1 crash from 2016-08-01.
 - aurora  (version 50): 26 crashes from 2016-08-01.
 - beta    (version 49): 546 crashes from 2016-08-02.
 - release (version 48): 566 crashes from 2016-07-25.
 - esr     (version 45): 105 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly       0       0       0
 - aurora        7       9       1
 - beta        200     146      45
 - release     175     140      85
 - esr          13       8      11

Affected platforms: Windows, Mac OS X, Linux

Crash rank on the last 7 days:
           Browser   Content     Plugin
 - nightly #506
 - aurora  #492      #55
 - beta    #92       #36
 - release #139      #23
 - esr     #581
Crash volume for signature 'Fold':
 - nightly (version 52): 1 crash from 2016-09-19.
 - aurora  (version 51): 1 crash from 2016-09-19.
 - beta    (version 50): 214 crashes from 2016-09-20.
 - release (version 49): 702 crashes from 2016-09-05.
 - esr     (version 45): 142 crashes from 2016-06-01.

Crash volume on the last weeks (Week N is from 10-03 to 10-09):
            W. N-1  W. N-2
 - nightly       1       0
 - aurora        1       0
 - beta        152      62
 - release     527     173
 - esr           8      13

Affected platforms: Windows, Mac OS X, Linux

Crash rank on the last 7 days:
           Browser   Content     Plugin
 - nightly #651
 - aurora  #1024
 - beta    #194      #43
 - release #193      #39
 - esr     #916
Too late for firefox 52, mass-wontfix.
Blocks: 1429566
https://crash-stats.mozilla.com/signature/?signature=Fold&date=%3E%3D2017-12-01T22%3A59%3A00.000Z&date=%3C2018-01-12T22%3A59%3A58.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports

Most of the crashes on Windows are EXCEPTION_ACCESS_VIOLATION_READ with a random address or 0xffffffffffffffff. 

I assume we should make that bug a security bug?
Group: core-security
OS: Windows 7 → All
Hardware: Unspecified → All
(In reply to Henrik Skupin (:whimboo) from comment #12)
> Most of the crashes on Windows are EXCEPTION_ACCESS_VIOLATION_READ with a
> random address or 0xffffffffffffffff. 
> 
> I assume we should make that bug a security bug?

The exception_breakpoint crashes are troubling:  MOZ_CRASH("shouldn't reach here");
https://hg.mozilla.org/releases/mozilla-release/annotate/c548334d172b/js/src/frontend/FoldConstants.cpp#l1919

The couple EXCEPTION_ACCESS_VIOLATION_WRITE are almost certainly vulnerabilities, and the fact that they're at different places (PNK_IF and PNK_NAME) might indicate a more general corruption.

One of the WRITE violations is writing to a stack variable -- how does that happen? Maybe crash-stats is reporting the wrong location, or some tricky optimization is hiding under there.
https://hg.mozilla.org/releases/mozilla-release/annotate/afa87f9be3a8/js/src/frontend/FoldConstants.cpp#l1638

The random read addresses are troubling. The ones I looked at were on the "switch (pn->getKind())" line which would imply the pn structure (or pn itself?) is bad. on Win64 0xffffffffffffffff itself just means crash-stats hit a value that was too big to be a valid pointer; you need to look at the "raw dump" tab to make progress. Sometimes it'll be clear one or more of the register values contains our UAF marker (a clear vulnerability), but not in this case. On 64-bit builds the crashing value (when it's reasonable) is the value of rbx, and when it's 0xffffffffffffffff the value of rbx is a "too large" value (but not the UAF marker). On 32-bit windows the crash address seems to be the rsi address with no offset.

Seems fair to call this a potential vulnerability because something is corrupted.
Group: core-security → javascript-core-security
Priority: -- → P1
Steve: please find an appropriate assignee for this bug (or keep it for yourself if you don't have anyone at the moment). We need an assigned responsible person for severe security bugs.
Assignee: nobody → sdetar
Steve, can you help us find someone to assign this to?
Flags: needinfo?(sdetar)
Jason, do you know who would be a good person to continue making progress on this bug?
Flags: needinfo?(sdetar) → needinfo?(jorendorff)
My current plan is, in order of increasing desperation:
1. Land bug 1275292 and see if that perturbs things.
2. Add a Nightly-only sanity-check pass over the parse tree. This wouldn't be so hard, and it would be more fuzzer fodder.
3. Consider adding more information to the crash report.
Taking, leaving n-i?me.
Assignee: sdetar → jorendorff
Bug 1275292 has landed.
Flags: needinfo?(jorendorff)
Flags: needinfo?(jorendorff)
The problem with messups in Fold is it's too late, sort of -- we have a corrupt ParseNode and don't know anything about its ostensible context.  I would investigate this by changing Fold to return a tristate { Success, Error, Corrupt }, then make the caller, if returned a Corrupt, dump some info about itself into crash data and then crash.

Also bug 1275292 comment 2 casting shade at freeTree being to "aggressively" "verify" has it all wrong.  :-)  This bug -- these crashes -- are *exactly* why that verification is good, 'cause without the freeing would we definitely know there's some problem afoot?  I don't think it's a certainty we would.  The more we take steps that depend upon tree integrity, the more warning we're going to get, earlier, when something goes wrong.
Verification is good. Real verification of parse trees would look like comment 18 step 2, or the `{ Success, Error, Corrupt }` option you propose (since constant folding is optional, I slightly prefer a separate DEBUG-only pass). We can check that ParseNode pointers actually point into the arena, that atoms are real atoms--it sounds good.

freeTree() did not actually "verify" anything, except by guaranteeing that if we *did* use-after-free a ParseNode, it was likely to have been recycled already, and thus likely to lead to undefined behavior. As the bard wrote:

And some kind of help is the kind of help
That helping's all about
And some kind of help is the kind of help
We all can do without

(Shel Silverstein)
Since beta 61, there is no longer as many crashes in beta. (only 1 in 61.0b3)
Maybe we should backport Bug 1275292 to ESR 60 ?
4 crashes on 61 or 62, total, and all are Linux nullptr crashes.  See nbp's comment 23 - should we backport to ESR60?
(Jason, Waldo)?
Flags: needinfo?(jwalden+bmo)
This bug is weird. Our reading of crashstats is that this has never happened frequently in beta. Only in release. We think maybe it happens on a specific web site that basically isn't used by our typical beta user. (URL data does not support this theory, but doesn't rule it out IMO.)

That would mean comment 23 is wrong, there was no change; and we shouldn't expect any change until 61 ships to release. Then we'll see.
Stalled. For FF61, the signature was "static bool Fold". Crash rate hasn't changed.
Crash Signature: [@ Fold] → [@ Fold] [@ static bool Fold]
Flags: needinfo?(jwalden+bmo)
Keywords: stalled
Whiteboard: #jsapi:crashes-retriage]
Whiteboard: #jsapi:crashes-retriage] → [#jsapi:crashes-retriage]
I'll note (to comment 25) that it's definitely hitting in beta (at about 1/10th of release crash rate); and even a 64a1 crash already.

This has morphed into bug 1547561. The signatures here only occur prior to 67.

Duping forward to the new bug, which has recent activity.

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(jorendorff)
Resolution: --- → DUPLICATE
Group: javascript-core-security
You need to log in before you can comment on or make changes to this bug.