Closed Bug 1252034 Opened 4 years ago Closed 4 years ago

crash in js::jit::BuildDominatorTree

Categories

(Core :: JavaScript Engine: JIT, defect, critical)

46 Branch
Unspecified
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox45 --- wontfix
firefox46 --- wontfix
firefox47 - wontfix
firefox48 --- fixed
firefox49 --- fix-optional
firefox-esr45 --- affected
firefox50 --- fix-optional

People

(Reporter: cbook, Assigned: nbp)

References

(Blocks 1 open bug, )

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

also crashes windows opt nightly builds -> https://crash-stats.mozilla.com/report/index/7e94f420-f73b-4b2d-8abc-b2b6f2160229

nbp, jandem told me you was working on some related last week.
Blocks: 532972
Flags: needinfo?(nicolas.b.pierron)
This issue is more than likely caused by Bug 1221872.
Do we know if this signature is also present in aurora, beta, esr?
Group: javascript-core-security
Flags: needinfo?(cbook)
Flags: needinfo?(cbook)
Philipp, I hear the website this happens on is being heavily advertised in Germany right now, so we may see this pop up on SUMO.
(In reply to Nicolas B. Pierron [:nbp] from comment #2)
> This issue is more than likely caused by Bug 1221872.
> Do we know if this signature is also present in aurora, beta, esr?

beta and aurora opt does not seem to crash, maybe this is because Bug 1238592 was not uplifted so far. Maybe this is a regression or depend somehow as you mentioned on that bug. Trunk Debug Builds just exits on the link of the live-site - no assertion failure or so. Windbg stack etc is on the way
Attached file windbg stack
Ok, I managed to reproduce the issue on a nightly build and under rr.
So far the issue seems to be a null-deref issue caused by the reset of the immediateDominator.

The problem I see is that we have no fixup block before a loop header.  Thus, the number of predecessors of the loopHeader block is 1, which is the backedge of the loop.

When we run IntersectDominators (under ComputeImmediateDominators), the immediate dominator of the loop header is set to its backedge, while its backedge has not been visited yet, thus its immediate dominator is still nullptr.

This issue should not be exploitable.
(rr) bt
[…]
#2 in js::jit::MBasicBlock::removePredecessorWithoutPhiOperands at jit/MIRGraph.cpp:1431
#3 in js::jit::MBasicBlock::removePredecessor at jit/MIRGraph.cpp:1445
#4 in js::jit::RemoveUnmarkedBlocks at jit/IonAnalysis.cpp:1967
#5 in js::jit::ValueNumberer::cleanupOSRFixups at jit/ValueNumbering.cpp:1125
#6 in js::jit::ValueNumberer::run at jit/ValueNumbering.cpp:1217

Ok, the problem seems to be that we removed one of the last non-backedge predecessor of the loop under cleanupOSRFixups (Bug 1238592), which causes the buildDominatorTree failure described in comment 7.
9:35.87 INFO: Last good revision: ccaa937df50d72e8945ccea449bae0b8a6017d49
9:35.87 INFO: First bad revision: 4504452a90fff486e8ffc212f13ce39d31438653
9:35.87 INFO: Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=ccaa937df50d72e8945ccea449bae0b8a6017d49&tochange=4504452a90fff486e8ffc212f13ce39d31438653

9:36.92 INFO: Looks like the following bug has the changes which introduced the regression:
https://bugzilla.mozilla.org/show_bug.cgi?id=1238592

so yeah its confirmed, regression from bug 1238592
Keywords: regression
Duplicate of this bug: 1252120
This is a null-deref on nightly only.  The null pointer is coming from a reset, and due to earlier mutation of the graph, one block is missing to prevent the intersectDominators function to walk back the backedge.

We can safely open this bug, as this is not a security issue.
Group: javascript-core-security
I'm contemplating fixing this by completely disabling UCE in functions containing OSR entries.
From what I understand, from my debugging session, and from the test case from Bug 1252120, is that we have multiple loops.

An outer loop above which we add a fixup block, a middle loop with nothing special, and an inner loop which has the OSR block.  My guess is that we then remove edges (I have not yet checked the mir graph of the test case) to the backedge of the outer loop, thus making the outer loop dead.

When we clean fixup blocks, we get rid of this one, as it is no longer reachable but are faced with the middle loop which has no fixup blocks, but is reachable through its backedge.

I think we can replace the logic we have today by something which is much simpler.  Today we add fixup blocks only when we are removing edges.  What we can do instead, is always add fixup block to self-dominated loops.  As these loops are self-dominated, the addition of the fixup block should not changed the domination relation, if we remove the backedge of a loop, then this will change the domination relation, but it is safe to keep the fixup block longer and remove it later with the clean-up phase.  Ideally we should remove it when we remove the backedge, but this is should not be a big deal.

What do you think?
Flags: needinfo?(sunfish)
This patch should properly handlea all the cases involving the removal of
basic blocks by eagerly inserting fixup blocks on top of all loops which are
reachable from the OSR entry point.

As we add an extra predecessor to some loop headers, we have to weaken some
of the assertion, only when we are under Value Numbering.

The cleaupOSRFixup function now has to handle extra cases, where the fixup
block was not necessary, and ensure that we remove the fixup block if the
original loop predecessor is still reachable.

I think this simplifies the handling of OSR, as we do not conditionally add
the fixup blocks, but we conditionally remove them when they are not
necessary. Having extra fixup blocks remaining in the graph, should be
caught in debug builds, and be safer in optimized build.
Attachment #8725681 - Flags: review?(sunfish)
Flags: needinfo?(sunfish)
Flags: needinfo?(nicolas.b.pierron)
Blocks: 1238592
Comment on attachment 8725681 [details] [diff] [review]
Value Numbering: Unconditionally generate fixup blocks.

Review of attachment 8725681 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/IonAnalysis.cpp
@@ +2574,5 @@
>  
>          if (block->isLoopHeader()) {
> +            if (underValueNumberer && block->numPredecessors() == 3)
> +                // Fixup block.
> +                MOZ_ASSERT(block->getPredecessor(1)->numPredecessors() == 0);

This should also MOZ_ASSERT(graph.osrBlock()), which will help ensure that other code doesn't accidentally start depending on it, and will help serve as documentation that this is yet another OSR-induced complication.

::: js/src/jit/MIRGraph.h
@@ +428,5 @@
>          MOZ_ASSERT(numPredecessors() >= 2);
> +        if (numPredecessors() == 2)
> +            return true;
> +        if (numPredecessors() == 3) // fixup block added by ValueNumbering phase.
> +            return getPredecessor(1)->numPredecessors() == 0;

I'm not really happy about this awkward special case here, but I don't currently have any better suggestions.
Attachment #8725681 - Flags: review?(sunfish) → review+
[Tracking Requested - why for this release]:
This bug got introduced by Bug 1238592.
https://hg.mozilla.org/mozilla-central/rev/a794e56f887d
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Nicolas, I was looking at the crash-stats on this signature. For 28 days worth of crash data, this has occurred only twice. Based on that data and the size of the patch, I do not feel the need to uplift this to Fx47. Do you agree?
Flags: needinfo?(nicolas.b.pierron)
(In reply to Ritu Kothari (:ritu) from comment #19)
> Nicolas, I was looking at the crash-stats on this signature. For 28 days
> worth of crash data, this has occurred only twice. Based on that data and
> the size of the patch, I do not feel the need to uplift this to Fx47. Do you
> agree?

I guess this set of issues has been in the tree for a long time, and skipping a few versions should not be dramatic.

This patch is a simplification of the current mechanism that we have today. Thus, this would probably be the good first thing to backport if we see again such issues in the future.
Flags: needinfo?(nicolas.b.pierron)
(In reply to Nicolas B. Pierron [:nbp] from comment #20)
> (In reply to Ritu Kothari (:ritu) from comment #19)
> > Nicolas, I was looking at the crash-stats on this signature. For 28 days
> > worth of crash data, this has occurred only twice. Based on that data and
> > the size of the patch, I do not feel the need to uplift this to Fx47. Do you
> > agree?
> 
> I guess this set of issues has been in the tree for a long time, and
> skipping a few versions should not be dramatic.
> 
> This patch is a simplification of the current mechanism that we have today.
> Thus, this would probably be the good first thing to backport if we see
> again such issues in the future.

Thanks! I will resolve this as wontfix for 47. We can reopen it if the crash signature spikes in 47.
Assignee: nobody → nicolas.b.pierron
Version: unspecified → 46 Branch
Crash volume for signature 'js::jit::BuildDominatorTree':
 - nightly (version 50): 3 crashes from 2016-06-06.
 - aurora  (version 49): 7 crashes from 2016-06-07.
 - beta    (version 48): 140 crashes from 2016-06-06.
 - release (version 47): 357 crashes from 2016-05-31.
 - esr     (version 45): 14 crashes from 2016-04-07.

Crash volume on the last weeks:
             Week N-1   Week N-2   Week N-3   Week N-4   Week N-5   Week N-6   Week N-7
 - nightly          0          0          0          0          1          1          1
 - aurora           1          1          0          2          0          2          0
 - beta            20         19         16         20         18         27          5
 - release         56         55         60         49         52         48         17
 - esr              0          1          2          4          0          0          2

Affected platforms: Windows, Mac OS X, Linux
Volume reduced significantly with Nicolas' patch. Marking fix-optional for 49 and 50. If this spikes please revert.
You need to log in before you can comment on or make changes to this bug.