Closed Bug 1238592 Opened 8 years ago Closed 8 years ago

Crash at xul!js::InlineSpaghettiStackIterator<js::jit::MStoreToRecover>::InlineSpaghettiStackIterator<js::jit::MStoreToRecover>+0x00000012)

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla47
Tracking Status
firefox44 --- wontfix
firefox45 --- wontfix
firefox46 + fixed
firefox47 + verified
firefox-esr38 --- wontfix
firefox-esr45 46+ verified

People

(Reporter: cbook, Assigned: sunfish)

References

()

Details

(Keywords: crash, sec-moderate, Whiteboard: [adv-main46+][adv-esr45.1+]required to fix bug 1221872 fully)

Attachments

(8 files, 6 obsolete files)

26.17 KB, text/plain
Details
187.93 KB, text/plain
Details
1.68 KB, application/x-javascript
Details
1.54 MB, image/png
Details
973.44 KB, image/png
Details
4.51 KB, patch
Details | Diff | Splinter Review
8.43 KB, patch
Details | Diff | Splinter Review
8.29 KB, patch
nbp
: review+
Details | Diff | Splinter Review
crash found via bughunter and reproduced on a win7 debug build based on todays m-c tip

steps to reproduce:
--> Load http://fashion.aeonsquare.net/tomicakoutu/
--> Crash on Load 

xul!js::InlineSpaghettiStackIterator<js::jit::MStoreToRecover>::InlineSpaghettiStackIterator<js::jit::MStoreToRecover>+0x00000012)
Attached file bughunter crash stack
jan is this something for you again ?
Flags: needinfo?(jdemooij)
Forwarding to nbp because of recover instructions.

FWIW, I think I've seen similar signatures on crash-stats.
Flags: needinfo?(jdemooij) → needinfo?(nicolas.b.pierron)
dupe of bug 1221872 ?
See Also: → 1221872
Group: core-security → javascript-core-security
(In reply to Bob Clary [:bc:] from comment #4)
> dupe of bug 1221872 ?

Probably, I will check if I can reproduce this on my computer.
Ok, I am able to reproduce this issue (unfortunately not under rr yet), the issue is a near null dereference caused by the lack of MResumePoint while lowering the MRegExp instruction.

When assignSafepoint is used, both the mir()->resumePoint() and the lastResumePoint_ are null.  This suggest that the block did not had any MResumePoint.  Only a few code path can produce such basic blocks, either the Split-Edge phase (Bug 1186006) or the Value Numbering (Bug 1221872).

Normally, we have a work-around in the Lowering to propagate the lastResumePoint_ from the previous block, but this does not work if the block has multiple predecessors.

I will try to extract this function and build a shell test case.
"near null dereference" might be safe-ish, depends on how variable the "near" bit is.
Keywords: sec-moderate
Attached file minimal html test case (so far) (obsolete) —
So far I am stuck with this HTML test case, I will try to emulate all the DOM functions, and see if I can build a JS shell test case.

This code is mostly coming from jQuery 1.2.6.
Attached file js shell test case
I can reproduce this issue with this test case on x64 and with a debug build while running with --no-threads.

Note, using --ion-eager, or even --baseline-eager cause the test to spin indefinitely.

The problem seems to be related to the Value Numbering phase. (Bug 1221872)
I will try Bug 1221872 patch tomorrow, and see what is missing to fix the issue.
Flags: needinfo?(nicolas.b.pierron)
Attachment #8707527 - Attachment is obsolete: true
Ok, the problem seems to comes from the fact that Value Numbering is removing blocks as it goes and not as a later phase.

In the following test minimal case (x64 debug build, running --no-threads):


function find(c, unbox, t) {
    if (typeof t != "string") return; // (1)
    while (true) { // (2)
        while (t == "" && !inJit());
        c--;
        if (c == 0) {
            while (unbox) { // (3)
                if (inIon()) return;
            }
        }
    }
};
find(1, {}, '');


What happens for reasons which I yet have to investigate, the condition (1) if infered as always being false.  This leaves the MIRGraph in 2 separated graphs, because of the OSR entry point into (3).  As we have an OSR block, and that it can loop back into the beginning of the while loop (2), we inster a ficup OSR block before the loop (2).  This fixup block is here to keep the domination order clean, and prevents big graphs re-ordering around the OSR block.

Later during Value Numbering, we infer that the condition of (3) will always be true-ish (= object), and thus the exit path of the while loop (1) no longer exists.

The problem here is that the fixup block which was made to keep the messed-up immediate dominator tree caused by the 2 entry points, no longer work as expected and dominates the outer loop (2) which can conditionally go into the inner loop (3) which can only return.

Thus, LICM is able to move invariant code (t == "") in the fixup-block, which does not have any resume point.
Flags: needinfo?(sunfish)
Ok, I have been thinking for a while about this issue and making up test cases.

I think one solution to this problem would be to record all the fixup block that are added to the graph during the Value Numbering phase, and at the end of the Value Numbering iteration, we should:

  1) Recompute the dominator tree(s).
  2) Remove any block which is only dominated by fixup blocks.

Also note that it is possible to have more than one fixup block in the MIRGaph.

I will make a bunch of test cases, now that I am getting familiar with the trickiness of hitting this code path.

(In reply to Nicolas B. Pierron [:nbp] from comment #12)
> What happens for reasons which I yet have to investigate, the condition (1)
> is inferred as always being false.

It is probably being inferred as being false, as we never recorded any call to "typeof" as we only entered the function while executing the interpreter.

I managed to reproduce the same behaviour by using the UCEfault-like pattern, and by calling the replaced function over-and-over to force the inlining of the replaced function.


var f = function () {
  f = function () { return true; };
  return false;
};

function find(c, unbox, t) {
    if (f()) return;
    while (true) {
        while (t == "" && !inJit());
        c--;
        if (c == 0) {
            while (unbox) {
                if (f() && inIon()) return;
            }
        }
    }
};
find(1, {}, '');
Attachment #8709947 - Flags: review?(sunfish)
(not working yet)

I am on a dead end at trying to reverse engineer the current code, in order
to reuse it.  I really do not want to add extra branches in the current
code, as I am not sure to grasp all corner cases.

I undertsand that the initial goal was to do only one traversal of the MIR
Graph, but now I think it would be easier understand and potentially more
efficient if we were to only flag Unreachable basic block as such, instead of
removing them from the graph, at the expense of additional reruns.

Now I am more thinking about re-using the code made for PruneUnusedBranches
second phase, to implement the removal of basic blocks.

Sunfish, as you might know this code better, would you take over this patch?
Attachment #8709952 - Flags: feedback?(sunfish)
Attached patch bug1238592.patch (obsolete) — Splinter Review
Here's a new patch which implements the same idea slightly differently.

In short, after GVN completely done, if any OSR fixup blocks have been created, do a full mark and sweep of the CFG and delete any completely unreachable blocks, other than OSR fixup blocks for reachable loops.

This fixes the reduced testcase, and is a conceptually simpler patch. What do you think?
Flags: needinfo?(sunfish)
Attachment #8710179 - Flags: review?(nicolas.b.pierron)
Attachment #8709952 - Flags: feedback?(sunfish)
Attachment #8709947 - Flags: review?(sunfish)
Assignee: nobody → sunfish
Comment on attachment 8710179 [details] [diff] [review]
bug1238592.patch

Review of attachment 8710179 [details] [diff] [review]:
-----------------------------------------------------------------

Sounds good, Thanks :)
Have you tried with the test cases that are attached on this bug?  I am confident this should work but better safe than sorry.

::: js/src/jit/ValueNumbering.cpp
@@ +1121,5 @@
> +        }
> +    }
> +
> +    // And sweep.
> +    return RemoveUnmarkedBlocks(mir_, graph_, numMarked);

RemoveUnamrkedBlocks does not flag the operands of all instructions as having removed uses, which we should do, as otherwise we could make a truncation assuming that we know everything about Math instructions, and not keep a copy of the instruction for the bailout path.

Note this issue would be a correctness issue, but not a safety issue.
Attachment #8710179 - Flags: review?(nicolas.b.pierron) → review+
Attached patch bug1238592.patch (obsolete) — Splinter Review
Carrying forward r+ from nbp. This patch adds a call to FlagAllOperandsAsHavingRemovedUses in RemoveUnmarkedBlocks as requested in the review.
Attachment #8710179 - Attachment is obsolete: true
Attachment #8714145 - Flags: review+
Attached patch bug1238592.patch (obsolete) — Splinter Review
That fix caused a crash on the included testcase. Here's a new patch which also includes a fix to FlagAllOperandsAsHavingRemovedUses to handle the case where a block's entryResumePoint() is null. Rerequesting r? for review of this additional change.
Attachment #8714145 - Attachment is obsolete: true
Attachment #8714146 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8714146 [details] [diff] [review]
bug1238592.patch

Review of attachment 8714146 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit-test/tests/ion/bug1238592.js
@@ +1,4 @@
> +// bug 1238592
> +
> +function find(c, unbox, t) {
> +    if (typeof t != "string") return; // (1)

Note, as this bug affect multiple version we should probably not land the test case at the same time.

::: js/src/jit/IonAnalysis.cpp
@@ +1932,5 @@
>  
> +            // As we are going to remove edges and basic blocks, we have to mark
> +            // instructions which would be needed by baseline if we were to
> +            // bailout.
> +            FlagAllOperandsAsHavingRemovedUses(block);

This function was made assuming that the graph has no removed blocks yet, and is faster if applied in with a PostorderIterator.  I will recommend to make a separate loop only for calling this function, as done the the Prune Branch optimization.

Also note, that if we backport this patch, we might have to include this function as well, as it got added in Gecko a few versions ago.
Attachment #8714146 - Flags: review?(nicolas.b.pierron) → review+
Attached patch bug1238592.patch (obsolete) — Splinter Review
Updated patch to address review feedback: Drop the testcase, and put the call to FlagAllOperandsAsHavingRemovedUses in its own loop using a PostorderIterator.
Attachment #8714146 - Attachment is obsolete: true
Attachment #8715383 - Flags: review+
Carrying forward the r+.
Comment on attachment 8715383 [details] [diff] [review]
bug1238592.patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

Not easily. It's difficult to create testcases that trigger the OSR fixup block condition in the first place; triggering this bug requires triggering it in a very specific way.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

No; the patch no longer contains the testcase.

Which older supported branches are affected by this flaw?

mozilla35 and later.

If not all supported branches, which bug introduced the flaw?

Bug 1029830.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

I don't, but they'd be fairly easy and straightforward to create.

How likely is this patch to cause regressions; how much testing does it need?

Unlikely. Normal treeherder testing is probably sufficient.
Attachment #8715383 - Flags: sec-approval?
Comment on attachment 8715383 [details] [diff] [review]
bug1238592.patch

This is sec-moderate rated so it doesn't need sec-approval to land.
Attachment #8715383 - Flags: sec-approval?
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/bbb2e9a90816 - jsapi tests, https://treeherder.mozilla.org/logviewer.html#?job_id=21187406&repo=mozilla-inbound, at least some shell builds like https://treeherder.mozilla.org/logviewer.html#?job_id=21185086&repo=mozilla-inbound, and, oddly, web-platform-reftest, https://treeherder.mozilla.org/logviewer.html#?job_id=21190544&repo=mozilla-inbound, all agreed on "Assertion failure: pred->numPredecessors() == 0 (OSR fixup block should have no predecessors), at c:/builds/moz2_slave/m-in-w64-d-0000000000000000000/build/src/js/src/jit/ValueNumbering.cpp:1111"
Attached patch relax-assert.patch (obsolete) — Splinter Review
The fix was backed out for causing jsapi test testJitGVN_FixupOSROnlyLoopNested to fail, because there's an over-aggressive assert. This patch fixes it.
Attachment #8716575 - Flags: review?(nicolas.b.pierron)
Attached patch 283247.patchSplinter Review
To avoid confusion, here is a single patch containing both the main fix and the additional over-zealous assert fix.
Attachment #8715383 - Attachment is obsolete: true
Attachment #8716575 - Attachment is obsolete: true
Attachment #8716575 - Flags: review?(nicolas.b.pierron)
Attachment #8716799 - Flags: review?(nicolas.b.pierron)
Attachment #8716799 - Flags: review?(nicolas.b.pierron) → review+
https://hg.mozilla.org/mozilla-central/rev/4504452a90ff
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
need uplift to 45/46 or ?
Flags: needinfo?(sledru)
It depends the risk. We had about 30 crashes with the signature mentioned in the "see also" bug in 44.
Flags: needinfo?(sledru)
Group: javascript-core-security → core-security-release
Depends on: 1252034
Want to request uplift to beta (46)?  It looks like that would also fix bug 1221872.
Flags: needinfo?(sunfish)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #31)
> Want to request uplift to beta (46)?  It looks like that would also fix bug
> 1221872.

Note, if we take this patch, we should also backport Bug 1252034 as well.
Isn't bug 1221872 fixed by its own patch?

I don't know how to evaluate the importance of backporting the patch here. If people who do think it's important, I'll backport it.
Flags: needinfo?(sunfish)
Whiteboard: required to fix bug 1221872 fully
Too late for 46, but this will ship with 47.
Comment on attachment 8716799 [details] [diff] [review]
283247.patch

Taking it to esr for fix bug 1221872
Attachment #8716799 - Flags: approval-mozilla-esr45+
I missed that we needed this for bug 1221872 the other day. We should take this for 46 on beta and m-r.
Does the esr patch apply for mozilla-release and mozilla-beta too?
Flags: needinfo?(sunfish)
Comment on attachment 8716799 [details] [diff] [review]
283247.patch

Let's also take this patch for 46 beta and m-r since bug 1221872 appears to need it
Attachment #8716799 - Flags: approval-mozilla-release+
Attachment #8716799 - Flags: approval-mozilla-beta+
Whiteboard: required to fix bug 1221872 fully → [adv-main46+][adv-esr45.1+]required to fix bug 1221872 fully
I don't know of any reason it shouldn't work on release and beta, though I haven't tested it myself.
Flags: needinfo?(sunfish)
Flags: qe-verify+
Under Windows 7 64-bit, with m-c debug build (from 2016-01-11), I've encountered a tab crash (but no option to submit a report), along with an Application Error window - https://goo.gl/hcbLyh.
Verified fixed with Firefox 47 beta 8 (Build ID: 20160523113146) and latest ESR45 tinderbox - build, across platforms - no crash found.

[1] Windows 7 64-bit, Mac OS X 10.10.5 and Ubuntu 14.04 32-bit.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Group: core-security-release
Depends on: 1502013
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: