Closed Bug 1365518 Opened 7 years ago Closed 7 years ago

Assertion failure: exprStackSlots == expectedDepth, at BaselineBailouts.cpp:1058

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla55
Tracking Status
firefox-esr45 --- unaffected
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 --- unaffected
firefox55 + verified

People

(Reporter: cbook, Assigned: jandem)

References

()

Details

(Keywords: assertion, csectype-wildptr, sec-critical, Whiteboard: [adv-main55-][post-critsmash-triage])

Attachments

(4 files, 2 obsolete files)

Attached file bughunter stack
found via bughunter reproduced on latest tinderbox debug build trunk on windows 7

Steps to reproduce:
-> Load https://plans.thegrid.io/lifetime/
--> Assertion failure: exprStackSlots == expectedDepth, at c:/builds/moz2_slave/m-cen-w32-d-000000000000000000/build/src/js/src/jit/BaselineBailouts.cpp:1058

filing as s-s just in case
arai, shu: could you take a look at this ?
Flags: needinfo?(shu)
Flags: needinfo?(arai.unmht)
When I load that site I get a crash in jitcode, but no assertion.
I only get the crash if I scroll.
Hm, not having a lot of success debugging this. Here's a brain dump of where I am.

The crash is in jitcode that accesses 0x5b8(0x10($rax)). 0x10($rax), a NativeObject's slots_, is 0x0. 0x5b8 is some slot number.

$rax is a CallObject:

object 0x7fffb55a8880 from global 0x7fffd5e731a0 [Window]
class 0x7fffed194d78 Call
group 0x7fffcabf20d0
flags: delegate varobj
proto null
reserved slots:
   0 (reserved) = <Call object at 0x7fffb556cec0>
   1 (reserved) = <function Wa< (https://plans.thegrid.io/lifetime/ds.js:1) at 0x7fffb55c2bc0>
properties:
    ((js::Shape*) 0x7fffbdd3aa88) enumerate permanent JSString* (0x7fffd9c0cea0) = Latin1Char * (0x7fffd9c0cea8) = "lr"
: slot 2 = <function lr (https://plans.thegrid.io/lifetime/ds.js:1) at 0x7fffb56b7f00>

Let $enclosing = $rax->enclosingEnvironment().

$enclosing is another CallObject with hundreds of slots.

$enclosing->numFixedSlots() is 16.

(0x5b8 / 8) + 16 is 199.

Slot 199 on $enclosing is
    ((js::Shape*) 0x7fffbdd36240) enumerate permanent JSString* (0x7fffd9c10e00) = Latin1Char * (0x7fffd9c10e08) = "te"
: slot 199 = <function te (https://plans.thegrid.io/lifetime/ds.js:1) at 0x7fffb5582fc0>

The function lr above has a

00043:   1  getaliasedvar "te" (hops = 1, slot = 199) # te

So it looks like there's a bug somewhere in codegen for a getaliasedvar not skipping the right number of environments?
The script with the getaliasedvar "te" that's causing crashes is at ds.js:1:281406.
As best as I can tell, getaliasedvar "te" is 0-hop and supposed to 0-hop in the function at ds.js:1:281406. It's getting getting inlined and the environment chain is incorrect.
I've tracked it to this: poly inline dispatch is dispatching to the wrong
inlined function. I don't know how to fix, but this super hacky printf
instrumentation illustrates the problem.

I hacked up MDebugger to take a JSFunction* and print out its script's
toStringStart offset. I make new MDebuggers only when getting aliased vars
named "te" at large slots on scripts whose toStringStart() == 281406.
The MDebugger then prints out at runtime callee->nonLazyScript()->toStringStart().

This usually works, until eventually it dispatches incorrectly. Example output:

>>SHU callee 0x7fffb54b3300 toStringStart 281406
...
>>SHU callee 0x7fffb54b3300 toStringStart 281406
>>SHU callee 0x7fffb54b3f80 toStringStart 281719

Jan, how do I debug this from here?
Attachment #8868880 - Flags: feedback?(jdemooij)
Okay, I'm pretty sure the bug is that for some ObjectGroup group, group->maybeInterpretedFunction() may be != group. I'm not sure why that would be, but what Ion does is:

1. In getPolyCallTargets, find non-singleton targets via observed ObjectGroups' maybeInterpretedFunction().
2. Do poly inline dispatch based on target->group(). But since group isn't guaranteed to be the same roundtripping through maybeInterpretedFunction(), the inline dispatch is just wrong.
(In reply to Shu-yu Guo [:shu] from comment #8)
> Okay, I'm pretty sure the bug is that for some ObjectGroup group,
> group->maybeInterpretedFunction() may be != group. I'm not sure why that
> would be, but what Ion does is:
> 

Err, I meant group->maybeInterpretedFunction()->group() may be != group.
Shu, are you able to get some form of testcase saved that reproduces locally (even if un-minimized)? I'm worried about the site changing out from us before debugging is finished.
I'm afraid I'm busy this weekend and won't be able to look into this shortly.
Flags: needinfo?(arai.unmht)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #10)
> Shu, are you able to get some form of testcase saved that reproduces locally
> (even if un-minimized)? I'm worried about the site changing out from us
> before debugging is finished.

I plan to do that. Should be possible.
I have a saved version that reproduces an asan nullptr crash when scrolling on fedora. Note that bughunter's vms don't have real displays and may not support h264. The assertion was only seen once on ubuntu but the but the other crashes were seen on both fedora and ubuntu in:

js::jit::DoTypeMonitorFallback or js::jit::GetNativeDataPropertyByValue<false> or js::Lambda or js::TypeScript::SetArgument
I haven't come up with a minimal test case, but I'm pretty sure the bug is that due to setting a function's __proto__, it is possible for a function's ObjectGroup |group| to be such that |group->maybeInterpretedFunction()->group() != group|. When this happens, the poly inline dispatch logic in Ion is wrong. Ion's assumption is that for each ObjectGroup in the observed callee typeset at some callsite, ObjectGroup identity is preserved through roundtripping maybeInterpretedFunction: codegen for it automatically calls the last inlined function if all previous ObjectGroup checks fail. Ultimately that means a function will be dispatched to the wrong inline block.

I don't have time to write a full fix tonight; Jan should be back tomorrow, and he can decide if we should band-aid this now or fix this in a more principled fashion.
Flags: needinfo?(shu)
Attached patch Bandaid (obsolete) — Splinter Review
Attachment #8869285 - Flags: review?(jdemooij)
Attached patch BandaidSplinter Review
Attachment #8869286 - Flags: review?(jdemooij)
Attachment #8869285 - Attachment is obsolete: true
Attachment #8869285 - Flags: review?(jdemooij)
Group: core-security → javascript-core-security
Comment on attachment 8869286 [details] [diff] [review]
Bandaid

Review of attachment 8869286 [details] [diff] [review]:
-----------------------------------------------------------------

I think this is a regression from bug 1357680 part 3. In that case this bandaid works but it will disable inlining of functions that had their __proto__ mutated.

I've been wanting to make some changes in this area, so let me see how hard it is to support this.
Attachment #8869286 - Flags: review?(jdemooij)
Comment on attachment 8868880 [details] [diff] [review]
super hacky printf instrumentation

Thanks a lot for debugging this, Shu!
Attachment #8868880 - Flags: feedback?(jdemooij)
Flags: needinfo?(jdemooij)
The testcase below fails with --no-threads, passes with --no-ion:

$ dist/bin/js --no-threads test.js
Error: Assertion failed: got 2, expected 1


function init() {
    foo = () => 1;
    bar = () => 2;
    foo.__proto__ = function() {};
}
function test() {
    var arr = [foo, bar];
    for (var i = 0; i < 10000; i++) {
        assertEq(arr[i % 2](), i % 2 + 1);
    }
}
init();
test();
Only 55 affected, or other branches too?
Attached patch Patch (obsolete) — Splinter Review
This changes the ObjectVector storing the inlining targets to an InliningTargets Vector that stores both the function and the group to guard on.

With this patch I can no longer reproduce the browser crash and it fixes the test in comment 19.
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Flags: needinfo?(jdemooij)
Attachment #8870332 - Flags: review?(shu)
Attached patch PatchSplinter Review
Attachment #8870332 - Attachment is obsolete: true
Attachment #8870332 - Flags: review?(shu)
Attachment #8870335 - Flags: review?(shu)
Comment on attachment 8870335 [details] [diff] [review]
Patch

Review of attachment 8870335 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great!

::: js/src/jit/IonBuilder.cpp
@@ +4011,5 @@
>      return InliningDecision_Inline;
>  }
>  
>  AbortReasonOr<Ok>
> +IonBuilder::selectInliningTargets(const InliningTargets& targets, CallInfo& callInfo, BoolVector& choiceSet,

Nit: >100 columns
Attachment #8870335 - Flags: review?(shu) → review+
https://hg.mozilla.org/mozilla-central/rev/3bfd7a302267
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Version: unspecified → Trunk
Group: javascript-core-security → core-security-release
Whiteboard: [adv-main55-]
Flags: qe-verify+
Whiteboard: [adv-main55-] → [adv-main55-][post-critsmash-triage]
I have managed to reproduce the issue mentioned in comment 0 using Firefox 55.0a1 debug (Build Id:20170517210721) on Windows 10 64 bit, Ubuntu 16.04 64bit and macOS 10.11.6

This issue is no longer reproducible on Firefox 55.0 debug build (Build Id:20170731093223) on Windows 10 64 bit, Ubuntu 16.04 64 bit and macOS 10.11.6.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: