Closed Bug 1343723 Opened 3 years ago Closed 3 years ago

Crash [@ js::jit::MachineState::read] involving Promise

Categories

(Core :: JavaScript Engine: JIT, defect, P1, critical)

x86_64
Linux
defect

Tracking

()

VERIFIED FIXED
mozilla55
Tracking Status
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 + fixed
firefox55 + verified

People

(Reporter: gkw, Assigned: tcampbell)

References

(Blocks 2 open bugs)

Details

(5 keywords, Whiteboard: [jsbugmon:])

Crash Data

Attachments

(2 files, 1 obsolete file)

The following testcase crashes on mozilla-central revision e91de6fb2b3d (build with --enable-debug --enable-more-deterministic, run with --fuzzing-safe --no-threads --ion-eager):

// jsfunfuzz-generated
oomTest(function() {
    // Adapted from randomly chosen test: js/src/tests/ecma_6/Promise/iterator-close.js
    class MyPromiseStaticResolveGetterThrows extends Promise {}
    (function() {
        for (y of [0]) {
            let x;
            z;
            (function() {
                return x;
            })();
        }
    })()
})


Backtrace:

#0  js::jit::MachineState::read (reg=..., this=0x7ffeefca51d0) at js/src/jit/Registers.h:242
#1  js::jit::SnapshotIterator::fromRegister (this=0x7ffeefca5020, reg=...) at js/src/jit/JitFrameIterator.h:418
#2  js::jit::SnapshotIterator::allocationValue (this=this@entry=0x7ffeefca5020, alloc=..., rm=rm@entry=js::jit::SnapshotIterator::RM_Normal) at js/src/jit/JitFrames.cpp:1855
#3  0x0000000000811c99 in js::jit::SnapshotIterator::read (this=this@entry=0x7ffeefca5020) at js/src/jit/JitFrameIterator.h:540
#4  0x0000000000e6fbfb in InitFromBailout (caller=..., ionScript=<optimized out>, excInfo=0x7ffeefca55c0, callPC=<synthetic pointer>, nextCallee=..., startFrameFormals=..., builder=..., invalidate=true, iter=..., script=..., fun=..., callerPC=0x0, cx=0x7fc29a250000) at js/src/jit/BaselineBailouts.cpp:725
#5  js::jit::BailoutIonToBaseline (cx=0x7fc29a250000, activation=0x7ffeefca5db0, iter=..., invalidate=invalidate@entry=true, bailoutInfo=bailoutInfo@entry=0x7ffeefca5148, excInfo=excInfo@entry=0x7ffeefca55c0) at js/src/jit/BaselineBailouts.cpp:1620
/snip

For detailed crash information, see attachment.
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/039475e39939
user:        Jeff Walden
date:        Thu Feb 09 17:12:56 2017 -0800
summary:     Bug 1341951 - Use alignas/alignof (rather than a union that attempts to replicate the same thing) inside RInstructionStorage.  r=nbp

Waldo, is bug 1341951 a likely regressor?
Flags: needinfo?(jwalden+bmo)
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #2)
> autoBisect shows this is probably related to the following changeset:
> 
> The first bad revision is:
> changeset:   https://hg.mozilla.org/mozilla-central/rev/039475e39939
> user:        Jeff Walden
> date:        Thu Feb 09 17:12:56 2017 -0800
> summary:     Bug 1341951 - Use alignas/alignof (rather than a union that
> attempts to replicate the same thing) inside RInstructionStorage.  r=nbp
> 
> Waldo, is bug 1341951 a likely regressor?

Gary, does this reproduce with Bug 1341951 comment 12?
Group: javascript-core-security
Flags: needinfo?(gary)
I had made sure to retest *after* that patch landed, so yes.
Flags: needinfo?(gary) → needinfo?(nicolas.b.pierron)
That MachineState in the stack is filled with uninitialized regs_.  (Or in debug builds, regs_ initialized to 0x100, 0x101, et seq.)  In a build without bug 1341951's changes, those fields are filled in by a prior js::jit::Bailout call, if memory serves from a quick rr session.  I ran out of time to do a full comparison between pre-change and post-change.  This is probably better handled by nbp.
Flags: needinfo?(jwalden+bmo)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #5)
> In a build
> without bug 1341951's changes, those fields are filled in by a prior
> js::jit::Bailout call

This is the correct behaviour which is expected, as these fields are supposed to be filled by the Trampoline function which dump all registers on the stack, and then this structure is interpreted as being a MachineState.

I have no idea how Bug 1341951 is supposed to change the way the trampoline behaves, nor the way the MachineState is read.
Component: JavaScript Engine → JavaScript Engine: JIT
Priority: -- → P1
Whiteboard: [jsbugmon:update] → [jsbugmon:update,ignore]
JSBugMon: The testcase found in this bug no longer reproduces (tried revision 5fe5dcf1c10a).
Whiteboard: [jsbugmon:update,ignore] → [jsbugmon:bisectfix]
Whiteboard: [jsbugmon:bisectfix] → [jsbugmon:]
JSBugMon: Fix Bisection requested, result:
autoBisect shows this is probably related to the following changeset:

The first good revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/f98a326bcf8d
user:        Jan de Mooij
date:        Mon Mar 20 14:00:33 2017 +0100
summary:     Bug 1328140 - Improve handling of IC failures, add megamorphic IC stubs. r=h4writer

This iteration took 264.685 seconds to run.
Jan, is bug 1328140 a likely fix, or did it merely mask the issue?
Flags: needinfo?(jdemooij)
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #9)
> Jan, is bug 1328140 a likely fix, or did it merely mask the issue?

It's unlikely, the bug is probably still there.
Flags: needinfo?(jdemooij)
Sounds potentially exploitable, so I'll mark this high.
Flags: needinfo?(jdemooij)
I think this is a regression from bug 1273858.

LexicalEnvironmentObject::create throws an OOM exception. Then we end up in the exception handler and do a bailout to Baseline. The bailout reads the environment Value:

  Value v = iter.read();

However the snapshot expects it to be in eax, which is the return value of the LexicalEnvironmentObject::create call. eax is not included in the safepoint so it's not initialized in the MachineState.

This happens because in IonBuilder::jsop_pushlexicalenv (and jsop_copylexicalenv) we do:

    current->add(ins);
    current->setEnvironmentChain(ins);

    return resumeAfter(ins);

The resumepoint/snapshot captures the environment chain and that's not compatible with the exception handler (because it also uses the snapshot).

The simplest way to fix it might be for these bytecode ops to push the environment chain on the stack, and then have a separate JSOp to set the env slot (the current->setEnvironmentChain(ins) part). That will fix the bug because when we're doing a bailout for try-catch we ignore stack slots.

Ted, what do you think?
Flags: needinfo?(tcampbell)
Flags: needinfo?(nicolas.b.pierron)
Flags: needinfo?(jdemooij)
I'm also wondering if the fix for bug 1354275 is incomplete, as MNewArray for instance doesn't have a resumepoint but can throw on OOM... Would the separate bytecode op fix these issues too?
Jan, I agree with your assessment that we should consider splitting JSOP_PUSHLEXICALENV into two opcodes. The first being a rough analogue to JSOP_NEWOBJECT, and the second being the infallible update of environment slot.

I also wonder if prologue env slots risks this problem too, but am not clear right now on that.

Between PTO/Easter/Weekend, I'm off until Monday. I'll give it some more thought then. The resume points stuff is new to me and between this bug and bug 1354275, I'm still missing some of the nuance.
Flags: needinfo?(tcampbell)
Assignee: nobody → tcampbell
Ted, what's the status on this bug?
Flags: needinfo?(tcampbell)
I should resolve it this week. The fix Jan proposed likely will work, I'm just trying to understand the problem a little better since it is easy to fix the testcase without fixing the underlying problem.
Okay, I think I have a fix that doesn't require adding extra opcodes.

In this patch, I remove the |resumeAfter| and instead roll back to previous snapshot and previous $ENV. This should result in OOM appearing to fire earlier, but this effect should not be observable. As mentioned above, MNewArray already can OOM and doesn't have a |resumeAfter|. This behavior is more correct anyways as the old environment is recovered instead.

===

When comparing the generated LIR/regalloc/safepoints for before and after this patch, I see the same result. Only the recovery data is different, because we resume to last snapshot.

I don't understand why this case is different than the typical..

> ins = MSomeOpThatCanOOM()
> current->add(ins)
> current->push(ins)
> resumeAfter(ins)

The |current->setEnvironmentChain| has the same effect as |current->push| in that it updates |MBasicBlock::slots_|.
Flags: needinfo?(tcampbell)
Attachment #8869648 - Flags: review?(jdemooij)
Attachment #8869648 - Flags: feedback?(nicolas.b.pierron)
Comment on attachment 8869648 [details] [diff] [review]
0001-Bug-1343723-ResumePoint-not-needed-for-MNewLexicalEn.patch

Review of attachment 8869648 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM. Maybe add a testcase that fails without the patch? Follow-up patch is fine if this is security sensitive.
Attachment #8869648 - Flags: review?(jdemooij) → review+
Comment on attachment 8869648 [details] [diff] [review]
0001-Bug-1343723-ResumePoint-not-needed-for-MNewLexicalEn.patch

Review of attachment 8869648 [details] [diff] [review]:
-----------------------------------------------------------------

Nice and simple, Thanks!
I guess the problem is inherent to the scope chain, as we are trying to read-it when we go in the exception handler, right? So we won't have similar issues with throwing calls which require a resumeAfter call?
Attachment #8869648 - Flags: feedback?(nicolas.b.pierron) → feedback+
Automated testcase is a little tricky since it depends on register allocation. I'm not sure how to write it. I was able to confirm patch against specific revision the fuzzbug was reported against.
Comment on attachment 8869648 [details] [diff] [review]
0001-Bug-1343723-ResumePoint-not-needed-for-MNewLexicalEn.patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
- Requires control of regalloc and OOM at specific VM call in order for recover to load uninitialized memory as an object which is then used to exploit. Would require intimate understanding of recovery mechanism to see problem from patch.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
- Bug number in commit message.
- Fix does not make clear the chain of events needed to trigger the risky state (MachineState structure corruption).

Which older supported branches are affected by this flaw?
- ff54
- ff55

If not all supported branches, which bug introduced the flaw?
- Introduce by bug 1273858 (ff54)

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
- The code hasn't been changed since bug introduced. We should probably backport to ff54.

How likely is this patch to cause regressions; how much testing does it need?
- The original feature has given a few rare crashes since lexical environments in Ion have been tricky. Risk of this patch to stability is low, and risk to security should be much lower than it currently is without the fix.
Attachment #8869648 - Flags: sec-approval?
sec-approval+.
Please nominate a patch for Beta (54) as well.
Attachment #8869648 - Flags: sec-approval? → sec-approval+
Comment on attachment 8869648 [details] [diff] [review]
0001-Bug-1343723-ResumePoint-not-needed-for-MNewLexicalEn.patch

Approval Request Comment
[Feature/Bug causing the regression]:
- Introduced by bug 1273858 (ff54)
[User impact if declined]:
- This is a security risk
[Is this code covered by automated tests?]:
- Code is hit by existing tests
- There is no automated test for specific sec-bug
- Test cases for specific issue is fragile and hard to automate, also sec risk
- Tested against fuzzbug for specific revision. Also relevant tests run locally against master.
[Has the fix been verified in Nightly?]:
- Not landed yet.
[Needs manual test from QE? If yes, steps to reproduce]: 
- No.
[List of other uplifts needed for the feature/fix]:
- None. Patch is complete.
[Is the change risky?]:
- Low-med
[Why is the change risky/not risky?]:
- This patch is an improvement over initial feature and has had more scrutiny. Risk is there is a reduced window for fuzzers and this feature has had a few fuzzbugs flush out edge cases in past. Still seems worth doing.
[String changes made/needed]:
- No.
Attachment #8869648 - Flags: approval-mozilla-beta?
Keywords: checkin-needed
More failures:
https://treeherder.mozilla.org/logviewer.html#?job_id=101411346&repo=mozilla-inbound

You'll probably want to give this a run through Try to ensure that any other issues are sorted too.
Well, that's embarrassing.

Added missing alias analysis case. Forwarding r+ from previous version. Debug checks now pass locally for me.
Attachment #8869648 - Attachment is obsolete: true
Attachment #8869648 - Flags: approval-mozilla-beta?
Flags: needinfo?(tcampbell)
Attachment #8870658 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/9d23e7a1da5e

Please request Beta approval on this as soon as you can.
Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(tcampbell)
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment on attachment 8870658 [details] [diff] [review]
0001-Bug-1343723-ResumePoint-not-needed-for-MNewLexicalEn.patch

Approval Request Comment
[Feature/Bug causing the regression]:
- Introduced by bug 1273858 (ff54)
[User impact if declined]:
- This is a security risk
[Is this code covered by automated tests?]:
- Code is hit by existing tests
- There is no automated test for specific sec-bug. Test cases for specific issue is fragile and hard to automate
- Tested against fuzzbug for specific revision. Also relevant tests run locally against master.
[Has the fix been verified in Nightly?]:
- Yes.
[Needs manual test from QE? If yes, steps to reproduce]: 
- No.
[List of other uplifts needed for the feature/fix]:
- None. Patch is complete.
[Is the change risky?]:
- Low-med
[Why is the change risky/not risky?]:
- This patch is an improvement over initial feature and has had more scrutiny. Risk is there is a reduced window for fuzzers and this feature has had a few fuzzbugs flush out edge cases in past. Still seems worth doing.
[String changes made/needed]:
- No.
Flags: needinfo?(tcampbell)
Attachment #8870658 - Flags: approval-mozilla-beta?
Comment on attachment 8870658 [details] [diff] [review]
0001-Bug-1343723-ResumePoint-not-needed-for-MNewLexicalEn.patch

Fix a sec-high. Beta54+. Should be in 54 beta 11.
Attachment #8870658 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Group: javascript-core-security → core-security-release
(In reply to Ted Campbell [:tcampbell] from comment #31)
> [Is this code covered by automated tests?]:
> - Code is hit by existing tests
> - There is no automated test for specific sec-bug. Test cases for specific
> issue is fragile and hard to automate
> - Tested against fuzzbug for specific revision. Also relevant tests run
> locally against master.
> [Has the fix been verified in Nightly?]:
> - Yes.
> [Needs manual test from QE? If yes, steps to reproduce]: 
> - No.

Setting qe-verify- based on Ted's assessment on manual testing needs.
Flags: qe-verify-
Status: RESOLVED → VERIFIED
JSBugMon: This bug has been automatically verified fixed.
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.