Closed Bug 1096138 Opened 10 years ago Closed 9 years ago

Assertion failure: *to != *moves_[i].to(), at jit/LIR.cpp

Categories

(Core :: JavaScript Engine: JIT, defect)

x86
macOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla37
Tracking Status
firefox34 --- wontfix
firefox35 --- wontfix
firefox36 + fixed
firefox37 + verified
firefox-esr31 36+ fixed
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed
b2g-v2.0M --- fixed
b2g-v2.1 --- fixed
b2g-v2.2 --- fixed

People

(Reporter: gkw, Assigned: sunfish)

References

Details

(4 keywords, Whiteboard: [jsbugmon:update][adv-main36+][adv-esr31.5+][b2g-adv-main2.2+])

Attachments

(4 files)

(function(x, y) {
    var r2 = r1
    var r3 = y ^ 4
    var r4 = (function(){})()
    var r6 = r5
    x = 5 % r3
    r4 - r2
    r3 - 6
    return x
})()

asserts js debug shell on m-c changeset eb0d3b3c0b22 with --ion-eager --no-threads at Assertion failure: *to != *moves_[i].to(), at jit/LIR.cpp.

Debug configure options:

LD=ld CROSS_COMPILE=1 CC="clang -Qunused-arguments -msse2 -mfpmath=sse -arch i386" RANLIB=ranlib CXX="clang++ -Qunused-arguments -msse2 -mfpmath=sse -arch i386" AS=$CC AR=ar STRIP="strip -x -S" HOST_CC="clang -Qunused-arguments -msse2 -mfpmath=sse" AUTOCONF=/usr/local/Cellar/autoconf213/2.13/bin/autoconf213 HOST_CXX="clang++ -Qunused-arguments -msse2 -mfpmath=sse" sh /Users/skywalker/trees/mozilla-central/js/src/configure --target=i386-apple-darwin9.2.0 --enable-macos-target=10.5 --enable-debug --enable-optimize --enable-nspr-build --enable-more-deterministic --with-ccache --enable-gczeal --enable-debug-symbols --disable-tests

autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/7e2e5dffbb2e
user:        Kannan Vijayan
date:        Sun Aug 17 15:58:16 2014 -0400
summary:     Bug 1054340 - Remove PcOffset IR instructions. r=h4writer

This seems 32-bit only. Setting s-s and assuming sec-critical because LIR is involved.

Kannan, is bug 1054340 a possible regressor?
Flags: needinfo?(kvijayan)
Attached file stack
(lldb) bt 5
* thread #1: tid = 0x729b2, 0x0034c58d js-dbg-opt-32-dm-nsprBuild-darwin-eb0d3b3c0b22`js::jit::LMoveGroup::add(this=<unavailable>, from=<unavailable>, to=<unavailable>, type=<unavailable>) + 573 at LIR.cpp:538, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0034c58d js-dbg-opt-32-dm-nsprBuild-darwin-eb0d3b3c0b22`js::jit::LMoveGroup::add(this=<unavailable>, from=<unavailable>, to=<unavailable>, type=<unavailable>) + 573 at LIR.cpp:538
    frame #1: 0x00352827 js-dbg-opt-32-dm-nsprBuild-darwin-eb0d3b3c0b22`js::jit::LinearScanAllocator::reifyAllocations() [inlined] js::jit::LiveRangeAllocator<js::jit::LinearScanVirtualRegister, true>::addMove(moves=<unavailable>, from=<unavailable>, to=<unavailable>, type=<unavailable>) + 2231 at LiveRangeAllocator.h:673
    frame #2: 0x00352810 js-dbg-opt-32-dm-nsprBuild-darwin-eb0d3b3c0b22`js::jit::LinearScanAllocator::reifyAllocations() [inlined] js::jit::InstructionDataMap::operator[](this=<unavailable>, this=<unavailable>, ins=<unavailable>, from=<unavailable>, to=<unavailable>, type=<unavailable>) + 96 at LiveRangeAllocator.h:687
    frame #3: 0x003527b0 js-dbg-opt-32-dm-nsprBuild-darwin-eb0d3b3c0b22`js::jit::LinearScanAllocator::reifyAllocations(this=0x00000000) + 2112 at LinearScan.cpp:378
    frame #4: 0x002e5968 js-dbg-opt-32-dm-nsprBuild-darwin-eb0d3b3c0b22`js::jit::LinearScanAllocator::go(this=0xbfffe868) + 392 at LinearScan.cpp:1316
(lldb)
(In reply to Gary Kwong [:gkw] [:nth10sd] GMT+8 after Oct 9 from comment #0)
> This seems 32-bit only. Setting s-s and assuming sec-critical because LIR is
> involved.
> 
> Kannan, is bug 1054340 a possible regressor?

Unlikely.  That patch just removed an IR instruction, and didn't change any compile behaviour that would be in effect in this case (the IR instruction that was removed was only emitted when the profiler was turned on anyway).
Flags: needinfo?(kvijayan)
(In reply to Kannan Vijayan [:djvj] from comment #2)
> > Kannan, is bug 1054340 a possible regressor?
> 
> Unlikely.

Not sure how to move this forward then. Jan, thoughts?
Flags: needinfo?(jdemooij)
Can somebody look at this?  It is a few weeks old at this point.
Jan mentioned in-person that we might fix this by switching entirely to the backtracking allocator and removing the lsra one, but not sure when that will happen yet.
Flags: needinfo?(sunfish)
During reification, just before reifying v6[2], we have this (from dumpInstructions()):

...
[56,57 ModI] [def v19<i>:edx] [temp v18<g>:eax] [use v17:r] [use v13:r] [use arg:4] [use arg:0] [use stack:12] [use v7:*] [use v11:*] [use v12:*] [use v13:*] [use v15:*] [use v16:*]
[58,59 Nop]
[60,61 BinaryV] [def v20<t>:ecx] [def v21<p>:edx] [use v11:r] [use v12:r] [use v6:r] [use v7:r]
...

We're at "v6[2] req(r) has(eax) [59,60) eax@60". Note that this is LSRA, so it's my understanding that liveness positions are sometimes intentionally off by one, which is why we appear to have a use at 60 covered by a (half-open) range at [59,60). Intuitively, a copy into eax for v6 should be placed in the "input" movegroup for the BinaryV. However, v6[2] actually has a start position of 59, which is the output side of the Nop, so reification puts the copy in the "output" movegroup for the Nop.

Later, when attempting to reify "v19[0] req(edx@68?) has(eax) [58,59)", we place the copy at the minimal def end after the Nop. Since the "after" movegroup for the Nop already has a write to eax, we crash.

If the copy for v6 had been in the "input" movegroup for the BinaryV, as would have been intuitive, there would be no conflict. So, it's possible that some code here is not properly accounting for the off-by-one. Or, possibly some code is not properly skipping the Nop when it should.
Attached patch mop.patchSplinter Review
The LIR Nop exists to give some space after an instruction with fixed definitions, to prevent the live ranges from colliding with the inputs in the subsequent instruction. But, proper handling of OsiPoints requires several parts of register allocation to implicitly scan down from an instruction past its OsiPoint (minimalDefEnd). Since Nops are inserted before OsiPoints, minimalDefEnd must skip past Nops to find OsiPoints. However, this means that fixed definition live ranges to extend down into territory where they conflict with the inputs in the subsequent instruction again.

The attached patch inserts a new kind of nop, which I'm tentatively calling a Mop, Mops are like Nops, except that (a) they go *after* any OsiPoint, and (b) minimalDefEnd does not have to skip over them. This allows them to fulfill the same function as Nops, providing space to prevent collisions from fixed definition live ranges.

This fixes the given testcase here, doesn't seem to break anything, and is the most comprehensive fix for this bug I've found so far. I've not yet requested a review, because I'm not yet convinced. Input welcome.

Note that the backtracking allocator does not use Nops, so it won't use Mops with this patch either.
Assignee: nobody → sunfish
Flags: needinfo?(sunfish)
Comment on attachment 8535949 [details] [diff] [review]
mop.patch

Ok, I've tried several different approaches to fixing this, and this Mop patch is the only one that fixes it without breaking anything else. I realize that this isn't ideal, so I'm open to suggestions of how else to fix the bug here.
Flags: needinfo?(jdemooij)
Attachment #8535949 - Flags: review?(jdemooij)
Comment on attachment 8535949 [details] [diff] [review]
mop.patch

Review of attachment 8535949 [details] [diff] [review]:
-----------------------------------------------------------------

Could really use a comment, but since this is a sec bug we can add one later.
Attachment #8535949 - Flags: review?(jdemooij) → review+
Comment on attachment 8535949 [details] [diff] [review]
mop.patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

Not easy, I'd guess. Unless assisted by a testcase to point the way, one would require fairly detailed knowledge of JIT internals to figure this out.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

There is a comment which gives some clues of where one might look for the actual bug, but no bulls-eye on the bug itself.

Which older supported branches are affected by this flaw?

status-firefox34: 	wontfix
status-firefox35: 	affected
status-firefox36: 	affected
status-firefox37: 	affected 

If not all supported branches, which bug introduced the flaw?

Unknown. Bisection turned up bug 1054340, but that seems unlikely.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

Backports should be easy to create, and low-risk.

How likely is this patch to cause regressions; how much testing does it need?

Unlikely. The patch itself introduces straight-forward logic in well-understood places.
Attachment #8535949 - Flags: sec-approval?
Comment on attachment 8535949 [details] [diff] [review]
mop.patch

sec-approval+ for trunk. Please create and nominate Aurora and Beta patches as well. We can get them in once trunk is green.
Attachment #8535949 - Flags: sec-approval? → sec-approval+
It's not clear to me if this affects b2g32 or not based on the response to the sec-approval questions.
status-b2g-v2.0: --- → ?
Flags: needinfo?(sunfish)
As a correction, while the testcase here doesn't exhibit the bug on old versions, we don't know that they're not affected, because we don't know when the underlying bug was introduced. For safety, I'll mark all older versions as affected, unless we learn otherwise.
https://hg.mozilla.org/mozilla-central/rev/d8d78db85c5d
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
Status: RESOLVED → VERIFIED
JSBugMon: This bug has been automatically verified fixed.
So this has almost-certainly missed Fx35 now (the RC build is due to be spun today). Can we at least get approval requests for Aurora and esr31?
Flags: needinfo?(sunfish)
Group: javascript-core-security
[Tracking Requested - why for this release]:
security bug that affects ESR, although the known testcase doesn't reproduce the problem there. Safe enough patch, we should just take it rather than risk it.
Dan, could you fill the uplift request for aurora and esr?
Thanks
Attached patch aurora.patchSplinter Review
I apologize for the delay. Attached is a version of the patch ported to aurora. It's essentially the same, though I did need to update a few things to work with the changes introduced in bug 1106947.
Flags: needinfo?(sunfish)
Comment on attachment 8546252 [details] [diff] [review]
aurora.patch

Approval Request Comment
[Feature/regressing bug #]:

Unknown

[User impact if declined]:

Potential security bug.

[Describe test coverage new/current, TBPL]:

TBPL

[Risks and why]:

The patch is low-risk because what it's making fairly simple changes in well-understood areas.

[String/UUID change made/needed]:

None.
Attachment #8546252 - Flags: approval-mozilla-aurora?
Attached patch esr31.patchSplinter Review
[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:

It's sec-critical.

User impact if declined: 

Security bug.

Fix Landed on Version:

mozilla37

Risk to taking this patch (and alternatives if risky): 

Low; the patch makes simple changes in well-understood areas.

String or UUID changes made by this patch: 

None.

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8546275 - Flags: approval-mozilla-esr31?
Attachment #8546252 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8546275 - Flags: approval-mozilla-esr31? → approval-mozilla-esr31+
Whiteboard: [jsbugmon:update] → [jsbugmon:update][adv-main36+][adv-esr31.5+]
Whiteboard: [jsbugmon:update][adv-main36+][adv-esr31.5+] → [jsbugmon:update][adv-main36+][adv-esr31.5+][b2g-adv-main2.2+]
Group: core-security → core-security-release
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.