Closed Bug 1150337 Opened 8 years ago Closed 8 years ago

Odin: crash that appears to be related to typed array access

Categories

(Core :: JavaScript Engine: JIT, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox38 --- unaffected
firefox39 + fixed
firefox40 + fixed
firefox-esr31 --- unaffected
firefox-esr38 --- unaffected
b2g-v1.4 --- unaffected
b2g-v2.0 --- unaffected
b2g-v2.0M --- unaffected
b2g-v2.1 --- unaffected
b2g-v2.1S --- unaffected
b2g-v2.2 --- unaffected
b2g-master --- fixed

People

(Reporter: dougc, Assigned: sunfish)

Details

(Keywords: sec-critical)

Attachments

(3 files, 2 obsolete files)

Seeing some unexpected behaviour testing asm.js pointer-masking code. While trying to narrow it down with a differential analysis, found some code that crashes the browser. It looks like an int8 typed array access with a dehoisted index offset, and the index adds up to be beyond the 32 bit boundary. Perhaps some issue with the typed system, or an edge case in this optimization. Are you aware of any issues here?

   0x00007fffab0cb50e:	mov    %r10b,(%r15,%r8,1)
   0x00007fffab0cb512:	mov    %esi,%r8d
   0x00007fffab0cb515:	imul   $0xffffa7e7,%r8d,%r8d
   0x00007fffab0cb51c:	add    %edi,%r8d
   0x00007fffab0cb51f:	imul   $0xffff492e,%eax,%eax
   0x00007fffab0cb525:	add    %eax,%r8d
   0x00007fffab0cb528:	sar    $0x10,%r8d
=> 0x00007fffab0cb52c:	movsbl 0x97a8(%r15,%r8,1),%eax

(gdb) info reg r8
r8             0xfffffffb	4294967291
Marking s-s. If possible, please attach a testcase.
Group: javascript-core-security
No simple test case yet, but it's from this asm.js code from poppler which might give some clues.

function __ZN9DCTStream10readMCURowEv(i1) {
...
      // This is the bad HEAP8 access.
      i13 = HEAP8[(i16 + (Math_imul(i18, -22553) | 0) + (Math_imul(i17, -46802) | 0) >> 16) + 38824 >> 0] | 0;

This appears to have been translated from:
    // color space conversion
    if (colorXform) {
      // convert YCbCr to RGB
      if (numComps == 3) {
	for (y2 = 0; y2 < mcuHeight; ++y2) {
	  for (x2 = 0; x2 < mcuWidth; ++x2) {
	    pY = rowBuf[0][y2][x1+x2];
	    pCb = rowBuf[1][y2][x1+x2] - 128;
	    pCr = rowBuf[2][y2][x1+x2] - 128;
	    pR = ((pY << 16) + dctCrToR * pCr + 32768) >> 16;
	    rowBuf[0][y2][x1+x2] = dctClip[dctClipOffset + pR];
	    pG = ((pY << 16) + dctCbToG * pCb + dctCrToG * pCr + 32768) >> 16;
	    rowBuf[1][y2][x1+x2] = dctClip[dctClipOffset + pG];
	    pB = ((pY << 16) + dctCbToB * pCb + 32768) >> 16;
	    rowBuf[2][y2][x1+x2] = dctClip[dctClipOffset + pB];
	  }
	}
      // convert YCbCrK to CMYK (K is passed through unchanged)
Might have overlooked the sign extensions of the index register, %rd8 to %r8 above. If %rd8 were signed extended to %r8, rather than being zero extended, then perhaps the address calculation would be valid. Perhaps this optimization should only be applied when the lhs of the addition is known to be positive, not just when the resulting index is known to be within bounds.

This code pattern is also interesting as it computes a signed index into a table, and then adds 32768 to make it a non-negative index. This pattern probably needs to be optimized well too, so perhaps skipping the offset dehoisting optimization in this case is not a good long term fix. Also this code pattern challenges an inline bounds checking strategy that assumes small negative pre-offset indexes are not expected and can be handled OOL.

fwiw I see that LLVM has a lot of code to dealing with sign extension optimization for 64 bit processors, and it is still not complete. Related issues are blocking V8 and JSC performance - the 32 bit index to 64 bit addressing mode optimization. This might be an area needing some development work.
dougc: by any chance does the problem you're seeing go away when you create a fresh profile?  I ask b/c it's possible you are getting a cache hit from asm.js-cached code from a previous build.  While the asm.js cache is keyed on FF buildid (which changes on every build), the buildid only changes when the 'firefox' executable is rebuilt (but not, e.g., when you rebuild just libxul.so).

I ask b/c the displacement 0x97a8 is > AsmJSCheckedImmediateRange which shouldn't be possible when !MAsmJSHeapAccess::needsBoundsCheck in tryAddDisplacement (which should only be true for constant or masked indices).
(In reply to Luke Wagner [:luke] from comment #4)
> dougc: by any chance does the problem you're seeing go away when you create
> a fresh profile?

Yes, it still crashes with a fresh profile. It reports compiling the asm.js module, not just loading from the cache. Massive modifies the asm.js code to frustrate caching.

> I ask b/c the displacement 0x97a8 is > AsmJSCheckedImmediateRange which
> shouldn't be possible when !MAsmJSHeapAccess::needsBoundsCheck in
> tryAddDisplacement (which should only be true for constant or masked
> indices).

Guessing that it's derived that the bounds check is not needed. It should be possible to write a simple test for this. Note the value of r8 which as an int32 would be negative and adding the offset would make the 32 bit index positive. The problem might be that r8d was not sign extended when used as r8 in the 64 bit address mode calculation so adding the offset gives an unexpected value.
Well, we actually *want* all int32's that flow into loads/stores to be zero in the high uint32; this guarantees they hit w/in the 4gb+AsmJSCheckedImmediateRange when interpreted as a signed 64-bit index in the load/store.

Could you try commenting out the removeBoundsCheck() calls in jit/RangeAnalysis.cpp for asm.js to see if that is the culprit?  I forgot this is the third case where we can have !needsBoundsCheck.
This looks like an example, but is does not crash the js shell here. Range analysis can derive that the bounds check is not necessary so the optimization is applied. When %rax is 0xffffffff this would appear to access memory well beyond the array buffer, and beyond the reserved area?

 movsbl   0xd000(%r15,%rax,1), %eax
Ah, is the issue here that RangeAnalysis needs to take into account MAsmJSHeapAccess::offset()?
(In reply to Luke Wagner [:luke] from comment #6)
> Well, we actually *want* all int32's that flow into loads/stores to be zero
> in the high uint32; this guarantees they hit w/in the
> 4gb+AsmJSCheckedImmediateRange when interpreted as a signed 64-bit index in
> the load/store.

Yes, but then the assumption that '4gb+AsmJSCheckedImmediateRange' is adequate might not be valid, and it might need 8gb?

> Could you try commenting out the removeBoundsCheck() calls in
> jit/RangeAnalysis.cpp for asm.js to see if that is the culprit?  I forgot
> this is the third case where we can have !needsBoundsCheck.

Disabling the dehoisting optimization avoids the crash. The needsBoundsCheck optimization does not look like the cause, rather it tickles the problem. It appears necessary to avoid the dehoisting optimization on 64-bit when the pre-offset index can be negative, and perhaps this could be flagged during range analysis too, but this will hurt performance on this code pattern. This is also the solution proposed for V8 which does not use the memory protection scheme, see 'Check that the argument index is positive to ensure that the loss of truncate is not an issue.' in https://codereview.chromium.org/860283004/

If the allocation can be expanded to 8gb then this would be better for performance?

Note that this code pattern frustrates pointer-masking too, and requires the access be emitted as a[(i+c)&m] which will result in less optimal code. It's a challenging pattern and it's importance was missed.
(In reply to Douglas Crosher [:dougc] from comment #9)
> Yes, but then the assumption that '4gb+AsmJSCheckedImmediateRange' is
> adequate might not be valid, and it might need 8gb?

No, the intention was that we only allow immediates < AsmJSCheckedImmediateRange.  (We could of course have AsmJSCheckedImmediateRange = 4gb.)

> > Could you try commenting out the removeBoundsCheck() calls in
> > jit/RangeAnalysis.cpp for asm.js to see if that is the culprit?  I forgot
> > this is the third case where we can have !needsBoundsCheck.
> 
> The needsBoundsCheck
> optimization does not look like the cause, rather it tickles the problem.

Stepping through the example in comment 7, this does appear to be the cause: RangeAnalysis::analyze is calling removeBoundsCheck() when it shouldn't.  minHeapLength is 65536 (due to the constant store) and, despite the fact that the result can clearly be negative, range->lower() == 53120 and range->upper() == 53375.  I don't know enough about the invariants of range analysis to know if this is a bug in range analysis or the interpretation of the results, but given that the test includes (range->lower() >= 0), it seems like the former.

On a side note, instrumenting right now and running over all the humble bundle and Unity demos, this range analysis only seems to call removeBoundsCheck() (when needsBoundCheck() was previously true) on <1% of total AsmJSHeapAccesses.
(In reply to Luke Wagner [:luke] from comment #10)
...
> > > Could you try commenting out the removeBoundsCheck() calls in
> > > jit/RangeAnalysis.cpp for asm.js to see if that is the culprit?  I forgot
> > > this is the third case where we can have !needsBoundsCheck.
> > 
> > The needsBoundsCheck
> > optimization does not look like the cause, rather it tickles the problem.
> 
> Stepping through the example in comment 7, this does appear to be the cause:
> RangeAnalysis::analyze is calling removeBoundsCheck() when it shouldn't. 

The index is always within bounds in the test example, so it is correct to remove the bounds check. The naming of 'needsBoundsCheck' leads to endless confusion and I really wish it were renamed to provenInBounds or something similar.

> minHeapLength is 65536 (due to the constant store) and, despite the fact
> that the result can clearly be negative, range->lower() == 53120 and
> range->upper() == 53375.  I don't know enough about the invariants of range
> analysis to know if this is a bug in range analysis or the interpretation of
> the results, but given that the test includes (range->lower() >= 0), it
> seems like the former.

It's only the pre-offset index that can be negative, the 'i' in 'i+c'. The bounds check tests the index 'i+c', not 'i'. I don't see an issue with range analysis. If the protected area is not extended to 8gb then the suggestion is to check the range of 'i' in 'i+c' and skip the addressing mode matching when 'i' can be negative. There might have been a test for this in the past, in the x64 lowering code, but it has been reworked.

> On a side note, instrumenting right now and running over all the humble
> bundle and Unity demos, this range analysis only seems to call
> removeBoundsCheck() (when needsBoundCheck() was previously true) on <1% of
> total AsmJSHeapAccesses.

The loop analysis has never been extended to deal with a stride other than one and it might not be working with the asm.js patterns. Lots of work to do here. Try code generated by Emscripten with the POINTER_MASKING option and you'll see most bounds checking is removed.
(In reply to Douglas Crosher [:dougc] from comment #11)
> The naming of 'needsBoundsCheck' leads to endless
> confusion and I really wish it were renamed to provenInBounds or something
> similar.

Patches welcome.  I don't think the name isn't the source of the confusion here.  Rather, I think the root bug is the mistaken assumption that if 'i+c' is in-bounds, then so is 'i' (and thus we can use any immediate displacement that fits (which is false in the case where i is negative or large positive with large c)) combined with the fact that 64-bit effective addressing uses 64-bit arith so out-of-bounds i+c just goes farther out-of-bounds instead of going back in-bounds as it would on 32-bit.

I agree that the most direct solution is to map 8gb and I'm inclined to do that.  I think we could also remove AsmJSCheckedImmediateRange since it'd no longer be necessary.

> It's only the pre-offset index that can be negative, the 'i' in 'i+c'. The
> bounds check tests the index 'i+c', not 'i'. I don't see an issue with range
> analysis. 

Ahh, I see, I had assumed range analysis was running after we'd folded the +c into the heap access but I see now that EAA runs right after and ins->ptr() is still the add.

> The loop analysis has never been extended to deal with a stride other than
> one and it might not be working with the asm.js patterns. Lots of work to do
> here. Try code generated by Emscripten with the POINTER_MASKING option and
> you'll see most bounds checking is removed.

But I assume that removal happens a priori in CheckArrayAccess(), not in RangeAnalysis::analyze?
On PTO thus week, but at a quick read through the conclusion here seems right. This is a case I missed in bug 986981. Extending the guard region isn't the only possible fix, but it seems like it should work. Also, it'd only be 6GiB, not 8, since the displacement is signed. Negative offsets are a different kettle of fish.
6gb; sold!
How bad is this, security-wise?  Would this allow any kind of controllable crash, or is it always going to crash on a particular address?
Carefully crafted web content can read and write arbitrary memory within a 2 GiB region in the address space immediately following the asm.js heap area, on x64.

The problem was introduced in bug 986981.
[Tracking Requested - why for this release]: sec-critical in the same general code area as one of our Pwn2Own exploits, so more people will be looking at it.  Plus the demo crash is in a standard benchmark.
Attached patch ao-fix.patch (obsolete) — Splinter Review
I am currently testing this patch.
Assignee: nobody → sunfish
Attached patch ao-fix.patch (obsolete) — Splinter Review
This version also adds a testcase. Posting for review now.
Attachment #8589755 - Attachment is obsolete: true
Attachment #8589785 - Flags: review?(luke)
Comment on attachment 8589785 [details] [diff] [review]
ao-fix.patch

Review of attachment 8589785 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/EffectiveAddressAnalysis.h
@@ +20,5 @@
> +    template<typename MAsmJSHeapAccessType>
> +    bool TryAddDisplacement(MAsmJSHeapAccessType *ins, int32_t o);
> +
> +    template<typename MAsmJSHeapAccessType>
> +    void AnalyzeAsmHeapAccess(MAsmJSHeapAccessType* ins);

Since this is non-static, I think the correct naming style is lowercase first letter of member functions.

::: js/src/jit/MIRGenerator.h
@@ +259,5 @@
> +        if (sizeof(intptr_t) == sizeof(int32_t))
> +            return !access->needsBoundsCheck();
> +
> +        // Otherwise, only allow the checked size.
> +        return false;

It'd be nice to document our case analysis a bit more to explain what we want to assume and why that's not true in 64-bit effective addressing.  Also, can this function return the immediate limit instead of a bool (so we can more directly see the connection with this explanation)?
Attachment #8589785 - Flags: review?(luke) → review+
Attached patch ao-fix.patchSplinter Review
Attached is an updated patch which addresses your review comments here and from irc. Carrying forward r+.
Attachment #8589785 - Attachment is obsolete: true
Attachment #8589875 - Flags: review+
Comment on attachment 8589875 [details] [diff] [review]
ao-fix.patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

Once you know how to trigger the bug, accessing arbitrary memory in a 2 GiB region after the asm.js heap on x64 is pretty easy (and the included testcase demonstrates how). Finding something sensitive in that region without crashing the process probably requires some luck on systems with ASLR, especially given that this affects x64 only.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

Yes, the test included in the patch paints a bulls-eye on the security problem. I can take the test out for now if that'd be better.

Which older supported branches are affected by this flaw?

mozilla-aurora

If not all supported branches, which bug introduced the flaw?

bug 986981

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

I don't, but it'd be fairly easy to create, and pretty low risk. This code hasn't changed a lot since the aurora branch.

How likely is this patch to cause regressions; how much testing does it need?

Pretty unlikely. The main change in this patch is pretty straightforward.
Attachment #8589875 - Flags: sec-approval?
Comment on attachment 8589875 [details] [diff] [review]
ao-fix.patch

sec-approval+ for trunk. If this applies "as is" to Aurora, please nominate it for that as well.
Attachment #8589875 - Flags: sec-approval? → sec-approval+
And a followup, https://hg.mozilla.org/integration/mozilla-inbound/rev/bd9cb02d4da8

to fix an assert which I had rearranged to avoid a compiler warning, incorrectly, and it failed on ARM.
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #26)
> Backed out for OOM failures.

Notice it failed on a 32 bit machine, so might not have been caused by the actual patch to the compiler which would only have affected 64-bit, but rather the tests using too much memory?

The amount of memory used in some of these test files caused problems testing on ARM devices with limited memory (which was necessary before we had the simulator). Perhaps just split the test file.
The problem was in the testcase, which attempted to create a very large array in an attempt to maximize the likelyhood of detecting a problem on x64, but it led to OOMs on x32. I've now reduced the size of the array in the testcase, which fixes the problem on Windows XP on try.

https://hg.mozilla.org/integration/mozilla-inbound/rev/a1b728c86053
https://hg.mozilla.org/mozilla-central/rev/a1b728c86053
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Attached patch aurora.patchSplinter Review
Approval Request Comment
[Feature/regressing bug #]: 986981
[User impact if declined]: sec-critical bug
[Describe test coverage new/current, TreeHerder]: TreeHerder
[Risks and why]: Pretty low-risk; the actual patch here is fairly simple, and it has seen a lot of testing.
[String/UUID change made/needed]: none
Attachment #8597540 - Flags: approval-mozilla-aurora?
Attachment #8597540 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
AWFY detected a regression/improvement on:
- slave: Mac OS X 10.10 64-bit (Mac Pro, shell)
- mode: Ion

Regression(s)/Improvement(s):
- asmjs-apps: zlib-throughput: -0.79% (improvement)

Recorded range:
- http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=e08e45fd8922&tochange=a1b728c86053

More details: http://arewefastyet.com/regressions/#/regression/734156
Group: javascript-core-security → core-security
Group: core-security
You need to log in before you can comment on or make changes to this bug.