OdinMonkey: Reproducible Nightly crashes with ArrayBuffer and Worker

RESOLVED FIXED in mozilla27

Status

()

defect
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: sstangl, Assigned: luke)

Tracking

unspecified
mozilla27
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 3 obsolete attachments)

Reporter

Description

6 years ago
In attempting to write basic prototypes for SharedArrayBuffer, I appear to have accidentally written a program that 100% reproducibly crashes Nightly x86_64. The generated code attempts to mimic the provided "main.c" logic, but supplies a normal ArrayBuffer instead of a SharedArrayBuffer. The crash appears to be deep within Odin JITted code. Probably s-c.

https://crash-stats.mozilla.com/report/index/840dbd74-7e95-41df-bd80-a4d6d2131010

Interestingly, the same codebase also always crashes the latest stable Chrome build.
Assignee

Comment 1

6 years ago
Ah hah, this is a recent regression from the constant load/store optimization: on x64, when an ArrayBuffer is neutered (ArrayBufferObject::neuterAsmJSBuffer), we mprotect the entire region so that *every* load/store faults, simulating byteLength == 0.  The crash (which is not s-s since it's guaranteed to be inside this protected range) is because the AsmJSFaultHandler only handles faults at known heap accesses (to avoid covering up real bugs) and bug 865516 stopped noting constant loads/stores.  My fault (pun intended) for not remembering during review.
Group: core-security
Assignee

Comment 2

6 years ago
Posted patch fix-x64-neuter-crash (obsolete) — Splinter Review
Assignee: nobody → luke
Status: NEW → ASSIGNED
Attachment #815390 - Flags: review?(dtc-moz)
Assignee

Updated

6 years ago
Comment 0 is private: false
Comment on attachment 815390 [details] [diff] [review]
fix-x64-neuter-crash

Review of attachment 815390 [details] [diff] [review]:
-----------------------------------------------------------------

The patch looks fine in terms of noting the heap accesses.

I have some reservations about allowing the 'neutering' of an asm.js heap buffer while the code is still live, and perhaps it would be better to simply not allow this to occur.  I presume this occurs in the 'steal buffer' paths.  The heap length is baked into the code in many ways, and it's not really possible to change the buffer length to zero - all the same issues with resizing the heap would apply.

Is it really necessary to note all the heap access in production code?  Could the fault handler just assume that a fault within the heap buffer, from code within the asm.js code, is a valid heap reference?  The handler could check that the code is an expected pattern, which it would need to do anyway.
Attachment #815390 - Flags: review?(dtc-moz) → review+
While your looking over this, consider:

ArrayBufferObject::stealContents
...
    // Neuter the donor ArrayBufferObject and all views of it
    ArrayBufferObject::setElementsHeader(header, 0);

Wouldn't this clear the flag indicating that the buffer was a special mmapped asm.js buffer, and thus lead to it calling 'free' to release the buffer rather than special x64 release function than unmaps the buffer?

Also the call ArrayBufferObject::setElementsHeader(newheader, length) just above appears redundant as the length is set by AllocateArrayBufferContents?
Assignee

Comment 5

6 years ago
(In reply to Douglas Crosher [:dougc] from comment #3)
> I have some reservations about allowing the 'neutering' of an asm.js heap
> buffer while the code is still live, and perhaps it would be better to
> simply not allow this to occur.

Well, "allowing" isn't really a choice since we don't have a great way to prevent this, atm.  One thought was to require the incoming ArrayBuffer to be frobbed in some way that makes neutering impossible but there's not a clean, obvious way I can see.

> paths.  The heap length is baked into the code in many ways, and it's not
> really possible to change the buffer length to zero - all the same issues
> with resizing the heap would apply.

Yes, there is technically a safe-but-incorrect bug on ARM/x86 atm (that's the TODO in the !x64 neuterAsmJSArrayBuffer), but all we'd need to do is maintain a list of AsmJSModules so we could go down the list and re-AsmJSModule::initHeap any linked to the given ArrayBuffer.

> Is it really necessary to note all the heap access in production code? 
> Could the fault handler just assume that a fault within the heap buffer,
> from code within the asm.js code, is a valid heap reference?

For loads, we need to know the destination register.  Doing "just enough decompilation" on x86 seems really unpleasant.

(In reply to Douglas Crosher [:dougc] from comment #4)

But arg, you are correct.  I've been meaning to add a shell builtin so we could actually, ya know, test this.  I'll file a bug.
(In reply to Luke Wagner [:luke] from comment #5)
> (In reply to Douglas Crosher [:dougc] from comment #3)
> > I have some reservations about allowing the 'neutering' of an asm.js heap
> > buffer while the code is still live, and perhaps it would be better to
> > simply not allow this to occur.
> 
> Well, "allowing" isn't really a choice since we don't have a great way to
> prevent this, atm.  One thought was to require the incoming ArrayBuffer to
> be frobbed in some way that makes neutering impossible but there's not a
> clean, obvious way I can see.

There is room to add flags to the elements header, but perhaps just bend the spec a little for asm.js code and do not neuter buffers flagged as asm.js buffers.  Might it be cleaner to throw an exception if attempting to transfer an asm.js heap buffer.

It would seem that the x64 solution could work for the x86 and ARM. So long as the complete region of memory reserved for the heap buffer is retained.  It could be protected and reads and writes to a zero length array could be emulated in the signal handler.  The memory could as least be discarded to reduce the RSS, but it would still use virtual address space which is a limited resource on the 32 bit archs.

There are compiled in assumptions about the heap length (bounds checks that have been optimized away etc) which require the complete virtual address region of the heap to be retained.

If the heap buffer were to be shared between workers then it is not clear what 'neutering' would be expected to do, and there might be little to gain from the complexity of supporting this.

> > paths.  The heap length is baked into the code in many ways, and it's not
> > really possible to change the buffer length to zero - all the same issues
> > with resizing the heap would apply.
> 
> Yes, there is technically a safe-but-incorrect bug on ARM/x86 atm (that's
> the TODO in the !x64 neuterAsmJSArrayBuffer), but all we'd need to do is
> maintain a list of AsmJSModules so we could go down the list and
> re-AsmJSModule::initHeap any linked to the given ArrayBuffer.

Not sure if that would help.  Re-patching the bounds checks would not work because there are also compiled-in optimizations and many will have been optimized away.

But the x64 approach would appear to be a possibility, but still a lot of complexity for what seems like little practical gain, and perhaps not event practical if the heap buffers are shared between workers.
Assignee

Comment 7

6 years ago
It's not the bounds checks we have to patch on x86, it's the base addresses which we would repoint to a newly-mmaped-PROT_NONE region so that every load/store faults.  This would allow us to free/transfer the original buffer.  (We could almost do the same for x64 except for the annoyance that clients are going to try to free() it and we need them to munmap.)  It is very annoying, but so far we've held the line on preserving JS semantics with this x86 bug being the only exception I know of and that is useful.
(In reply to Luke Wagner [:luke] from comment #7)
> It's not the bounds checks we have to patch on x86, it's the base addresses
> which we would repoint to a newly-mmaped-PROT_NONE region so that every
> load/store faults.  This would allow us to free/transfer the original
> buffer.  (We could almost do the same for x64 except for the annoyance that
> clients are going to try to free() it and we need them to munmap.)  It is
> very annoying, but so far we've held the line on preserving JS semantics
> with this x86 bug being the only exception I know of and that is useful.

Ok, thank you for the clarification.  The cons:

1. This would still use virtual address space as the newly-mmaped-PROT_NONE region would need to be as large as the heap length for the benefit of the baked in heap length assumptions.  However it might be possible to share this region.  The ability to transfer the heaps would be most useful for larger heaps, but for large heaps on 32 bit archs it becomes more likely that a PROT_NONE region will not be available, which might limit the practical usefulness anyway.

2. Constraining the asm.js compiled code to support synchronous changes to the heap base address would limit some useful optimizations such as hoisting the heap base pointer.  This optimization is not currently performed by Odin, but perhaps a language spec. should look ahead.

Asm.js code is optimized to work with one heap buffer, so can not transferred data in an array buffer efficiently, so limiting the transfer of asm.js heap buffers might not be such a significant implementation limitation.  Perhaps the shared buffers will solve the use cases for asm.js code.

The current state of the code does not support the transfer of the heap buffer.  The x64 supports a copying of the buffer, but it appears to be buggy and even if fixed would not be much more useful than just copying.  Thus perhaps an alternative fix for this bug would be to simply generate an error reporting an implementation limitation when attempting to transfer an asm.js heap buffer, and perhaps this would be appropriate for the x86 and ARM now.
Assignee

Comment 9

6 years ago
(In reply to Douglas Crosher [:dougc] from comment #8)
> Thus perhaps an alternative fix for this bug would be to simply generate an
> error reporting an implementation limitation when attempting to transfer an
> asm.js heap buffer, and perhaps this would be appropriate for the x86 and
> ARM now.

That still walks the line of changing JS semantics, even if "allocation overflow" exceptions can technically pop up almost everywhere.  I'll send a mail to asmjs@mozilla.com.
Assignee

Comment 10

6 years ago
Ok, after some discussion, for now, I agree with Douglas, let's throw an internal error on an attempt to neuter an ArrayBuffer that is actively being used by asm.js on the stack or on an attempt to call an asm.js method whose linked heap has been neutered.
Assignee

Comment 11

6 years ago
Adding a 'neuter()' shell testing function in another bug to avoid cluttering this bug with pre-existing bugs found by fuzzing using neuter.
Depends on: 929786
Assignee

Comment 12

6 years ago
Posted patch fix-asm.js-x64-neuter (obsolete) — Splinter Review
This patch fixes the bug Douglas pointed out in comment 4.
Attachment #820689 - Flags: review?(dtc-moz)
Assignee

Comment 13

6 years ago
Posted patch throw-on-live-neuter (obsolete) — Splinter Review
This patch throws as described in comment 10 and thus obsoletes the previous fix.
Attachment #815390 - Attachment is obsolete: true
Attachment #820690 - Flags: review?(dtc-moz)
Assignee

Comment 14

6 years ago
Hah, coincidentally, Steve just changed all this code a few minutes ago, fixing the one of these bugs in the process.  Probably Steve should review this new patch, then.  (See comment 10 for what this patch is trying to do and previous comments for discussion.)
Attachment #820689 - Attachment is obsolete: true
Attachment #820690 - Attachment is obsolete: true
Attachment #820689 - Flags: review?(dtc-moz)
Attachment #820690 - Flags: review?(dtc-moz)
Attachment #820713 - Flags: review?(sphink)
Comment on attachment 820713 [details] [diff] [review]
fix-asm.js-neutering

Review of attachment 820713 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/AsmJSLink.cpp
@@ +335,5 @@
> +    // which is normally immutable except for the neuter operation that occurs
> +    // when an ArrayBuffer is transfered. Throw an internal error if we try to
> +    // run with a neutered heap.
> +    if (module.maybeHeapBufferObject() && module.maybeHeapBufferObject()->isNeutered()) {
> +        js_ReportOverRecursed(cx);

Why this particular error?

::: js/src/vm/TypedArrayObject.cpp
@@ +331,5 @@
>      return reinterpret_cast<OldObjectRepresentationHack*>(obj->getElementsHeader())->views;
>  }
>  
> +bool
> +ArrayBufferObject::neuterViews(JSContext *cx)

\o/

@@ +400,3 @@
>      uint32_t byteLen = 0;
>      updateElementsHeader(getElementsHeader(), byteLen);
> +    getElementsHeader()->setIsNeuteredBuffer();

Why is this 2nd call necessary? Seems like you'd only need one or the other, doesn't matter which. (Probably a leftover from before I fixed the inadvertent flags zeroing.)

@@ +564,5 @@
> +    }
> +    if (!act)
> +        return true;
> +
> +    // TODO: attempt to handle this with sufficiently clever hackery

What is "this", exactly? Neutering a buffer while it is bound to a module with a live frame on the stack, I guess?

Please file a bug & convert to a bug comment, and be a little more explicit.
Attachment #820713 - Flags: review?(sphink) → review+
Assignee

Comment 16

6 years ago
(In reply to Steve Fink [:sfink] from comment #15)
> Why this particular error?

It is somewhat arbitrary, but I picked it because too-much-recursion exceptions do pop out of calls.  js_ReportOutOfMemory is special, so I didn't want to use it.  js_ReportAllocationOverflow was another ambiguous choice.

> @@ +400,3 @@
> >      uint32_t byteLen = 0;
> >      updateElementsHeader(getElementsHeader(), byteLen);
> > +    getElementsHeader()->setIsNeuteredBuffer();
> 
> Why is this 2nd call necessary? Seems like you'd only need one or the other,
> doesn't matter which. (Probably a leftover from before I fixed the
> inadvertent flags zeroing.)

Oh hah, some sort of crazy merge conflict when I was folding patches...
https://hg.mozilla.org/mozilla-central/rev/dc598f50a09a
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
You need to log in before you can comment on or make changes to this bug.