Closed Bug 979594 Opened 6 years ago Closed 5 years ago

Atomics and locks for SharedArrayBuffer: Plain JS

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: sstangl, Assigned: lth)

References

(Blocks 7 open bugs)

Details

(Keywords: dev-doc-complete, Whiteboard: [DocArea=JS])

Attachments

(3 files, 17 obsolete files)

61.45 KB, patch
luke
: review+
Details | Diff | Splinter Review
112.22 KB, patch
sstangl
: review+
Details | Diff | Splinter Review
25.57 KB, patch
sstangl
: review+
dougc
: review+
Details | Diff | Splinter Review
Basic prototypes for SharedArrayBuffer would like mutexes and condition variables.
(This bug will probably morph but it's a good place to start.)
Assignee: nobody → lhansen
Attached patch WIP: Add Atomics class. (obsolete) — Splinter Review
WIP patch that adds a global Atomics object ala Math, in jsatomics.cpp. Atomics.inc() inline support for x86 is given; the rest are not yet inlined. OdinMonkey doesn't recognize the Atomics class, but that's easy to add.
(In reply to Sean Stangl [:sstangl] from comment #2)
> Created attachment 8464954 [details] [diff] [review]
> WIP: Add Atomics class.
> 
> WIP patch that adds a global Atomics object ala Math, in jsatomics.cpp.
> Atomics.inc() inline support for x86 is given; the rest are not yet inlined.
> OdinMonkey doesn't recognize the Atomics class, but that's easy to add.

The atomic operations are great to have. It will be interesting to see what frameworks people can develop on top of these. The compare-and-swap operation would be a good start and for asm.js the other atomic ops could just be implemented on top of this.

For translated C code, are you exploring implementing a lock wait operation? There's no wait or sleep function in JS? Also condition variable signals? Or is it expected that code will be reworked to avoid locks and condition variables?
(In reply to Douglas Crosher [:dougc] from comment #3)
> 
> For translated C code, are you exploring implementing a lock wait operation?
> There's no wait or sleep function in JS? Also condition variable signals? Or
> is it expected that code will be reworked to avoid locks and condition
> variables?

There is an evolving spec, and more code will appear here soon.  I expect to publish a draft spec once there is some initial POC code.

For waiting, the current idea is to provide a futex-like API that allows the implementation of higher-level locks and condition variables.  Futexes are too low-level for use by mortals but is probably the best substrate on which to build locks for both asm.js/emscripten and for hand-written JS.
Attached file Simple test case / sample of API use (obsolete) —
A simple program that distributes a SharedArrayBuffer among three workers that compute on that buffer.  Without mutual exclusion the computed values will be incorrect with very high probability.  With mutual exclusion they will be correct.

The program uses the proposed Atomics API, including futexes, to implement a simple mutex object that it then uses for mutual exclusion.
Provides Atomics.{compareExchange, load, store, fence, futexWait, futexWake}, and superficial specs for these.  Compiles on Linux based on mozilla-inbound a couple of days back, might also compile on Mac OS X (requires GCC / Clang intrinsics).

This is probably "roughly right" but there are some really gross hacks in the Worker code, which I don't know very well yet.

This is not well tested at all but it does behave in the desired way with the test case I just posted.

There is no JIT support whatsoever at this stage.
Looks neat! Here's some random thoughts..

The main thread can't block on a mutex, otherwise a worker could stall the thread. I notice you poll the finish of the workers with timeouts in the main thread - wonder how that is going to be treated?

All workers need to duplicately load the script sources. I remember sstangl needed to do the same in Emscripten pthreads experiments he was doing. That requirement will make spawning new threads super-costly and they'll eat up a _lot_ of memory in large Emscripten applications. Is there a way we could avoid this?

I'm probably stating something that Luke et al. have already thought out well in advance already: we'll need to have the atomics directly available in asm.js at some point, so that we can implement the higher primitives in straight asm.js and not have to jump out to non-asm.js code.
(In reply to Jukka Jylänki from comment #7)
> Looks neat! Here's some random thoughts..
> 
> The main thread can't block on a mutex, otherwise a worker could stall the
> thread. I notice you poll the finish of the workers with timeouts in the
> main thread - wonder how that is going to be treated?

That's just a stopgap.  There are several ways to slice this.  In the isolated case of the test program it's possible to post "done" values back to the main worker.  In general, the idea (see the Google Docs document for futexWaitCallback) is to allow the main thread to set up a callback on a futex in such a way that it receives the callback when there is a wakeup on it, but the call to set up the callback won't block.

> All workers need to duplicately load the script sources. I remember sstangl
> needed to do the same in Emscripten pthreads experiments he was doing. That
> requirement will make spawning new threads super-costly and they'll eat up a
> _lot_ of memory in large Emscripten applications. Is there a way we could
> avoid this?

I don't know, and it's probably out of scope for the SharedArrayBuffer work per se, but it's something we can think about.

> I'm probably stating something that Luke et al. have already thought out
> well in advance already: we'll need to have the atomics directly available
> in asm.js at some point, so that we can implement the higher primitives in
> straight asm.js and not have to jump out to non-asm.js code.

Yes, that will come.  What you're seeing here is just stuff I've hacked together for a POC this week.  Again, the Google Docs document outlines a lot of work items, including changes to Emscripten and OdinMonkey, and in general JIT support.  I haven't tried to estimate the work, but we're likely talking many weeks.

However, now that you're starting to see how things are hanging together you can at least start thinking about pthreads support and so on; rapid feedback from you (in the form you just sent) is highly useful.
Rebased to m-i tip + with proper diff header.
Attachment #8466229 - Attachment is obsolete: true
Thanks, that works! I can now continue on implementing atomics, mutexes and other primitives to the pthread API.
(In reply to Lars T Hansen [:lth] from comment #8)
> (In reply to Jukka Jylänki from comment #7)
> 
> > All workers need to duplicately load the script sources. I remember sstangl
> > needed to do the same in Emscripten pthreads experiments he was doing. That
> > requirement will make spawning new threads super-costly and they'll eat up a
> > _lot_ of memory in large Emscripten applications. Is there a way we could
> > avoid this?
> 
> I don't know, and it's probably out of scope for the SharedArrayBuffer work
> per se, but it's something we can think about.

Another answer is that as things are now, workers are not, and cannot be, threads, since there is a ceiling on how many workers that can be active in a domain at any one time.  Once that number of workers is active, the rest are queued, they are not run at all until other workers are closed.  (The max in the code is 10, but on desktop the number appears to be 20, presumably the default pref value.)  Without concurrency we'll have deadlocks.

At a minimum we'd need to be able to pass a flag to the Worker constructor asking for an error return if the worker cannot have a thread to run on.  Ideally there'd be a more sophisticated policy than a fixed number of threads per domain.
Test cases, TODO list, etc are accumulating here:
https://github.com/lars-t-hansen/moz-sandbox/tree/master/sab
Testing out atomics with the patch, I'm getting an error "Permission denied to access object" when I tried to call any of the atomics ops. I ended up polyfilling the atomics like shown in your sample, I wonder if that message was a result of them being unimplemented?

How about atomic ops on 64-bit types? Those are interesting and particularly used in games to solve the ABA problem in lock-free data structures (e.g. http://www.stroustrup.com/isorc2010.pdf ) so they would be very useful. The atomic ops on a Float32Array and a Float64Array are peculiar, and there is no native counterpart. Since we don't have a Int64Array, perhaps having an 64-bit atomic op on a In32Array that requires that the offset is even so that the address is 8-aligned?

Operating with atomics only on signed Arrays instead of unsigned feels peculiar, especially on bit ops which would effectively operate on an unsigned bit field. How is the highest bit treated there on signed inputs? Could the atomics work on both signed and unsigned typed arrays?

Nevertheless, I now have basic pthreads running with mutexes, which work as well. Next I'll try to extend the pthreads api to cover condition variables, thread-local storage and the other parts, and then move on to the non-pthread cases with e.g. C++11 atomics.

The ARM weak memory model that does not give sequential consistency is an interesting question. Currently we only have a fence that is a full ll+ls+sl+ss barrier. How about other fence types for more fine-grained control?
(In reply to Jukka Jylänki from comment #13)
> Testing out atomics with the patch, I'm getting an error "Permission denied
> to access object" when I tried to call any of the atomics ops. I ended up
> polyfilling the atomics like shown in your sample, I wonder if that message
> was a result of them being unimplemented?

If you mean add/sub/and/or/xor, they are not implemented in the patch above, but that work is finished and a patch is coming today, most likely.

> How about atomic ops on 64-bit types? Those are interesting and particularly
> used in games to solve the ABA problem in lock-free data structures (e.g.
> http://www.stroustrup.com/isorc2010.pdf ) so they would be very useful.

I agree they are useful.

> The atomic ops on a Float32Array and a Float64Array are peculiar, and there is
> no native counterpart. Since we don't have a Int64Array, perhaps having an
> 64-bit atomic op on a In32Array that requires that the offset is even so
> that the address is 8-aligned?

I need to investigate this more, because it comes down to what the underlying architectures can support, at least for the first cut.  That said, we probably need to support enough operations to support a reasonably straightforward translation of C++ code.

> Operating with atomics only on signed Arrays instead of unsigned feels
> peculiar, especially on bit ops which would effectively operate on an
> unsigned bit field. How is the highest bit treated there on signed inputs?
> Could the atomics work on both signed and unsigned typed arrays?

I don't understand this remark.  The compareExchange/load/store as well as add/sub/and/or/xor operations are supported on all unsigned and signed integer array types.  It is only the futex operations that are only supported on Int32Array.

> Nevertheless, I now have basic pthreads running with mutexes, which work as
> well. Next I'll try to extend the pthreads api to cover condition variables,
> thread-local storage and the other parts, and then move on to the
> non-pthread cases with e.g. C++11 atomics.

That's great!

> The ARM weak memory model that does not give sequential consistency is an
> interesting question. Currently we only have a fence that is a full
> ll+ls+sl+ss barrier. How about other fence types for more fine-grained
> control?

I have considered that to be very low priority indeed, but by all means put forward arguments for why we should add the extra complexity at this stage :)
(In reply to Lars T Hansen [:lth] from comment #14)
> 
> > How about atomic ops on 64-bit types? Those are interesting and particularly
> > used in games to solve the ABA problem in lock-free data structures (e.g.
> > http://www.stroustrup.com/isorc2010.pdf ) so they would be very useful.
> 
> I agree they are useful.
> 
> > The atomic ops on a Float32Array and a Float64Array are peculiar, and there is
> > no native counterpart. Since we don't have a Int64Array, perhaps having an
> > 64-bit atomic op on a In32Array that requires that the offset is even so
> > that the address is 8-aligned?
> 
> I need to investigate this more, because it comes down to what the
> underlying architectures can support, at least for the first cut.  That
> said, we probably need to support enough operations to support a reasonably
> straightforward translation of C++ code.

Some preliminary findings.

C++11 supports add/sub/and/or/xor only on integral (char and integer) types, including 64-bit integer types.  On some 32-bit platforms it's likely that the 64-bit accesses will have to be implemented by means of additional spinlocks (see next case) because there are no direct hardware mappings.  I believe there are also algorithms for two-word compare-and-swap that can be brought to bear.

C++11 supports compareExchange/load/store on arbitrary types, even aggregate types of arbitrary size.  Apart from some kind of "LOCK REP MOVS" type construction (that may or may not be valid) on x86 most platforms would implement these operations by something like a per-type or global spinlock and a memcpy (I need to read more).

I will note these issues in the evolving specification doc, but probably flag them as future extensions until the more fundamental parts of what we already have are stable (futex functionality, JIT support, Emscripten support).
This patch provides native support for Atomics.add/sub/and/or/xor, and fixes a number of bugs and makes coercion behavior consistent and probably correct.

The canonical spec for the Atomics API is now at the head of js/src/builtin/AtomicsObject.cpp (in this patch).  What's on Google Docs is considered stale at this time.

With this patch the test case is obsolete because it no longer needs the polyfill.  For the current test case, as well as other tests, go to my github repo referenced in an earlier comment.
Attachment #8466223 - Attachment is obsolete: true
Attachment #8466923 - Attachment is obsolete: true
One thing I was wondering about, which may be out of scope for this bug (or even orthogonal to it): would these interfaces give a way to interact (from generated code) with mozilla::Atomic objects? Is such a thing planned? (the way I'm envisioning this is that you would call an addressOf function to get some atomic object, and then take care to do the appropriate atomic ops on it in the generated code, e.g. loadPtrAtomic/storePtrAtomic).
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #17)
> One thing I was wondering about, which may be out of scope for this bug (or
> even orthogonal to it): would these interfaces give a way to interact (from
> generated code) with mozilla::Atomic objects?

Relatedly, using mozilla::Atomic to underpin all of this would be great, rather than digging around in more compiler intrinsics.  I think it's possible that the < 32-bit cases can be handled by MSVC, but nobody has done the digging to confirm it all works on XP.
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #17)

> One thing I was wondering about, which may be out of scope for this bug (or
> even orthogonal to it): would these interfaces give a way to interact (from
> generated code) with mozilla::Atomic objects? Is such a thing planned? (the
> way I'm envisioning this is that you would call an addressOf function to get
> some atomic object, and then take care to do the appropriate atomic ops on
> it in the generated code, e.g. loadPtrAtomic/storePtrAtomic).

That is not being contemplated as part of this work, and I expect some of the people who have (with a twinge of distaste?) given this project their blessing would recoil in horror at the prospect :)

Out of curiosity, do you have any concrete use cases for such an API?
(In reply to Nathan Froyd (:froydnj) from comment #18)

> Relatedly, using mozilla::Atomic to underpin all of this would be great,
> rather than digging around in more compiler intrinsics.  I think it's
> possible that the < 32-bit cases can be handled by MSVC, but nobody has done
> the digging to confirm it all works on XP.

I agree; the code in the patch is expedient but far from final.  For now I'll let the API evolve to meet expected needs and then we can deal with updating mozilla::Atomic down the line.
Attached patch WIP: Ion support for the atomics (obsolete) — Splinter Review
This substantially extends the second part of Sean's patch by adding MCallOptimize and Ion support for compareExchange, load, store, and fence, as well as the framework for the binary operations (lowest level not yet implemented for those).  The present patch probably does not apply quite cleanly above the previously posted patch because it depends on some names being made public.  (I'm posting this patch simply to make the work visible.)
Attachment #8464954 - Attachment is obsolete: true
Summary: Prototype locks for SharedArrayBuffer → Atomics and locks for SharedArrayBuffer
Comment on attachment 8468356 [details] [diff] [review]
WIP: Atomics API with add/sub/and/or/xor and many fixes

Patch is obsolete; go to patch queue on github (https://github.com/lars-t-hansen/atomics-queue).
Attachment #8468356 - Attachment is obsolete: true
Comment on attachment 8470024 [details] [diff] [review]
WIP: Ion support for the atomics

Patch is obsolete; go to patch queue on github (https://github.com/lars-t-hansen/atomics-queue).
Attachment #8470024 - Attachment is obsolete: true
(In reply to Lars T Hansen [:lth] from comment #16)
> 
> The canonical spec for the Atomics API is now at the head of
> js/src/builtin/AtomicsObject.cpp (in this patch).  What's on Google Docs is
> considered stale at this time.

The spec has been removed from that file and moved back to a shared, commentable Google Doc.  See links attached to bug 1054841.
This is a set of five patches that implement the Atomics object in SpiderMonkey.  I'm going to ask for a review now since I suspect we'll do a couple of rounds anyway, even though there are three work items remaining:

- these patches still use their own machine atomics, because the mozilla
  atomics are not good enough and i've not taken the time to attempt to
  flesh those out
- the asm.js signal handlers must reliably perform a memory fence on an
  out-of-bounds access and I've not yet written that code
- there's an annoying pessimization having to do with atomic accesses
  to uint32 arrays, where inlining only kicks in if the expected return
  type is double

In addition to that, the DOM code (which I'll also attach in its current form, but not for review) is not finished and thus there may be some API changes.
Attachment #8491500 - Flags: review?(luke)
Attachment #8491501 - Flags: review?(luke)
Attached patch Stubs for ARM code generation (obsolete) — Splinter Review
Attached patch Stubs for 'none' code generation (obsolete) — Splinter Review
Attachment #8491507 - Flags: review?(luke)
Attached patch DOM hooks (obsolete) — Splinter Review
This is clearly unfinished, notably there is no facility for breaking any lock when a worker is terminated.

I'm curious to know what you think about the approach and about the API between DOM and the JS engine.
Attachment #8491509 - Flags: feedback?(luke)
Comment on attachment 8491501 [details] [diff] [review]
Ion support for using Atomics from plain JS

Probably need a proper Ion reviewer for this one.  Handing to Jan, but Jan feel free to reassign as you see fit.
Attachment #8491501 - Flags: review?(luke) → review?(jdemooij)
(In reply to Luke Wagner [:luke] from comment #31)
> Comment on attachment 8491501 [details] [diff] [review]
> Ion support for using Atomics from plain JS
> 
> Probably need a proper Ion reviewer for this one.  Handing to Jan, but Jan
> feel free to reassign as you see fit.

That reminds me, there are some unexplored optimization opportunities in that patch.  The ones I know about are listed starting at row 77 here:
https://docs.google.com/spreadsheets/d/1PFa3aDxY6mffT8uoflCaFitX9lKj_Y4_aZwtMApIRiI/edit?usp=sharing
Keywords: dev-doc-needed
Whiteboard: [DocArea=JS]
Comment on attachment 8491501 [details] [diff] [review]
Ion support for using Atomics from plain JS

Review of attachment 8491501 [details] [diff] [review]:
-----------------------------------------------------------------

Forwarding to Sean. He's more familiar with SAB etc and I'll be on PTO until Thursday.
Attachment #8491501 - Flags: review?(jdemooij) → review?(sstangl)
Comment on attachment 8491500 [details] [diff] [review]
Atomics object + interpreter support

Review of attachment 8491500 [details] [diff] [review]:
-----------------------------------------------------------------

I started by looking at the futex stuff.  Generally looks quite nice, here is an initial round of comments and nits and maybe a bug.

::: js/src/builtin/AtomicsObject.cpp
@@ +27,5 @@
> + *
> + * If MOZ_HAVE_CXX11_ATOMICS is set we'll use C++11 atomics.
> + * Otherwise we'll fall back on gcc/Clang intrinsics.  If neither
> + * option is available then the build must disable shared memory, or
> + * compilation will fail with a predictable error.

So does msvc give us atomics (and, thus, do we get SharedArrayBuffer on msvc)?

@@ +90,5 @@
> +reportNoFutexes(JSContext *cx)
> +{
> +    JS_ReportErrorNumber(cx, js_GetErrorMessage, nullptr, JSMSG_ATOMICS_NOT_INSTALLED);
> +    return false;
> +}

These above 2 static functions should be capitalized.  (The commented out one should be removed.)

@@ +101,5 @@
> +    RootedObject obj(cx, &v.toObject());
> +    if (!obj->is<SharedTypedArrayObject>())
> +        return reportBadArrayType(cx);
> +    Rooted<SharedTypedArrayObject *> view(cx, &obj->as<SharedTypedArrayObject>());
> +    viewp.set(view);

Seems like you don't really need the intermediate Rooted 'view'.

@@ +111,5 @@
> +GetSharedTypedArrayIndex(JSContext *cx, Value v, Handle<SharedTypedArrayObject *> view, uint32_t* offset, bool* inRange)
> +{
> +    // The spec says "ToLength(ToNumber(v))", but for the purposes of
> +    // checking the range we can compare directly to the view's length,
> +    // we don't have to first clamp to 2^53-1.

I think it's more complicated than that; search for CanonicalNumericIndex+IntegerIndexedElementGet.  This definition has changed a bit and it may change in the future, so it'd be best to reuse the same predicate we've staked out for use in normal typed array views: IsTypedArrayIndex (even if it's not 100% what the currently says; we can update it).  This takes a jsid, so you'd want to use ValueToId to get that.  What's nice is that there should be a pretty fast through both these methods when the input isInt32.

@@ +673,5 @@
> +    JS::JSFutexAPI* fx = rt->futexAPI;
> +    if (!fx)
> +        return reportNoFutexes(cx);
> +    // if (!fx->isOnWorkerThread())
> +    //     return reportNotOnWorkerThread(cx);

Generally we try not to land commented code like this.

@@ +687,5 @@
> +        return false;
> +    double numberValue;
> +    if (!ToNumber(cx, valv, &numberValue))
> +        return false;
> +    int32_t value = (int32_t)numberValue;

I think this is equivalent (but faster and a bit more clear) to ToInt32(cx, valv, &value)

@@ +693,5 @@
> +    if (!ToNumber(cx, timeoutv, &timeout))
> +        return false;
> +    timeout = ToInteger(timeout);
> +    if (timeout < 0)
> +        timeout = 0;

There is a ToInteger(cx, HandleValue, double*) you can use that is equivalent to (and faster than) this.

@@ +701,5 @@
> +        return true;
> +    }
> +
> +    // This lock also protects the "waiters" field on SharedArrayRawBuffer.
> +    fx->lock();

Instead of requiring all return paths to fx->unlock(), can you introduce an AutoLockFutexAPI RAII class to do this for you?

@@ +720,5 @@
> +        js_ReportOutOfMemory(cx);
> +        return false;
> +    }
> +
> +    FutexWaiter w(SAB_ID, offset, token);

Since the list of waiters is per-SharedArrayRawBuffer, why is the SAB_ID necessary?  That is, shouldn't all waiters have the same SAB_ID in the list?

@@ +723,5 @@
> +
> +    FutexWaiter w(SAB_ID, offset, token);
> +    w.waiting = true;
> +    FutexWaiter *waiters = sarb->waiters();
> +    if (waiters) {

if (FutexWaiter *waiters = sarb->waiters()) {

@@ +729,5 @@
> +        w.back = waiters->back;
> +        waiters->back->lower_pri = &w;
> +        waiters->back = &w;
> +    }
> +    else {

} else {

@@ +736,5 @@
> +    }
> +
> +    bool retval = true;
> +    switch (fx->wait(timeout)) {
> +    case JS::JSFutexAPI::Woken:

SM style is to indent 'case' by two spaces.

@@ +798,5 @@
> +    if (!ToNumber(cx, countv, &count))
> +        return false;
> +    count = ToInteger(count);
> +    if (count < 0)
> +        count = 0;

ToInteger(cx, HandleValue, double*) here and in WakeOrReque.

@@ +907,5 @@
> +                    ++woken;
> +                    --count;
> +                }
> +            }
> +            else {

} else {

@@ +920,5 @@
> +                    c->lower_pri = last->lower_pri;
> +
> +                    // Make new adjacent nodes point to c.
> +                    last->lower_pri->back = c;
> +                    last->lower_pri = c;

Perhaps these 4 statements could instead be in a block with the comment "Insert c at the end of the list."

@@ +926,5 @@
> +                    // c is now last.
> +                    last = c;
> +                }
> +            }
> +        } while (iter != waiters);

Can we iloop here if offset1 == offset2?

::: js/src/jsapi.h
@@ +656,5 @@
> +    // Wake the thread represented by the token.  Returns true if the
> +    // thread was woken, false if the thread no longer exists.
> +    //
> +    // The lock must be held around this call, see lock() and unlock().
> +    virtual bool wake(WorkerToken *token) = 0;

It's probably worth spelling out that the awoken waiters will not run until after wake() returns and the caller of wake() calls unlock().  (The impl seems to depend on this, assuming that the waiter list does not mutate while calling wake().)

On the flip side, it'd be good to specify that the implementation of wake() can rely on the invariant that wake() is called at most once for any given 'token'.
Depends on: 1071618
Comment on attachment 8491500 [details] [diff] [review]
Atomics object + interpreter support

Review of attachment 8491500 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/AtomicsObject.cpp
@@ +167,5 @@
> +
> +    // CAS always sets oldval to the old value of the cell.
> +    // addr must be a T*, and oldval and newval should be variables of type T
> +
> +

Can you remove the second \n?

@@ +687,5 @@
> +        return false;
> +    double numberValue;
> +    if (!ToNumber(cx, valv, &numberValue))
> +        return false;
> +    int32_t value = (int32_t)numberValue;

Actually, I think there is a significant difference between ToInt32(v) and (int32_t)ToNumber(v): check out all the goofiness in ToInt32(double d) in SM.  My guess is that the cast is correct when numberValue is in the int32 range but we get into UB when the int32 is outside this range.
Attachment #8491500 - Flags: review?(luke) → review+
Comment on attachment 8491500 [details] [diff] [review]
Atomics object + interpreter support

Review of attachment 8491500 [details] [diff] [review]:
-----------------------------------------------------------------

Whoooops, I r+'d out of habit before clicking 'Publish'... still some open questions on this patch.

::: js/src/builtin/AtomicsObject.cpp
@@ +159,5 @@
> +    if (!ToNumber(cx, oldv, &oldCandidate))
> +        return false;
> +    double newCandidate;
> +    if (!ToNumber(cx, newv, &newCandidate))
> +        return false;

Given the previous comment, I think we want ToInt32 to take care of the double->int32 hop and then cast from there.

@@ +172,5 @@
> +#if defined(CXX11_ATOMICS)
> +#define CAS(T, addr, oldval, newval)                                    \
> +    do {                                                                \
> +        std::atomic_compare_exchange_strong(reinterpret_cast<std::atomic<T>*>(addr), &oldval, newval); \
> +    } while(0)

ooc, why do you need the do{}while(0)s here?  Or is it just habit for macros?

@@ +243,5 @@
> +bool
> +js::atomics_compareExchange(JSContext *cx, unsigned argc, Value *vp)
> +{
> +    CallArgs args = CallArgsFromVp(argc, vp);
> +    return atomics_compareExchange_impl(cx, args.get(0), args.get(1), args.get(2), args.get(3), args.rval());

Why are most natives separated into the native and the _impl native?  It seems like you could just as well have all the 'HandleValue objv = args.get(0);' as variable decls in a single function.  Perhaps you've seen this pattern in other natives in SM.  The usual reason we do this split is when we have a "non-generic method" which needs to do special backflips when called with a cross-compartment 'this': we need to enter the compartment of 'this' and re-call the _impl portion of the native.  None of these natives are methods (no 'this'), so that shouldn't be a problem here.
Attachment #8491500 - Flags: review+
Comment on attachment 8491509 [details] [diff] [review]
DOM hooks

Review of attachment 8491509 [details] [diff] [review]:
-----------------------------------------------------------------

Great to see the tests!

::: dom/base/nsJSEnvironment.cpp
@@ +631,5 @@
>                                    js_options_dot_str, this);
> +
> +    JSFutexAPIImpl *fx = new JSFutexAPIImpl(mContext); // TODO: OOM failure
> +    fx->init();                                        // TODO: OOM failure
> +    ::JS_SetRuntimeFutexAPI(sRuntime, fx);

IIUC, there is one nsJSContext per browsing context so one per tab/iframe/etc.  That means each context after the first will clobber the runtime futex API slot.  Probably you want to put this in nsJSContext::EnsureStatics (which is called once, for the main thread's JSRuntime), next to, e.g., JS::SetAsmJSCacheOps.  (Perhaps also you can put your new JSAPI in namespace JS?)  Also, you'd want to take a JSRuntime* as constructor argument (which is fortunately all you need, looking at the impl).

@@ +2759,5 @@
> +  // Sort of hacky, we really want module initialization to do this
> +  PRLock *l = PR_NewLock();
> +  if (!l)
> +    return false;
> +  if (!futex_lock.compareExchange(nullptr, l))

I think we can store a lock in global memory (just as a static of this file) and rely on creation of the main-thread's JSRuntime (which will call EnsureStatics) to happen-before creation of any workers.  You could assert this and verify with whoever does the final DOM review.  I wouldn't worry about destruction since there is one per process (we seem to already do this in other places: http://mxr.mozilla.org/mozilla-central/ident?i=sAllocationLock).

Alternatively, it occurred to me that, just as you have a newTokenForThread, you could have a newTokenForBuffer and this new data structure could own the lock.  As a bonus, you'd get finer-granularity locking which would reduce contention in cases where you had different workers pounding on different SABs.

@@ +2803,5 @@
> +  }
> +
> +  // A hack, but at least platform independent: 4000s is the max.
> +  if (timeout_ns > 4000000000.0)
> +    return JS::JSFutexAPI::ErrorTooLong;

Probably want to see what setTimeout() does and mimic that.

@@ +2819,5 @@
> +bool
> +JSFutexAPIImpl::wake(JS::JSFutexAPI::WorkerToken *token)
> +{
> +  JSRuntime *rt = static_cast<JSFutexAPIWorkerTokenImpl*>(token)->rt;
> +  PR_NotifyCondVar(static_cast<JSFutexAPIImpl*>(JS_GetRuntimeFutexAPI(rt))->cond_);

If all you need is the the condvar, can you just case the condvar* (or the JSFutexAPIImpl*) to the token?  Given that, perhaps we don't even need destroyToken?

::: dom/base/nsJSEnvironment.h
@@ +174,5 @@
>  
>    static bool DOMOperationCallback(JSContext *cx);
>  };
>  
> +class JSFutexAPIImpl : public JS::JSFutexAPI

Perhaps you could name this "JSPerThreadFutexImpl" or something to make it clear that the reason this is per-JSRuntime is that it isn't just an API, it contains per-thread data.  It'd also be good to throw JS_AbortIfWrongThread into all the member functions to make sure, e.g., this doesn't accidentally get called from a helper thread.
Attachment #8491509 - Flags: feedback?(luke) → feedback+
(In reply to Luke Wagner [:luke] from comment #34)

Style and other suggestions will be addressed as suggested, by and large.

> So does msvc give us atomics (and, thus, do we get SharedArrayBuffer on
> msvc)?

We get atomics starting with VC++ 2012, which provides the C++11 atomics.  For VC++ 2010 and earlier we'll need to build something on inline assembly, most likely.  I believe we're now building with 2012 by default, so the point is probably moot.  I'd be curious to know your opinion about that.

> @@ +111,5 @@
> > +GetSharedTypedArrayIndex(JSContext *cx, Value v, Handle<SharedTypedArrayObject *> view, uint32_t* offset, bool* inRange)
> 
> I think it's more complicated than that; search for
> CanonicalNumericIndex+IntegerIndexedElementGet.

Neither of those functions appear to exist.  Perhaps they disappeared when Waldo boiled the TypedArray ocean last week.

The definition here follows the SharedTypedArray spec, which follows the ES6 TypedArray spec.  I'll look around a little bit more for clues as to what to do, but so far as I know this code is basically correct.

> @@ +720,5 @@
> > +        js_ReportOutOfMemory(cx);
> > +        return false;
> > +    }
> > +
> > +    FutexWaiter w(SAB_ID, offset, token);
> 
> Since the list of waiters is per-SharedArrayRawBuffer, why is the SAB_ID
> necessary?  That is, shouldn't all waiters have the same SAB_ID in the list?

Nice catch, it's a holdover from an old design that had a global list.

> @@ +926,5 @@
> > +                    // c is now last.
> > +                    last = c;
> > +                }
> > +            }
> > +        } while (iter != waiters);
> 
> Can we iloop here if offset1 == offset2?

No, because the termination condition is iter == waiters; iter always advances through the list; and waiters is constant and points to the sentinel node.

I'm fairly sure I considered the case where the two locations are the same but I'm not completely sure.  I'll double-check.
(In reply to Luke Wagner [:luke] from comment #36)
> Comment on attachment 8491500 [details] [diff] [review]
> Atomics object + interpreter support
> 
> Review of attachment 8491500 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +172,5 @@
> > +#if defined(CXX11_ATOMICS)
> > +#define CAS(T, addr, oldval, newval)                                    \
> > +    do {                                                                \
> > +        std::atomic_compare_exchange_strong(reinterpret_cast<std::atomic<T>*>(addr), &oldval, newval); \
> > +    } while(0)
> 
> ooc, why do you need the do{}while(0)s here?  Or is it just habit for macros?

Just habit.

> @@ +243,5 @@
> > +bool
> > +js::atomics_compareExchange(JSContext *cx, unsigned argc, Value *vp)
> > +{
> > +    CallArgs args = CallArgsFromVp(argc, vp);
> > +    return atomics_compareExchange_impl(cx, args.get(0), args.get(1), args.get(2), args.get(3), args.rval());
> 
> Why are most natives separated into the native and the _impl native?  It
> seems like you could just as well have all the 'HandleValue objv =
> args.get(0);' as variable decls in a single function.  Perhaps you've seen
> this pattern in other natives in SM.  The usual reason we do this split is
> when we have a "non-generic method" which needs to do special backflips when
> called with a cross-compartment 'this': we need to enter the compartment of
> 'this' and re-call the _impl portion of the native.  None of these natives
> are methods (no 'this'), so that shouldn't be a problem here.

A combination of following the dominant style, a holdover from an earlier version where I needed the function pointer, and not knowing what I'd be needing in subsequent patches.  I'm happy to change this, it does not look like subsequent patches need the split.
(In reply to Lars T Hansen [:lth] from comment #38)
> We get atomics starting with VC++ 2012, which provides the C++11 atomics. 
> For VC++ 2010 and earlier we'll need to build something on inline assembly,
> most likely.  I believe we're now building with 2012 by default, so the
> point is probably moot.  I'd be curious to know your opinion about that.

It's worth noting that we don't define MOZ_HAVE_CXX11_ATOMICS for VC++ 2012, because of a severe bug in its implementation:

http://mxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#44
(In reply to Nathan Froyd (:froydnj) from comment #40)
> (In reply to Lars T Hansen [:lth] from comment #38)
> > We get atomics starting with VC++ 2012, which provides the C++11 atomics. 
> > For VC++ 2010 and earlier we'll need to build something on inline assembly,
> > most likely.  I believe we're now building with 2012 by default, so the
> > point is probably moot.  I'd be curious to know your opinion about that.
> 
> It's worth noting that we don't define MOZ_HAVE_CXX11_ATOMICS for VC++ 2012,
> because of a severe bug in its implementation:
> 
> http://mxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#44

Thanks, that's useful to know.  I saw the discussion but obviously I did not internalize the consequences.

I'll add an item to my to-do list to worry about this.  That may just amount to including <atomic> myself for VC++ 2012, since at the moment I only worry about the sequentially consistent memory model.
(In reply to Luke Wagner [:luke] from comment #35)

> Actually, I think there is a significant difference between ToInt32(v) and
> (int32_t)ToNumber(v): check out all the goofiness in ToInt32(double d) in
> SM.  My guess is that the cast is correct when numberValue is in the int32
> range but we get into UB when the int32 is outside this range.

It is probably at least subject to platform and compiler bugs^Wdependencies (that was my experience at Opera, anyhow - endless pain).  I'll do as you suggest here, it is potentially a hair faster anyway when the input is already an integer.
(In reply to Lars T Hansen [:lth] from comment #39)
> (In reply to Luke Wagner [:luke] from comment #36)
> > 
> > > +#define CAS(T, addr, oldval, newval)                                    \
> > > +    do {                                                                \
> > > +        std::atomic_compare_exchange_strong(reinterpret_cast<std::atomic<T>*>(addr), &oldval, newval); \
> > > +    } while(0)
> > 
> > ooc, why do you need the do{}while(0)s here?  Or is it just habit for macros?
> 
> Just habit.

I should also have mentioned - because I now remember that it was an important part of the reason - that it ensures that the macro is only used in a statement position, never in an expression position.
(In reply to Lars T Hansen [:lth] from comment #38)
> We get atomics starting with VC++ 2012, which provides the C++11 atomics. 
> For VC++ 2010 and earlier we'll need to build something on inline assembly,
> most likely.  I believe we're now building with 2012 by default, so the
> point is probably moot.  I'd be curious to know your opinion about that.

Talking to ehsan and others on #developers, we apparently can't depend on <atomic> since we use STLPort on b2g.  mfbt/Atomics.h is our solution to this problem so we probably need to (eventually) extend mfbt/Atomics.h.  As long as we're #ifdef NIGHTLY, the current solution seems fine.

> > I think it's more complicated than that; search for
> > CanonicalNumericIndex+IntegerIndexedElementGet.
> 
> Neither of those functions appear to exist.  Perhaps they disappeared when
> Waldo boiled the TypedArray ocean last week.

Those are the names of spec methods (never in SM, afaik):
http://people.mozilla.org/~jorendorff/es6-draft.html#sec-canonicalnumericindexstring
http://people.mozilla.org/~jorendorff/es6-draft.html#sec-integerindexedelementget

> > Can we iloop here if offset1 == offset2?
> 
> No, because the termination condition is iter == waiters; iter always
> advances through the list; and waiters is constant and points to the
> sentinel node.

Lets say there are two waiters that get requeued: we reque the first (now it is inbetween the old end and waiters), then we requeue the second (now it is inbetween the first-requeued and waiters), now we get to the requeued-first and repeat.

(In reply to Lars T Hansen [:lth] from comment #39)
> A combination of following the dominant style, a holdover from an earlier
> version where I needed the function pointer, and not knowing what I'd be
> needing in subsequent patches.  I'm happy to change this, it does not look
> like subsequent patches need the split.

In general, we don't split into _impl unless we have a non-generic-method, so it'd be nice to remerge these.

(In reply to Lars T Hansen [:lth] from comment #43)
> I should also have mentioned - because I now remember that it was an
> important part of the reason - that it ensures that the macro is only used
> in a statement position, never in an expression position.

That's a good reason.
(In reply to Luke Wagner [:luke] from comment #44)
> (In reply to Lars T Hansen [:lth] from comment #38)
> > We get atomics starting with VC++ 2012, which provides the C++11 atomics. 
> > For VC++ 2010 and earlier we'll need to build something on inline assembly,
> > most likely.  I believe we're now building with 2012 by default, so the
> > point is probably moot.  I'd be curious to know your opinion about that.
> 
> Talking to ehsan and others on #developers, we apparently can't depend on
> <atomic> since we use STLPort on b2g.  mfbt/Atomics.h is our solution to
> this problem so we probably need to (eventually) extend mfbt/Atomics.h.  As
> long as we're #ifdef NIGHTLY, the current solution seems fine.

Uncertain what b2g has to do with anything.  For GCC and Clang we'd use GCC/Clang
intrinsics, not the <atomic> functions; the gcc intrinsics are as old as the
hills.  I had assumed that b2g is compiled with  some gcc-compatible toolchain.
Is that not so?
Ah, I forgot that we had the GCC intrinsic fallback and the issue is only with msvc.  On IRC, ted said that, when we upgrade primary builders to a new msvc version (the 2013 upgrade apparently coming in weeks) we usually give everyone a release or two before deprecating support for the old.  So perhaps we'll be fine by the time this would be released.
Attached patch bug979594-AtomicsAPI.diff (obsolete) — Splinter Review
Base Atomics object, v2.

This takes into account the review comments and has a few bug fixes.
Attachment #8491500 - Attachment is obsolete: true
Attachment #8495399 - Flags: review?(luke)
Attached patch bug979594-AtomicsIonSupport.diff (obsolete) — Splinter Review
First of two patches: generic Ion support for atomics + x86/x64/none.
Attachment #8491501 - Attachment is obsolete: true
Attachment #8491502 - Attachment is obsolete: true
Attachment #8491505 - Attachment is obsolete: true
Attachment #8491501 - Flags: review?(sstangl)
Attachment #8495402 - Flags: review?(sstangl)
Attached patch bug979594-AtomicsARM.diff (obsolete) — Splinter Review
Second of two patches: ARM back-end for Atomics.
Attachment #8495405 - Flags: review?(sstangl)
(In reply to Lars T Hansen [:lth] from comment #48)
> Created attachment 8495402 [details] [diff] [review]
> bug979594-AtomicsIonSupport.diff
> 
> First of two patches: generic Ion support for atomics + x86/x64/none.

I should note that the handling of SharedUint32Array is suboptimal (because inlining only kicks in in some cases), though I believe it is correct.  I'm planning to improve that in a followup patch, if that's OK.
I've forked this bug; asm.js discussion moves to bug 1073096.
Summary: Atomics and locks for SharedArrayBuffer → Atomics and locks for SharedArrayBuffer: Plain JS
Comment on attachment 8491507 [details] [diff] [review]
Support for OdinMonkey and asm.js

Moved to bug 1073096.
Attachment #8491507 - Attachment is obsolete: true
Attachment #8491507 - Flags: review?(luke)
Comment on attachment 8491509 [details] [diff] [review]
DOM hooks

Moved to bug 1073096.
Attachment #8491509 - Attachment is obsolete: true
Comment on attachment 8495402 [details] [diff] [review]
bug979594-AtomicsIonSupport.diff

Review of attachment 8495402 [details] [diff] [review]:
-----------------------------------------------------------------

Partial review, will wait for re-upload before reviewing codegen.

Big patch, but largely uncontroversial. I am concerned about the implementation of all atomic operations in terms of CMPXCHG and looping, when a LOCK prefix with a direct instruction would suffice. The code also seems to only permit inlining Uint32 values that are not required to be doubles by JSValue representation, and I would like greater clarity on that part of the code. It would also be prudent to prefer "LOCK ADD" to "LOCK XADD" when the return value is unused.

::: js/src/jit/AtomicOp.h
@@ +11,5 @@
> +namespace jit {
> +
> +// Types of atomic operation, shared by MIR and LIR.
> +
> +enum AtomicOp {

Is there some reason this should not be in MIR.h, perhaps within the definition of some MInstruction?

@@ +36,5 @@
> +    MembarAllbits = 31
> +};
> +
> +}
> +}

} // namespace jit
} // namespace js

::: js/src/jit/MCallOptimize.cpp
@@ +45,5 @@
> +        native == atomics_sub ||
> +        native == atomics_and ||
> +        native == atomics_or ||
> +        native == atomics_xor)
> +        return inlineAtomicsBinop(callInfo, target);

Multi-line if() conditional requires {}.

@@ +2201,5 @@
> +    if (!atomicsPreconditions(callInfo, &arrayType))
> +        return InliningStatus_NotInlined;
> +
> +    MDefinition *oldval = callInfo.getArg(2);
> +    if (oldval->type() != MIRType_Int32)

Requiring MIRType_Int32 does not work with end-range Uint32 values.

@@ +2219,5 @@
> +        MCompareExchangeTypedArrayElement::New(alloc(), elements, index, arrayType, oldval, newval);
> +    current->add(cas);
> +    current->push(cas);
> +
> +    cas->setResultType(arrayType == Scalar::Uint32 ? MIRType_Double : MIRType_Int32);

cas->setResultType(getInlineReturnType());

This line duplicates logic from atomicsPreconditions(), which already checks that getInlineReturnType() returns the correct value. Additionally, this removes the arrayType outparam from atomicsPreconditions().

@@ +2245,5 @@
> +                                    DoesRequireMemoryBarrier);
> +    current->add(load);
> +    current->push(load);
> +
> +    load->setResultType(arrayType == Scalar::Uint32 ? MIRType_Double : MIRType_Int32);

Ditto to inlineAtomicsCompareExchange().

@@ +2303,5 @@
> +    Scalar::Type arrayType;
> +    if (!atomicsPreconditions(callInfo, &arrayType))
> +        return InliningStatus_NotInlined;
> +
> +    if (callInfo.getArg(2)->type() != MIRType_Int32)

This doesn't work with very large Uint32 addends.

@@ +2333,5 @@
> +        MAtomicTypedArrayElementBinop::New(alloc(), k, elements, index, arrayType, value);
> +    current->add(binop);
> +    current->push(binop);
> +
> +    binop->setResultType(arrayType == Scalar::Uint32 ? MIRType_Double : MIRType_Int32);

Ditto to inlineAtomicsCompareExchange().

@@ +2339,5 @@
> +}
> +
> +// Common preconditions for the atomics ops: the first argument is a
> +// SharedTypedArray object; the second argument is an int32 for the
> +// index.

This comment should be broken up and inlined below where relevant.

@@ +2342,5 @@
> +// SharedTypedArray object; the second argument is an int32 for the
> +// index.
> +
> +bool
> +IonBuilder::atomicsPreconditions(CallInfo &callInfo, Scalar::Type *arrayType)

Prefer "atomicsMeetsPreconditions()".

@@ +2348,5 @@
> +    if (callInfo.getArg(0)->type() != MIRType_Object)
> +        return false;
> +
> +    MIRType arg1Type = callInfo.getArg(1)->type();
> +    if (arg1Type != MIRType_Int32)

if (callInfo.getArg(1)->type != MIRType_Int32)

::: js/src/jit/MIR.h
@@ +7698,5 @@
>          setResultType(MIRType_Value);
> +        if (requiresBarrier_)
> +            setGuard();         // Not removable or movable
> +        else
> +            setMovable();

Prefer |requiresBarrier_ ? setGuard() : setMovable();|

@@ +7890,5 @@
> +    {
> +        if (requiresBarrier_)
> +            setGuard();         // Not removable or movable
> +        else
> +            setMovable();

Prefer |requiresBarrier_ ? setGuard() : setMovable();|

@@ +7904,5 @@
> +                                        MDefinition *value, Scalar::Type arrayType,
> +                                        MemoryBarrierRequirement requiresBarrier = DoesNotRequireMemoryBarrier)
> +    {
> +        return new(alloc) MStoreTypedArrayElement(elements, index, value, arrayType,
> +                                                  requiresBarrier);

requiresBarrier can be on previous line.

@@ +11109,5 @@
> +// after the barriered operation, and vice versa, and to prevent the
> +// barriered operation from being removed or hoisted.
> +
> +class MMemoryBarrier
> +    : public MNullaryInstruction

ultra-nit: 2-space indent for :

@@ +11114,5 @@
> +{
> +    // The type is a combination of the memory barrier types in AtomicOp.h.
> +    const int type_;
> +
> +    explicit MMemoryBarrier(int type) : type_(type) {

nit:

explicit MMemoryBarrier(int type)
  : type_(type)
{

@@ +11136,5 @@
> +};
> +
> +class MCompareExchangeTypedArrayElement
> +    : public MAryInstruction<4>
> +    , public MixPolicy< MixPolicy<ObjectPolicy<0>, IntPolicy<1> >, MixPolicy<IntPolicy<2>, IntPolicy<3> > >

ultra-nit: 2-space indent.

@@ +11149,5 @@
> +        initOperand(0, elements);
> +        initOperand(1, index);
> +        initOperand(2, oldval);
> +        initOperand(3, newval);
> +        setResultType(arrayType_ == Scalar::Uint32 ? MIRType_Double : MIRType_Int32);

This is already set by the inlining function.

@@ +11153,5 @@
> +        setResultType(arrayType_ == Scalar::Uint32 ? MIRType_Double : MIRType_Int32);
> +        setGuard();             // Not removable
> +    }
> +
> +public:

ultra-nit: 2-space indent.

@@ +11178,5 @@
> +        return getOperand(2);
> +    }
> +    int oldvalOperand() {
> +        return 2;
> +    }

This should be a static integer.

@@ +11192,5 @@
> +};
> +
> +class MAtomicTypedArrayElementBinop
> +    : public MAryInstruction<3>,
> +      public Mix3Policy< ObjectPolicy<0>, IntPolicy<1>, IntPolicy<2> >

ultra-nit: 2-space indent.

@@ +11202,5 @@
> +  protected:
> +    explicit MAtomicTypedArrayElementBinop(AtomicOp op, MDefinition *elements, MDefinition *index,
> +                                           Scalar::Type arrayType, MDefinition *value)
> +        : op_(op)
> +        , arrayType_(arrayType)

nit: 2-space indent from previous line; comma should be on previous line

@@ +11207,5 @@
> +    {
> +        initOperand(0, elements);
> +        initOperand(1, index);
> +        initOperand(2, value);
> +        setResultType(arrayType_ == Scalar::Uint32 ? MIRType_Double : MIRType_Int32);

Already set by inlining function.

::: js/src/jit/none/MacroAssembler-none.h
@@ +295,5 @@
>      template <typename T, typename S> void store16(T, S) { MOZ_CRASH(); }
>  
>      template <typename T> void computeEffectiveAddress(T, Register) { MOZ_CRASH(); }
>  
> +    template <typename T> void compareExchange8SignExtend(const T &mem, Register oldval, Register newval, Register output) { MOZ_CRASH(); }

It looks like all of these lines extend past the 99th column. They could be shortened by removing argument names and not qualifying T or S. If that is insufficient, |template <typename T>| may be on its own line.

::: js/src/jit/shared/Lowering-x86-shared.cpp
@@ +356,5 @@
> +    if (ins->arrayType() == Scalar::Uint32 && IsFloatingPointType(ins->type())) {
> +        tempDef = tempFixed(eax);
> +        newval = useRegister(ins->newval());
> +    }
> +    else {

nit: } else {

@@ +391,5 @@
> +    //
> +    // For ADD and SUB we'll use XADD:
> +    //
> +    //    movl       src, output
> +    //    lock xaddl output, mem

If the value of the atomic addition is never read, then "LOCK ADD"/"LOCK SUB" performs less work than "LOCK XADD" and should be preferred.

@@ +396,5 @@
> +    //
> +    // For the 8-bit variants XADD needs a byte register for the
> +    // output only.
> +    //
> +    // For AND/OR/XOR we need to use a CMPXCHG loop:

All of AND, OR, and XOR happily accept the LOCK prefix and can execute atomically. These should lower to a single instruction on x86.
Attachment #8495402 - Flags: review?(sstangl)
(In reply to Sean Stangl [:sstangl] from comment #54)
> Comment on attachment 8495402 [details] [diff] [review]
> bug979594-AtomicsIonSupport.diff
> 
> Review of attachment 8495402 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Partial review, will wait for re-upload before reviewing codegen.
> 
> Big patch, but largely uncontroversial. I am concerned about the
> implementation of all atomic operations in terms of CMPXCHG and looping,
> when a LOCK prefix with a direct instruction would suffice.

That's probably a misunderstanding.  The LOCK prefix with a direct instruction
only works when the result of the JS operation is not used.  When the result is
used, it is invariably required by the spec to be the old value of the cell,
and the CMPXCHG loop is needed to access that value appropriately (except
that for add and subtract the XADD instruction can be used).

The optimization in the case where the result is not needed has been noted,
as have other optimizations (see comment #32) but it did not seem important
to implement that for the first go-around, as the patch is already quite
large.

> The code also
> seems to only permit inlining Uint32 values that are not required to be
> doubles by JSValue representation, and I would like greater clarity on that
> part of the code.

See comment #50, but I've also noted your other comments on the issue.
We should probably discuss this further, I'll take it to email or IRC.

> It would also be prudent to prefer "LOCK ADD" to "LOCK
> XADD" when the return value is unused.

As noted above.  I can implement this optimization now, but I would prefer to
make that a follow-on bug.

> ::: js/src/jit/AtomicOp.h
> @@ +11,5 @@
> > +namespace jit {
> > +
> > +// Types of atomic operation, shared by MIR and LIR.
> > +
> > +enum AtomicOp {
> 
> Is there some reason this should not be in MIR.h, perhaps within the
> definition of some MInstruction?

The direct reason is that the definitions are shared between MIR and LIR.  I had
some problems (I forget - probably circular includes) that precluded including
MIR.h everywhere I needed it.  I can try again, but unless you have major misgivings
I'd prefer to keep it like it is.

Again for reference the spec is here:
https://docs.google.com/document/d/1NDGA_gZJ7M7w1Bh8S0AoDyEqwDdRh4uSoTPSNn77PFk/edit?usp=sharing

(Style and bikeshedding remarks will be taken into account.)
(In reply to Luke Wagner [:luke] from comment #46)
> Ah, I forgot that we had the GCC intrinsic fallback and the issue is only
> with msvc.  On IRC, ted said that, when we upgrade primary builders to a new
> msvc version (the 2013 upgrade apparently coming in weeks) we usually give
> everyone a release or two before deprecating support for the old.  So
> perhaps we'll be fine by the time this would be released.

FWIW, MSVC exposes patterns like LOCK XADD, LOCK op, and LOCK CMPXCHG as intrinsic functions for various operand sizes (search for "x86 Intrinsics List" on msdn maybe) so it's possible to implement a compatibility library for earlier MSVC versions without worrying deeply about what will play on other compilers and systems and without delving into assembler, it's just a bunch of code to crank out.  If the switch to a newer VC++ is long in coming I may do that, because right now I get buildbot failures on Windows because the buildbot is on an old compiler.
Sean, I need your feedback on comment #55.
Flags: needinfo?(sstangl)
(In reply to Lars T Hansen [:lth] from comment #56)
> (In reply to Luke Wagner [:luke] from comment #46)
>
> FWIW, MSVC exposes patterns like LOCK XADD, LOCK op, and LOCK CMPXCHG as
> intrinsic functions for various operand sizes (search for "x86 Intrinsics
> List" on msdn maybe) so it's possible to implement a compatibility library
> for earlier MSVC versions without worrying deeply about what will play on
> other compilers and systems and without delving into assembler, it's just a
> bunch of code to crank out.  If the switch to a newer VC++ is long in coming
> I may do that, because right now I get buildbot failures on Windows because
> the buildbot is on an old compiler.

I decided to just do that, since it turned out to be very easy.
(In reply to Luke Wagner [:luke] from comment #44)
> (In reply to Lars T Hansen [:lth] from comment #38)
> 
> > > Can we iloop here if offset1 == offset2?
> > 
> > No, because the termination condition is iter == waiters; iter always
> > advances through the list; and waiters is constant and points to the
> > sentinel node.
> 
> Lets say there are two waiters that get requeued: we reque the first (now it
> is inbetween the old end and waiters), then we requeue the second (now it is
> inbetween the first-requeued and waiters), now we get to the requeued-first
> and repeat.

You're right.  The intended invariant is that last does not precede waiters but that does not hold.
(In reply to Lars T Hansen [:lth] from comment #55)
> I can implement this optimization now, but I would prefer to make that a follow-on bug.

Leaving it for later would be fine. A nice way to express this intent is by filing a new bug before submitting the patch, then referring to that bug number in a comment around the code.

> (Style and bikeshedding remarks will be taken into account.)

Bikeshedding is a pejorative word!
Flags: needinfo?(sstangl)
(In reply to Sean Stangl [:sstangl] from comment #60)
> (In reply to Lars T Hansen [:lth] from comment #55)
> 
> > (Style and bikeshedding remarks will be taken into account.)
> 
> Bikeshedding is a pejorative word!

It sure is, but I humbly submit that it properly describes review remarks like the following one :)

@@ +7890,5 @@
> +    {
> +        if (requiresBarrier_)
> +            setGuard();         // Not removable or movable
> +        else
> +            setMovable();

Prefer |requiresBarrier_ ? setGuard() : setMovable();|
Comment on attachment 8495399 [details] [diff] [review]
bug979594-AtomicsAPI.diff

New patch forthcoming.
Attachment #8495399 - Flags: review?(luke)
Comment on attachment 8495405 [details] [diff] [review]
bug979594-AtomicsARM.diff

New patches forthcoming for this and the previous one.
Attachment #8495405 - Flags: review?(sstangl)
(In reply to Luke Wagner [:luke] from comment #34)
> Comment on attachment 8491500 [details] [diff] [review]
> Atomics object + interpreter support
> 
> Review of attachment 8491500 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +111,5 @@
> > +GetSharedTypedArrayIndex(JSContext *cx, Value v, Handle<SharedTypedArrayObject *> view, uint32_t* offset, bool* inRange)
> > +{
> > +    // The spec says "ToLength(ToNumber(v))", but for the purposes of
> > +    // checking the range we can compare directly to the view's length,
> > +    // we don't have to first clamp to 2^53-1.
> 
> I think it's more complicated than that; search for
> CanonicalNumericIndex+IntegerIndexedElementGet.  This definition has changed
> a bit and it may change in the future, so it'd be best to reuse the same
> predicate we've staked out for use in normal typed array views:
> IsTypedArrayIndex (even if it's not 100% what the currently says; we can
> update it).  This takes a jsid, so you'd want to use ValueToId to get that. 
> What's nice is that there should be a pretty fast through both these methods
> when the input isInt32.

The comment is a little stale and should be cleaned up, and the "spec" in question is the Atomics spec and these are methods and not built-in accessors, so we could really do what we want for now, but your point is taken.  If the change you suggest does not cause too much turbulence now I will fix it.
Blocks: 1077014
Blocks: 1077027
Blocks: 1077036
Blocks: 1077305
Blocks: 1077317
Blocks: 1077318
Blocks: 1077321
I believe I've addressed all comments here, and also made a few adjustments for comments made on later patches.

Apart from those fixes, notable new code here includes an atomics compatibility layer for Visual Studio 2008 / 2010.
Attachment #8495399 - Attachment is obsolete: true
Attachment #8499638 - Flags: review?(luke)
While I complete the review, could you answer one high-level question I have (I also asked this in comment 37; sorry if I missed a response somewhere): can we remove the whole concept of WorkerToken?  Rather, I think that FutexWaiter can just store the JSPerRuntimeFutexAPI* directly and then 'wake' can be a nullary method that wakes 'this'.  I see in the comments there is the issue of being deletion-safe, so maybe I'm just missing an obvious hazard, but from what I read in the patch, the JSRuntime (and JSPerRuntimeFutexAPI) of a FutexWaiter cannot be deleted while it is in the waiters list (also, the FutexWaiter itself is on the stack).
Comment on attachment 8499638 [details] [diff] [review]
Atomics API, with review comments addressed

Review of attachment 8499638 [details] [diff] [review]:
-----------------------------------------------------------------

r+ pending reply to comment 66.  Overall fantastic patch; it was very nice to read.

Btw, while undefined-on-out-of-bounds behavior is more consistent with typed arrays but I expect TC39 will request throw-on-out-of-bounds (viewing the typed array behavior as a mistake).  I expect ArrayTypes in Typed Objects will also throw on OOB.  Throw-on-OOB is a bit more work to achieve in Odin (but possible), so I'm fine with the current semantics for the time being; just giving a heads-up.

::: js/src/builtin/AtomicsObject.cpp
@@ +55,5 @@
> +
> +using namespace js;
> +
> +#if defined(MOZ_HAVE_CXX11_ATOMICS)
> +#define CXX11_ATOMICS

The not-terribly-well-documented-or-uniformly-practiced SM style is to indent nested preprocessor directives with a single space between the # and the first char of the identifier.  I really appreciate this for any non-trivial processor use.

@@ +170,5 @@
> +}
> +
> +static bool
> +GetSharedTypedArrayOfIntegerType(JSContext *cx, HandleValue v,
> +                                 MutableHandle<SharedTypedArrayObject *> viewp)

This method no longer seems to require "OfIntegerType".  That's fine since it seems the uses switch on view->type() and do the right thing for non-int types, but probably best to update the name.

@@ +187,5 @@
> +GetSharedTypedArrayIndex(JSContext *cx, HandleValue v, Handle<SharedTypedArrayObject *> view,
> +                         uint32_t* offset, bool* inRange)
> +{
> +    RootedId id(cx);
> +    if (!JS_ValueToId(cx, v, &id))

We generally prefer the js:: internal functions when possible (so ValueToId here) since the public JSAPI ones are usually a bit slower to call (public linkage so indirect call; often jsapi methods do extra sanity checking).

@@ +540,5 @@
> +          //  - clamp the input value
> +          //  - perform the operation
> +          //  - clamp the result
> +          //  - store the result
> +          // This requires a CAS loop.

Do you know if there is a use case for atomic ops on Uint8Clamped?  It seems like we could just as well leave it out (as we do with the float view types).

@@ +714,5 @@
> +// header node.
> +
> +class FutexWaiter
> +{
> +public:

SM style is two space before case and visibility labels.  Or you could just make this a struct.

@@ +717,5 @@
> +{
> +public:
> +    FutexWaiter(uint32_t offset, JS::JSPerRuntimeFutexAPI::WorkerToken *token)
> +        : offset(offset)
> +        , token(token)

SM style is two spaces before the :

@@ +719,5 @@
> +    FutexWaiter(uint32_t offset, JS::JSPerRuntimeFutexAPI::WorkerToken *token)
> +        : offset(offset)
> +        , token(token)
> +    {
> +        lower_pri = back = nullptr;

Can these be in the initializer list?

@@ +732,5 @@
> +
> +class AutoLockFutexAPI
> +{
> +    JS::JSPerRuntimeFutexAPI * const fx;
> +public:

two space indent

@@ +803,5 @@
> +
> +    FutexWaiter w(offset, token);
> +    w.waiting = true;
> +    FutexWaiter *waiters = sarb->waiters();
> +    if (waiters) {

Since it's not used after, can you "if (FutexWaiter *waiters = sarb->waiters())"?

@@ +969,5 @@
> +    // first node may change, and the list may be emptied out by the
> +    // operation.
> +
> +    FutexWaiter *waiters = sarb->waiters();
> +    if (waiters) {

To avoid indenting the whole page, perhaps:
  if (!waiters) {
    r.setInt32(0);
    return true;
  }

@@ +986,5 @@
> +        while (iter != &whead) {
> +            FutexWaiter *c = iter;
> +            iter = iter->lower_pri;
> +            if (!c->waiting || c->offset != offset1)
> +                continue;

If this was written:
  for (FutexWaiter *iter = whead.lower_pri; iter != &whead; iter = iter->lower_pri) {
then I don't think there would be a need for a separate 'c' variable and it'd be a bit easier to read.

::: js/src/jsapi.h
@@ +622,5 @@
> +//
> +// The API may differ among clients; for example, the APIs installed by
> +// a worker and by the main window event thread are likely different.
> +
> +class JSPerRuntimeFutexAPI

Now that it's inside namespace JS, you can take off the 'JS' prefix in the name.

@@ +1368,5 @@
>  JS_PUBLIC_API(void)
>  JS_SetRuntimePrivate(JSRuntime *rt, void *data);
>  
> +JS_PUBLIC_API(void)
> +JS_SetRuntimeFutexAPI(JSRuntime *rt, JS::JSPerRuntimeFutexAPI *fx);

What if we changed the semantics of this API to hand over ownership of 'fx' (such that ~JSRuntime deleted 'fx')?  Even though it's at most one, I noticed that workers delete the FutexAPI while the main-thread JSRuntime doesn't (this irregularity might end up being confusing).
(In reply to Luke Wagner [:luke] from comment #67)
> Comment on attachment 8499638 [details] [diff] [review]
> Atomics API, with review comments addressed
> 
> Review of attachment 8499638 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r+ pending reply to comment 66.

I'll get back to that comment separately.

I'll address style / formatting / naming.  Just a couple of followups here.

> @@ +540,5 @@
> > +          //  - clamp the input value
> > +          //  - perform the operation
> > +          //  - clamp the result
> > +          //  - store the result
> > +          // This requires a CAS loop.
> 
> Do you know if there is a use case for atomic ops on Uint8Clamped?  It seems
> like we could just as well leave it out (as we do with the float view types).

Good question indeed.  If you look at the JIT patches you will see that optimizations do not kick in for Uint8Clamped arrays.

The main reason to support it is if the Uint8ClampedArray is the only array you have, because some other entity has handed that array to you but nothing else.  Now, you can create an alias of that array with a bit of work - at the moment.  I actually would like to remove the ability to go from a SharedTypedArray to a SharedArrayBuffer, and that would remove the ability to create the alias.

The float atomic ops are left out for a different reason.  The Float64 ops are left out because there's no guaranteed availability of 8-byte atomic operations across platforms, while we can pretty much count on 4-byte versions.  (The Float32 ops are left out for symmetry with Float64.)

> @@ +1368,5 @@
> >  JS_PUBLIC_API(void)
> >  JS_SetRuntimePrivate(JSRuntime *rt, void *data);
> >  
> > +JS_PUBLIC_API(void)
> > +JS_SetRuntimeFutexAPI(JSRuntime *rt, JS::JSPerRuntimeFutexAPI *fx);
> 
> What if we changed the semantics of this API to hand over ownership of 'fx'
> (such that ~JSRuntime deleted 'fx')?  Even though it's at most one, I
> noticed that workers delete the FutexAPI while the main-thread JSRuntime
> doesn't (this irregularity might end up being confusing).

I'll look into it.
(In reply to Luke Wagner [:luke] from comment #67)
> Comment on attachment 8499638 [details] [diff] [review]
> Atomics API, with review comments addressed
> 
> Review of attachment 8499638 [details] [diff] [review]:
> -----------------------------------------------------------------

> @@ +986,5 @@
> > +        while (iter != &whead) {
> > +            FutexWaiter *c = iter;
> > +            iter = iter->lower_pri;
> > +            if (!c->waiting || c->offset != offset1)
> > +                continue;
> 
> If this was written:
>   for (FutexWaiter *iter = whead.lower_pri; iter != &whead; iter = iter->lower_pri) {
> then I don't think there would be a need for a separate 'c' variable and
> it'd be a bit easier to read.

I don't think so.  The node that is 'iter' is updated destructively if it is unlinked from the list (to be shuffled to the back), thus making iter->lower_pri wrong when it is executed at the end of the loop.
(In reply to Lars T Hansen [:lth] from comment #68)
> I actually would like to remove the ability to go from a
> SharedTypedArray to a SharedArrayBuffer, and that would remove the ability
> to create the alias.

Interesting; why?  For ArrayBuffer, I could see that providing some anti-detachment guarantees, but SAB isn't detachable.  Even without this Atomics situation, it seems like there would be plenty of situations where you'd want to create a differently-typed view.

(In reply to Lars T Hansen [:lth] from comment #69)
Ah hah, right you are.  I had grepped 'iter' in that loop, but not thought more carefully.
(In reply to Luke Wagner [:luke] from comment #70)
> (In reply to Lars T Hansen [:lth] from comment #68)
> > I actually would like to remove the ability to go from a
> > SharedTypedArray to a SharedArrayBuffer, and that would remove the ability
> > to create the alias.
> 
> Interesting; why?  For ArrayBuffer, I could see that providing some
> anti-detachment guarantees, but SAB isn't detachable.  Even without this
> Atomics situation, it seems like there would be plenty of situations where
> you'd want to create a differently-typed view.

I have had a longer message about this pending for a while, but in brief: making the SAB unavailable  allows an agent to create a bit of shared memory, and then map non-overlapping views onto that memory, and then apportion the views to computation agents (assuming we can transfer shared array views easily among workers, not hard to imagine), and then to /know/ with certainty that the workers are not racing when they are operating on those views.

The inspiration for that is PJS, of course.  There are a few related technologies: read-only views (which I think will be easy + uncontroversial), and the ability for one agent - maybe the holder of the SAB - to neuter a view on the SAB even if that view is on a different agent (which is neither easy or uncontroversial).

In the end it all probably depends on the underlying SAB being unavailable to a holder of a view onto that SAB.

Worthwhile?  Not sure.  But I'm always looking for ideas for making shared memory more useful for plain JS, as opposed to asm.js.
> The inspiration for that is PJS, of course.  There are a few related
> technologies: read-only views (which I think will be easy +
> uncontroversial), and the ability for one agent - maybe the holder of the
> SAB - to neuter a view on the SAB even if that view is on a different agent
> (which is neither easy or uncontroversial).

And of course some fine points about how a view onto shared memory is transfered so that there can only be one owner at a time, and maybe other things - said fine points being why I have not sent the longer message yet :-/
(In reply to Lars T Hansen [:lth] from comment #68)
> (In reply to Luke Wagner [:luke] from comment #67)
> > Comment on attachment 8499638 [details] [diff] [review]
> > Atomics API, with review comments addressed
> > 
> > Review of attachment 8499638 [details] [diff] [review]:
> > -----------------------------------------------------------------
> 
> > @@ +1368,5 @@
> > >  JS_PUBLIC_API(void)
> > >  JS_SetRuntimePrivate(JSRuntime *rt, void *data);
> > >  
> > > +JS_PUBLIC_API(void)
> > > +JS_SetRuntimeFutexAPI(JSRuntime *rt, JS::JSPerRuntimeFutexAPI *fx);
> > 
> > What if we changed the semantics of this API to hand over ownership of 'fx'
> > (such that ~JSRuntime deleted 'fx')?  Even though it's at most one, I
> > noticed that workers delete the FutexAPI while the main-thread JSRuntime
> > doesn't (this irregularity might end up being confusing).

That the main-thread runtime does not delete it is an oversight / bug, as you can see the DOM side is pretty rough still.

I generally like the change you suggest.  That said it doesn't really reduce any API surface: As it is, we can pretty much guarantee that things are set up so that the JSFutexAPI is destroyed before the JSContext or JSRuntime to which the JSFutexAPI refers.  If the JSRuntime's destructor destroys the JSFutexAPI the order is much less clear.  At the moment that does not matter since the destructor is pretty simple.  I'm not sure what it'll look like when the DOM API is complete.

Destruction may also tie in with the issue of WorkerToken, I'll get back to that.
(In reply to Luke Wagner [:luke] from comment #66)
> can we remove the whole concept of WorkerToken?  Rather, I think that
> FutexWaiter can just store the JSPerRuntimeFutexAPI* directly and then
> 'wake' can be a nullary method that wakes 'this'.  I see in the comments
> there is the issue of being deletion-safe, so maybe I'm just missing an
> obvious hazard, but from what I read in the patch, the JSRuntime (and
> JSPerRuntimeFutexAPI) of a FutexWaiter cannot be deleted while it is in the
> waiters list (also, the FutexWaiter itself is on the stack).

That is correct, and it's a nice simplification.  (The system with the WorkerToken dates to an earlier design where there was more asynchrony than there is now; the idea was that the rt field of the token subclass could be set to NULL when the worker was deleted, while the token could remain on a work queue.)

It's a particularly nice simplification because it appears that it will remove the cx_ field from the PerRuntimeFutexAPI subclass, which makes it much less problematic to transfer ownership to the runtime for deletion by the runtime.
Also: cf comment #37, explored in greater detail on bug 1074237, there needs to be an adjustment in how the futex API is attached to the JS engine.  There is one per JSContext, not one per JSRuntime, because each JSRuntime is shared by many tabs in the browser.  In practice this is straightforward.
I believe this patch addresses all comments in the last review round (comment 66, comment 67).  Substantive changes are:

 - deletion protocol, as suggested in review
 - now attaching the futex API to the JSContext rather than the JSRuntime,
   as discussed in comment 75.

I'll attach the sibling patch (WIP) to bug 1074237, so that the effect of those changes is clear.
Attachment #8499638 - Attachment is obsolete: true
Attachment #8499638 - Flags: review?(luke)
Attachment #8501729 - Flags: review?(luke)
I also replied in bug 1074237, but: I don't think we need this to be per-JSContext since JSContexts necessarily execute on the main thread of the JSRuntime (and they must execute LIFO on the stack).  JSContext is actually an incredibly confusing abstraction; it roughly means "a set of virtual registers, an exception state, and a callstack".  With proper stack-y management of this state (which Gecko mostly already does for security reasons), all this state could just as well be on the JSRuntime.  Most of the design of JSContext was inherited from when JSRuntime was an (externally) multi-threaded thing and you could have JSContexts concurrently execute on multiple threads.  Even with that design, there still could've been one JSContext per thread, but somehow, by accident, JSContext became associated (though not strictly) with HTML browsing contexts.

(There is actually a long-standing bug to remove JSContext (bug 650361) which mostly blocks on a bunch of DOM refactoring/simplification.  Fortunately, bholley has been chipping away at this and every few months another terrible old dependency drops.  Hopefully soon the Gecko main thread will only have one JSContext and then the ball will be back in our court to merge the two.)
And once more for luck.  Luke, this changes PerContextFutexAPI back to PerRuntimeFutexAPI, otherwise the changes are as mentioned in the comment on the previous patch.
Attachment #8501729 - Attachment is obsolete: true
Attachment #8501729 - Flags: review?(luke)
Attachment #8502430 - Flags: review?(luke)
Comment on attachment 8502430 [details] [diff] [review]
Atomics API, with review comments addressed, redux redux

Review of attachment 8502430 [details] [diff] [review]:
-----------------------------------------------------------------

Great!

::: js/src/builtin/AtomicsObject.cpp
@@ +748,5 @@
> +        fx->unlock();
> +    }
> +};
> +
> +}

Nit: can you append a "  // unnamed namespace" after the } on this and any other closing } curlies?

::: js/src/jsapi.h
@@ +1353,5 @@
> +// Transfers ownership of fx to rt; if rt's futexAPI field is not null when rt is
> +// deleted then rt's destructor will delete that value.  If fx is null in this
> +// call then ownership of a held nonnull value is transfered away from rt.
> +JS_PUBLIC_API(void)
> +JS_SetRuntimeFutexAPI(JSRuntime *rt, JS::PerRuntimeFutexAPI *fx);

Nit: perhaps move all these new functions to right under PerRuntimeFutexAPI (into namespace JS) and take of JS_ prefix?
Attachment #8502430 - Flags: review?(luke) → review+
(In reply to Luke Wagner [:luke] from comment #79)
> Comment on attachment 8502430 [details] [diff] [review]
> Atomics API, with review comments addressed, redux redux
> 
> ::: js/src/jsapi.h
> @@ +1353,5 @@
> > +// Transfers ownership of fx to rt; if rt's futexAPI field is not null when rt is
> > +// deleted then rt's destructor will delete that value.  If fx is null in this
> > +// call then ownership of a held nonnull value is transfered away from rt.
> > +JS_PUBLIC_API(void)
> > +JS_SetRuntimeFutexAPI(JSRuntime *rt, JS::PerRuntimeFutexAPI *fx);
> 
> Nit: perhaps [...] take of JS_ prefix?

Sure, I'll do that, but can you explain why?  The JS_ style otherwise seems to be used pretty extensively at the API level; I'm probably missing some context.
Flags: needinfo?(luke)
I believe this addresses all issues, though I note:

- some issues have been deferred as optimizations, and bugs have been created
  and are referenced from comments in this patch
- some lines in the 'none' platform stubs are left quite long, because there
  were many other very long lines in the same files

Also, there's been some discussion of Double arguments to the primitives, those are handled here by truncating to Uint32 by means of MTruncateToInt32; I have convinced myself that the latter is appropriate.
Attachment #8495402 - Attachment is obsolete: true
Attachment #8502722 - Flags: review?(sstangl)
Attachment #8495405 - Attachment is obsolete: true
Attachment #8502724 - Flags: review?(sstangl)
(In reply to Lars T Hansen [:lth] from comment #80)
> Sure, I'll do that, but can you explain why?  The JS_ style otherwise seems
> to be used pretty extensively at the API level; I'm probably missing some
> context.

In the beginning jsapi.h was a strict C .h and the convention was JS_ prefixes on public functions.  A few years ago, SM switched to allowing full C++ (for a while there was a hybrid #ifdef __cplusplus mix).  Since then, new APIs have gone into namespace JS and old APIs have been incrementally moved to JS.  Eventually someone with some free time will probably batch move the stragglers.  To wit, all internal functions with extern visibility had a js_ prefix and those have also been (much more completely) moved to namespace js.
Flags: needinfo?(luke)
Comment on attachment 8502722 [details] [diff] [review]
Atomics: Ion support

Review of attachment 8502722 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. Thanks for addressing the nits and for filing follow-up bugs on the rest.
Attachment #8502722 - Flags: review?(sstangl) → review+
Comment on attachment 8502724 [details] [diff] [review]
Atomics: ARM code generation for Ion

Review of attachment 8502724 [details] [diff] [review]:
-----------------------------------------------------------------

I don't see anything wrong with this patch, but I would also like Marty to take a look.

Note that ARMv8/AArch64 does not share a backend or MacroAssembler with previous ARM architectures.

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +4747,5 @@
> +//
> +// Discussion here:  http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html.
> +// However note that that discussion uses 'isb' as the trailing fence.
> +// I've not quite figured out why, and I've gone with dmb here which
> +// is safe.  Also see the LLVM source, which uses 'dmb ish' generally.

The memory could be executable. In our case it is not, so DMB is fine.
Attachment #8502724 - Flags: review?(sstangl)
Attachment #8502724 - Flags: review?(mrosenberg)
Attachment #8502724 - Flags: review+
(In reply to Sean Stangl [:sstangl] from comment #85)

> ::: js/src/jit/arm/MacroAssembler-arm.cpp
> @@ +4747,5 @@
> > +//
> > +// Discussion here:  http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html.
> > +// However note that that discussion uses 'isb' as the trailing fence.
> > +// I've not quite figured out why, and I've gone with dmb here which
> > +// is safe.  Also see the LLVM source, which uses 'dmb ish' generally.
> 
> The memory could be executable. In our case it is not, so DMB is fine.

That was my explanation too, but I don't find it very satisfactory.  (So then I started worrying about ISB's effect on the pipeline and the store buffer and wondering whether ISB doesn't do double duty as a lightweight DSB.  Remains unresolved in my mind.)
(In reply to Sean Stangl [:sstangl] from comment #85)
> 
> Note that ARMv8/AArch64 does not share a backend or MacroAssembler with
> previous ARM architectures.

Right.  When I wrote "ARMv8" in the patches - likely for where we can use other sync instructions - I meant AArch32.
Attachment #8502724 - Flags: review?(mrosenberg) → review?(dtc-moz)
Comment on attachment 8502724 [details] [diff] [review]
Atomics: ARM code generation for Ion

Review of attachment 8502724 [details] [diff] [review]:
-----------------------------------------------------------------

Nice work.

::: js/src/jit/arm/CodeGenerator-arm.cpp
@@ +2250,5 @@
> +
> +void
> +CodeGeneratorARM::memoryBarrier(int barrier)
> +{
> +    if (barrier == (MembarStoreStore|MembarSynchronizing))

Could note that on the ARMv6 these always perform a full system barrier operation.

@@ +2256,5 @@
> +    else if (barrier & MembarSynchronizing)
> +        masm.ma_dsb();
> +    else if (barrier == MembarStoreStore)
> +        masm.ma_dmb(masm.BarrierST);
> +    else if (barrier)

Could the default be easily checked here? Just to catch invalid 'barrier' arguments?

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +4688,5 @@
> +namespace js {
> +namespace jit {
> +
> +template<>
> +Register

Could consider returning void, and perhaps rename 'r' to 'out'.

@@ +4733,5 @@
> +
> +// General algorithm:
> +//
> +//     ...    ptr, <addr>         ; compute address of item
> +//     dmb

If the programmer is expected to explicitly use an appropriate barrier here then this might not be needed in this sequence. Also for the barrier at the end of this sequence.

@@ +4736,5 @@
> +//     ...    ptr, <addr>         ; compute address of item
> +//     dmb
> +// L0  ldrex* output, [ptr]
> +//     sxt*   output, output, 0   ; sign-extend if applicable
> +//     *xt*   tmp, oldval, 0      ; sign-extend or zero-extend if applicable

Could this be hoisted out of the loop?

@@ +4766,5 @@
> +      case 1:
> +        as_ldrexb(output, ptr);
> +        if (signExtend) {
> +            as_sxtb(output, output, 0);
> +            as_sxtb(ScratchRegister, oldval, 0);

Could this be hoisted, or is it running short on registers?

Is it even necessary? Could it just expect that the oldval is already within range? Or is this needed to be consistent with the x86?

@@ +4767,5 @@
> +        as_ldrexb(output, ptr);
> +        if (signExtend) {
> +            as_sxtb(output, output, 0);
> +            as_sxtb(ScratchRegister, oldval, 0);
> +        }

nit: should the else be on the same line as the closing brace?

@@ +4841,5 @@
> +
> +// General algorithm:
> +//
> +//     ...    ptr, <addr>         ; compute address of item
> +//     dmb

Are these barriers really needed for this sequence? Might these be used for acquire/release operations?
Attachment #8502724 - Flags: review?(dtc-moz) → review+
(In reply to Douglas Crosher [:dougc] from comment #88)
> Comment on attachment 8502724 [details] [diff] [review]
> Atomics: ARM code generation for Ion
> 
> Review of attachment 8502724 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +2256,5 @@
> > +    else if (barrier & MembarSynchronizing)
> > +        masm.ma_dsb();
> > +    else if (barrier == MembarStoreStore)
> > +        masm.ma_dmb(masm.BarrierST);
> > +    else if (barrier)
> 
> Could the default be easily checked here? Just to catch invalid 'barrier'
> arguments?

I don't understand that remark.

> @@ +4733,5 @@
> > +
> > +// General algorithm:
> > +//
> > +//     ...    ptr, <addr>         ; compute address of item
> > +//     dmb
> 
> If the programmer is expected to explicitly use an appropriate barrier here
> then this might not be needed in this sequence. Also for the barrier at the
> end of this sequence.

I believe both barriers are required for sequential consistency, and the programmer is not expected to use explicit barriers.

(For practical purposes the Atomics operations proposed here are equivalent to seq_cst atomics in C++.)

> @@ +4736,5 @@
> > +//     ...    ptr, <addr>         ; compute address of item
> > +//     dmb
> > +// L0  ldrex* output, [ptr]
> > +//     sxt*   output, output, 0   ; sign-extend if applicable
> > +//     *xt*   tmp, oldval, 0      ; sign-extend or zero-extend if applicable
> 
> Could this be hoisted out of the loop?

See next.

> @@ +4766,5 @@
> > +      case 1:
> > +        as_ldrexb(output, ptr);
> > +        if (signExtend) {
> > +            as_sxtb(output, output, 0);
> > +            as_sxtb(ScratchRegister, oldval, 0);
> 
> Could this be hoisted, or is it running short on registers?
> 
> Is it even necessary? Could it just expect that the oldval is already within
> range? Or is this needed to be consistent with the x86?

It can't just expect that the oldval is within range - nobody guarantees that, and the spec says it's not allowed to assume that - but I believe the sign extension may not be necessary so long as we don't examine the high bits.  That is, instead of extend; extend; cmp we could use eor; shift+setcond, which would also reduce register pressure.

re hoisting, it's possible that with reduced register pressure I can hoist some more computations if there are any left to hoist.  I'll look into it.  I may decide to file it as a followup bug, in order to get this one landed.

> @@ +4841,5 @@
> > +
> > +// General algorithm:
> > +//
> > +//     ...    ptr, <addr>         ; compute address of item
> > +//     dmb
> 
> Are these barriers really needed for this sequence? Might these be used for
> acquire/release operations?

The definition says sequentially consistent, so yes, they're required as far as I know.

Nick Bray has suggested that we want acquire/release semantics instead; I remain unconvinced so far, so I'm sticking to my story until we have implementation and usage experience.  Also see comments in the spec draft about the possibility of introducing additional, weaker primitives, pretty much following C++ again.
https://hg.mozilla.org/mozilla-central/rev/ad0fdfc44d48
https://hg.mozilla.org/mozilla-central/rev/ab936277cf4b
https://hg.mozilla.org/mozilla-central/rev/983259897284
Status: NEW → RESOLVED
Closed: 5 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Blocks: 1088598
Depends on: 1088633
Depends on: 1095282
Depends on: 1100080
You need to log in before you can comment on or make changes to this bug.