Closed Bug 1074237 Opened 5 years ago Closed 5 years ago

Provide support code for Atomics "futexes"

Categories

(Core :: DOM: Workers, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox38 --- fixed

People

(Reporter: lth, Assigned: lth)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 18 obsolete files)

28.52 KB, patch
luke
: feedback+
Details | Diff | Splinter Review
32.41 KB, patch
luke
: review+
Details | Diff | Splinter Review
(Forked from bug 979594.)

With bug 979594 the JS engine provides an "Atomics" object that provides some methods for atomic operations on shared memory and some primitives for implementing blocking locks (based on Linux futexes).  The latter primitives need a little support from the DOM code for suspension and wakeup.

(This technology is Nightly-only at the moment.)
Luke's comments on a previous version of the patch for this bug, taken from bug 979594:

Comment on attachment 8491509 [details] [diff] [review] [diff] [review]
DOM hooks

Review of attachment 8491509 [details] [diff] [review] [diff] [review]:
-----------------------------------------------------------------

Great to see the tests!

::: dom/base/nsJSEnvironment.cpp
@@ +631,5 @@
>                                    js_options_dot_str, this);
> +
> +    JSFutexAPIImpl *fx = new JSFutexAPIImpl(mContext); // TODO: OOM failure
> +    fx->init();                                        // TODO: OOM failure
> +    ::JS_SetRuntimeFutexAPI(sRuntime, fx);

IIUC, there is one nsJSContext per browsing context so one per tab/iframe/etc.  That means each context after the first will clobber the runtime futex API slot.  Probably you want to put this in nsJSContext::EnsureStatics (which is called once, for the main thread's JSRuntime), next to, e.g., JS::SetAsmJSCacheOps.  (Perhaps also you can put your new JSAPI in namespace JS?)  Also, you'd want to take a JSRuntime* as constructor argument (which is fortunately all you need, looking at the impl).

@@ +2759,5 @@
> +  // Sort of hacky, we really want module initialization to do this
> +  PRLock *l = PR_NewLock();
> +  if (!l)
> +    return false;
> +  if (!futex_lock.compareExchange(nullptr, l))

I think we can store a lock in global memory (just as a static of this file) and rely on creation of the main-thread's JSRuntime (which will call EnsureStatics) to happen-before creation of any workers.  You could assert this and verify with whoever does the final DOM review.  I wouldn't worry about destruction since there is one per process (we seem to already do this in other places: http://mxr.mozilla.org/mozilla-central/ident?i=sAllocationLock).

Alternatively, it occurred to me that, just as you have a newTokenForThread, you could have a newTokenForBuffer and this new data structure could own the lock.  As a bonus, you'd get finer-granularity locking which would reduce contention in cases where you had different workers pounding on different SABs.

@@ +2803,5 @@
> +  }
> +
> +  // A hack, but at least platform independent: 4000s is the max.
> +  if (timeout_ns > 4000000000.0)
> +    return JS::JSFutexAPI::ErrorTooLong;

Probably want to see what setTimeout() does and mimic that.

@@ +2819,5 @@
> +bool
> +JSFutexAPIImpl::wake(JS::JSFutexAPI::WorkerToken *token)
> +{
> +  JSRuntime *rt = static_cast<JSFutexAPIWorkerTokenImpl*>(token)->rt;
> +  PR_NotifyCondVar(static_cast<JSFutexAPIImpl*>(JS_GetRuntimeFutexAPI(rt))->cond_);

If all you need is the the condvar, can you just case the condvar* (or the JSFutexAPIImpl*) to the token?  Given that, perhaps we don't even need destroyToken?

::: dom/base/nsJSEnvironment.h
@@ +174,5 @@
>  
>    static bool DOMOperationCallback(JSContext *cx);
>  };
>  
> +class JSFutexAPIImpl : public JS::JSFutexAPI

Perhaps you could name this "JSPerThreadFutexImpl" or something to make it clear that the reason this is per-JSRuntime is that it isn't just an API, it contains per-thread data.  It'd also be good to throw JS_AbortIfWrongThread into all the member functions to make sure, e.g., this doesn't accidentally get called from a helper thread.

Attachment #8491509 [details] [diff] - Flags: feedback?(luke@mozilla.com) → feedback+
(In reply to Lars T Hansen [:lth] from comment #1)
> Luke's comments on a previous version of the patch for this bug, taken from
> bug 979594:
> 
> Comment on attachment 8491509 [details] [diff] [review]
> DOM hooks
> 
> Review of attachment 8491509 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/base/nsJSEnvironment.cpp
> @@ +631,5 @@
> >                                    js_options_dot_str, this);
> > +
> > +    JSFutexAPIImpl *fx = new JSFutexAPIImpl(mContext); // TODO: OOM failure
> > +    fx->init();                                        // TODO: OOM failure
> > +    ::JS_SetRuntimeFutexAPI(sRuntime, fx);
> 
> IIUC, there is one nsJSContext per browsing context so one per
> tab/iframe/etc.  That means each context after the first will clobber the
> runtime futex API slot.  Probably you want to put this in
> nsJSContext::EnsureStatics (which is called once, for the main thread's
> JSRuntime), next to, e.g., JS::SetAsmJSCacheOps.  (Perhaps also you can put
> your new JSAPI in namespace JS?)  Also, you'd want to take a JSRuntime* as
> constructor argument (which is fortunately all you need, looking at the
> impl).

This is actually a fairly major bug on my part.  From insufficient reading + testing of the code I had concluded that there was a one-to-one mapping between JSRuntime and JSContext here.  Now I'm worried that attaching the futex API to the runtime is wrong.  (If it is right the setup should clearly go in EnsureStatics, as you say, or maybe inside RuntimeServices::GetRuntime - not sure yet, but mostly a question of mechanics.)
I see where I went wrong: Per-runtime was basically right for workers - workers have one runtime per execution context - but is completely wrong for "main threads", where there are many tabs/documents per runtime.  The idea behind the futexes is to have one condition variable per browsing context, not one shared among many browsing contexts.  As it is, if JS in two tabs block on the shared condition variable then waking up one tab from a third tab will wake both of the waiters, and that was not the idea at all.
Attached patch DOM Futex API (WIP) (obsolete) — Splinter Review
Not for landing, but meant to clarify the effect of changes to patch on bug 979594.
(In reply to Lars T Hansen [:lth] from comment #3)
I think it is not as bad as you are imagining: there is only one main thread and JSRuntime shared by all non-worker browsing contexts, so there can be only one futexWait at a time.
(In reply to Luke Wagner [:luke] from comment #5)
> (In reply to Lars T Hansen [:lth] from comment #3)
> I think it is not as bad as you are imagining: there is only one main thread
> and JSRuntime shared by all non-worker browsing contexts, so there can be
> only one futexWait at a time.

That's unfortunate, because it means that if, say, a SharedWorker is operating concurrently with documents in two windows in such a way that they both have to suspend before the SharedWorker can continue, then progress becomes impossible after the first window goes into futexWait.

(Doesn't affect the issue at hand, of course; just one more strike against using futexWait on the main thread.)
(In reply to Lars T Hansen [:lth] from comment #6)
> That's unfortunate, because it means that if, say, a SharedWorker is
> operating concurrently with documents in two windows in such a way that they
> both have to suspend before the SharedWorker can continue, then progress
> becomes impossible after the first window goes into futexWait.

Yeah.  Even with a process-per-tab model like other browsers have, there is (afaik) a single main-thread event loop in each process so, when windows get grouped in processes (same origin, or just when there are too many processes) they'll be likewise serialized.  I think that just has to be part of the programming model (if SAB is even exposed to the main thread and this is another in the list of reasons why it shouldn't).

> (Doesn't affect the issue at hand, of course; just one more strike against
> using futexWait on the main thread.)

Agreed
Attached patch DOM Futex API (WIP) (obsolete) — Splinter Review
Corresponds to the latest patch on bug 979594.
Attachment #8501730 - Attachment is obsolete: true
Attached patch DOM Futex API (obsolete) — Splinter Review
Context: This patch implements the DOM bits of the proposed (experimental) "futex" API for use with shared memory.  The spec for the full feature is here: https://docs.google.com/document/d/1NDGA_gZJ7M7w1Bh8S0AoDyEqwDdRh4uSoTPSNn77PFk/edit?usp=sharing, you want the section "Atomic Operations for Worker Suspension and Wake-up" and the operations in question are futexWait, futexWake, and futexWakeOrRequeue.

The present patch addresses Luke's concerns from the previous (WIP) patch, and additionally attempts to ensure that workers are not hung when the tab owning them is closed.  A try run with this patch comes up green: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=2914dd00522a.

(Shared memory is currently enabled in Nightly desktop and disabled everywhere else.)
Attachment #8502431 - Attachment is obsolete: true
Attachment #8525462 - Flags: review?(khuey)
Attachment #8525462 - Flags: feedback?(luke)
(Actually the try run has one Win7 debug failure, which I will investigate.)
The failure is caused by the browser engine entering a worker's runloop before it initializes the main runtime (hence the global futex_lock, which is initialized in EnsureStatics, is null, hence we assert).  Easy enough to guard against, but the better fix may be reverting to the lazy initialization of futex_lock I had previously.
Just an fyi that I'm unlikely to get to this review until after the event in Portland.  Send me email if that's a problem.
Attached patch DOM Futex API, v2 (obsolete) — Splinter Review
This fixes the bustage in yesterday's patch.  Also there's some cleanup in naming, and some better instructions in the test cases.

Kyle, after PDX will be fine I think, as long as it doesn't drag out much beyond that - I would dearly like to land this before the end of the year...
Attachment #8525462 - Attachment is obsolete: true
Attachment #8525462 - Flags: review?(khuey)
Attachment #8525462 - Flags: feedback?(luke)
Attachment #8526050 - Flags: review?(khuey)
Attachment #8526050 - Flags: feedback?(luke)
Comment on attachment 8526050 [details] [diff] [review]
DOM Futex API, v2

Review of attachment 8526050 [details] [diff] [review]:
-----------------------------------------------------------------

Looks generally good.  "Interrupt" tends to mean "just call the interrupt callback, you can recover after that", but your use of "interrupt" is more like "killing".  I was initially confused by this when I didn't see interrupt_ ever reset to false.  On the subject of interrupts, though, iirc, you don't currently have any limitations on using futexes on the main thread, so do you also need an interrupt-like mechanism so that, e.g., we can show the slow-script dialog if the main thread is hung on a futex?  See:
  http://mxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCJSRuntime.cpp#1314
Attachment #8526050 - Flags: feedback?(luke) → feedback+
(In reply to Luke Wagner [:luke] from comment #15)
> Comment on attachment 8526050 [details] [diff] [review]
> DOM Futex API, v2
> 
> Review of attachment 8526050 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks generally good.  "Interrupt" tends to mean "just call the interrupt
> callback, you can recover after that", but your use of "interrupt" is more
> like "killing".

True.  I can change that.

> On the subject of interrupts, though, iirc,
> you don't currently have any limitations on using futexes on the main
> thread, so do you also need an interrupt-like mechanism so that, e.g., we
> can show the slow-script dialog if the main thread is hung on a futex?  See:
>  
> http://mxr.mozilla.org/mozilla-central/source/js/xpconnect/src/XPCJSRuntime.
> cpp#1314

Nice catch + bad oversight on my part.  I think I have a mental block(!) on that, because I've run into fairly instant deadlocks when trying to block on the main thread (in plain JS code).  But figuring out what's going on with that is the next step, and I'll consider the main-thread interrupt when I do so.
Comment on attachment 8526050 [details] [diff] [review]
DOM Futex API, v2

Canceling review - this code does not handle spurious wakeups properly.
Attachment #8526050 - Flags: review?(khuey)
Discussed the issues here with Ben Turner at the work week.  There are at least these three issues to investigate (in addition to the spurious wakeups) before offering the patch again:

- Understand interaction of worker that is blocked in futexWait with memory report, which needs to
  break the lock.  But that break must not be observable.
- Disable futexWait in the worker's close handler.  See table in WorkerFeature.h.
- Figure out how this interacts with GC.  Can GC run when a worker is blocked?
There is also an interaction with the worker interrupt handler, which calls back into the event loop but should possibly/probably not do that if the interrupt was triggered from a futexWait.  In general the memory reporter and GC also seem to be interacting with the interrupt handler.
Attached patch Adjustments to the JS API (obsolete) — Splinter Review
The adjustments are simple but introduce a new error message in the error database.
Attachment #8558014 - Flags: review?(luke)
Attachment #8526050 - Attachment is obsolete: true
Comment on attachment 8558015 [details] [diff] [review]
DOM Futex API - block on workers, but not on the main thread

Context: This patch implements the DOM bits of the proposed (experimental) "futex" API for use with shared memory.  The spec for the full feature is here: https://docs.google.com/document/d/1NDGA_gZJ7M7w1Bh8S0AoDyEqwDdRh4uSoTPSNn77PFk/edit?usp=sharing, you want the section "Atomic Operations for Worker Suspension and Wake-up" and the operations in question are futexWait, futexWake, and futexWakeOrRequeue.

There are numerous test cases included.

This patch implements blocking wait on workers but disallows it on the main thread (the main thread code is not finished and is in any case more controversial).  I've attempted to implement a nested control-processing event loop as you and Ben suggested to me in SF in December.

This is a Nightly-only feature at the moment and if/when it moves off Nightly it will be behind a pref.
Attachment #8558015 - Flags: review?(khuey)
Comment on attachment 8558014 [details] [diff] [review]
Adjustments to the JS API

Review of attachment 8558014 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/AtomicsObject.cpp
@@ +809,5 @@
>        case JS::PerRuntimeFutexAPI::Timedout:
>          r.setInt32(AtomicsObject::FutexTimedout);
>          break;
> +      case JS::PerRuntimeFutexAPI::ErrorException:
> +        MOZ_ASSERT(JS_IsExceptionPending(cx));

Your jsapi.h comment says "or throw an uncatchable", in which case there would be no exception pending.
Attachment #8558014 - Flags: review?(luke) → review+
Comment on attachment 8558015 [details] [diff] [review]
DOM Futex API - block on workers, but not on the main thread

Redirecting review to Ben (on advice from more senior collaborator :)
Attachment #8558015 - Flags: review?(khuey) → review?(bent.mozilla)
(In reply to Luke Wagner [:luke] from comment #23)
> Comment on attachment 8558014 [details] [diff] [review]
> Adjustments to the JS API
> 
> Your jsapi.h comment says "or throw an uncatchable", in which case there
> would be no exception pending.

Doc bug, will fix that.
(In reply to Lars T Hansen [:lth] from comment #20)
> Created attachment 8558014 [details] [diff] [review]
> Adjustments to the JS API

https://hg.mozilla.org/integration/mozilla-inbound/rev/53cc8f769ca2
Keywords: leave-open
This seems reasonably clean to me.  One test case is included; I'm in the process of writing more.

Some complexity results from an installed interrupt handler being able to signal further interrupts (eg for GC), and being able to run JS code (or so it seems to me, still digging around in that code).

There's a little additional churn for the Worker code in this patch because the JS engine API evolved further.
Attachment #8559842 - Flags: feedback?(luke)
BTW the limitation on the wait time on the main thread (4000s) can be removed without too much fuss, not that that will matter in practice apart from making the API cleaner.
Fixes an initialization bug, rewrites the timeout logic, adds test cases.
Attachment #8559842 - Attachment is obsolete: true
Attachment #8559842 - Flags: feedback?(luke)
Attachment #8561422 - Flags: feedback?(luke)
Comment on attachment 8561422 [details] [diff] [review]
DOM Futex API - block on the main thread, v2

Review of attachment 8561422 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/nsJSEnvironment.cpp
@@ +2611,5 @@
>  };
>  
> +JS::PerRuntimeFutexAPI::WakeResult
> +JSMainFutexAPIImpl::wait(double timeout_ns)
> +{

I was thinking that it'd be good to maintain a bool (could be #ifdef) indicating the locked status that you could assert at the beginning of all FutexAPIImpl methods.

@@ +2613,5 @@
> +JS::PerRuntimeFutexAPI::WakeResult
> +JSMainFutexAPIImpl::wait(double timeout_ns)
> +{
> +  if (killed_)
> +    return Uncatchable;

There is one JSRuntime shared by main-thread content.  Let's say one origins misbehaves and gets killed, you wouldn't want that to disable all futexes in all origins henceforth until the browser is closed.  I don't really understand this notify() business, so I don't yet know why killed_ needs to be a persistent state as opposed to a simple local variable of the wait()/wake() algo.

@@ +2720,5 @@
> +  PR_NotifyCondVar(cond_);
> +}
> +
> +void
> +JSMainFutexAPIImpl::notify(ThreadStatus status)

Who calls notify again?  I can't find any callers.  I'm hoping its residual b/c it seems like wake(WakeForJSInterrupt) is all you need (on the main thread and in workers).
Attachment #8561422 - Flags: feedback?(luke)
(In reply to Luke Wagner [:luke] from comment #31)
> Comment on attachment 8561422 [details] [diff] [review]
> DOM Futex API - block on the main thread, v2
> 
> Review of attachment 8561422 [details] [diff] [review]:
> -----------------------------------------------------------------

All good suggestions.  Notify was indeed an artifact of the Worker case; kill is as you point out incorrect for the main thread; and the flag for locked status is a useful check to have.
These become necessary to support the subsequent two patches, especially the one that implements blocking on the main thread.
Attachment #8562098 - Flags: review?(luke)
Relatively minor changes, resulting from further testing but also fallout from developing the subsequent patch.
Attachment #8558015 - Attachment is obsolete: true
Attachment #8558015 - Flags: review?(bent.mozilla)
Attachment #8562099 - Flags: review?(bent.mozilla)
Attachment #8561422 - Attachment is obsolete: true
Attachment #8562101 - Flags: review?(bent.mozilla)
Comment on attachment 8562098 [details] [diff] [review]
Further adjustments to the JS API

Review of attachment 8562098 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/AtomicsObject.cpp
@@ +721,5 @@
>      {
>      }
>  
>      bool        waiting;                // Set to true when the worker is on the list and not woken
> +    bool        suspended;              // Set to true when the worker is on the list, not woken, and not wakeable

I was thinking about what the "right" thing to do in this wake()-during-interrupt case.  Initially I thought it was this bizarre corner case that couldn't happen (the wake() happening from within the interrupt code itself), but then I realized that it can happen totally naturally just from another non-interrupted worker wake()ing the interrupted worker.  Since non-fatal interrupts can happen spuriously, this would mean that, every now and then, a wait()er misses its wake() would be bad.  So I think the correct behavior here is to make wake()-during-interrupt Just Work (which is: after the interrupt returns, if execution was not aborted, and another thread (or in the bizarre case, the interrupt handler itself) woke the waiter, it wakes).

::: js/src/vm/Runtime.cpp
@@ +614,5 @@
> +        // tell here whether the main thread will respond to the
> +        // interrupt within its futex code or within script code.  (We
> +        // could perhaps arrange for that information to be available
> +        // at the cost of some inter-thread protocol within the DOM
> +        // code.)

The comment (esp the word "races") made me think there was something subtle going on here but, iiuc, this is exactly what you'd expect given futexes.  The names of all the methods are so self-documenting, I'd say you don't need any comment.
Attachment #8562098 - Flags: review?(luke)
Good call on the wake-during-interrupt, but there are two cases here.

One is if a suspended/interrupted waiter is woken by another agent; I agree that it should wake up when the interrupt handler returns and I've implemented that and it's quite tidy.  

The other case I worry about is that the same runtime can be used to push another element onto the waiters list.  (This requires the runtime to reach a call to futexWait from the interrupt handler, possibly from one of those rare events that are fired synchronously while interrupted, it does not seem at all likely but seem at least possible.)  If a wakeup is then delivered to the main thread from a worker the waking has to happen in LIFO order for waiters using that runtime (for a given wakeup location), which is opposite to the normal FIFO order that is mandated by the spec.
How about this: when calling into the interrupt handler during wait, you set a flag (analogous to suspended in the last patch, but named "handlingInterrupt" or something specific to this case) and then any attempt to wait() just reports an error if handlingInterrupt.
(Btw, it'd be good to write a test for this; the shell lets you run arbitrary JS from an interrupt handler via timeout().)
(In reply to Luke Wagner [:luke] from comment #38)
> How about this: when calling into the interrupt handler during wait, you set
> a flag (analogous to suspended in the last patch, but named
> "handlingInterrupt" or something specific to this case) and then any attempt
> to wait() just reports an error if handlingInterrupt.

That's certainly expedient.  I have another approach which is less heavy-handed but requires a little more engineering, I may save it for another day.
Depends on: 1131943
I think this addresses the concerns.  Test cases follow.
Attachment #8562098 - Attachment is obsolete: true
Attachment #8562941 - Flags: review?(luke)
Attached patch Test cases for API adjustments (obsolete) — Splinter Review
Two test cases here:

One requires two DOM patches to be applied and works in concert with the slow script dialog; it tests that a futexWait that is suspended due to an interrupt is properly woken.

The other requires the JS shell patch from bug 1131953 to be applied first.  It sets up a timeout (as suggested in an earlier comment) and then tries to wait from within the timeout callback.  The attempt causes an error to be thrown.
Comment on attachment 8562099 [details] [diff] [review]
DOM Futex API - block on workers, but not on the main thread, v2

Removing review on this patch due to a bug; new patch coming tomorrow.
Attachment #8562099 - Flags: review?(bent.mozilla)
Comment on attachment 8562941 [details] [diff] [review]
Further adjustments to the JS API, v2

Review of attachment 8562941 [details] [diff] [review]:
-----------------------------------------------------------------

Ah, this looks pretty good.  Two suggested reductions:

::: js/src/builtin/AtomicsObject.cpp
@@ +866,5 @@
> +
> +        w.suspended = true;
> +        {
> +          AutoUnlockFutexAPI unlock(fx);
> +          fx->enterInterrupt();

Why is it necessary to have both suspended and interrupted_?  I realize one is on the FutexWaiter and one is on the FutexPerRuntimeFutexAPI but these are in 1:1 correspondence (as long as the FutexWaiter is alive), so it seems like Wake could just as well test c->fx->isInterrupted() (after enter/leaveInterrupt() have been moved outside scope ofAutoUnlockFutexAPI).

@@ +873,5 @@
> +        }
> +        w.suspended = false;
> +        if (!retval)
> +            break;
> +        if (w.wokenDuringSuspension) {

Is it necessary to have a separate wokenDuringSuspension or can we just check "if (!w.waiting)" here.  If I'm reading correctly, 'waiting' is cleared iff wokenDuringSuspension is set.  Perhaps this (and the above) would help if we allowed nested waiting but I think should keep things as simple as possible until we decide to do that (which I really think isn't worth it since it should never happen in any normal app).

@@ +876,5 @@
> +            break;
> +        if (w.wokenDuringSuspension) {
> +#ifdef DEBUG
> +            fprintf(stderr, "A waiter was woken during suspension, letting it wake up\n");
> +#endif

Remember to remove before landing.

::: js/src/jsapi.h
@@ +648,5 @@
>      // Since the sleeping thread also needs that lock to wake up, the
>      // thread will not actually wake up until the caller of wake()
>      // releases the lock.
> +    //
> +    // If the thread is waiting in a call to wait() and reason is

*the reason

@@ +649,5 @@
>      // thread will not actually wake up until the caller of wake()
>      // releases the lock.
> +    //
> +    // If the thread is waiting in a call to wait() and reason is
> +    // WakeExplicit then that call will return with Woken.

s/that/the wait()/

@@ +652,5 @@
> +    // If the thread is waiting in a call to wait() and reason is
> +    // WakeExplicit then that call will return with Woken.
> +    //
> +    // If the thread is waiting in a call to wait() and the reason is
> +    // WakeForJSInterrupt then that call will either remain active (if

s/that/the wait()/

::: js/src/vm/Runtime.cpp
@@ +609,5 @@
>  
> +    if (mode == JSRuntime::RequestInterruptUrgent) {
> +        // We must both wake the main thread and interrupt its JIT
> +        // code, not one or the other, because we don't know which
> +        // signal will reach the thread.

"which signal will reach the thread" is a bit confusing, since we literally use signals on Unices.  How about "If this interrupt is urgent (for example, the slow script dialog), take additional steps to interrupt corner cases where the above fields are not regularly polled: ilooping JIT code and futexWait."
(In reply to Luke Wagner [:luke] from comment #44)
> Comment on attachment 8562941 [details] [diff] [review]
> Further adjustments to the JS API, v2
> 
> Review of attachment 8562941 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Ah, this looks pretty good.  Two suggested reductions:
> 
> ::: js/src/builtin/AtomicsObject.cpp
> @@ +866,5 @@
> > +
> > +        w.suspended = true;
> > +        {
> > +          AutoUnlockFutexAPI unlock(fx);
> > +          fx->enterInterrupt();
> 
> Why is it necessary to have both suspended and interrupted_?  I realize one
> is on the FutexWaiter and one is on the FutexPerRuntimeFutexAPI but these
> are in 1:1 correspondence (as long as the FutexWaiter is alive), so it seems
> like Wake could just as well test c->fx->isInterrupted() (after
> enter/leaveInterrupt() have been moved outside scope ofAutoUnlockFutexAPI).

I see your point.  I'll think about it.  As you note below, it's an intermediate stage toward supporting nested waiting, but I agree it's unnecessary to pay for that now. 

> @@ +873,5 @@
> > +        }
> > +        w.suspended = false;
> > +        if (!retval)
> > +            break;
> > +        if (w.wokenDuringSuspension) {
> 
> Is it necessary to have a separate wokenDuringSuspension or can we just
> check "if (!w.waiting)" here.  If I'm reading correctly, 'waiting' is
> cleared iff wokenDuringSuspension is set.  Perhaps this (and the above)
> would help if we allowed nested waiting but I think should keep things as
> simple as possible until we decide to do that (which I really think isn't
> worth it since it should never happen in any normal app).

That's a good call, and a nice reduction in complexity (for now).

> @@ +876,5 @@
> > +            break;
> > +        if (w.wokenDuringSuspension) {
> > +#ifdef DEBUG
> > +            fprintf(stderr, "A waiter was woken during suspension, letting it wake up\n");
> > +#endif
> 
> Remember to remove before landing.

I left this in on purpose but in that case it should probably use some other mechanism.  Will remove it.

Comments will be fixed, of course.
Took your suggestion on both points; nice simplification.
Attachment #8562941 - Attachment is obsolete: true
Attachment #8562941 - Flags: review?(luke)
Attachment #8563363 - Flags: review?(luke)
Comment on attachment 8563363 [details] [diff] [review]
Further adjustments to the JS API, v3

Beautiful, thanks.
Attachment #8563363 - Flags: review?(luke) → review+
Many changes as a result from bugs discovered by Jukka, see bug 1131757.
Attachment #8562099 - Attachment is obsolete: true
Attachment #8565067 - Flags: review?(bent.mozilla)
Cleaned up a bit, and added more test cases.
Attachment #8562101 - Attachment is obsolete: true
Attachment #8562101 - Flags: review?(bent.mozilla)
Attachment #8565068 - Flags: review?(bent.mozilla)
Attachment #8565067 - Flags: review?(bent.mozilla)
Attachment #8565068 - Flags: review?(bent.mozilla)
Attachment #8566029 - Flags: review?(luke)
Attached patch Test casesSplinter Review
These tests live in dom/{base,workers}/test at the moment but that doesn't have to be the case, and may not even be natural if all the futex code is in the JS engine.
Attachment #8566035 - Flags: feedback?(luke)
Comment on attachment 8566029 [details] [diff] [review]
Futex API (DOM + shell, workers + main)

Review of attachment 8566029 [details] [diff] [review]:
-----------------------------------------------------------------

Lovely!  One suggested semantic change below.

::: js/src/builtin/AtomicsObject.cpp
@@ +703,5 @@
>  
>  class FutexWaiter
>  {
>    public:
> +    FutexWaiter(uint32_t offset, JSRuntime *fx)

Can you s/fx/rt/ everywhere in this file?

@@ +770,5 @@
>      int32_t value;
>      if (!ToInt32(cx, valv, &value))
>          return false;
> +    double timeout_ns;
> +    if (!ToInteger(cx, timeoutv, &timeout_ns))

For all these high-resolution timing situations, I was thinking it'd be good to use consistent units and performance.now() which uses units of milliseconds (expressing sub-milliseconds as fractional parts): https://dvcs.w3.org/hg/webperf/raw-file/tip/specs/HighResolutionTime/Overview.html#sec-extenstions-performance-interface

@@ +851,5 @@
> +    // work cross-platform.
> +
> +    uint64_t maxSlice = 4000000000LLU;
> +
> +    uint64_t finalEnd = timed ? PRMJ_Now() + (uint64_t)timeout_us : 0;

Perhaps put finalEnd right below the timeout_us > 2e53 test?

@@ +860,5 @@
> +        if (timed) {
> +            sliceStart = PRMJ_Now();
> +            wakeResult = fx->futexWait(cx, PR_MicrosecondsToInterval((uint32_t)Min((uint64_t)timeout_us, maxSlice)));
> +        }
> +        else {

} else {

@@ +962,5 @@
>          return false;
>      if (count < 0)
>          count = 0;
>  
> +    JSRuntime* fx = cx->runtime();

Is this variable dead now?

@@ +1028,5 @@
>          r.setUndefined();
>          return true;
>      }
>  
> +    JSRuntime* fx = cx->runtime();

Also dead?

::: js/src/vm/Runtime.cpp
@@ +710,5 @@
> +JSRuntime::futexWait(JSContext *cx, uint32_t timeout_us)
> +{
> +    MOZ_ASSERT(futexLockHolder_ == PR_GetCurrentThread());
> +
> +    futexState_ = Waiting;

Can you MOZ_ASSERT(futexState_ == Idle)?

@@ +720,5 @@
> +
> +    // PR_FAILURE currently results only from API usage bug or PR_Interrupt(), which
> +    // is not used for thread management in Firefox.
> +    if (PR_WaitCondVar(futexCond_, timeout_us) == PR_FAILURE)
> +        futexState_ = Interrupted;

Everywhere else in SM (and according to bent, in Gecko too) just asserts PR_SUCESS, so I think you can drop the comment and test and use JS_ALWAYS_TRUE().  With this last use of Interrupted removed, you can remove it from WakeResult.

@@ +726,5 @@
> +#ifdef DEBUG
> +    futexLockHolder_ = holder;
> +#endif
> +
> +    WakeResult result = futexState_ != Waiting ? futexState_ : TimedOut;

IIUC, the current code is fine in case of spurious wakeup b/c, even though we report it here as TimedOut, in the caller we'll see that the time hasn't actually elapsed and try again.  I think it'd be nicer for readability/local-reasoning if we could see the loop immediately surrounding the PR_WaiCondVar instead of having to go read the caller.  What if instead we also moved all our timing logic (the 2e53 check, maxSlice, the whole loop-until-time-has-elapsed) into this function and had it just take the 'double timeout'?

::: js/src/vm/Runtime.h
@@ +788,5 @@
>      JSVersion defaultVersion_;
>  
> +    // Global futex lock.
> +    static bool FutexInitialize();
> +    static void FutexDestroy();

Could you factor all this code into a FutexRuntime class (in AtomicsObject.h) that is stored as a simple public member variable in the JSRuntime?  We've done this for JitRuntime/GCRuntime and it's nice b/c it keeps JSRuntime smaller.  If you did that you could also remove many of the futex name prefixes for all these members, since the code will read like "rt->futex.wake()".  Another benefit is the impl code can naturally go in AtomicsObject.cpp which should make it easier to read as part how it's used.
Comment on attachment 8566035 [details] [diff] [review]
Test cases

Even if we can test it from the JS shell (which is way nicer for JS engine devs, so we should mostly have jit-tests), it's a great idea to have browser tests as you've done here to make sure the whole thing works.
Attachment #8566035 - Flags: feedback?(luke) → feedback+
(In reply to Luke Wagner [:luke] from comment #52)
> Comment on attachment 8566029 [details] [diff] [review]
> Futex API (DOM + shell, workers + main)
> 
> @@ +720,5 @@
> > +
> > +    // PR_FAILURE currently results only from API usage bug or PR_Interrupt(), which
> > +    // is not used for thread management in Firefox.
> > +    if (PR_WaitCondVar(futexCond_, timeout_us) == PR_FAILURE)
> > +        futexState_ = Interrupted;
> 
> Everywhere else in SM (and according to bent, in Gecko too) just asserts
> PR_SUCESS, so I think you can drop the comment and test and use
> JS_ALWAYS_TRUE().  With this last use of Interrupted removed, you can remove
> it from WakeResult.

Nice.

> @@ +726,5 @@
> > +#ifdef DEBUG
> > +    futexLockHolder_ = holder;
> > +#endif
> > +
> > +    WakeResult result = futexState_ != Waiting ? futexState_ : TimedOut;
> 
> IIUC, the current code is fine in case of spurious wakeup b/c, even though
> we report it here as TimedOut, in the caller we'll see that the time hasn't
> actually elapsed and try again.

It's structured the way it is because Waiting is not a valid return value (indeed it's nonsensical), taken as it is.  Of course that could be changed.  And if I take your subsequent refactoring suggestion it's a moot point.

> I think it'd be nicer for
> readability/local-reasoning if we could see the loop immediately surrounding
> the PR_WaiCondVar instead of having to go read the caller.  What if instead
> we also moved all our timing logic (the 2e53 check, maxSlice, the whole
> loop-until-time-has-elapsed) into this function and had it just take the
> 'double timeout'?

I spent a lot of time this afternoon moving that logic out of that function since that logic really doesn't belong in Runtime and having it apart from the futex code in AtomicsObject.cpp was not beneficial, so... sigh :)  Of course, if I refactor with a FutexRuntime class, which seems like a good idea on its own, then the issue disappears in large part.  Seems like the way to go.
(In reply to Luke Wagner [:luke] from comment #52)
> Comment on attachment 8566029 [details] [diff] [review]
> Futex API (DOM + shell, workers + main)
> 
> @@ +770,5 @@
> >      int32_t value;
> >      if (!ToInt32(cx, valv, &value))
> >          return false;
> > +    double timeout_ns;
> > +    if (!ToInteger(cx, timeoutv, &timeout_ns))
> 
> For all these high-resolution timing situations, I was thinking it'd be good
> to use consistent units and performance.now() which uses units of
> milliseconds (expressing sub-milliseconds as fractional parts):
> https://dvcs.w3.org/hg/webperf/raw-file/tip/specs/HighResolutionTime/
> Overview.html#sec-extenstions-performance-interface

I agree.  This is an API change however and has to be scheduled a little bit, since there are now users of the API.  I'll make a note on the spec document and be sure to change this in rev2, which I intend to finalize in march (along with code changes to support it).
I think this addresses all remarks on the previous patch.
Attachment #8558014 - Attachment is obsolete: true
Attachment #8562943 - Attachment is obsolete: true
Attachment #8563363 - Attachment is obsolete: true
Attachment #8565067 - Attachment is obsolete: true
Attachment #8565068 - Attachment is obsolete: true
Attachment #8566029 - Attachment is obsolete: true
Attachment #8566029 - Flags: review?(luke)
Attachment #8566449 - Flags: review?(luke)
(In reply to Lars T Hansen [:lth] from comment #55)
> (In reply to Luke Wagner [:luke] from comment #52)
>
> > For all these high-resolution timing situations, I was thinking it'd be good
> > to use consistent units and performance.now() which uses units of
> > milliseconds (expressing sub-milliseconds as fractional parts):
> > https://dvcs.w3.org/hg/webperf/raw-file/tip/specs/HighResolutionTime/
> > Overview.html#sec-extenstions-performance-interface
> 
> I agree.  This is an API change however and has to be scheduled a little
> bit, since there are now users of the API.  I'll make a note on the spec
> document and be sure to change this in rev2, which I intend to finalize in
> march (along with code changes to support it).

Actually I've talked to Jukka and it causes no hardship for him to change this now, so I'll create an additional patch that fixes that.
Comment on attachment 8566449 [details] [diff] [review]
Futex API (DOM + shell, workers + main), v2

Review of attachment 8566449 [details] [diff] [review]:
-----------------------------------------------------------------

This looks very nice indeed.  I think we can collapse the state space further by folding FutexWaiter::waiting into FutexRuntime::state_ and this would avoid all the awkwardness with 'waiting' in this patch as well as reduce the state space.  What do you think of:
 - remove interruptLevel_ and related methods; the two futexWait callers unconditionally call FutexRuntime::wake()
 - add a new FutexState WaitingInterrupted; FutexRuntime::wait sets state_ = WaitingInterrupted instead of Idle right before calling handleInterrupt
 - remove FutexWaiter::waiting; add FutexRuntime::isWaiting() { return state_ == Waiting || state_ == WaitingInterrupted; } to be queried instead
 - in FutexRuntime::wake, if state_ == WaitingInterrupted { if WakeExplicit, then state_ = Woken, else no state change }

::: js/src/builtin/AtomicsObject.cpp
@@ +1002,5 @@
>  
> +/* static */ bool
> +js::FutexRuntime::initialize()
> +{
> +    lock_ = PR_NewLock();

Can you MOZ_ASSERT(!lock_);?

@@ +1010,5 @@
> +/* static */ void
> +js::FutexRuntime::destroy()
> +{
> +    if (lock_)
> +        PR_DestroyLock(lock_);

To satisfy the above assert in cases of init;destroy;init;destroy, could you null out lock_ here?

@@ +1143,5 @@
> +            }
> +            break;
> +
> +          case FutexRuntime::Woken:
> +            goto finished;

It'd be nice to see '*result = x' immediately before every 'goto finished'.  I realize GCC probably warns if you remove the dominating assignment to *result, so perhaps just assign a safe-but-invalid value there.

@@ +1149,5 @@
> +          case FutexRuntime::WokenForJSInterrupt:
> +            // The interrupt handler may reenter the engine.  In that case
> +            // there are two complications:
> +            //
> +            // - The waiting thread is not actually in a wait() call so we

s/wait()/PR_WaitCondVar/

::: js/src/builtin/AtomicsObject.h
@@ +78,5 @@
> +    //
> +    // wait() will not wake up spuriously.  It will return true and
> +    // set *result to a return code appropriate for
> +    // Atomics.futexWait() on success, and return false on error.
> +    bool wait(JSContext *cx, bool *waiting, double timeout, int32_t *result);

It'd be nice to declare 'enum WakeResult' in this class and use that instead of int32_t for the type of 'result'.

::: js/src/vm/Runtime.cpp
@@ +58,5 @@
>  /* static */ ThreadLocal<PerThreadData*> js::TlsPerThreadData;
>  /* static */ Atomic<size_t> JSRuntime::liveRuntimesCount;
> +/* static */ Atomic<PRLock*> FutexRuntime::lock_;
> +#ifdef DEBUG
> +/* static */ Atomic<PRThread*> FutexRuntime::lockHolder_;

Can you move these to AtomicsObject.cpp?
(In reply to Luke Wagner [:luke] from comment #58)
> Comment on attachment 8566449 [details] [diff] [review]
> Futex API (DOM + shell, workers + main), v2
> 
> Review of attachment 8566449 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This looks very nice indeed.  I think we can collapse the state space
> further by folding FutexWaiter::waiting into FutexRuntime::state_ and this
> would avoid all the awkwardness with 'waiting' in this patch as well as
> reduce the state space.  What do you think of:
>  - remove interruptLevel_ and related methods; the two futexWait callers
> unconditionally call FutexRuntime::wake()

Ah, you mean the two _futexWake_ callers (I was very lost).

>  - add a new FutexState WaitingInterrupted; FutexRuntime::wait sets state_ =
> WaitingInterrupted instead of Idle right before calling handleInterrupt
>  - remove FutexWaiter::waiting; add FutexRuntime::isWaiting() { return
> state_ == Waiting || state_ == WaitingInterrupted; } to be queried instead
>  - in FutexRuntime::wake, if state_ == WaitingInterrupted { if WakeExplicit,
> then state_ = Woken, else no state change }

I'll try it out.  (Complexity reduction is welcome but I'm not convinced this is it.)

> ::: js/src/builtin/AtomicsObject.h
> @@ +78,5 @@
> > +    //
> > +    // wait() will not wake up spuriously.  It will return true and
> > +    // set *result to a return code appropriate for
> > +    // Atomics.futexWait() on success, and return false on error.
> > +    bool wait(JSContext *cx, bool *waiting, double timeout, int32_t *result);
> 
> It'd be nice to declare 'enum WakeResult' in this class and use that instead
> of int32_t for the type of 'result'.

I've not done that since the values stored in *result may be negative and (so far as I can tell) an enum has unknown signedness.  But I've documented the set of values that can be stored in *result and added an assertion that the value stored is one of the expected ones.
(In reply to Lars T Hansen [:lth] from comment #59)
> I've not done that since the values stored in *result may be negative and
> (so far as I can tell) an enum has unknown signedness.  But I've documented
> the set of values that can be stored in *result and added an assertion that
> the value stored is one of the expected ones.

Good news!  With the recent dropping of old compilers, we can now use 'enum WakeResult : int32_t {...}' in Mozilla code: https://developer.mozilla.org/en-US/docs/Using_CXX_in_Mozilla_code
I believe this addresses all remarks so far, and it includes the change that interprets the timeout argument for futexWait as milliseconds.  It also allows the timeout argument to be omitted and default to +Infinity, a much needed API adjustment.
Attachment #8566449 - Attachment is obsolete: true
Attachment #8566449 - Flags: review?(luke)
Attachment #8566721 - Flags: review?(luke)
Comment on attachment 8566721 [details] [diff] [review]
Futex API (DOM + shell, workers + main), v3

Review of attachment 8566721 [details] [diff] [review]:
-----------------------------------------------------------------

\o/

::: js/src/builtin/AtomicsObject.cpp
@@ +1195,5 @@
> +        state_ = Woken;
> +        return;
> +    }
> +    if (state_ != Waiting)
> +        return;

IIUC, this function can now only be called in states Waiting and WaitingInterrupted.  It'd be nice to either (1) MOZ_ASSERT() this as a precondition and remove this "if (state_ != Waiting)" test or (2) switch (state_) and MOZ_CRASH() in the non-waiting states.
Attachment #8566721 - Flags: review?(luke) → review+
One distinction that /may/ have been lost in this latest rewrite is that there may be restrictions on futexWait once a worker is closing.  Intuitively I'd guess that interrupts will take care of that too, as they are required to take care of ilooping close handlers, but it's something that should be tested  carefully.
(In reply to Lars T Hansen [:lth] from comment #63)
> One distinction that /may/ have been lost in this latest rewrite is that
> there may be restrictions on futexWait once a worker is closing. 
> Intuitively I'd guess that interrupts will take care of that too, as they
> are required to take care of ilooping close handlers, but it's something
> that should be tested  carefully.

Onclose event: https://developer.mozilla.org/en-US/docs/Web/API/WorkerGlobalScope.onclose

In WorkerFeature.h we learn that the close handler runs for an unlimited time when triggered by self.close() and worker.terminate() but it gets a short timeout when triggered by navigating away from the page, so that's what we should test.
Try is green:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8a3d11fa7087

Landed:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cf3b9f27efcf

Will move the DOM test cases to a new bug; removing leave-open.
Keywords: leave-open
https://hg.mozilla.org/mozilla-central/rev/cf3b9f27efcf
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
(In reply to Lars T Hansen [:lth] from comment #63)
> One distinction that /may/ have been lost in this latest rewrite is that
> there may be restrictions on futexWait once a worker is closing. 
> Intuitively I'd guess that interrupts will take care of that too, as they
> are required to take care of ilooping close handlers, but it's something
> that should be tested  carefully.

Yeah... Ideally we just wouldn't allow any waits in the close handler, but I don't think there's any real problem here. Of course there could always be bugs...
No longer depends on: 1131943
You need to log in before you can comment on or make changes to this bug.