Closed Bug 1131953 Opened 5 years ago Closed 5 years ago

Add futex support for the JS shell

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox38 --- fixed

People

(Reporter: lth, Assigned: lth)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 3 obsolete files)

Currently the futex support is only installed by DOM (main thread and workers), but since the JS shell supports simple workers with evalInWorker there's no reason why we should not add futex functionality to the shell.

That would allow us to have shell tests for futexes.  Those tests would not really test any of the DOM code, but they would test the engine logic, and would allow some tests to be written that are hard to write now, esp around how interrupts and futexes interact.
Adding the futex functionality to the shell is easy enough, but for anything meaningful to happen it needs to be possible to transfer (at a minimum) a shared-memory object to the worker.  Nothing like that seems to exist.  

Candidate solutions:

- Pass one or more SAB objects to the worker by means of evalInWorker; the worker would obtain
  these objects by calling some intrinsic (getWorkerParam(n) maybe).  Easy enough to extend to other
  types, and to use structured cloning to pass values.

- Register a SAB on the master side with setWorkerSAB(n, sab) and obtain it on the worker side with
  getWorkerSAB(n).

I like the first one, so I'll play around with that.
Attached patch WIP (obsolete) — Splinter Review
This compiles and links but has not been tested due to the lack of a way to transmit shared memory objects.
Actually the solution that I'll probably implement is to have a single global mailbox that will hold one SharedArrayBuffer, this mailbox can be read and written by main thread and workers alike.  If they need more coordination than that they can coordinate via shared memory.

New shell functions, then:

  setSharedArrayBuffer(sab)
  getSharedArrayBuffer()
Attached patch WIP v2 (obsolete) — Splinter Review
Implements the one-element mailbox.  Clean, but could use a little more testing before review.
Attachment #8562680 - Attachment is obsolete: true
Attached patch Patch (obsolete) — Splinter Review
Code an test case.

I have an additional test case attached to bug 1074237 that I can land once the jsapi changes on that bug have landed and this patch has landed.
Attachment #8562927 - Attachment is obsolete: true
Attachment #8563370 - Flags: review?(luke)
Comment on attachment 8563370 [details] [diff] [review]
Patch

Review of attachment 8563370 [details] [diff] [review]:
-----------------------------------------------------------------

I like the global mailbox solution.  Once you have shared memory, you can even futexWait on the result of an evalInWorker() from the main thread, which could allow building some nice testing primitives later on.

::: js/src/shell/js.cpp
@@ +323,5 @@
> +                wait = PR_MillisecondsToInterval((timeout_us + 999) / 1000); // Round up always
> +            }
> +            gFutexLockHeld = false;
> +            if (PR_WaitCondVar(cond_, wait) == PR_FAILURE)
> +                interrupted = true;

IIUC, other than user errors (calling w/o lock held), the only valid PR_FAILURE we expect here is spurious condvar wakeup (EINTR).  In that case, we *do* want to continue and try again which is what will happen if we simply ignore the return value (and delete interrupted).

@@ +333,5 @@
> +                else
> +                    timeout_us -= (after - before);
> +            }
> +        } while (!(interrupted || wokenForJSInterrupt_ || woken_ || timedout));
> +        waiting_ = false;

Can you MOZ_ASSERT(waiting_); before clearing?

@@ +337,5 @@
> +        waiting_ = false;
> +        if (interrupted)
> +            return Uncatchable;
> +        if (wokenForJSInterrupt_)
> +            return WokenForJSInterrupt;

I think you need to clear wokenForJSInterrupt_ here or else all future waits() will return WokenForJSInterrupt.

@@ +354,5 @@
> +          case WakeExplicit:
> +            woken_ = true;
> +            break;
> +          case WakeForJSInterrupt:
> +            wokenForJSInterrupt_ = true;

Instead of having two independent bools (with 4 possible pair states, 1 of which is unobservable (both false), 1 of which is impossible (both true)), how about just having a WakeReason as the field and switching on that?  To catch bugs, you could set it to WakeReason(-1) before the PR_WaitCondVar and assert the reason is WakeReason(-1) in wake() before clobbering it.

@@ +2896,5 @@
>      JS_SetErrorReporter(rt, my_ErrorReporter);
>  
> +    ShellFutexAPI *fx = js_new<ShellFutexAPI>();
> +    if (!fx || !fx->init()) {
> +        js_delete(fx);

Random note, I just noticed that ~JSRuntime has 'delete futexAPI_;'; it should use js_delete(futexAPI_) (global SM rule; allows the embedding to switch allocators).  That means that the embedding must also use JS_malloc() to allocate the memory which is a pain and now I'm starting to think that having ~JSRuntime delete the FutexAPI is a bad idea (which was probably mine).  It also prevents the FutexAPI object from being a global or member variable (which I would expect them to be for the main-thread and worker-thread, resp.).

@@ +4429,5 @@
> +        return false;
> +    }
> +
> +    SharedArrayRawBuffer *newBuffer = args[0].toObject().as<SharedArrayBufferObject>().rawBufferObject();
> +    newBuffer->addReference();

Could you have a corresponding dropReference() in main(), right after the JS_DestroyRuntime().  That way, in theory, all SharedArrayRawBuffers would be dead by the time we JS_Shutdown().  Then, to test let this condition be tested/fuzzed, you could keep an #ifdef DEBUG count of total live SharedArrayRawBuffers and assert that number is zero right from inside JS_Shutdown().
Actually, taking my above comment further, you could merge waiting_ in by having a tri-state enum: Waiting, Woken, WokenForJSInterrupt and this would map 1:1 with the FutexAPI states.
Higher-level question: if JS_RequestInterruptCallback() Just Works for interrupting a blocked futex, do we even need a FutexAPI?  I'm having trouble coming up with what it gives us just using PRLock/PRCondVar directly inside AtomicsObject.cpp.  I know workers are doing some exciting stuff with nested event loops; maybe I just don't understand the constraints.
(In reply to Luke Wagner [:luke] from comment #6)
> Comment on attachment 8563370 [details] [diff] [review]
> Patch
> 
> Review of attachment 8563370 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/shell/js.cpp
> @@ +323,5 @@
> > +                wait = PR_MillisecondsToInterval((timeout_us + 999) / 1000); // Round up always
> > +            }
> > +            gFutexLockHeld = false;
> > +            if (PR_WaitCondVar(cond_, wait) == PR_FAILURE)
> > +                interrupted = true;
> 
> IIUC, other than user errors (calling w/o lock held), the only valid
> PR_FAILURE we expect here is spurious condvar wakeup (EINTR).  In that case,
> we *do* want to continue and try again which is what will happen if we
> simply ignore the return value (and delete interrupted).

A spurious wakeup is not reported as a PR_FAILURE, at least not according to the documentation in prcvar.h: PR_FAILURE should only be reported for an explicit PR_Interrupt(), which nobody uses, or for incorrect use of the API (forgetting to lock).

(Indeed I expect "spurious wakeup" and "wakeup" are necessarily indistinguishable, or there would be no reason to have a notion of "spurious wakeup"; the cvar could take care of never reporting that.)

> @@ +333,5 @@
> > +                else
> > +                    timeout_us -= (after - before);
> > +            }
> > +        } while (!(interrupted || wokenForJSInterrupt_ || woken_ || timedout));
> > +        waiting_ = false;
> 
> Can you MOZ_ASSERT(waiting_); before clearing?

I suppose so.

> @@ +337,5 @@
> > +        waiting_ = false;
> > +        if (interrupted)
> > +            return Uncatchable;
> > +        if (wokenForJSInterrupt_)
> > +            return WokenForJSInterrupt;
> 
> I think you need to clear wokenForJSInterrupt_ here or else all future
> waits() will return WokenForJSInterrupt.

No, because it's cleared before the wait loop, but I agree it is good hygiene to clear it after as well.

> @@ +354,5 @@
> > +          case WakeExplicit:
> > +            woken_ = true;
> > +            break;
> > +          case WakeForJSInterrupt:
> > +            wokenForJSInterrupt_ = true;
> 
> Instead of having two independent bools (with 4 possible pair states, 1 of
> which is unobservable (both false), 1 of which is impossible (both true)),
> how about just having a WakeReason as the field and switching on that?  To
> catch bugs, you could set it to WakeReason(-1) before the PR_WaitCondVar and
> assert the reason is WakeReason(-1) in wake() before clobbering it.

Maybe - need to think about the consequences.

> @@ +2896,5 @@
> >      JS_SetErrorReporter(rt, my_ErrorReporter);
> >  
> > +    ShellFutexAPI *fx = js_new<ShellFutexAPI>();
> > +    if (!fx || !fx->init()) {
> > +        js_delete(fx);
> 
> Random note, I just noticed that ~JSRuntime has 'delete futexAPI_;'; it
> should use js_delete(futexAPI_) (global SM rule; allows the embedding to
> switch allocators).  That means that the embedding must also use JS_malloc()
> to allocate the memory which is a pain and now I'm starting to think that
> having ~JSRuntime delete the FutexAPI is a bad idea (which was probably
> mine).

It was indeed your idea :)  It was a nice simplification at the time.  I'll file a followup bug for that.

> It also prevents the FutexAPI object from being a global or member
> variable (which I would expect them to be for the main-thread and
> worker-thread, resp.).

Not sure - depends on the relative deletion order of things and the protocol for that we eventually end up with.

> @@ +4429,5 @@
> > +        return false;
> > +    }
> > +
> > +    SharedArrayRawBuffer *newBuffer = args[0].toObject().as<SharedArrayBufferObject>().rawBufferObject();
> > +    newBuffer->addReference();
> 
> Could you have a corresponding dropReference() in main(), right after the
> JS_DestroyRuntime().  That way, in theory, all SharedArrayRawBuffers would
> be dead by the time we JS_Shutdown().  Then, to test let this condition be
> tested/fuzzed, you could keep an #ifdef DEBUG count of total live
> SharedArrayRawBuffers and assert that number is zero right from inside
> JS_Shutdown().

Sure thing.
(In reply to Luke Wagner [:luke] from comment #8)

> Higher-level question: if JS_RequestInterruptCallback() Just Works for
> interrupting a blocked futex, do we even need a FutexAPI?  I'm having
> trouble coming up with what it gives us just using PRLock/PRCondVar directly
> inside AtomicsObject.cpp.  I know workers are doing some exciting stuff with
> nested event loops; maybe I just don't understand the constraints.

The DOM code probably illustrates the need better than the shell code.  In the DOM, the worker code needs to interact with a nested event loop that has its own (existing) condition variable that it uses for Runnable dispatch and some timeout logic for handling runaway threads that have been canceled.  On the main thread, the FutexAPI uses its own condition variable but needs to manage timeouts that exceed the range of the interval timer used for conditional variable waits.

The FutexAPI looks like a bit of overhead but it has actually allowed for flexibility in how waiting is handled in different contexts.

It's possible that an extended interrupt mechanism could take the place of all that.  (Or perhaps, it's possible the interrupt mechanism should be extended to allow for all of that.)  I doubt it would be much simpler but it might be useful for other things as well.
I'm sorry, I still don't see the need for a nested event loop in either the main-thread or worker case.  In both of those cases, if a futexWait() needs to be interrupted, a watchdog thread requests an interrupt and it is the *interrupt callback* that does the nested event loop.  E.g.:
  http://mxr.mozilla.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#4698
This is really no different then how this all works with an ilooping 'while(true){}'.  I think Ben gave the advice to use a nested event loop to implement futexWait() when he thought that we'd unconditionally pthread_cond_wait, but since RequestInterruptCallback will wake the cond var (and we can distinguish this wake from a real futexWake()) everything will Just Work.  I talked with bent about this on irc and so far we can't see any counter-examples.
(Also, in addition to massively simplifying the whole path, it means we can review this all from within js/src.)
(In reply to Luke Wagner [:luke] from comment #11)
> I think Ben gave the advice to use a nested event loop to
> implement futexWait() when he thought that we'd unconditionally
> pthread_cond_wait, but since RequestInterruptCallback will wake the cond var
> (and we can distinguish this wake from a real futexWake()) everything will
> Just Work.

Yep, that was my advice, and my thought process. I didn't know RequestInterruptCallback would break us out of the wait.

As long as it does I think this sounds fine. Even the memory reporter stuff should continue to work correctly.
(In reply to Luke Wagner [:luke] from comment #12)

> (Also, in addition to massively simplifying the whole path, it means we can
> review this all from within js/src.)

A simplification would be most welcome, in this case at the cost of an additional watchdog thread for workers presumably (one should be enough).  I'll try to rework the code, and then we'll see.
A very nice simplification indeed.  And no extra thread needed, that was a misconception on my part.  All my tests are passing in a debug build; will continue to test for a little while before posting patches.  (I don't have any tests that check that the memory reporter works correctly, though.)
Shared memory mailboxes only (cleaned up) since all the other code moved to bug 1074237.
Attachment #8563370 - Attachment is obsolete: true
Attachment #8563370 - Flags: review?(luke)
Attachment #8566024 - Flags: review?(luke)
Comment on attachment 8566024 [details] [diff] [review]
Shared memory mailboxes

Review of attachment 8566024 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/shell/js.cpp
@@ +4302,5 @@
> +        newObj = SharedArrayBufferObject::New(cx, buf);
> +    }
> +    PR_Unlock(sharedArrayBufferMailboxLock);
> +
> +    args.rval().setObjectOrNull(newObj);

I think you need to "if (!newObj) return false" (since SharedArrayBufferObject::New() will have reported an out of memory error).  Then this can be setObject(*newObj).
Attachment #8566024 - Flags: review?(luke) → review+
(In reply to Luke Wagner [:luke] from comment #17)
> Comment on attachment 8566024 [details] [diff] [review]
> Shared memory mailboxes
> 
> Review of attachment 8566024 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/shell/js.cpp
> @@ +4302,5 @@
> > +        newObj = SharedArrayBufferObject::New(cx, buf);
> > +    }
> > +    PR_Unlock(sharedArrayBufferMailboxLock);
> > +
> > +    args.rval().setObjectOrNull(newObj);
> 
> I think you need to "if (!newObj) return false" (since
> SharedArrayBufferObject::New() will have reported an out of memory error). 
> Then this can be setObject(*newObj).

Not quite, but you're right about that bug.  The function needs to successfully return null if there is no object in the mailbox.
This does a better job of handling various argument types to setSharedArrayBuffer, notably the buffer argument can be cleared with null.  It also properly allows null to be returned from getSharedArrayBuffer.  Not asking for re-review, just noting the adjustments.
The futex tests are not yet comprehensive but this covers some interesting cases at least.
Attachment #8566400 - Flags: review?(luke)
Memo to self: the tests can be run locally but alone and with the harness, but on try they cause errors in the "J" test suite, see here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f1b2d292aac8

Possibly this is to do with timeout (test takes 5-6 seconds to run), or a missing annotation, or a bug in the tests when run on try - need to investigate.
Comment on attachment 8566400 [details] [diff] [review]
Test cases for shell futexes and shared memory mailbox

Review of attachment 8566400 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/tests/shell/shell.js
@@ +2,5 @@
> + * This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +if (typeof assertThrowsInstanceOf === 'undefined') {

Ah, so jit-tests/lib/asserts.js isn't available to js/src/tests?  Most of us default to putting tests in jit-tests/tests/some_new_dir so I don't have to worry about test harness strangeness (it's weird and mostly historic that we have two test harnesses anyways; they have been incrementally merged; and should one day be one).  Anyhow, if you have a reason for putting these tests in js/src/tests, there is already an assertThrows() in js/src/tests/shell.js, so you could put this right under.
Attachment #8566400 - Flags: review?(luke) → review+
conflict(?) with bug 1101662.
shell/futex.js fails with --tbpl option (especially --no-thread).

https://hg.mozilla.org/integration/mozilla-inbound/rev/e47a04e3dbe6
(In reply to Tooru Fujisawa [:arai] from comment #24)
> conflict(?) with bug 1101662.
> shell/futex.js fails with --tbpl option (especially --no-thread).
> 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/e47a04e3dbe6

I'll take a look at that in the morning.  --tbpl shouldn't fail but the feature absolutely requires threads (shell workers) so it may just be a matter of disabling the test for some configs.
Depends on: 1135368
Assignee: nobody → lhansen
You need to log in before you can comment on or make changes to this bug.