Open Bug 507718 Opened 11 years ago Updated 2 years ago

Provide a lightweight alternative to NSPR for synchronization+threads with a modern C++ API

Categories

(Core :: General, defect)

x86
macOS
defect
Not set

Tracking

()

People

(Reporter: gal, Unassigned)

References

Details

Attachments

(1 file)

No description provided.
Summary: Provide a lightweight alternative to NSPR with a modern C++ API → Provide a lightweight alternative to NSPR for synchronization+threads with a modern C++ API
It would be nice to have some simple lightweight threading and synchronization primitives with a modern C++ API, i.e.

Atomic<int> foo;

f++ => operator++ => happy times

Or virtual Thread::run(), the rest is hidden.

Performance-critical parts like Atomic should be all-inline and fast and use compiler intrinsics where available (i.e. gcc 4.1).

We can probably borrow code from someone instead of rolling our own from scratch.
We chatted about this last week and decided to implement the C++1x specification for threads, mutexes, condvars, and atomics.  This will also pull in part of the C++1x time API since some methods allow one to wait for a specified duration of time.  I have a nascent pthreads implementation of <thread> and implementing the rest will be quite easy as the API is similar to pthreads'.  A Windows implementation will be harder, but bent has volunteered to do this.  We can borrow code from boost, Google, and NSPR where it makes sense to do so.

This code will live in js/src and be shared between Spidermonkey and Gecko.  Once/if our compilers support this part of the C++1x spec, we can just switch to their implementations.

It would be nice to take this code from boost wholesale, but boost uses exceptions and implements a superset of the C++1x spec.  Instead of throwing exceptions as the spec dictates, we will just abort() on errors.  This code is crucial enough that there's not much point in trying to recover from an error, and it might be counterproductive to do so anyway.

We briefly discussed how to handle unsupported platforms/architectures.  The option we tentatively settled on was #error'ing to force maintainers of other platforms to provide an impl.  We also have the option of wrapping the C++1x API around NSPR, as a slow and crufty fallback, and #warning that the fallback is being used.
This better not take too long and (not or, I'm generous today) retrace too many NSPR footsteps, or I will cry "boondoggle."

/be
Don't get me wrong: C++ sugar won't (ought not!) take long and it will be sweet.

Only downside is kids eat too much (everyone wants to be a lock-free threaded programming superstar) and too many get cavities (bugs from lack of barriers or other subtleties -- Mom and Dad pay the bills).

/be
Nothing fancy is being planned here, just a (i) lightweight, modern wrapper around system threading primitives, and (ii) well-specified implementation of atomic operations.  NSPR has neither.

We'd still need to catch abuses of these APIs in review, just like we have to today.
js/src/threads, then?

/be
Aye.  Or maybe js/src/c++1x/ since it's going to include atomics and sync primitives too.
People will expect to find those in a dir called threads/, I think, and I don't want to really scope-creep this in the way that your directory name choice suggests, ahem!
I'd like to solicit feedback on the mostly-done v0.0, which lives here

http://hg.mozilla.org/users/cjones_mozilla.com/cxx1x/file/63a0c5674d42

I hope this medium is preferable to a big patch.  Implemented so far is most of C++1x's <thread>, <mutex>, <condition_variable> for pthreads systems, and <cstdatomic> (lock-free) for x86/x86-64/ARM (I also implemented a fallback that uses mutexes).  Some parts of their specs seemed useless or worse so I omitted them.

I'm going to finish up the lacking feature (timed condvar.wait()) and work on more tests in the mean time.
My WIP repo listed in comment 9 would lead to the layout

  js/src/threads/
     [platform-neutral headers]
     linux/
     osx/
     posix/
     pthread/
     windows/
     test/

I prefer this because I can use the same file names for all the platform-specific headers and selectively export them.  Is this OK, or does anyone have strong opinions on an alternate layout?
Not really sure who wants to/should be reviewing what, so I'm going to r? spam.
Assignee: nobody → jones.chris.g
Attachment #414176 - Flags: review?
Comment on attachment 414176 [details] [diff] [review]
part 1, v1: adds js/src/threads and mozstd::thread::hardware_concurrency()

Probably want to look over the build goop.
Attachment #414176 - Flags: review?(benjamin)
Attachment #414176 - Flags: review? → review?(brendan)
Comment on attachment 414176 [details] [diff] [review]
part 1, v1: adds js/src/threads and mozstd::thread::hardware_concurrency()

Brendan, do you want to enforce JS code conventions for this code? If so, we have to adjust indents and such a bit.
I'm not sure if the "and such" includes class/function naming conventions, but FWIW I used the same identifiers as the C++1x spec (save s/std/mozstd/).  The reason is to allow us to just dump this code when/if C++1x support starts appearing.
Drive-by nit: mozstd seems weird.  Everything else we are adding is in the mozilla namespace, and this adds a new top level one.
Comment on attachment 414176 [details] [diff] [review]
part 1, v1: adds js/src/threads and mozstd::thread::hardware_concurrency()



>+unsigned
>+hardware_concurrency()
>+{

indent is 4 spaces (no tabs)

>+  int nproc;
>+  size_t size = sizeof(nproc);
>+
>+  if (sysctlbyname("hw.ncpu", &nproc, &size, NULL, 0))
>+    return 1;
>+
>+  return nproc;
>+}
>+
>+} // namespace mozinternal

>+  if (!fp) {

Don't overbrace if there is a single consequence.

>+  if (line) {
>+    free(line);
>+  }

Those seem to be the most relevant style diffs to JS. However, I think this module should get a style waiver and do whatever chris prefers since its an external component (like nanojit). That's Brendan's call though.

I remember the name rationale and no objections there.
It looks like he's just following the Coding Style docs on MDC:
https://developer.mozilla.org/En/Developer_Guide/Coding_Style
In fact, this code was originally in JS style, and I adapted it for Gecko style before posting this patch.  But I personally prefer SM style, and Andreas has spoken, so ...
Comment on attachment 414176 [details] [diff] [review]
part 1, v1: adds js/src/threads and mozstd::thread::hardware_concurrency()

I'd like brendan to decide about namespace and style before I review... FWIW my opinion would be to use mozilla standard 2-space and either mozilla::thread or mozilla::std::thread
Attachment #414176 - Flags: review?(benjamin)
Chris, are you going to make it so that it is possible to do a
debug build using purely POSIX pthreads primitives, and no magic hacks
with atomic insns, lock free algorithms, or wotnot?  I ask because
the option to map this stuff purely onto pthreads will make it much
easier to use runtime race detection tools in future.

IME the #1 problem such tools have is "seeing" all the relevant
inter-thread synchronisation events; failure to do so leads floods
of false positives and general uselessness.
Could we just compile in some valgrind instrinsics that trigger the right happens-before relations?

Regarding style, I think this code should follow Coding_Style since it's not really JS code, it's new code, and new code follows the style guide; that is not optional. Maybe that means it shouldn't live under js/.

Namespace "mozilla::thread" or "mozilla::threads" sounds good to me.
For future reference, the plan would be to fall back on pthread-synchronized operations across the board, including atomic arithmetic.  My user repo already has fallback implementations of atomic arithmetic that use std::mutex, so we would just need to set a configure flag to make this happen.
(In reply to comment #21)
> Could we just compile in some valgrind instrinsics that trigger the right
> happens-before relations?
> 

Julian raised the point that these APIs are in their infancy and likely to change.  Using pthread sync would be more robust at first, and reduce annotation garbage.
roc, the reason to have it under js/ is that we would like to kill off the remaining ties to NSPR in JS and replace it with this.
AIUI, this code was going to live under js for two reasons: first, so standalone SpiderMonkey does not depend on anything (JS_THREADSAFE currently requires NSPR, but this is considered a misfeature); second, to avoid yet more DSO/DLL hell.

We can revisit either, but I'm fine with using the Mozilla style guide and not bothering about the variation. js/src/xpconnect uses its own style, and while js/src/nanojit happens to use 4-space c-basic-offset, it is from an upstream repo and its style is not as consistent as SpiderMonkey's anyway.

Agree with roc on wanting valgrind no-op hokey-pokey for atomic ops, not ifdef'ed layering on pthreads (which will be so slow and different I worry it won't find the races that matter).

/be
(In reply to comment #23)
> (In reply to comment #21)
> > Could we just compile in some valgrind instrinsics that trigger the right
> > happens-before relations?
> > 
> 
> Julian raised the point that these APIs are in their infancy and likely to
> change.  Using pthread sync would be more robust at first, and reduce
> annotation garbage.

We have been using CAS in JS for over ten years. Perhaps we'll just keep using our jslock.{cpp,h} code and part ways here, but if js/src/threads could host the one portable CAS interface and implementations, that would be better. We can't afford to stop using CAS in production builds, of course. Would valgrind's race detection tools really find the same races if we ran every CAS through a global lock, or address-keyed hashtable of locks?

/be
(In reply to comment #25)
> Agree with roc on wanting valgrind no-op hokey-pokey for atomic ops, not
> ifdef'ed layering on pthreads (which will be so slow and different I worry it
> won't find the races that matter).
>

I don't think it would be /that/ much slower ... in the common case, it would be approximately twice as expensive as a CAS (~two CAS's instead of one).  But in the contended case, it would indeed be much more expensive.

IIRC, helgrind is using hybrid lockset/happens-before detection (thanks roc!), so I don't think locked CAS will affect the "real" races reported.  IOW, I locked CAS shouldn't induce happens-before or add to the lockset for operations that are otherwise racy.
(In reply to comment #26)
> (In reply to comment #23)
> > (In reply to comment #21)
> > > Could we just compile in some valgrind instrinsics that trigger the right
> > > happens-before relations?
> > > 
> > 
> > Julian raised the point that these APIs are in their infancy and likely to
> > change.  Using pthread sync would be more robust at first, and reduce
> > annotation garbage.
> 
> We have been using CAS in JS for over ten years. Perhaps we'll just keep using
> our jslock.{cpp,h} code and part ways here, but if js/src/threads could host
> the one portable CAS interface and implementations, that would be better. We
> can't afford to stop using CAS in production builds, of course.

The C++1x spec has APIs for a whole host of atomic ops, and the semantics are well defined IMHO.  I'd prefer to switch jslock over to the js/threads code (which does use CAS on x86/x86-64/ARM).
I don't really want to switch from NSPR to a yet slower implementation based on pthreads. I think we can wait with switching our CAS code until the proper new implementation is in place.
(In reply to comment #29)
> I don't really want to switch from NSPR to a yet slower implementation based on
> pthreads. I think we can wait with switching our CAS code until the proper new
> implementation is in place.

Julian was asking about pthread-locked atomics for special DEBUG builds only.  Regular builds will use CAS on x86/x86-64/ARM.  If jslock has native atomics support for platforms that js/threads doesn't, I can pretty easily add them to js/threads.
(In reply to comment #26)
> We have been using CAS in JS for over ten years. Perhaps we'll just keep using
> our jslock.{cpp,h} code and part ways here, but if js/src/threads could host
> the one portable CAS interface and implementations, that would be better. We
> can't afford to stop using CAS in production builds, of course. Would
> valgrind's race detection tools really find the same races if we ran every CAS
> through a global lock, or address-keyed hashtable of locks?

We model CAS by ignoring the write and only observing the read,
thereby missing the case where a CAS races against a read.  However,
that might be preferable to the sequence
{ pth-mx-lock; x++; pth-mx-unlock; } since the latter induces 2 h-b
edges and makes the program look more ordered than it really is.

So perhaps an OK compromise is to allow CAS but force all other
inter-thread synch to be done via pthreads?  Remember, this is for
debugging only.  For production, there's no proposal at all to use
pthreads it's deemed undesirable.
Attachment #414176 - Flags: review?(brendan) → review?(lw)
Comment on attachment 414176 [details] [diff] [review]
part 1, v1: adds js/src/threads and mozstd::thread::hardware_concurrency()

I think Luke should bless this, esp. the C++[01]x purity but also the SpiderMonkey-ness apart from coding style. I'm ok with it following the Mozilla C++ style guide, as noted earlier.

/be
Attachment #414176 - Flags: review?(lw) → review+
Comment on attachment 414176 [details] [diff] [review]
part 1, v1: adds js/src/threads and mozstd::thread::hardware_concurrency()

Well, there is only a little to look at, but I like the direction this is heading with cleanly separating the platform/arch-specific stuff.  Looking forward to the rest.

>+unsigned
>+hardware_concurrency()
>+{
>+  int nproc;
>+  size_t size = sizeof(nproc);
>+
>+  if (sysctlbyname("hw.ncpu", &nproc, &size, NULL, 0))
>+    return 1;
>+
>+  return nproc;
>+}

The spec says "if the value is not computable or well defined and implementation should return 0".  Is there another reason to return 1?

Second, I think new tests in js can be public domain (which I hear has a 3 line license you can copy from newer js reftests), so perhaps you can set the precedent for the new test directory you are growing?  (I'm about the farthest thing from a lawyer, so maybe double check that :)

Also, a high-level question: are you planning to simulate rvalue references?
Comment on attachment 414176 [details] [diff] [review]
part 1, v1: adds js/src/threads and mozstd::thread::hardware_concurrency()

Chris: just curious -- is your sysctlbyname code in
js/src/threads/darwin/hardware_concurrency.h a better
way to get the number of processors than the _PR_HAVE_SYSCTL
code in PR_GetNumberOfProcessors?  See
http://mxr.mozilla.org/nspr/ident?i=PR_GetNumberOfProcessors

I'm wondering if I should copy your code into NSPR.

>+    // NSPR doesn't expose this information

You can call PR_GetNumberOfProcessors
(In reply to comment #33)
> >+unsigned
> >+hardware_concurrency()
> >+{
> >+  int nproc;
> >+  size_t size = sizeof(nproc);
> >+
> >+  if (sysctlbyname("hw.ncpu", &nproc, &size, NULL, 0))
> >+    return 1;
> >+
> >+  return nproc;
> >+}
> 
> The spec says "if the value is not computable or well defined and
> implementation should return 0".  Is there another reason to return 1?
> 

Nice catch!

> Second, I think new tests in js can be public domain (which I hear has a 3 line
> license you can copy from newer js reftests), so perhaps you can set the
> precedent for the new test directory you are growing?  (I'm about the farthest
> thing from a lawyer, so maybe double check that :)
> 

Interesting question, seems like a gray area.  The tests sort of depend on an open spec, but the implementation is "mozstd" ... probably best to ask a lawyer ;).

> Also, a high-level question: are you planning to simulate rvalue references?

Not atm, I've just been skipping over those interfaces.
(In reply to comment #34)
> (From update of attachment 414176 [details] [diff] [review])
> Chris: just curious -- is your sysctlbyname code in
> js/src/threads/darwin/hardware_concurrency.h a better
> way to get the number of processors than the _PR_HAVE_SYSCTL
> code in PR_GetNumberOfProcessors?  See
> http://mxr.mozilla.org/nspr/ident?i=PR_GetNumberOfProcessors
> 

Don't think so:

     The sysctlbyname() function accepts an ASCII representation of the name
     and internally looks up the integer name vector.  Apart from that, it
     behaves the same as the standard sysctl() function.

> >+    // NSPR doesn't expose this information
> 
> You can call PR_GetNumberOfProcessors

Gah!  I looked for this and failed to find it.  If I'd known it existed, I would have cribbed from here rather than searching Google ;).
(In reply to comment #35)
> Interesting question, seems like a gray area.  The tests sort of depend on an
> open spec, but the implementation is "mozstd" ... probably best to ask a lawyer
> ;).

The test code itself can by PD, regardless of what specification it implements or what it calls.  Please use the minimal PD license text for the tests!
With bug 577899 and bug 642381, code shared between Gecko/SpiderMonkey is starting to happen.  The code this bug covers would probably live in mfbt/threads, per current plans.  Some lingering issues from past discussion

 - I think there's value in differentiating code written to follow the C++0x spec from code that's not.  The latter kind of code already lives (will live) in mfbt/.  I would prefer to put the C++0x stuff in a namespace that includes the letters "std"; I still like mozstd, but I also don't want to hang up on this.

 - Regardless of the mozstd/mozilla::std/mozilla stuff that follows the C++0x spec, we might still want mozilla::Thread/etc. wrappers in mfbt when we need to break from the C++0x API.  One such break might be moving the xpcom/glue deadlock detector into mfbt/, and have it operate on mozilla::Mutex/etc. wrapped around mozstd::mutex/etc.

 - On valgrind-able platforms, std::mutex/etc. are implemented by pthread types.  No hand-rolled broken glorp.  I don't believe those impls will affect helgrind.  (If xpcom/glue/Mutex.h and PRLock don't affect helgrind on m-c tip, then the new regime wouldn't either.)

 - On valgrind-able platforms and architectures, the std:: atomics are implemented by inline assembly.  The x86/x86_64 instructions used are (of course!) the well-known cmpxchgl, xaddl, the *fences, etc.  (It's a bit more complicated on ARM.)  I don't fully understand the potential problems in helgrind, but we have many options available for helping out helgrind if it doesn't already understand the semantics of x86 atomics: annotation, fallback impls, calls to magic functions, ...  (ARM is more complicated.)  Maybe Julian and I should discuss this somewhere else.

 - C++0x specifies atomics with relaxed-consistency semantics.  There are implementations of these in mozstd.  I don't expect we'll need to use these in either SpiderMonkey or Gecko, but if we do, they will probably present problems for helgrind.  I'd like to cross that bridge when we come to it.
(In reply to comment #2)
> We chatted about this last week and decided to implement the C++1x
> specification for threads, mutexes, condvars, and atomics.  This will also
> pull in part of the C++1x time API since some methods allow one to wait for
> a specified duration of time.  I have a nascent pthreads implementation of
> <thread> and implementing the rest will be quite easy as the API is similar
> to pthreads'.  A Windows implementation will be harder, but bent has
> volunteered to do this.  We can borrow code from boost, Google, and NSPR
> where it makes sense to do so.

This code will live in js/src and be shared
> between Spidermonkey and Gecko.  Once/if our compilers support this part of
> the C++1x spec, we can just switch to their implementations.

It would be
> nice to take this code from boost wholesale, but boost uses exceptions and
> implements a superset of the C++1x spec.  Instead of throwing exceptions as
> the spec dictates, we will just abort() on errors.  This code is crucial
> enough that there's not much point in trying to recover from an error, and
> it might be counterproductive to do so anyway.

I'm not terribly familiar with the code under discussion here, but I thought I'd throw in that boost.lockfree looks like it's going to make it in to boost 1.47.0.  Documentation can be found here:

http://tim.klingt.org/boost_lockfree/

it might fit your needs.  it simulates atomics with available mechanisms (transparently) and provides three lockfree containers (fifo, ring buffer and stack, I believe).  HTH!

We briefly discussed how to
> handle unsupported platforms/architectures.  The option we tentatively
> settled on was #error'ing to force maintainers of other platforms to provide
> an impl.  We also have the option of wrapping the C++1x API around NSPR, as
> a slow and crufty fallback, and #warning that the fallback is being used.
Assignee: jones.chris.g → nobody
You need to log in before you can comment on or make changes to this bug.