Closed Bug 884676 Opened 7 years ago Closed 6 years ago

Use mozilla::Atomic instead of JS_ATOMIC_*

Categories

(Core :: JavaScript Engine, defect)

x86_64
Windows 7
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla28

People

(Reporter: jcranmer, Assigned: jcranmer)

References

Details

(Whiteboard: [qa-])

Attachments

(4 files, 6 obsolete files)

The patch here is almost simple:
1. Make ThreadPool::nextId_, JSRuntime::interrupt [1], and JSPrincipals::refcount atomic.
2. Delete JS_ATOMIC_* macros.

[If speed is an issue, nextId_ appears to be Relaxed memory ordering, while interrupt and refcount could be ReleaseAcquire instead of SequentiallyConsistent. But SequentiallyConsistent should square away with what we have now.]

Complications:
1. This means we'd have atomic barriers even if JS_THREADSAFE is not defined. Is this a big deal?
2. Atomic is not copy-constructable by design. There are several people who do:
JSPrincipals gStaticPrincipal = { 1 };
to construct a principal that won't ever get released. Under C++11, that won't compile, since this invokes a copy-initialization of the atomic [2]. In my work-in-progress patch, I've moved to setting the value somewhere before it gets used for the first time (WorkerPrincipal is the hardest). An alternative idea is to provide a constructor:
JSPrincipals(uint32_t initRefcount = 0, uint32_t debugToken = 0).

[1] volatile? For cross-thread memory visibility? Evil! Bad! Broken! DIE!
[2] mozilla::Atomic declares the wrong deleted copy constructor, using Atomic(Atomic&) insead of Atomic(const Atomic&). Still causes the copy constructor to not exist.


Thoughts?
Blocks: 884068
No longer blocks: 884281
My initial thoughts are you should attach the patch.  :-)

We're moving to a world where JS_THREADSAFE is always on (well, the effect is that -- obviously we'll just remove the JS_THREADSAFE knob at that point), so there should be no worries about adding in atomic-ops ops to those builds.

The only hits I see for static JSPrincipals are in testXDR.cpp, which we should feel free to change as needed.  I think, if that's an issue, we should just kill off the possibility of having statically allocated JSPrincipals.

http://mxr.mozilla.org/mozilla-central/search?string=jsprincipal&find=&findi=&filter=^[^\0]*%24&hitlimit=&tree=mozilla-central

JSRuntime::interrupt is (if memory serves) one of those candidates for relaxed ordering.  Sure, volatile alone is dumb, but it also sort of worked.  :-)
Attached patch Prototype patch (obsolete) — Splinter Review
This shows the biggest issue with the JSPrincipals gunk.
Attachment #764884 - Flags: feedback?(jwalden+bmo)
Attached patch Prototype patch (obsolete) — Splinter Review
Forgot to qref
Attachment #764884 - Attachment is obsolete: true
Attachment #764884 - Flags: feedback?(jwalden+bmo)
Attachment #764886 - Flags: feedback?(jwalden+bmo)
Attached patch Replace all uses (obsolete) — Splinter Review
I don't know who a good reviewer for this patch is, so if you don't want to, redirect to someone else.
Attachment #764886 - Attachment is obsolete: true
Attachment #764886 - Flags: feedback?(jwalden+bmo)
Attachment #770919 - Flags: review?(jwalden+bmo)
Comment on attachment 770919 [details] [diff] [review]
Replace all uses

Review of attachment 770919 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/Principal.cpp
@@ +11,5 @@
>  
>  namespace {
>  
> +static Atomic<uint32_t> initialized(0);
> +JSPrincipals gPrincipal;

These both can/should be statics in GetWorkerPrincipal, I believe.

::: js/src/jsapi-tests/testChromeBuffer.cpp
@@ +3,5 @@
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
>  #include "jsapi-tests/tests.h"
>  
> +JSPrincipals system_principals;

Move this down next to the testChromeBuffer block, as a local.

::: js/src/jsapi-tests/testOriginPrincipals.cpp
@@ +13,5 @@
>      sOriginPrincipalsInErrorReporter = report->originPrincipals;
>  }
>  
> +JSPrincipals prin1;
> +JSPrincipals prin2;

Don't see an obvious immediate way to not statically initialize these, so okay, good enough.

::: js/src/jsapi-tests/testXDR.cpp
@@ +80,5 @@
>  BEGIN_TEST(testXDR_principals)
>  {
> +    for (size_t i = 0; i < sizeof(testPrincipals)/sizeof(testPrincipals[0]); ++i) {
> +        testPrincipals[i].refcount = 1;
> +    }

Make the array a local in this BEGIN_TEST(testXDR_principals) block, and use ArrayLength in mozilla/Util.h:

    const JSPrincipals testPrincipals[2];
    for (size_t i = 0; i < ArrayLength(testPrincipals); i++)
        testPrincipals[i].refcount = 1;

::: js/src/jsapi.h
@@ +3704,5 @@
>   * Security protocol.
>   */
>  struct JSPrincipals {
>      /* Don't call "destroy"; use reference counting macros below. */
> +    mozilla::Atomic<int> refcount;

Instead of int here, please use uint32_t so that the type is definitely one Atomic supports.  (It looks like some of the browser users expect this to be a uint32_t, so no uintptr_t for now.)

Please file a followup to encapsulate refcount so that *only* JSPrincipals can use it.  Right now, because this is public, anyone can touch it, and quite possibly does -- which means we can't (at least not without more auditing than I'd like to do here, or than you want to do to make progress here) make it use ReleaseAcquire ordering.

::: js/src/jscntxt.h
@@ +671,5 @@
>      /*
>       * If non-zero, we were been asked to call the operation callback as soon
>       * as possible.
>       */
> +    mozilla::Atomic<int32_t> interrupt;

Yeah, I'm not sure about memory ordering on this, looking slightly harder.  Let's stick to sequentially-consistent for now.

::: js/src/shell/js.cpp
@@ +5155,5 @@
>          *outFile = defaultOut;
>      }
>  }
>  
> +JSPrincipals shellTrustedPrincipals;

Make this a local inside main().

::: js/src/vm/ThreadPool.cpp
@@ +293,5 @@
>      if (!lazyStartWorkers(cx))
>          return false;
>  
>      // Find next worker in round-robin fashion.
> +    size_t id = ++nextId_ % numWorkers();

So, actually, it looks like ThreadPool::submitOne is dead code.  Nobody calls it that I can see.  Which means nextId_ is also dead code.  Could you get rid of the ThreadPool changes from the patch and kill off this field/method in another patch, either in this bug or another one?
Attachment #770919 - Flags: review?(jwalden+bmo) → feedback+
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #5)
> Please file a followup to encapsulate refcount so that *only* JSPrincipals
> can use it.  Right now, because this is public, anyone can touch it, and
> quite possibly does -- which means we can't (at least not without more
> auditing than I'd like to do here, or than you want to do to make progress
> here) make it use ReleaseAcquire ordering.

Bug 892847.
Since I want to move JS_ATOMIC* altogether, this needs to come first.
Attachment #775234 - Flags: review?(jwalden+bmo)
Attached patch Replace all other uses (obsolete) — Splinter Review
Attachment #770919 - Attachment is obsolete: true
Attachment #775235 - Flags: review?(jwalden+bmo)
Comment on attachment 775234 [details] [diff] [review]
Delete ThreadPool::submitOne

Review of attachment 775234 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/ThreadPool.h
@@ +52,1 @@
>  // case, the job will be executed by all worker threads.  This does

First sentence should be

// The way to submit a job is using |submitAll()|, executing the job on all worker threads.

or so, because it still reads as if there were some other way to submit a job.
Attachment #775234 - Flags: review?(jwalden+bmo) → review+
Comment on attachment 775235 [details] [diff] [review]
Replace all other uses

Review of attachment 775235 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsapi-tests/testXDR.cpp
@@ +79,5 @@
>  {
> +    JSPrincipals testPrincipals[2];
> +    for (size_t i = 0; i < mozilla::ArrayLength(testPrincipals); ++i) {
> +        testPrincipals[i].refcount = 1;
> +    }

JS style doesn't brace single-liners like this.  Also, put a |using mozilla::ArrayLength;| at the top, underneath all the includes, and get rid of the qualification.

::: js/src/jsapi.cpp
@@ +4710,5 @@
>  
>  JS_PUBLIC_API(void)
>  JS_DropPrincipals(JSRuntime *rt, JSPrincipals *principals)
>  {
> +    int rc = --principals->refcount;

Make that uint32_t for consistency.
Attachment #775235 - Flags: review?(jwalden+bmo) → review+
I spent some time duplicating this work yesterday, because I didn't realize this bug existed.  FWIW:

- I did this for each relevant field:

+#ifdef JS_THREADSAFE
+    mozilla::Atomic<int32_t> interrupt;
+#else
+    int32_t interrupt;
+#endif

- Also, I introduced a TestJSPrincipals class that allowed all the static principals in the tests to stay static.

+// Just a wrapper around JSPrincipals that allows static construction.
+class TestJSPrincipals : public JSPrincipals
+{
+  public:
+    TestJSPrincipals(int rc = 0)
+      : JSPrincipals()
+    {
+        refcount = rc;
+    }
+};
Reland part 1 after verifying on try that the regressions were due to part 2:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9b130bbc0f39
Whiteboard: [leave open]
Attachment #775234 - Flags: checkin+
To better find out why the original patch failed, I split out the changes for js::Runtime::interrupt from those for the principals. This appears to pass on try, so I'll try landing this as well.
Attachment #775235 - Attachment is obsolete: true
Attachment #790190 - Flags: review?(jwalden+bmo)
Comment on attachment 790190 [details] [diff] [review]
atomic-rt-interrupt

Review of attachment 790190 [details] [diff] [review]:
-----------------------------------------------------------------

Stealing review.  r=me.  Bonus points if you fix the indentation of the trailing '\' chars in CHECK_BRANCH in js/src/vm/Interpreter.cpp.

::: js/src/vm/Runtime.h
@@ +690,5 @@
>      /*
>       * If non-zero, we were been asked to call the operation callback as soon
>       * as possible.
>       */
> +    mozilla::Atomic<int32_t> interrupt;

I had this here:

+#ifdef JS_THREADSAFE
+    mozilla::Atomic<int32_t> interrupt;
+#else
+    int32_t interrupt;
+#endif
+    static_assert(sizeof(interrupt) == sizeof(int32_t),
+                  "|interrupt| must be treatable as int32_t for the JITS");

The static_assert may be overkill, but I think the #ifdef is reasonable.
Attachment #790190 - Flags: review?(jwalden+bmo) → review+
Here's my version of the remainder, FWIW.  I used |{ 1 }| to initialize the
JSPrincipals in dom/workers/Principal.cpp, but I'm not certain if that's
legitimate and portable.  If it is, it could be used to initialize all the
other cases too, removing the need for TestJSPrincipals.
Assignee: Pidgeot18 → n.nethercote
Assignee: n.nethercote → Pidgeot18
According to AWFY, changeset 66e1ed80ba05 has regressed octane-regexp with ±17%
If the AWFY builds are threadsafe, I can sort of understand how this might have caused regressions.  If they are not threadsafe (as is the default for the JS shell), I cannot understand.
AWFY does build/test a threadsafe shell (so, e.g., we can see the benefits of background compilation).

I also noticed that part 2 is blamed for a 7.7% regression in ss-unpack-code, a 5.5% regression in ss-tagcloud, and a 14.9% regression in ss-fasta leading to an overall 2% regression on SunSpider.  Although it's not in the patch hunks, we check rt->interrupt on basically every loop backedge in the VM (JS_CHECK_OPERATION_LIMIT) so any benchmark that spends time in the VM is regressed.

2% is non-trivial, so I think that means we need to back out part 2.
(In reply to Luke Wagner [:luke] from comment #23)
> AWFY does build/test a threadsafe shell (so, e.g., we can see the benefits
> of background compilation).
> 
> I also noticed that part 2 is blamed for a 7.7% regression in
> ss-unpack-code, a 5.5% regression in ss-tagcloud, and a 14.9% regression in
> ss-fasta leading to an overall 2% regression on SunSpider.  Although it's
> not in the patch hunks, we check rt->interrupt on basically every loop
> backedge in the VM (JS_CHECK_OPERATION_LIMIT) so any benchmark that spends
> time in the VM is regressed.
> 
> 2% is non-trivial, so I think that means we need to back out part 2.

There is one trick to reduce the regression: reduce the memory consistency of rt->interrupt. I'm generally hesitant to do this, because it makes reasoning about correctness a lot harder, but since we're in a perf hotspot here, there would be a win.

The two models that we have for use for us are release-acquire and relaxed. I'm not fluent enough in the JS engine to know what is expected, but release-acquire guarantees that every memory operation made by a thread that sets rt->interrupt is seen by a thread that reads that value from rt->interrupt; relaxed makes no guarantee about other memory operands. On x86, they should be equivalent, although our barrier intrinsics in non-<atomic> mode may be too strong.
(In reply to Joshua Cranmer [:jcranmer] from comment #24)
Relaxed is what, effectively, the JS engine was assuming up until the patch: we're not expecting to read anything from the interrupting thread; the interrupting thread is just setting a flag that says "when you see this flag, stop and call the operation callback".
Not a hard patch to write, but confirming that this fixes the regression before landing would be helpful.
Comment on attachment 790793 [details] [diff] [review]
Make rt->interrupt relaxed atomic

Jan says its a speedup; let's see what happens on awfy!
Attachment #790793 - Flags: review+
(In reply to Joshua Cranmer [:jcranmer] from comment #24)
> (In reply to Luke Wagner [:luke] from comment #23)
> > AWFY does build/test a threadsafe shell (so, e.g., we can see the benefits
> > of background compilation).
> > 
> > I also noticed that part 2 is blamed for a 7.7% regression in
> > ss-unpack-code, a 5.5% regression in ss-tagcloud, and a 14.9% regression in
> > ss-fasta leading to an overall 2% regression on SunSpider.  Although it's
> > not in the patch hunks, we check rt->interrupt on basically every loop
> > backedge in the VM (JS_CHECK_OPERATION_LIMIT) so any benchmark that spends
> > time in the VM is regressed.
> > 
> > 2% is non-trivial, so I think that means we need to back out part 2.
> 
> There is one trick to reduce the regression: reduce the memory consistency
> of rt->interrupt. I'm generally hesitant to do this, because it makes
> reasoning about correctness a lot harder, but since we're in a perf hotspot
> here, there would be a win.
> 
> The two models that we have for use for us are release-acquire and relaxed.
> I'm not fluent enough in the JS engine to know what is expected, but
> release-acquire guarantees that every memory operation made by a thread that
> sets rt->interrupt is seen by a thread that reads that value from
> rt->interrupt; relaxed makes no guarantee about other memory operands. On
> x86, they should be equivalent, although our barrier intrinsics in
> non-<atomic> mode may be too strong.

The barrier intrinsics for Linux/Android could use some x86-specific tuning if anybody was so inclined.  The barrier intrinsics for Windows should be Just Right.
Woohoo!  The awfy tests were a little hung up because of an unrelated bug, but it looks like the sunspider regression was fixed.  (There is also different speedup in a preceding cset, so it looks like we are now tied.)
But there was an 18% regression on Octane-pdfjs in the same test run :(
Changelog is http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=9c90bda44992&tochange=f25d46b4f39f... is that caused by a different change?
Also, I forgot to mention, the octane-RegExp regression reported above was also fixed.

Nick: I don't see the regression you are referring to.  Is this on awfy?  32-bit OSX is smooth for that range (and 64-bit is temporarily broken for unrelated reasons).
I see the pdfjs regression at
http://www.arewefastyet.com/#machine=11&view=breakdown&suite=octane for rev f25d46b4f39f, which is the same rev that I see octane-regexp un-regress.  That's 32-bit OS X (Mac Pro).

But 
http://www.arewefastyet.com/#machine=12&view=breakdown&suite=octane (64-bit OS X, Mac Pro) sees pdfjs regress and then immediately unregress... and the revs being measured appear to be different.  The 32-bit and 64-bit minis look the same.  So maybe that was a short-lived, unrelated regression.
Ohhh, it's on the GGC builds, which I usually hide.
So it is.  Sorry for the noise -- I think we can say the Atomics change is looking good.
Here's a new version of the ref-count removal patch that merges the best parts
of jcranmer and my patches.  Still lots of test failures on try:
https://tbpl.mozilla.org/?tree=Try&rev=80ac271cb7fb.
Assignee: Pidgeot18 → n.nethercote
This looks like it passes try: <https://tbpl.mozilla.org/?tree=Try&rev=17510d7dbefe> (some other patches dealing with prbit.h are also applied in that run).
Assignee: n.nethercote → Pidgeot18
Attachment #792660 - Attachment is obsolete: true
Attachment #828355 - Flags: review?(jwalden+bmo)
Comment on attachment 828355 [details] [diff] [review]
Convert JSPrincipals to mozilla::Atomic

Review of attachment 828355 [details] [diff] [review]:
-----------------------------------------------------------------

Nice.  Sorry for the reviewing delay.  :-(

::: js/src/jsapi-tests/testChromeBuffer.cpp
@@ +5,5 @@
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
>  #include "jsapi-tests/tests.h"
>  
> +TestJSPrincipals system_principals(1);

Should this be static?
Attachment #828355 - Flags: review?(jwalden+bmo) → review+
https://hg.mozilla.org/mozilla-central/rev/c4883720cfb3
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.