Make Atoms threadsafe

RESOLVED FIXED in Firefox 49

Status

()

Core
XPCOM
RESOLVED FIXED
2 years ago
9 months ago

People

(Reporter: Bobby Holley (parental leave - send mail for anything urgent), Assigned: Bobby Holley (parental leave - send mail for anything urgent))

Tracking

(Blocks: 3 bugs)

unspecified
mozilla49
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox49 fixed)

Details

Attachments

(6 attachments, 3 obsolete attachments)

1.68 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
6.88 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
6.96 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
1.76 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
6.65 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
479 bytes, text/html
Details
This was removed in bug 387445, on the grounds that the atom table isn't threadsafe, and so atoms shouldn't be refcounted off the main thread either.

For stylo, we may be able to get away with doing any actual atomization off-main-thread, but we definitely need to refcount the atoms. This is a relatively small and uninvasive change.

Unittest run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b369c7269884
Talos run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3b62e64a76ee&selectedJob=21455153 [1]

I consulted with jmaher about the Talos results and he confirmed that it looked fine.

[1] https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=5511d54a3f172c1d68f98cc55dce4de1d0ba1b51&newProject=try&newRevision=3b62e64a76ee&framework=1&showOnlyImportant=0
Created attachment 8756603 [details] [diff] [review]
Make atoms refcountable off-main-thread. v1
Attachment #8756603 - Flags: review?(nfroyd)
Blocks: 1275766
How are you planning on handling the last release of an atom happening off the main thread?
(In reply to Jonas Sicking (:sicking) from comment #2)
> How are you planning on handling the last release of an atom happening off
> the main thread?

See the patch.
Though now I realize there's a race in there if somebody else grabs the entry in the mean time. Let me see if I can fix that.

Comment 5

2 years ago
Any measurements whether this affects to microbenchmarks?
(I just optimized out some atom getting because it was too slow, but sure, that was less about atom addref/release and more about hashtable usage.)
(In reply to Olli Pettay [:smaug] from comment #5)
> Any measurements whether this affects to microbenchmarks?

Not really.

> (I just optimized out some atom getting because it was too slow, but sure,
> that was less about atom addref/release and more about hashtable usage.)

Yeah, I'm pretty sure the hash table lookup would dominate. And realistically, of the only impact of this was a microbenchmark, we'd probably still want to land it.
Attachment #8756603 - Flags: review?(nfroyd)

Comment 7

2 years ago
Sure, but we do optimize rather heavily even for microbenchmarks in some cases.
Bug 1273481 as an example.

Comment 8

2 years ago
In other words, would be just good to know how much this affects in some microbenchmark.
Not objecting the change or anything.

Comment 9

2 years ago
A silly test which addrefs/releases atoms quite a bit
data:text/html,<script>window.mo = new MutationObserver(function(){}); function test() { for (var i = 0; i < 10000; ++i) { mo.observe(document, { attributes: true, attributeFilter: ["a", "b", "c", "d", "e", "f", "g", "h"]}); } console.log(new Date()); }; for (var i = 0; i < 100; ++i) { setTimeout(test, i * 50); }</script>
(that console.log() is just random debug output to see that something is happening.)
Discussed this a bit with sicking. We decided that it's probably worth just making atoms threadsafe in general, since it would make it easier to write code in Workers.

I have some patches which I'm running through try. They make the microbenchmark in comment 9 about 5% slower on average, which is almost certainly acceptable.
Summary: Give Atoms threadsafe refcounting → Make Atoms threadsafe
See Also: → bug 387445
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b369c7269884
https://treeherder.mozilla.org/#/jobs?repo=try&revision=19e3dff4870e&selectedJob=21481765
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=5511d54a3f17&newProject=try&newRevision=19e3dff4870e&framework=1
Created attachment 8756711 [details] [diff] [review]
Part 1 - Remove MOZ_DUMP_ATOM_LEAKS. v1

I don't think anyone is using this anymore. It would be good to assert that there
are no leaks, but that doesn't pass for me in a local build, and I don't have time
to chase it.
Attachment #8756711 - Flags: review?(nfroyd)
Created attachment 8756712 [details] [diff] [review]
Part 2 - Use an explicit init routine for the atom table. v1
Attachment #8756712 - Flags: review?(nfroyd)
Created attachment 8756713 [details] [diff] [review]
Part 3 - Protect gAtomTable with a lock. v1
Attachment #8756713 - Flags: review?(nfroyd)
Created attachment 8756714 [details] [diff] [review]
Part 4 - Use threadsafe refcounting for atoms and remove main-thread restrictions. v1
Attachment #8756714 - Flags: review?(nfroyd)
Attachment #8756603 - Attachment is obsolete: true
Comment on attachment 8756714 [details] [diff] [review]
Part 4 - Use threadsafe refcounting for atoms and remove main-thread restrictions. v1

Review of attachment 8756714 [details] [diff] [review]:
-----------------------------------------------------------------

Ugh, there's still a race here, between the threadsafe refcount decrement and the acquisition of the lock to destroy. I'll think about this some more.
Attachment #8756714 - Flags: review?(nfroyd)
Comment on attachment 8756711 [details] [diff] [review]
Part 1 - Remove MOZ_DUMP_ATOM_LEAKS. v1

Review of attachment 8756711 [details] [diff] [review]:
-----------------------------------------------------------------

Works for me.
Attachment #8756711 - Flags: review?(nfroyd) → review+
Attachment #8756712 - Flags: review?(nfroyd) → review+
Attachment #8756713 - Flags: review?(nfroyd) → review+
As for the last patch - I talked about it with bz and smaug on IRC, and the current plan is to try a GC mechanism whereby AddRef/Release just twiddle refcounts, and we periodically call into the (locked) atom table to evict entries whose refcount is zero.

We can probably keep a counter on the number of times we've Release()-ed and found refcount zero, and perform a GC when that counter gets high (and on debug-build shutdown). We can probably make it really high and GC pretty infrequently - waiting for 10000 possibly-dead atoms may be fine (with a smaller threshold in debug builds to make sure the code gets exercised).

I may not get to this before PTO though.
Ok, I managed to get this working this afternoon. Overhead on the microbenchmark is about 10-12% (presumably from the atomics usage), which is probably acceptable. I'll post patches.
Created attachment 8757097 [details] [diff] [review]
Part 4 - Use a GC scheme to free unused atoms. v1

Please review carefully.
Attachment #8757097 - Flags: review?(nfroyd)
Attachment #8756714 - Attachment is obsolete: true
Created attachment 8757098 [details] [diff] [review]
Part 5 - Remove main-thread restrictions on atoms. v1
Attachment #8757098 - Flags: review?(nfroyd)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fff250b1689d
https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=fff250b1689d
Looks like we also run leak checking in ASAN builds, so need to GC there on shutdown as well.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7d8b71f5a8ef
Created attachment 8757288 [details] [diff] [review]
Part 4 - Use a GC scheme to free unused atoms. v2
Attachment #8757288 - Flags: review?(nfroyd)
Attachment #8757097 - Attachment is obsolete: true
Attachment #8757097 - Flags: review?(nfroyd)
Please use NS_FREE_PERMANENT_DATA instead of defined(DEBUG) || defined(MOZ_ASAN). (Currently it is defined to be the same thing, but it conveys the intent better, and will ensure this keeps working in case we add some other testing that cares about leaks.)
(In reply to Andrew McCreight [:mccr8] from comment #27)
> Please use NS_FREE_PERMANENT_DATA instead of defined(DEBUG) ||
> defined(MOZ_ASAN). (Currently it is defined to be the same thing, but it
> conveys the intent better, and will ensure this keeps working in case we add
> some other testing that cares about leaks.)

Ok. And maybe I'll also set the opt threshold to something lower, since it will probably improve hashtable performance to a large number of dead entries. Maybe 500 or 1000? What do you think Nathan?
(In reply to Bobby Holley (PTO through June 13) from comment #28)
> And maybe I'll also set the opt threshold to something lower, since it
> will probably improve hashtable performance to a large number of dead
> entries. Maybe 500 or 1000? What do you think Nathan?

The first sentence is trying to say that a lower threshold would improve hashtable performance, because we wouldn't retain so many dead entries over time and we'd thus have a smaller (or at least a more lightly-loaded) hashtable, correct?  That sounds plausible.
(In reply to Nathan Froyd [:froydnj] from comment #29)
> (In reply to Bobby Holley (PTO through June 13) from comment #28)
> > And maybe I'll also set the opt threshold to something lower, since it
> > will probably improve hashtable performance to a large number of dead
> > entries. Maybe 500 or 1000? What do you think Nathan?
> 
> The first sentence is trying to say that a lower threshold would improve
> hashtable performance, because we wouldn't retain so many dead entries over
> time and we'd thus have a smaller (or at least a more lightly-loaded)
> hashtable, correct?  That sounds plausible.

Correct.
Comment on attachment 8757288 [details] [diff] [review]
Part 4 - Use a GC scheme to free unused atoms. v2

Review of attachment 8757288 [details] [diff] [review]:
-----------------------------------------------------------------

A couple comments, but this looks OK.  I think the microbenchmark slowdown is just because the refcount handling in our mutation observer implementation leaves something to be desired; I think we could make that microbenchmark faster.

::: xpcom/ds/nsAtomTable.cpp
@@ +64,5 @@
>  };
>  
>  //----------------------------------------------------------------------
>  
> +static Atomic<uint32_t> gUnusedAtomCount(0);

I think we can get away with making this Atomic<uint32_t, ReleaseAcquire>, which would make things a little cheaper...at least on x86oids (no memory barrier required on store, and I'm pretty sure it requires fewer barriers on ARM, too).

@@ +71,5 @@
>  {
>  public:
>    DynamicAtom(const nsAString& aString, uint32_t aHash)
>    {
> +    ++gUnusedAtomCount;

I think this deserves a comment...because I'm not completely sure what this is here for.  I think it's there because this can now happen with our GC'ing scheme:

1. Atom A drops to zero refcount; gUnusedAtomCount is incremented.
2. A GC is *not* triggered.
3. Somebody comes along and re-atomizes a string corresponding to A.
4. We find A still in our hashtable, even with a zero refcount.
5. We AddRef A.
6. To be consistent, we now have to decrement gUnusedAtomCount.

And so:

7. To balance out the case where we're creating an atom, we need this increment here.

Is that correct?  If so, OK, that makes sense.  But the increment here just to decrement in AddRef for the initial construction seems like needless work.  How about we do this:

1. Make this constructor private.
2. Change the constructor so that it starts like:

  DynamicAtom(const nsAString&, uint32_t) : mRefCnt(1)

3. Add a public static Create function:

  already_AddRefed<DynamicAtom>
  DynamicAtom::Create(const nsAString& aString, uint32_t aHash)
  {
    // The refcount is appropriately initialized in the constructor.
    return dont_AddRef(new DynamicAtom(aString, aHash));
  }

4. Replace |new DynamicAtom| calls with calls to Create().
5. Put a comment describing the above scenario in AddRef, so that the comment is close to the code that needs describing.

This is admittedly gnarly microoptimization, since atoms aren't created all that often (a non-rigorous stress test of popular websites only created about 4K atoms).

@@ +393,5 @@
>  
> +void
> +DynamicAtom::GCAtomTable()
> +{
> +  gAtomTableLock->AssertCurrentThreadOwns();

Let's move the lock acquisition into this function.  Or make the function take a |const MutexAutoLock&| parameter.

@@ +402,5 @@
> +      if (atom->mRefCnt == 0) {
> +        i.Remove();
> +        delete atom;
> +        MOZ_ASSERT(gUnusedAtomCount > 0);
> +        --gUnusedAtomCount;

In the interest of micro-optimization--as operations on gUnusedAtomCount are relatively expensive--why don't we do something like:

uint32_t removedCount = 0;

for (...) {
  ...
  if (atom->mRefCnt == 0) {
    ...
    removedCount++;
  }
}

MOZ_ASSERT(removedCount <= gUnusedAtomCount);
gUnusedAtomCount -= removedCount;
Attachment #8757288 - Flags: review?(nfroyd) → review+
Attachment #8757098 - Flags: review?(nfroyd) → review+
the mutationobserver case is just something where I knew we are using nsIAtoms so that it gets exposed to the web.

(it is just that some/most real world benchmarks contains that kind of totally silly microbenchmarks. 
bug 1273481 being an example of such: the test is testing the performance of a no-op method call.)
setAttribute performance might be more important than MutationObserver.observe.
That is a case which we have optimized heavily.
Created attachment 8757475 [details]
set/removeAttribute microbenchmark

Comment 35

2 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/5bab999902fc
https://hg.mozilla.org/integration/mozilla-inbound/rev/2ec4d49ae88d
https://hg.mozilla.org/integration/mozilla-inbound/rev/10ce0f01ee4a
https://hg.mozilla.org/integration/mozilla-inbound/rev/bec37fdc70f9
https://hg.mozilla.org/integration/mozilla-inbound/rev/de635a6b22cf
(In reply to Nathan Froyd [:froydnj] from comment #31)
> > +static Atomic<uint32_t> gUnusedAtomCount(0);
> 
> I think we can get away with making this Atomic<uint32_t, ReleaseAcquire>,
> which would make things a little cheaper...at least on x86oids (no memory
> barrier required on store, and I'm pretty sure it requires fewer barriers on
> ARM, too).

One other thought here is that this patch is using ThreadSafeAutoRefCnt for DynamicAtom's refcount, which uses sequential consistency not because refcounting really needs it, but because it could break pre-existing code that was depending on the memory barriers for other things.  Perhaps DynamicAtom's reference count could be ReleaseAcquire rather than SequentiallyConsistent.  Or, even more efficient, no memory barrier for AddRef, Release(ordering) semantics for Release(refcount)-to-nonzero, and Acquire semantics for Release(refcount)-to-zero, which IIRC is what's needed for threadsafe reference counting.

Or, better -- Atoms are special, since they're immutable once they're created, and for immutable objects, I actually don't see why reference counting can't just use Relaxed semantics (since the reason for the need for release and acquire semantics, is, as I understand it, the need for the object's destruction on a given thread to take into account all writes to the object and things it references on other threads before the object's last Release(refcounting) on those threads).  Somebody should definitely check this idea, though.

And, additionally, I don't see why gUnusedAtomCount can't use Relaxed semantics -- even without any of those immutability assumptions.  It seems like the key thing that's needed there is that there's a memory barrier from unlocking/locking the AtomTableLock, and when the atom table hands out a new atom, it does the initial AddRef of that atom on the thread on which it's handing it out while holding the atom table lock (which NS_Atomize does), which in turn assures that DynamicAtom::GCAtomTable (which holds the atom table lock) sees that reflected in the reference count.

Not sure if this is worth filing followups for or if it's micro-optimization that won't actually make a difference.

Comment 37

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/5bab999902fc
https://hg.mozilla.org/mozilla-central/rev/2ec4d49ae88d
https://hg.mozilla.org/mozilla-central/rev/10ce0f01ee4a
https://hg.mozilla.org/mozilla-central/rev/bec37fdc70f9
https://hg.mozilla.org/mozilla-central/rev/de635a6b22cf
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox49: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
Blocks: 1277711
Blocks: 1277713
I turned comment 36 into bug 1277709, bug 1277711, and bug 1277713.
If I understand this bug correctly, I think we can now remove the static atoms table. It only existed to allow some atoms (used by the HTML parser) to be looked up off the main thread, but now we can do that for any atom. See bug 529808 comment 2 and onwards.
(In reply to Nicholas Nethercote [:njn] from comment #39)
> If I understand this bug correctly, I think we can now remove the static
> atoms table. It only existed to allow some atoms (used by the HTML parser)
> to be looked up off the main thread, but now we can do that for any atom.
> See bug 529808 comment 2 and onwards.

I think it will also help with bug 1269490.
Blocks: 1269490

Updated

9 months ago
Depends on: 1351303
You need to log in before you can comment on or make changes to this bug.