Closed Bug 979293 Opened 6 years ago Closed 5 years ago

TSan: Off-thread parsing races the main thread accessing the atom table (data race js/HashTable.h:715 isFree)

Categories

(Core :: JavaScript Engine, defect, critical)

x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox39 --- fixed

People

(Reporter: decoder, Assigned: njn)

References

(Blocks 1 open bug)

Details

(Whiteboard: [tsan])

Attachments

(3 files, 2 obsolete files)

The attached logfile shows a thread/data race (mozilla-central revision 626d99c084cb) detected by TSan (ThreadSanitizer).

Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause inacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1].

If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist.

[1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
Component: General → JavaScript Engine
Summary: TSan: data race objdir-ff-tsan64opt/js/src/../../dist/include/js/HashTable.h:715 isFree → TSan: data race js/HashTable.h:715 isFree
Method names in the stack suggest off-thread parsing makes read-only accesses to the main thread's atom table, so in some loose theory that might be sufficient to be "safe" here.  If our hash table is thread-safe in the reading sense.  Which it might be, if it used thread-safe operations to update/access hash table state.  But it doesn't, so this is kind of asking for trouble, from appearances here.
Summary: TSan: data race js/HashTable.h:715 isFree → TSan: Off-thread parsing races the main thread accessing the atom table (data race js/HashTable.h:715 isFree)
This race is so common that it prevents mochitests from starting up under TSan due to hundreds of thousands of lines of output.  I appreciate that the function was named differently and used as such, but wouldn't it be nice if it did what was advertised on the tin?
js::HashTable lookup sounds like a read-only operation but it can modify the collision bit in |keyHash|, which is what TSAN is complaining about.

It feels like we want to have a way to freeze a js::HashTable and then it wouldn't do stuff with collision bits and there'd be extra checks to make sure you don't add or anything. We could even introduce a new type (js::FrozenHashTable) and provide a way to transfer the entryStorage from a js::HashTable to the js::FrozenHashTable.
(In reply to Nicholas Nethercote [:njn] from comment #3)
> js::HashTable lookup sounds like a read-only operation but it can modify the
> collision bit in |keyHash|, which is what TSAN is complaining about.
> 
> It feels like we want to have a way to freeze a js::HashTable and then it
> wouldn't do stuff with collision bits and there'd be extra checks to make
> sure you don't add or anything. We could even introduce a new type
> (js::FrozenHashTable) and provide a way to transfer the entryStorage from a
> js::HashTable to the js::FrozenHashTable.

I think TSan would still complain about this; at least in local testing, the stacks I get look something like:

  Read of size 4 at 0x7db000021350 by thread T41:
    #0 isFree /opt/build/froydnj/build-tsan/js/src/../../dist/include/js/HashTable.h:722 (libxul.so+0x0000049db986)
    #1 lookup /opt/build/froydnj/build-tsan/js/src/../../dist/include/js/HashTable.h:1264 (libxul.so+0x0000049db949)
    #2 readonlyThreadsafeLookup /opt/build/froydnj/build-tsan/js/src/../../dist/include/js/HashTable.h:1554 (libxul.so+0x0000049d77c2)
    #3 DefineProperty /home/froydnj/src/gecko-dev.git/js/src/jsapi.cpp:2597 (libxul.so+0x000004e69b8e)
    ...more frames
  Previous write of size 4 at 0x7db000021350 by main thread:
    #0 setCollision /opt/build/froydnj/build-tsan/js/src/../../dist/include/js/HashTable.h:729 (libxul.so+0x0000049db9be)
    #1 lookup /opt/build/froydnj/build-tsan/js/src/../../dist/include/js/HashTable.h:1257 (libxul.so+0x0000049db919)
    #2 readonlyThreadsafeLookup /opt/build/froydnj/build-tsan/js/src/../../dist/include/js/HashTable.h:1554 (libxul.so+0x0000049daab3)
    #3 getTokenInternal /home/froydnj/src/gecko-dev.git/js/src/frontend/TokenStream.cpp:1245 (libxul.so+0x000004ad732b)
    ...more frames

And TSan would still see that write (assuming we Move()'d into the FrozenHashTable) or the write from a copy (if we didn't Move(), which would be odd, IMHO) as coming before the read.  But having FrozenHashTable would at least make a place where it'd be a lot more convenient to MOZ_TSAN_BLACKLIST things, since it would be easier to explain what we're doing and we wouldn't be silently blacklisting away potential future races with hashtables.
Hmm, yes. Helgrind has annotations that I think can help with this sort of thing, esp. 
ANNOTATE_HAPPENS_BEFORE and ANNOTATE_HAPPENS_AFTER (http://www.valgrind.org/docs/manual/hg-manual.html#hg-manual.client-requests, but /usr/include/valgrind/helgrind.h has better documentation).

It looks like TSAN might support the same annotations, see https://code.google.com/p/data-race-test/wiki/DynamicAnnotations.

Also, the benefit of having js::FrozenHashTable is that we get to use the type system to enforce immutability.
(In reply to Nicholas Nethercote [:njn] from comment #5)
> 
> It looks like TSAN might support the same annotations, see
> https://code.google.com/p/data-race-test/wiki/DynamicAnnotations

The page you are referring to is about TSan v1 (the valgrind based one), which is deprecated. We are using TSan v2 which probably doesn't have such annotations by design (as far as I can tell).
It looks like TSan v2 does have something like ANNOTATE_HAPPENS_BEFORE, since they have tests for it:

http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/test/tsan/annotate_happens_before.cc?view=markup

(Thanks to Christian for pointing this out.)

So maybe we could use something like that.
QA Contact: n.nethercote
One silly thing is that even when HashTable doesn't need to set a collision bit -- i.e.
when doing a lookup that doesn't precede an add -- it does a `|=` with a zero
RHS, i.e. a no-op write. This patch avoids that, and while it doesn't
solve the broader problem I think it's a necessary part of any solution.

froydnj, can you try it and see if it makes TSan complain less? Thank you.
Attachment #8568987 - Flags: feedback?(nfroyd)
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
Comment on attachment 8568987 [details] [diff] [review]
Don't write collision bits unnecessarily

With this patch applied, I get zero warnings about hashtable collision races.
Attachment #8568987 - Flags: feedback?(nfroyd) → feedback+
QA Contact: n.nethercote
Comment on attachment 8568987 [details] [diff] [review]
Don't write collision bits unnecessarily

Review of attachment 8568987 [details] [diff] [review]:
-----------------------------------------------------------------

> With this patch applied, I get zero warnings about hashtable collision races.

Great! I was surprised to hear this because we still have the situation where the table is set up (i.e. written to a bunch of times) on one thread and then read from a bunch of times from multiple other threads. My best guess is that those other threads are created after the setup occurs, which would be enough to satisfy TSan.

I will still take a look at using some kind of separate "Frozen" type to make this code clearer and the guarantees stronger, but in the meantime I might as well get review on this part.
Luke, to summarize things...

- JSRuntime::permanentAtoms gets created and filled in at JSRuntime startup, by
  initializeAtoms().
  
- After that, it's read-only, so we let multiple threads read it without
  locking via threadsafeReadonlyLookup().

- *Except* that when you do a no-add-lookup in HashTable it actually does a
  `|= 0` on the keyHash, so it's not truly read-only. TSan complains frequently
  about this.

It's a genuine data race. Whether it would have negative effects in practice is
hard to say; the code is frequently overwriting keyHashes with the same value.
But given the unpredictability of the effects of data races in C++ (see
https://blog.mozilla.org/nnethercote/2015/02/24/fix-your-damned-data-races/)
and the simplicity of the fix, it's worth fixing.

BTW, I checked pldhash.cpp and it doesn't suffer from the same problem, because
SearchTable() only modifies the keyHash field if the |Reason| template
parameter is |ForAdd|. (Nb: pldhash.cpp doesn't have an equivalent to
threadsafeReadonlyLookup(), but if it did, it would be ok.)
Attachment #8569355 - Flags: review?(luke)
Attachment #8568987 - Attachment is obsolete: true
Jan, given that you added threadsafeReadonlyLookup() in bug 844983, you should read through this bug and understand it, as penance :)
Flags: needinfo?(jdemooij)
Comment on attachment 8569355 [details] [diff] [review]
Don't write collision bits in HashTable unnecessarily

Review of attachment 8569355 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense.  With inlining (which I think we usually have with this function) this should be free.
Attachment #8569355 - Flags: review?(luke) → review+
This clarifies the two phases -- (a) initialization and (b) read-only use
-- that |permanentAtoms| goes through. It also gives some type-based protection
against potential misuse.
Attachment #8569465 - Flags: review?(bhackett1024)
I tweaked a comment on mccr8's suggestion.
Attachment #8569475 - Flags: review?(bhackett1024)
Attachment #8569465 - Attachment is obsolete: true
Attachment #8569465 - Flags: review?(bhackett1024)
Attachment #8569475 - Flags: review?(bhackett1024) → review+
(In reply to Nicholas Nethercote [:njn] (on vacation until April 30th) from comment #12)
> Jan, given that you added threadsafeReadonlyLookup() in bug 844983, you
> should read through this bug and understand it, as penance :)

Done. Thanks for fixing!
Flags: needinfo?(jdemooij)
You need to log in before you can comment on or make changes to this bug.