Closed Bug 1121669 Opened 5 years ago Closed 3 years ago
TSan: data race parser/html/ns
Html5Stream Parser .cpp:85 get
[cribbing from decoder's script, hopefully he won't mind] The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer). Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) . If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist. I am not really sure what's going on here...I think this might be due to nsHtml5TimerKungFu and the weirdness surrounding it. CC'ing mccr8 since interesting things are happening during cycle collection.  http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #0) > I am not really sure what's going on here. I'm not sure, either. From the line numbers, it looks a lot like it's complaining about mFlushTimer, but that's supposed to be written to only on the main thread, even though it's *read* on the parser thread.
(In reply to Henri Sivonen (:hsivonen) from comment #1) > (In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #0) > > I am not really sure what's going on here. > > I'm not sure, either. From the line numbers, it looks a lot like it's > complaining about mFlushTimer, but that's supposed to be written to only on > the main thread, even though it's *read* on the parser thread. That's not actually true. mFlushTimer is written into on the parser thread at https://dxr.mozilla.org/mozilla-central/source/parser/html/nsHtml5StreamParser.cpp#1592 So, indeed, the main thread is null-checking mFlushTimer without a lock and, if it is not null, posting a runnable to the parser thread so that the runnable ends up nulling out mFlushTimer on the parser thread. A data race is detected, because in the case where the mFlushTimer is read on the main thread in the null check and turns out to be null, it has already been set to null on the parser thread, because a normal end of the parse, too, (not just CC) ends up posting the runnable that sets mFlushTimer to null. When I wrote this, the reasoning I had for why this is OK is as follows: mFlushTimer is initialized with a non-null value in the constructor of nsHtml5StreamParser. Thereafter, it is only written to when making it null. This is a one-way operation: if mFlushTimer has become null, it never becomes non-null again. Therefore, the null check read on the main thread sees mFlushTimer as already having been set to null already or as not having yet been set to null yet. If the main thread sees mFlushTimer as null, the view of the memory must be coherent enough to decide not to try to make it null again. If the main thread sees it as non-null, either the main thread sees memory in an incoherent state or there truly has not yet been an attempt on the parser thread to set mFlushTimer to null. Even if the main thread sees memory in an incoherent state, trying to post the runnable that sets mFlushTimer to null a second time is OK, because the runnable itself re-checks mFlushTimer for null on the parser thread. As for the nsHtml5StreamParser itself, it can't get deleted before the runnable runs, because the runnable does a kung fu death grip on the stream parser, so if we end up posting a second nsHtml5TimerKungFu, it means that the runnable traveling in the other direction from the nsHtml5RefPtr on the previous nsHtml5TimerKungFu hasn't run yet and deleted the nsHtml5StreamParser. So I'm pretty convinced that the detected race condition is actually benign. I'm not familiar with the annotations involved here to know how to annotate it as benign, though. (As noted, all this complexity is due to having to call Cancel() on the timer on the parser thread.)
(In reply to Henri Sivonen (:hsivonen) from comment #2) > So I'm pretty convinced that the detected race condition is actually > benign. [..] It's an impressive bit of reasoning. I am not in a position to assess it directly. But I would like to ask: are you assuming that writes done by one CPU core become visible to another core (that is, thread) in the same order? In other words, is the logic correct even in the presence of reordering of stores between cores?
(In reply to Julian Seward [:jseward] from comment #3) > But I would like to ask: are you assuming that writes > done by one CPU core become visible to another core (that is, thread) > in the same order? Yes. > In other words, is the logic correct even in the > presence of reordering of stores between cores? I don't know. Is adding a mutex around mFlushTimer an appropriate fix or is there something better that works when the pointer to which release/acquire memory orderings would need to be tied to is wrapped inside an nsCOMPtr?
I don't have enough information to review this properly, alas. Which fields of nsHtml5StreamParser is mFlushTimerMutex intended to protect? In particular, is it supposed to protect mFlushTimerArmed?
(In reply to Julian Seward [:jseward] from comment #7) > I don't have enough information to review this properly, alas. > Which fields of nsHtml5StreamParser is mFlushTimerMutex intended > to protect? mFlushTimer only. > In particular, is it supposed to protect mFlushTimerArmed? No, it's not supposed to protect it, since mFlushTimerArmed is accessed from the parser thread only (after the creation of the object it's on). I've updated the patch with a comment to this effect and with some additional scopes to make sure that mFlushTimerMutex isn't held unnecessarily long.
Attachment #8829100 - Flags: review?(jseward) → review+
Comment on attachment 8829100 [details] Bug 1121669 - Add a mutex around mFlushTimer to deal with write appearing to other threads in an inconsistent order. . https://reviewboard.mozilla.org/r/106294/#review108542
Can this land now?
Looks like autoland failed. I'll try to land this manually on Feb 14.
https://hg.mozilla.org/integration/mozilla-inbound/rev/bcda57aa0c3b291dcd5de5d7fd048af69dbdff4d Bug 1121669 - Add a mutex around mFlushTimer to deal with write appearing to other threads in an inconsistent order. r=jseward.
You need to log in before you can comment on or make changes to this bug.