Closed
Bug 1121669
Opened 9 years ago
Closed 7 years ago
TSan: data race parser/html/nsHtml5StreamParser.cpp:85 get
Categories
(Core :: DOM: HTML Parser, defect)
Tracking
()
RESOLVED
FIXED
mozilla54
Tracking | Status | |
---|---|---|
firefox54 | --- | fixed |
People
(Reporter: froydnj, Assigned: hsivonen)
References
(Blocks 1 open bug)
Details
(Whiteboard: [tsan])
Attachments
(2 files)
[cribbing from decoder's script, hopefully he won't mind] The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer). Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1]. If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist. I am not really sure what's going on here...I think this might be due to nsHtml5TimerKungFu and the weirdness surrounding it. CC'ing mccr8 since interesting things are happening during cycle collection. [1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
Assignee | ||
Comment 1•9 years ago
|
||
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #0) > I am not really sure what's going on here. I'm not sure, either. From the line numbers, it looks a lot like it's complaining about mFlushTimer, but that's supposed to be written to only on the main thread, even though it's *read* on the parser thread.
Assignee | ||
Comment 2•9 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #1) > (In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #0) > > I am not really sure what's going on here. > > I'm not sure, either. From the line numbers, it looks a lot like it's > complaining about mFlushTimer, but that's supposed to be written to only on > the main thread, even though it's *read* on the parser thread. That's not actually true. mFlushTimer is written into on the parser thread at https://dxr.mozilla.org/mozilla-central/source/parser/html/nsHtml5StreamParser.cpp#1592 So, indeed, the main thread is null-checking mFlushTimer without a lock and, if it is not null, posting a runnable to the parser thread so that the runnable ends up nulling out mFlushTimer on the parser thread. A data race is detected, because in the case where the mFlushTimer is read on the main thread in the null check and turns out to be null, it has already been set to null on the parser thread, because a normal end of the parse, too, (not just CC) ends up posting the runnable that sets mFlushTimer to null. When I wrote this, the reasoning I had for why this is OK is as follows: mFlushTimer is initialized with a non-null value in the constructor of nsHtml5StreamParser. Thereafter, it is only written to when making it null. This is a one-way operation: if mFlushTimer has become null, it never becomes non-null again. Therefore, the null check read on the main thread sees mFlushTimer as already having been set to null already or as not having yet been set to null yet. If the main thread sees mFlushTimer as null, the view of the memory must be coherent enough to decide not to try to make it null again. If the main thread sees it as non-null, either the main thread sees memory in an incoherent state or there truly has not yet been an attempt on the parser thread to set mFlushTimer to null. Even if the main thread sees memory in an incoherent state, trying to post the runnable that sets mFlushTimer to null a second time is OK, because the runnable itself re-checks mFlushTimer for null on the parser thread. As for the nsHtml5StreamParser itself, it can't get deleted before the runnable runs, because the runnable does a kung fu death grip on the stream parser, so if we end up posting a second nsHtml5TimerKungFu, it means that the runnable traveling in the other direction from the nsHtml5RefPtr on the previous nsHtml5TimerKungFu hasn't run yet and deleted the nsHtml5StreamParser. So I'm pretty convinced that the detected race condition is actually benign. I'm not familiar with the annotations involved here to know how to annotate it as benign, though. (As noted, all this complexity is due to having to call Cancel() on the timer on the parser thread.)
Comment 3•9 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #2) > So I'm pretty convinced that the detected race condition is actually > benign. [..] It's an impressive bit of reasoning. I am not in a position to assess it directly. But I would like to ask: are you assuming that writes done by one CPU core become visible to another core (that is, thread) in the same order? In other words, is the logic correct even in the presence of reordering of stores between cores?
Assignee | ||
Comment 4•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9f2d2e3df857
Assignee | ||
Comment 5•7 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #3) > But I would like to ask: are you assuming that writes > done by one CPU core become visible to another core (that is, thread) > in the same order? Yes. > In other words, is the logic correct even in the > presence of reordering of stores between cores? I don't know. Is adding a mutex around mFlushTimer an appropriate fix or is there something better that works when the pointer to which release/acquire memory orderings would need to be tied to is wrapped inside an nsCOMPtr?
Comment hidden (mozreview-request) |
Comment 7•7 years ago
|
||
I don't have enough information to review this properly, alas. Which fields of nsHtml5StreamParser is mFlushTimerMutex intended to protect? In particular, is it supposed to protect mFlushTimerArmed?
Comment hidden (mozreview-request) |
Assignee | ||
Comment 9•7 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #7) > I don't have enough information to review this properly, alas. > Which fields of nsHtml5StreamParser is mFlushTimerMutex intended > to protect? mFlushTimer only. > In particular, is it supposed to protect mFlushTimerArmed? No, it's not supposed to protect it, since mFlushTimerArmed is accessed from the parser thread only (after the creation of the object it's on). I've updated the patch with a comment to this effect and with some additional scopes to make sure that mFlushTimerMutex isn't held unnecessarily long.
Updated•7 years ago
|
Attachment #8829100 -
Flags: review?(jseward) → review+
Comment 10•7 years ago
|
||
mozreview-review |
Comment on attachment 8829100 [details] Bug 1121669 - Add a mutex around mFlushTimer to deal with write appearing to other threads in an inconsistent order. . https://reviewboard.mozilla.org/r/106294/#review108542
Assignee | ||
Comment 12•7 years ago
|
||
Looks like autoland failed. I'll try to land this manually on Feb 14.
Assignee | ||
Comment 13•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/bcda57aa0c3b291dcd5de5d7fd048af69dbdff4d Bug 1121669 - Add a mutex around mFlushTimer to deal with write appearing to other threads in an inconsistent order. r=jseward.
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(hsivonen)
Comment 14•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/bcda57aa0c3b
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox54:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Updated•7 years ago
|
Assignee: nobody → hsivonen
You need to log in
before you can comment on or make changes to this bug.
Description
•