Closed Bug 1136762 Opened 5 years ago Closed 4 years ago

TSan: data race xpcom/io/nsPipe3.cpp:1061 CloseWithStatus

Categories

(Core :: XPCOM, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox37 --- unaffected
firefox38 --- affected
firefox39 --- affected
firefox47 --- fixed

People

(Reporter: froydnj, Assigned: jseward)

References

(Blocks 1 open bug)

Details

(Whiteboard: [tsan])

Attachments

(2 files, 1 obsolete file)

The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer).

* Specific information about this bug

It looks like we're trying to close the input stream of the pipe from two different threads?  And the second close is racing with the first insofar as it's checking a non-synchronized value.

I would be really curious to know why we're sharing the same stream here between two different threads, especially since the main thread use looks like it's coming from JS...maybe this is the HTTP server used for mochitests.

* General information about TSan, data races, etc.

Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1][2].

If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist.

[1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[2] _How to miscompile programs with "benign" data races_: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
Attached file pipe-race.txt
You can use all the Bugzilla template helpers you want, but it would be *super* useful if the template yelled at you when you forgot to fill out a field.
I won't be able to look at this until next week.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
This is most likely happening due to:

1) Thread 1 is STS writing to nsPipeOutputStream with NS_AsyncCopy() using the auto-close feature.
2) Thread 2 is reading from an nsPipeInputStream and is closing because it hit an exception at the same time.  Probably because the pipe was closed from the writing side.

In this case its the nsPipeInputStream::mInputStatus field thats being accessed in two places.  It might be as easy as marking this Atomic<nsresult>, but I need to look at the race more closely.
Hmm, no.  I need to lock the monitor anywhere I touch the pipe mStatus or input stream's mInputStatus.  I should add a lot of locking assertions around these methods as well.
Ben, is there any possibility to move this along?  Once bug 1141565 lands,
this will be the #1 reported race when running the xpcshell test suite:

 123 xpcom/io/nsPipe3.cpp:1205:7 in CloseWithStatus <--------------
 116 nsprpub/pr/src/io/prlog.c:358:20 in PR_NewLogModule
 116 netwerk/base/nsIOService.cpp:1448:19 in nsIOService::Observe(nsISupp
  43 modules/libpref/Preferences.cpp:1769:38 in mozilla::UintVarChanged(c
  38 ipc/glue/MessagePump.cpp:142:7 in mozilla::ipc::MessagePump::Schedul
  11 js/src/jsscript.cpp:2529:58 in MarkScriptData
   9 tools/profiler/core/platform-linux.cc:250:46 in (anonymous namespace
   9 tools/profiler/core/platform-linux.cc:249:46 in (anonymous namespace
   9 tools/profiler/core/platform.h:350:34 in IsActive
Easy STR: do a TSan-enabled build.  Then:

TSAN_OPTIONS=suppressions=/home/sewardj/MOZ/SUPPS/tsan-empty.txt \
  DISPLAY=:1.0 ./mach xpcshell-test --sequential --log-mach - \
  browser/experiments/test/xpcshell/test_activate.js
Nathan, can you take this?  I'm swamped with service worker stuff still.
Flags: needinfo?(nfroyd)
I will take a look at this next week.
Flags: needinfo?(nfroyd)
Attached patch bug1136762-1.cset (obsolete) — Splinter Review
I audited all uses of mStatus and mInputStatus and found three places
where the monitor should have been held, but wasn't.  I also found a
race on mOriginalInput in nsPipe::Release.  The attached patch fixes
all of these.
Attachment #8715253 - Flags: feedback?(bkelly)
Comment on attachment 8715253 [details] [diff] [review]
bug1136762-1.cset

Review of attachment 8715253 [details] [diff] [review]:
-----------------------------------------------------------------

Looks reasonable.  Thanks!
Attachment #8715253 - Flags: feedback?(bkelly) → feedback+
Attachment #8715253 - Flags: review?(nfroyd)
Assignee: bkelly → jseward
Comment on attachment 8715253 [details] [diff] [review]
bug1136762-1.cset

Review of attachment 8715253 [details] [diff] [review]:
-----------------------------------------------------------------

This needs a few changes first.

::: xpcom/io/nsPipe3.cpp
@@ +525,5 @@
>      return 0;
>    }
> +  // Avoid racing on |mOriginalInput| by only looking at it when
> +  // the refcount is 1, that is, we are the only pointer (hence only
> +  // thread) to access it.

Thank you for this comment.

@@ +1179,5 @@
>  {
>    LOG(("nsPipeInputStream::OnInputException [this=%x reason=%x]\n",
>         this, aReason));
>  
> +  ReentrantMonitorAutoEnter mon(mPipe->mReentrantMonitor);

Why is this necessary?  AFAICT, we only ever call this with the monitor held.  (And so we ought to change it so it takes a ReentrantMonitorAutoEnter&...)

@@ +1490,5 @@
>  
>  nsresult
>  nsPipeInputStream::Status() const
>  {
> +  ReentrantMonitorAutoEnter mon(mPipe->mReentrantMonitor);

Please make a separate Status(const ReentrantMonitorAutoEnter&) method, move the |return NS_FAILED(...)| bit into that method, and use the new method wherever possible.  (Almost every call to Status() can use the new method, AFAICT.)  There's no reason to require unnecessary monitor acquisition for something so trivial as status flags.
Attachment #8715253 - Flags: review?(nfroyd) → review-
Addresses all review comments in comment 11, and also ..

> @@ +1179,5 @@
> >  {
> >    LOG(("nsPipeInputStream::OnInputException [this=%x reason=%x]\n",
> >         this, aReason));
> >  
> > +  ReentrantMonitorAutoEnter mon(mPipe->mReentrantMonitor);
> 
> Why is this necessary?  AFAICT, we only ever call this with the monitor held.  > (And so we ought to change it so it takes a ReentrantMonitorAutoEnter&...)

.. it makes this method (OnInputException) take a 
 const ReentrantMonitorAutoEnter&...)
as "evidence", and also does the same for OnInputReadable.
Attachment #8715253 - Attachment is obsolete: true
Attachment #8718869 - Flags: feedback?(nfroyd)
Comment on attachment 8718869 [details] [diff] [review]
bug1136762-2.cset

Review of attachment 8718869 [details] [diff] [review]:
-----------------------------------------------------------------

WFM, thank you for making the broader fiddly changes.  Consider this an r+.
Attachment #8718869 - Flags: review+
Attachment #8718869 - Flags: feedback?(nfroyd)
Attachment #8718869 - Flags: feedback+
https://hg.mozilla.org/mozilla-central/rev/5271840b178c
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
You need to log in before you can comment on or make changes to this bug.