Closed Bug 1121672 Opened 9 years ago Closed 8 years ago

TSan: data race netwerk/cache2/CacheStorageService.cpp:1391 AddStorageEntry

Categories

(Core :: Networking: Cache, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: froydnj, Assigned: mayhemer)

References

(Blocks 1 open bug)

Details

(Whiteboard: [tsan])

Attachments

(2 files, 2 obsolete files)

Attached file netwerk-cache-open-race.txt (obsolete) —
[cribbing from decoder's script, hopefully he won't mind]

The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer).

Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1].

If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist.

[1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
Is this the CacheEntry::mFileStatus member again?
Flags: needinfo?(nfroyd)
(In reply to Honza Bambas (:mayhemer) from comment #1)
> Is this the CacheEntry::mFileStatus member again?

I'm not sure.  The read in the log is claimed to come from a line like:

  if (MOZ_UNLIKELY(entry->IsFileDoomed())) {

which checks CacheEntry::mFileStatus, so I guess we can assume IsFileDoomed was inlined?  The write, however, is claimed to come from:

  mCallback->OnFileOpened(mHandle, rv);

in OpenFileEvent::Run:

http://mxr.mozilla.org/mozilla-central/source/netwerk/cache2/CacheFileIOManager.cpp#605

(and that's ignoring the assign_assuming_AddRef "frame" below it).  If I understand the code correctly, I don't think OnFileOpened could have been inlined here, so I'm not sure what's going on.
Flags: needinfo?(nfroyd)
Attached file TSan stack
Here's a better stack.  It looks like TSan is complaining that mHandle is being assigned in CacheFile::OnFileOpened, off the main thread, and then being read on the main thread via CacheFile::IsDoomed.  Mutexes are being held around the read and the write, but they're different mutexes, so TSan is unhappy.

Is there anything that can be done about this?
Attachment #8549147 - Attachment is obsolete: true
Flags: needinfo?(honzab.moz)
I think there is already a very similar bug on this and it again leads me to the same conclusion.  We could have mHasHandle atomic bool flag set *after* mHandle is set (and so addrefed) and when accessing mHandle do a check on mHasHandle.
Flags: needinfo?(honzab.moz) → needinfo?(nfroyd)
Attached patch v1 (obsolete) — Splinter Review
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Attachment #8644986 - Flags: review?(michal.novotny)
(In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #4)
> I think there is already a very similar bug on this and it again leads me to
> the same conclusion.  We could have mHasHandle atomic bool flag set *after*
> mHandle is set (and so addrefed) and when accessing mHandle do a check on
> mHasHandle.

I'd have to check, but I'm fairly sure this isn't going to stop TSan from complaining.  TSan does not--AFAIK--take into account how atomic operations on one memory location affect raciness of another.  (I think this is consistent with how the C++ memory model works as well, but I'd have to read the memory model much more closely than I'm capable of doing so currently.)

I'd much rather fix this in a way that's capable of being understood natively via TSan.
Flags: needinfo?(nfroyd)
Comment on attachment 8644986 [details] [diff] [review]
v1

Review of attachment 8644986 [details] [diff] [review]:
-----------------------------------------------------------------

Simply use CacheFile's lock to protect access to mHandle in CacheFile::IsDoomed(). mHandle is used under the lock at other places except in CacheFile::InitIndexEntry() called from CacheFile::OnMetadataRead(). Grab the lock in CacheFile::OnMetadataRead() too and put AssertOwnsLock() at the beginning of CacheFile::InitIndexEntry().
Attachment #8644986 - Flags: review?(michal.novotny) → review-
(In reply to Michal Novotny (:michal) from comment #7)
> Comment on attachment 8644986 [details] [diff] [review]
> v1
> 
> Review of attachment 8644986 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Simply use CacheFile's lock to protect access to mHandle in
> CacheFile::IsDoomed(). 

Absolutely not.  Remember the bug we were timing out a certain debug only test?  What you suggest would lead to cross entry of locks.

> mHandle is used under the lock at other places except
> in CacheFile::InitIndexEntry() called from CacheFile::OnMetadataRead(). Grab
> the lock in CacheFile::OnMetadataRead() too and put AssertOwnsLock() at the
> beginning of CacheFile::InitIndexEntry().

This is too suboptimal.  Putting locks everywhere just blocks everything.

I have to think.
(In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #8)
> > Simply use CacheFile's lock to protect access to mHandle in
> > CacheFile::IsDoomed(). 
> 
> Absolutely not.  Remember the bug we were timing out a certain debug only
> test?  What you suggest would lead to cross entry of locks.

What bug are you talking about? CacheFile::IsDoomed() is called only from CacheEntry::IsFileDoomed() which is called from CacheStorageService::AddStorageEntry() and CacheStorageService::CacheFileDoomed(). Both grab CacheStorageService's lock so the order would be CacheStorageService::mLock -> CacheFile::mLock. We normally acquire locks in following orders:

  CacheStorageService::mLock -> CacheEntry::mLock
  CacheEntry::mLock -> CacheFile::mLock

So I don't see any problem with this.


> > mHandle is used under the lock at other places except
> > in CacheFile::InitIndexEntry() called from CacheFile::OnMetadataRead(). Grab
> > the lock in CacheFile::OnMetadataRead() too and put AssertOwnsLock() at the
> > beginning of CacheFile::InitIndexEntry().
> 
> This is too suboptimal.  Putting locks everywhere just blocks everything.

I see _NO_ sense in trying to avoid using lock in a class where all members are already protected with a lock. We just missed to protect few of the accesses.
Attached patch v2Splinter Review
So, I first tried to use similar solution as in bug 1222782.  Then I realized that each access to mHandle-> has to be wrapped and also it goes via an intrinsic, that will probably make things slower than using a simpler solution you have suggested.

Hence, I use the lock in IsDoomed.  However, I am persuaded that we will have problems with that in a certain debug test that will be timing out because of the deadlock detector going crazy.  But I could well be wrong as well, we'll see.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3af92707b6c0
Attachment #8644986 - Attachment is obsolete: true
Attachment #8706987 - Flags: review?(michal.novotny)
Attachment #8706987 - Flags: review?(michal.novotny) → review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/c5fcb1b6fdb8
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: