TSan: data race netwerk/cache2/CacheStorageService.cpp:1391 AddStorageEntry

RESOLVED FIXED in Firefox 46

Status

()

Core
Networking: Cache
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: froydnj, Assigned: mayhemer)

Tracking

(Blocks: 1 bug)

unspecified
mozilla46
x86_64
Linux
Points:
---

Firefox Tracking Flags

(firefox46 fixed)

Details

(Whiteboard: [tsan])

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

3 years ago
Created attachment 8549147 [details]
netwerk-cache-open-race.txt

[cribbing from decoder's script, hopefully he won't mind]

The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer).

Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1].

If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist.

[1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
(Assignee)

Comment 1

3 years ago
Is this the CacheEntry::mFileStatus member again?
Flags: needinfo?(nfroyd)
(Reporter)

Comment 2

3 years ago
(In reply to Honza Bambas (:mayhemer) from comment #1)
> Is this the CacheEntry::mFileStatus member again?

I'm not sure.  The read in the log is claimed to come from a line like:

  if (MOZ_UNLIKELY(entry->IsFileDoomed())) {

which checks CacheEntry::mFileStatus, so I guess we can assume IsFileDoomed was inlined?  The write, however, is claimed to come from:

  mCallback->OnFileOpened(mHandle, rv);

in OpenFileEvent::Run:

http://mxr.mozilla.org/mozilla-central/source/netwerk/cache2/CacheFileIOManager.cpp#605

(and that's ignoring the assign_assuming_AddRef "frame" below it).  If I understand the code correctly, I don't think OnFileOpened could have been inlined here, so I'm not sure what's going on.
Flags: needinfo?(nfroyd)
(Reporter)

Comment 3

2 years ago
Created attachment 8642574 [details]
TSan stack

Here's a better stack.  It looks like TSan is complaining that mHandle is being assigned in CacheFile::OnFileOpened, off the main thread, and then being read on the main thread via CacheFile::IsDoomed.  Mutexes are being held around the read and the write, but they're different mutexes, so TSan is unhappy.

Is there anything that can be done about this?
Attachment #8549147 - Attachment is obsolete: true
Flags: needinfo?(honzab.moz)
(Assignee)

Comment 4

2 years ago
I think there is already a very similar bug on this and it again leads me to the same conclusion.  We could have mHasHandle atomic bool flag set *after* mHandle is set (and so addrefed) and when accessing mHandle do a check on mHasHandle.
Flags: needinfo?(honzab.moz) → needinfo?(nfroyd)
(Assignee)

Comment 5

2 years ago
Created attachment 8644986 [details] [diff] [review]
v1
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Attachment #8644986 - Flags: review?(michal.novotny)
(Reporter)

Comment 6

2 years ago
(In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #4)
> I think there is already a very similar bug on this and it again leads me to
> the same conclusion.  We could have mHasHandle atomic bool flag set *after*
> mHandle is set (and so addrefed) and when accessing mHandle do a check on
> mHasHandle.

I'd have to check, but I'm fairly sure this isn't going to stop TSan from complaining.  TSan does not--AFAIK--take into account how atomic operations on one memory location affect raciness of another.  (I think this is consistent with how the C++ memory model works as well, but I'd have to read the memory model much more closely than I'm capable of doing so currently.)

I'd much rather fix this in a way that's capable of being understood natively via TSan.
Flags: needinfo?(nfroyd)
Comment on attachment 8644986 [details] [diff] [review]
v1

Review of attachment 8644986 [details] [diff] [review]:
-----------------------------------------------------------------

Simply use CacheFile's lock to protect access to mHandle in CacheFile::IsDoomed(). mHandle is used under the lock at other places except in CacheFile::InitIndexEntry() called from CacheFile::OnMetadataRead(). Grab the lock in CacheFile::OnMetadataRead() too and put AssertOwnsLock() at the beginning of CacheFile::InitIndexEntry().
Attachment #8644986 - Flags: review?(michal.novotny) → review-
(Assignee)

Comment 8

2 years ago
(In reply to Michal Novotny (:michal) from comment #7)
> Comment on attachment 8644986 [details] [diff] [review]
> v1
> 
> Review of attachment 8644986 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Simply use CacheFile's lock to protect access to mHandle in
> CacheFile::IsDoomed(). 

Absolutely not.  Remember the bug we were timing out a certain debug only test?  What you suggest would lead to cross entry of locks.

> mHandle is used under the lock at other places except
> in CacheFile::InitIndexEntry() called from CacheFile::OnMetadataRead(). Grab
> the lock in CacheFile::OnMetadataRead() too and put AssertOwnsLock() at the
> beginning of CacheFile::InitIndexEntry().

This is too suboptimal.  Putting locks everywhere just blocks everything.

I have to think.
(In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #8)
> > Simply use CacheFile's lock to protect access to mHandle in
> > CacheFile::IsDoomed(). 
> 
> Absolutely not.  Remember the bug we were timing out a certain debug only
> test?  What you suggest would lead to cross entry of locks.

What bug are you talking about? CacheFile::IsDoomed() is called only from CacheEntry::IsFileDoomed() which is called from CacheStorageService::AddStorageEntry() and CacheStorageService::CacheFileDoomed(). Both grab CacheStorageService's lock so the order would be CacheStorageService::mLock -> CacheFile::mLock. We normally acquire locks in following orders:

  CacheStorageService::mLock -> CacheEntry::mLock
  CacheEntry::mLock -> CacheFile::mLock

So I don't see any problem with this.


> > mHandle is used under the lock at other places except
> > in CacheFile::InitIndexEntry() called from CacheFile::OnMetadataRead(). Grab
> > the lock in CacheFile::OnMetadataRead() too and put AssertOwnsLock() at the
> > beginning of CacheFile::InitIndexEntry().
> 
> This is too suboptimal.  Putting locks everywhere just blocks everything.

I see _NO_ sense in trying to avoid using lock in a class where all members are already protected with a lock. We just missed to protect few of the accesses.
(Assignee)

Comment 10

2 years ago
Created attachment 8706987 [details] [diff] [review]
v2

So, I first tried to use similar solution as in bug 1222782.  Then I realized that each access to mHandle-> has to be wrapped and also it goes via an intrinsic, that will probably make things slower than using a simpler solution you have suggested.

Hence, I use the lock in IsDoomed.  However, I am persuaded that we will have problems with that in a certain debug test that will be timing out because of the deadlock detector going crazy.  But I could well be wrong as well, we'll see.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3af92707b6c0
Attachment #8644986 - Attachment is obsolete: true
Attachment #8706987 - Flags: review?(michal.novotny)
Attachment #8706987 - Flags: review?(michal.novotny) → review+
(Assignee)

Updated

2 years ago
Keywords: checkin-needed

Comment 11

2 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/c5fcb1b6fdb8
Keywords: checkin-needed
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1203655

Comment 13

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/c5fcb1b6fdb8
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox46: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
You need to log in before you can comment on or make changes to this bug.