Closed Bug 1156974 Opened 9 years ago Closed 9 years ago

TSan: data race netwerk/cache2/CacheFileIOManager.h:51 IsDoomed

Categories

(Core :: Networking: Cache, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla43
Tracking Status
firefox43 --- fixed

People

(Reporter: froydnj, Assigned: froydnj)

References

(Blocks 1 open bug)

Details

(Keywords: csectype-race, csectype-uninitialized, sec-audit, Whiteboard: [tsan][post-critsmash-triage][adv-main43-])

Attachments

(2 files, 1 obsolete file)

The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer).

* Specific information about this bug

If I understand what these stacks are saying, then we have a malloc on the cache IO thread, and then a subsequent read from that malloc'd memory on the main thread...but without any intervening writes in between.

That seems a bit weird, because for that to happen, we'd have to not run the constructor for CacheFileHandle, all of which initialize the mIsDoomed member.  But then how would the main thread get access to the handle, since it hadn't been fully initialized?

Also seems worth pointing out that the write in TSan's malloc was 8 bytes, whereas the read for mIsDoomed is a mere byte.  And that mIsDoomed sits at the beginning of an 8-byte region.  So this is possibly TSan just being not quite as accurate as it could be, because it doesn't notice the write from the CacheFileHandle constructor?

Filing this as a security bug given the appearance of reading from uninitialized memory, but if this is just TSan being dumb, then it's probably an ordinary data race.  (Which is annoying, because it happens a fair amount.)  WDYT, Andrew?

* General information about TSan, data races, etc.

Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1][2].

If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist.

[1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[2] _How to miscompile programs with "benign" data races_: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
Flags: needinfo?(continuation)
I'm not really sure, sorry.  Maybe Honza could take a look.
Flags: needinfo?(continuation) → needinfo?(honzab.moz)
CacheFile::mHandle (which we do a null check on as a barrier in IsDoomed()) is assigned in CacheFile::OnFileOpened under the CacheFile object's lock.  Hence there should be enough memory barriers IMO between the CacheFileHandle ctor and that assignment.  Anyway, I'm no super expert to memory reordering.

Would (instead of the mHandle non-nullness check) an atomic flag "mHasHandle" set after the assignment happened do the job better?
Flags: needinfo?(honzab.moz) → needinfo?(nfroyd)
(In reply to Honza Bambas (:mayhemer) from comment #2)
> CacheFile::mHandle (which we do a null check on as a barrier in IsDoomed())
> is assigned in CacheFile::OnFileOpened under the CacheFile object's lock. 
> Hence there should be enough memory barriers IMO between the CacheFileHandle
> ctor and that assignment.  Anyway, I'm no super expert to memory reordering.
> 
> Would (instead of the mHandle non-nullness check) an atomic flag
> "mHasHandle" set after the assignment happened do the job better?

I don't understand this question.  Are you saying to set the flag mHasHandle in CacheFile::OnFileOpened when mHandle is assigned, and then...what?  How is the assignment to mHandle relevant to the mIsDoomed race?

mIsDoomed is still written on the cache I/O thread and read out on the main thread without any intervening locks...
Flags: needinfo?(nfroyd) → needinfo?(honzab.moz)
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #3)
> (In reply to Honza Bambas (:mayhemer) from comment #2)
> > CacheFile::mHandle (which we do a null check on as a barrier in IsDoomed())
> > is assigned in CacheFile::OnFileOpened under the CacheFile object's lock. 
> > Hence there should be enough memory barriers IMO between the CacheFileHandle
> > ctor and that assignment.  Anyway, I'm no super expert to memory reordering.
> > 
> > Would (instead of the mHandle non-nullness check) an atomic flag
> > "mHasHandle" set after the assignment happened do the job better?
> 
> I don't understand this question.  Are you saying to set the flag mHasHandle
> in CacheFile::OnFileOpened when mHandle is assigned, and then...what?  

Then you know you can dereference mHandle.

> How
> is the assignment to mHandle relevant to the mIsDoomed race?

We would not call mHandle->IsDoomed() (where mIsDoomed member is actually read) when the assignment is not finished.  mHasHandle would protect us.

> 
> mIsDoomed is still written on the cache I/O thread and read out on the main
> thread without any intervening locks...

And?  It's a bool, atomic to read.  But yes, maybe it should be atomic/relaxed or rel-acq.
Flags: needinfo?(honzab.moz)
Flags: needinfo?(nfroyd)
If I understand that proposal correctly, TSan would still report a data race on IsDoomed, because TSan wouldn't see that the atomic mHasHandle is "protecting" IsDoomed.  At that point, I think you might as well either blacklist the access, or add some Atomic<>-ness to mIsDoomed.
Flags: needinfo?(nfroyd)
This patch implements the proposal from comment 5 about simply making mIsDoomed
a relaxed atomic.  The justification here is from comment 2, but it is a little
thin...

I guess Honza isn't reviewing, so pushing this to Michal...
Attachment #8642604 - Flags: review?(michal.novotny)
This patch should work, thanks!
Attachment #8642604 - Flags: review?(michal.novotny) → review+
Group: core-security → network-core-security
Comment on attachment 8659395 [details] [diff] [review]
mark CacheFileHandle::mIsDoomed as a release/acquire Atomic variable

Apparently marking mIsDoomed as a Relaxed atomic meant that clang could still delete the initializer that we were relying upon to prevent the race.  ReleaseAcquire is strong enough to prevent the deletion...at least for now.

The only change from the last patch is s/Relaxed/ReleaseAcquire/, so the review hinges upon whether you think the extra overhead from the stronger ordering is worth it.  I would prefer to do this rather than blacklisting the IsDoomed method.
Attachment #8659395 - Flags: review?(michal.novotny)
Assignee: nobody → nfroyd
Status: NEW → ASSIGNED
Attachment #8659395 - Flags: review?(michal.novotny) → review+
https://hg.mozilla.org/mozilla-central/rev/b4f94fa3627d
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Group: network-core-security → core-security-release
Whiteboard: [tsan] → [tsan][post-critsmash-triage]
Whiteboard: [tsan][post-critsmash-triage] → [tsan][post-critsmash-triage][adv-main43-]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: