TSan: data race on global 'mozilla::net::CacheObserver::sDiskCacheCapacity'

RESOLVED FIXED in Firefox 48

Status

()

Core
Networking: Cache
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: jseward, Assigned: mayhemer)

Tracking

(Blocks: 1 bug)

Trunk
mozilla48
Points:
---

Firefox Tracking Flags

(firefox47 affected, firefox48 fixed)

Details

(Whiteboard: [necko-would-take])

Attachments

(4 attachments)

(Reporter)

Description

2 years ago
The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer).

* Specific information about this bug

Both the main thread and the 'Cache2 I/O' access
CacheObserver::sDiskCacheCapacity without any locking.

* General information about TSan, data races, etc.

Typically, races reported by TSan are not false positives, but it is
possible that the race is benign. Even in this case though, we should
try to come up with a fix unless this would cause unacceptable
performance issues. Also note that seemingly benign races can possibly
be harmful (also depending on the compiler and the architecture)
[1][2].

If the bug cannot be fixed, then this bug should be used to either
make a compile-time annotation for blacklisting or add an entry to the
runtime blacklist.

[1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[2] _How to miscompile programs with "benign" data races_: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
(Reporter)

Comment 1

2 years ago
Created attachment 8720234 [details]
TSan stack trace, tidied up
(Reporter)

Comment 2

2 years ago
Following the landing of bug 1136762, this is now the most frequently
reported race, when running the xpcshell test suite on TSan:

45 modules/libpref/Preferences.cpp:1769:38 in mozilla::UintVarChanged <------
35 ipc/glue/MessagePump.cpp:142:7 in mozilla::ipc::MessagePump::Sched
11 toolkit/components/telemetry/Telemetry.cpp:3318:37 in CanRecordBase
11 toolkit/components/telemetry/Telemetry.cpp:3318:11 in CanRecordBase
10 tools/profiler/core/platform-linux.cc:250:46 in (anonymous namespa
10 tools/profiler/core/platform-linux.cc:249:46 in (anonymous namespa
10 signal handler spoils errno tools/profiler/core/platform-linux.cc:
 9 tools/profiler/core/platform.h:350:34 in IsActive
 7 netwerk/base/nsPACMan.cpp:344:13 in nsPACMan::Shutdown()
(Reporter)

Comment 3

2 years ago
Easy STR: do a TSan-enabled build, then:

TSAN_OPTIONS=suppressions=/home/sewardj/MOZ/SUPPS/tsan-empty.txt \
   DISPLAY=:1.0 ./mach xpcshell-test --sequential --log-mach - \
   devtools/shared/apps/tests/unit/test_webappsActor.js
(Assignee)

Comment 4

2 years ago
This is uncritical at all.  Can I use Atomic as a static?
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
(Reporter)

Comment 5

2 years ago
(In reply to Honza Bambas (:mayhemer) from comment #4)
> This is uncritical at all.  Can I use Atomic as a static?

(per irc discussion) Yes you can use Atomic as a static.  No problem there.
The difficulty though is that the call to mozilla::Preferences::AddUintVarCache
at CacheObserver.cpp:161 now passes an Atomic<int>* instead of an int* to 
AddUintVarCache.  So there is some type fixing-up that needs to happen.
(Assignee)

Comment 6

2 years ago
I think adding Atomic variants of AddPrefCache* methods might be the way.  Others may benefit too.

Updated

2 years ago
Blocks: 929478
Whiteboard: [necko-would-take]
(Reporter)

Comment 7

2 years ago
(In reply to Honza Bambas (:mayhemer) from comment #6)
> I think adding Atomic variants of AddPrefCache* methods might be the way. 
> Others may benefit too.

I'm in two minds about whether to fix it like that.  I looked at it and
at first glance, adding Atomic AddPrefCache* variants seems pretty ugly.

Is there another way to solve this?  The sequence of events is as follows:

[1] main thread creates the Cache IO thread

[2] the IO thread reads CacheObserver::sDiskCacheCapacity, presumably getting
    the initial value, |kDefaultDiskCacheCapacity|

[3] the main thread then processes preferences (or whatever) and writes to
    CacheObserver::sDiskCacheCapacity

On interpretation is that the IO thread is created too early.  But I
assume that the startup sequence is massively complex and full of 
circular dependencies and so there is no possibility of reordering the
above to [1] [3] [2].  Correct?
(Reporter)

Comment 8

2 years ago
Created attachment 8725200 [details]
race-sDiskCacheCapacity.txt

TSan stack traces again, but will full (non-truncated stacks)
and ordered in the [1] [2] [3] sequence mentioned in comment #7.
(Reporter)

Comment 9

2 years ago
Also I guess reordering would not work if it can be that the cached
value can be changed at any point in the run.

So, making a more atomic version of AddUintVarCache might be the way.
But I am unsure whether it is really enough just to change the types
in question from uint32_t to Atomic<uint32_t>.  That might make TSan
stop complaining, but do we need atomicity at coarser granularity to
make this really correct?  I am unsure how to proceed.
(Assignee)

Comment 10

2 years ago
(In reply to Julian Seward [:jseward] from comment #9)
> Also I guess reordering would not work if it can be that the cached
> value can be changed at any point in the run.

I really don't understand why this is such a big problem.  The value itself is an int32_t which (AFAIK) is naturally atomic.  OK, not in C++11.. I heard..

The value at a moment is absolutely unimportant.  This is nothing critical!

> 
> So, making a more atomic version of AddUintVarCache might be the way.
> But I am unsure whether it is really enough just to change the types
> in question from uint32_t to Atomic<uint32_t>.  

I would like to try to have a patch like that.  If you feel like there is some showstopper, please let me know, can save me some cycles.

> That might make TSan
> stop complaining, but do we need atomicity at coarser granularity to
> make this really correct?  I am unsure how to proceed.

I don't understand your concern.

The only thing here is to ensure an atomic access to static pref cache.  And let you know that there is no arithmetic op happening on that static.  Regardless that uint32_t already is atomic to read/write, we need to Atomize<> it to make C++11 compilers and TSan happy.  Relaxed ordering is fully sufficient here.  

What else am I missing?
Flags: needinfo?(jseward)
(Reporter)

Comment 11

2 years ago
Created attachment 8730257 [details] [diff] [review]
Proposed fix
Flags: needinfo?(jseward)
(Reporter)

Updated

2 years ago
Attachment #8730257 - Flags: review?(honzab.moz)
(Assignee)

Comment 12

2 years ago
Comment on attachment 8730257 [details] [diff] [review]
Proposed fix

Review of attachment 8730257 [details] [diff] [review]:
-----------------------------------------------------------------

Excelent!  Thanks.  Exactly what I had in mind.

Although, I would more prefer this be Atomic<Relaxed>.  There is no arithmetic happening on that global, it's only a pref cache and there is absolutely no need for any memory ordering here.  We access this only on the main thread and on the cache io thread.

Maybe make it templatable on the memory ordering arg?

A pref module peer must review this too.
Attachment #8730257 - Flags: review?(honzab.moz) → review+
(Reporter)

Comment 13

2 years ago
Created attachment 8730771 [details] [diff] [review]
bug1248915c13.cset

Now with templatized memory ordering constraints, as per comment 12.
(Reporter)

Updated

2 years ago
Attachment #8730771 - Flags: review?(n.nethercote)
Comment on attachment 8730771 [details] [diff] [review]
bug1248915c13.cset

Review of attachment 8730771 [details] [diff] [review]:
-----------------------------------------------------------------

I'll agree to review this because the libpref/ reviewing situation is so dire, but I refute utterly the notion that I am a libpref/ peer :)

::: modules/libpref/Preferences.cpp
@@ +1787,5 @@
>    gCacheData->AppendElement(data);
>    return RegisterCallback(UintVarChanged, aPref, data);
>  }
>  
> +template <MemoryOrdering order>

Template params are usually start with an upper-case letter. So use |Order| or |Ordering|.

@@ +1790,5 @@
>  
> +template <MemoryOrdering order>
> +static void AtomicUintVarChanged(const char* aPref, void* aClosure)
> +{
> +  CacheData* cache = static_cast<CacheData*>(aClosure);

Use |auto cache = ...| to avoid writing the type twice.

@@ +1798,5 @@
> +
> +template <MemoryOrdering order>
> +// static
> +nsresult
> +Preferences::AddAtomicUintVarCache(Atomic<uint32_t,order>* aCache,

Comma after space.

::: modules/libpref/Preferences.h
@@ +274,5 @@
>    static nsresult AddUintVarCache(uint32_t* aVariable,
>                                    const char* aPref,
>                                    uint32_t aDefault = 0);
> +  template <MemoryOrdering order>
> +  static nsresult AddAtomicUintVarCache(Atomic<uint32_t,order>* aVariable,

User |Order| or |Ordering| again. And comma after space.

::: netwerk/cache2/CacheObserver.cpp
@@ +50,5 @@
>  // Cache of the calculated memory capacity based on the system memory size
>  int32_t CacheObserver::sAutoMemoryCacheCapacity = -1;
>  
>  static uint32_t const kDefaultDiskCacheCapacity = 250 * 1024; // 250 MB
> +Atomic<uint32_t,Relaxed> CacheObserver::sDiskCacheCapacity

Space after comma.

::: netwerk/cache2/CacheObserver.h
@@ +86,5 @@
>    static bool sUseDiskCache;
>    static uint32_t sMetadataMemoryLimit;
>    static int32_t sMemoryCacheCapacity;
>    static int32_t sAutoMemoryCacheCapacity;
> +  static Atomic<uint32_t,Relaxed> sDiskCacheCapacity;

Space after comma.
Attachment #8730771 - Flags: review?(n.nethercote) → review+
(Reporter)

Comment 15

2 years ago
(In reply to Julian Seward [:jseward] from comment #13)
> Created attachment 8730771 [details] [diff] [review]
> bug1248915c13.cset
> Now with templatized memory ordering constraints, as per comment 12.

Honza, could you please carry over your r+ to this new patch?
Flags: needinfo?(honzab.moz)
(Assignee)

Comment 16

2 years ago
No need to.  Just land.
Flags: needinfo?(honzab.moz)

Comment 18

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/c74f71e1ba62
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox48: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
You need to log in before you can comment on or make changes to this bug.