Closed Bug 1281292 Opened 8 years ago Closed 3 years ago

TSan: data race on gXPCOMThreadsShutDown

Categories

(Core :: XPCOM, defect, P3)

47 Branch
defect

Tracking

()

RESOLVED DUPLICATE of bug 1607138

People

(Reporter: mozbugz, Unassigned)

References

(Blocks 1 open bug)

Details

As seen in:
https://treeherder.mozilla.org/logviewer.html#?job_id=22651204&repo=try#L3973

Write of size 1 at 0x7f9f2ed90b48 by main thread:
  #0 mozilla::ShutdownXPCOM(nsIServiceManager*)
  xpcom/build/XPCOMInit.cpp:884:27
>    gXPCOMThreadsShutDown = true;
Previous read of size 1 at 0x7f9f2ed90b48 by thread T26:
  #0 nsThread::DispatchInternal(already_AddRefed<nsIRunnable>, unsigned int, nsThread::nsNestedEventTarget*)
  xpcom/threads/nsThread.cpp:686:7
>  if (gXPCOMThreadsShutDown && MAIN_THREAD != mIsMainThread && !aTarget) {

In this case it seems harmless, as ShutdownXPCOM() is the only place that sets this bool once, and nsThread::DispatchInternal() is the only place that reads it.

It would be nice to get concurrency experts to verify that, and at least document it so that it may be more quickly ignored as a potential data race.
Or actually protect it with an Atomic or Mutex.
I see another unprotected static bool in dom/base/ScriptSettings.cpp#30: sScriptSettingsTLSInitialized.
(Found by https://treeherder.mozilla.org/logviewer.html#?job_id=22651204&repo=try#L2562 )

Is this an acceptable usage across our codebase?
If yes, I'd suggest we document it in https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style , and maybe add an annotation for them.
Otherwise we should find more of these and squash them!
(In reply to Gerald Squelart [:gerald] from comment #0)
> In this case it seems harmless, as ShutdownXPCOM() is the only place that
> sets this bool once, and nsThread::DispatchInternal() is the only place that
> reads it.
Isn't that still a data race when read/write to gXPCOMThreadsShutDown can happen at the same time on different threads without synchronization?
There is only one write of the bool (from false==0 to true==1) happening over the whole life of Firefox, so there is no risk of overwriting between different tasks.
And many independent reads, which I believe are safe because they will just get a false or true.

But once again, an actual expert should probably confirm this, or demand rectification!
Even we make gXPCOMThreadsShutDown an atomic, the code still looks racy where nsThread::DispatchInternal() sees gXPCOMThreadsShutDown is false and continue its call flow while gXPCOMThreadsShutDown is set to true on the main thread at the same time.
Blocks: tsan
Maybe Nathan has some ideas.
Flags: needinfo?(nfroyd)
I'm not sure what the exact question is. :)

Changing it to an Atomic would make the TSan complaint go away, but it wouldn't do anything about the race mentioned in comment 4.  Fixing that race would require locking around gXPCOMThreadsShutDown, and that would introduce a very hot lock for not a lot of win.  Although it would fix problems of letting events leak in or around shutdown...

I'd have to think about it some more.
Flags: needinfo?(nfroyd)
Nicolas, could this be a small piece of the "Shutdown issues"[1] puzzle?

[1] https://wiki.mozilla.org/Gecko:Shutdown_issues
Flags: needinfo?(nical.bugzilla)
(In reply to Gerald Squelart [:gerald] from comment #7)
> Nicolas, could this be a small piece of the "Shutdown issues"[1] puzzle?
> 
> [1] https://wiki.mozilla.org/Gecko:Shutdown_issues

Kinda. This looks like the typical scenario where we have a mechanism that expects to never run during/after shutdown. In there a branch has been glued to detect a bad situation and mitigate some instances of it. As comment 4 says, this branch is not going to catch all cases of shutdown races.

I don't know if it would be very efficient to have DispatchInternal gracefully handle being called during shutdown. What should it do ? drop the message ? perhaps this message contains something important like a condvar that something will wait on, or data that would be leaked intermittently. The callers should be design to not ask the threading primitives to be run after or during the shutdown of threads.

I would prefer being good at detecting the race and blaming the caller of the Dispatch functions.
Perhaps, making gXPCOMThreadsShutDown and checking it at the beginning *and* at the end of the function with a fatal assertion (in non-release channels) would be a better way to catch this race.
Flags: needinfo?(nical.bugzilla)
Priority: -- → P3
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.