Closed
Bug 829617
Opened 12 years ago
Closed 11 years ago
data races on gTimestamp in xpcom/threads/HangMonitor.cpp
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
mozilla28
People
(Reporter: jseward, Assigned: decoder)
References
(Blocks 1 open bug)
Details
(Whiteboard: [data-race] [helgrind])
Attachments
(1 file, 1 obsolete file)
1.58 KB,
patch
|
decoder
:
review+
|
Details | Diff | Splinter Review |
gTimestamp in xpcom/threads/HangMonitor.cpp isn't consistently
protected by any lock. In at least one place it is protected by
gMonitor, but on the whole it is accessed without any locks held.
Per C++11 this means the behaviour is undefined -- races now count as
'undefined behaviour' which means compilers and hardware can do
anything they want if they can detect the absence of protecting locks.
Line 43 has this comment
// Monitor protects gShutdown and gTimeout, but not gTimestamp which rely on
// being atomically set by the processor; synchronization doesn't really matter
// in this use case.
Atomicity is only part of the picture. It is also of concern that
stores done by one processor may become visible to other processors
in a different order than they were performed in. C++11 allows that.
Also, we have
volatile PRIntervalTime gTimestamp = PR_INTERVAL_NO_WAIT;
[...]
case kUIActivity:
if (gTimestamp != PR_INTERVAL_NO_WAIT) {
cumulativeUILagMS += PR_IntervalToMilliseconds(PR_IntervalNow() -
gTimestamp);
}
Since gTimestamp is declared volatile, the compiler must issue a
separate load for each of the two uses. Since it isn't consistently
protected by a lock, it is possible that a different thread changes
the value in between these two uses, leading to a possibly
inconsistent adjustment to cumulativeUILagMS.
Reporter | ||
Comment 1•12 years ago
|
||
Complainage from Helgrind (2 of many)
----------------------------------------------------------------
Lock at 0x4F29D40 was first observed
at 0x4A0A283: pthread_mutex_init (/home/sewardj/VgTRUNK/merge/helgrind/hg_intercepts.c:429)
by 0x5058AB8: PR_NewLock (nsprpub/pr/src/pthreads/ptsynch.c:147)
by 0x7C7A3BF: mozilla::HangMonitor::Startup() (ff-opt/xpcom/threads/../../dist/include/mozilla/Mutex.h:49)
by 0x7C406B9: NS_InitXPCOM2_P (xpcom/build/nsXPComInit.cpp:481)
by 0x6BA8FE0: ScopedXPCOMStartup::Initialize() (toolkit/xre/nsAppRunner.cpp:1185)
by 0x6BAF2E1: XREMain::XRE_main(int, char**, nsXREAppData const*) (toolkit/xre/nsAppRunner.cpp:3886)
by 0x6BAF5A6: XRE_main (toolkit/xre/nsAppRunner.cpp:4093)
by 0x40208E: do_main(int, char**, nsIFile*) (browser/app/nsBrowserApp.cpp:195)
by 0x402195: main (browser/app/nsBrowserApp.cpp:388)
Possible data race during write of size 4 at 0x9510F68 by thread #1
Locks held: none
at 0x7C7A38F: mozilla::HangMonitor::Suspend() (xpcom/threads/HangMonitor.cpp:356)
by 0x78D5851: PollWrapper(_GPollFD*, unsigned int, int) (widget/gtk2/nsAppShell.cpp:34)
by 0x30E4447AF3: ??? (in /usr/lib64/libglib-2.0.so.0.3200.4)
by 0x30E4447C13: g_main_context_iteration (in /usr/lib64/libglib-2.0.so.0.3200.4)
by 0x78D5807: nsAppShell::ProcessNextNativeEvent(bool) (widget/gtk2/nsAppShell.cpp:135)
by 0x78F1B65: nsBaseAppShell::DoProcessNextNativeEvent(bool, unsigned int) (widget/xpwidgets/nsBaseAppShell.cpp:139)
by 0x78F1D47: nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool, unsigned int) (widget/xpwidgets/nsBaseAppShell.cpp:280)
by 0x7C74D6D: nsThread::ProcessNextEvent(bool, bool*) (xpcom/threads/nsThread.cpp:600)
by 0x7C3BA00: NS_ProcessNextEvent_P(nsIThread*, bool) (ff-opt/xpcom/build/nsThreadUtils.cpp:237)
by 0x79DE756: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (ipc/glue/MessagePump.cpp:82)
by 0x7CA2F15: MessageLoop::RunInternal() (ipc/chromium/src/base/message_loop.cc:215)
by 0x7CA2F24: MessageLoop::RunHandler() (ipc/chromium/src/base/message_loop.cc:208)
This conflicts with a previous read of size 4 by thread #5
Locks held: 1, at address 0x4F29D40
at 0x7C7A1E9: mozilla::HangMonitor::ThreadMain(void*) (xpcom/threads/HangMonitor.cpp:173)
by 0x505E156: _pt_root (nsprpub/pr/src/pthreads/ptthread.c:158)
by 0x4A0A126: mythread_wrapper (/home/sewardj/VgTRUNK/merge/helgrind/hg_intercepts.c:219)
by 0x30E2C07D13: start_thread (in /usr/lib64/libpthread-2.15.so)
by 0x30E28F167C: clone (in /usr/lib64/libc-2.15.so)
----------------------------------------------------------------
Lock at 0x4F29D40 was first observed
at 0x4A0A283: pthread_mutex_init (/home/sewardj/VgTRUNK/merge/helgrind/hg_intercepts.c:429)
by 0x5058AB8: PR_NewLock (nsprpub/pr/src/pthreads/ptsynch.c:147)
by 0x7C7A3BF: mozilla::HangMonitor::Startup() (ff-opt/xpcom/threads/../../dist/include/mozilla/Mutex.h:49)
by 0x7C406B9: NS_InitXPCOM2_P (xpcom/build/nsXPComInit.cpp:481)
by 0x6BA8FE0: ScopedXPCOMStartup::Initialize() (toolkit/xre/nsAppRunner.cpp:1185)
by 0x6BAF2E1: XREMain::XRE_main(int, char**, nsXREAppData const*) (toolkit/xre/nsAppRunner.cpp:3886)
by 0x6BAF5A6: XRE_main (toolkit/xre/nsAppRunner.cpp:4093)
by 0x40208E: do_main(int, char**, nsIFile*) (browser/app/nsBrowserApp.cpp:195)
by 0x402195: main (browser/app/nsBrowserApp.cpp:388)
Possible data race during write of size 4 at 0x9510F68 by thread #1
Locks held: none
at 0x7C7A32B: mozilla::HangMonitor::NotifyActivity(mozilla::HangMonitor::ActivityType) (xpcom/threads/HangMonitor.cpp:334)
by 0x78D586B: PollWrapper(_GPollFD*, unsigned int, int) (widget/gtk2/nsAppShell.cpp:36)
by 0x30E4447AF3: ??? (in /usr/lib64/libglib-2.0.so.0.3200.4)
by 0x30E4447C13: g_main_context_iteration (in /usr/lib64/libglib-2.0.so.0.3200.4)
by 0x78D5807: nsAppShell::ProcessNextNativeEvent(bool) (widget/gtk2/nsAppShell.cpp:135)
by 0x78F1B65: nsBaseAppShell::DoProcessNextNativeEvent(bool, unsigned int) (widget/xpwidgets/nsBaseAppShell.cpp:139)
by 0x78F1D47: nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool, unsigned int) (widget/xpwidgets/nsBaseAppShell.cpp:280)
by 0x7C74D6D: nsThread::ProcessNextEvent(bool, bool*) (xpcom/threads/nsThread.cpp:600)
by 0x7C3BA00: NS_ProcessNextEvent_P(nsIThread*, bool) (ff-opt/xpcom/build/nsThreadUtils.cpp:237)
by 0x79DE756: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (ipc/glue/MessagePump.cpp:82)
by 0x7CA2F15: MessageLoop::RunInternal() (ipc/chromium/src/base/message_loop.cc:215)
by 0x7CA2F24: MessageLoop::RunHandler() (ipc/chromium/src/base/message_loop.cc:208)
This conflicts with a previous read of size 4 by thread #5
Locks held: 1, at address 0x4F29D40
at 0x7C7A1E9: mozilla::HangMonitor::ThreadMain(void*) (xpcom/threads/HangMonitor.cpp:173)
by 0x505E156: _pt_root (nsprpub/pr/src/pthreads/ptthread.c:158)
by 0x4A0A126: mythread_wrapper (/home/sewardj/VgTRUNK/merge/helgrind/hg_intercepts.c:219)
by 0x30E2C07D13: start_thread (in /usr/lib64/libpthread-2.15.so)
by 0x30E28F167C: clone (in /usr/lib64/libc-2.15.so)
----------------------------------------------------------------
Reporter | ||
Updated•12 years ago
|
Whiteboard: [data-race] [helgrind]
Comment 2•12 years ago
|
||
I don't understand what it's complaining about. The variable is only ever written on the main thread, and as far as I understand the rules, this means that there is no possibility of any kind of race from reading it on the main thread either.
The hang monitor thread reads the value off the main thread, but it's ok if the read is not perfectly in sync... it would have to be many seconds off for it to affect the behavior of the program.
If there's a way to annotate this behavior, we should do that. Otherwise I think that this is WONTFIX.
Reporter | ||
Comment 3•12 years ago
|
||
> I don't understand what it's complaining about.
It's complaining that is saw the hang monitor thread read gTimestamp
and the main thread later write it, but with no synchronisation event
in between.
I'll annotate it as don't-bother-to-track-this-variable and close as
WONTFIX.
Assignee | ||
Comment 5•11 years ago
|
||
Fwiw, TSan is also still seeing this issue, this is a real race. I can blacklist the Suspend function at compile time to ignore it. Shall I attach a patch here for review?
Blocks: tsan
Flags: needinfo?(benjamin)
Assignee | ||
Comment 6•11 years ago
|
||
This makes access to gTimestamp atomic and fixes both the Suspend and the NotifyActivity TSan races for me.
Talked to bsmedberg and he said looks good, but nfroyd should review.
Assignee: nobody → choller
Status: NEW → ASSIGNED
Attachment #823463 -
Flags: review?(nfroyd)
Flags: needinfo?(benjamin)
Comment 7•11 years ago
|
||
Comment on attachment 823463 [details] [diff] [review]
tsan-hangmonitor.patch
Review of attachment 823463 [details] [diff] [review]:
-----------------------------------------------------------------
r=me with the changes below.
::: xpcom/threads/HangMonitor.cpp
@@ +7,5 @@
> #include "mozilla/Monitor.h"
> #include "mozilla/Preferences.h"
> #include "mozilla/Telemetry.h"
> #include "mozilla/ProcessedStack.h"
> +#include "mozilla/Attributes.h"
This #include is redundant, AFAICS. Please remove.
@@ +56,5 @@
> bool gShutdown;
>
> // The timestamp of the last event notification, or PR_INTERVAL_NO_WAIT if
> // we're currently not processing events.
> +mozilla::Atomic<PRIntervalTime, mozilla::ReleaseAcquire> gTimestamp(PR_INTERVAL_NO_WAIT);
The default for Atomic is what you want, not mozilla::ReleaseAcquire. Please remove it.
You should be able to get away without the mozilla:: here.
Attachment #823463 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 8•11 years ago
|
||
Patch v2. Addressed review comments, carrying over r+ from last review. Thanks Nathan! :)
Attachment #823463 -
Attachment is obsolete: true
Attachment #823669 -
Flags: review+
Assignee | ||
Comment 9•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/c5b39aa24ffe
Setting needinfo for Julian because I don't know if he needs something else to be done here for Helgrind. If not, then we can close this bug after my patch landed.
Flags: needinfo?(jseward)
Whiteboard: [data-race] [helgrind] → [data-race] [helgrind][leave open]
Comment 10•11 years ago
|
||
Reporter | ||
Comment 11•11 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #9)
> Setting needinfo for Julian because I don't know if he needs something else
> to be done here for Helgrind.
I think the fix is OK for Helgrind, so this can be closed now.
Flags: needinfo?(jseward)
Assignee | ||
Updated•11 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [data-race] [helgrind][leave open] → [data-race] [helgrind]
Updated•11 years ago
|
Target Milestone: --- → mozilla28
You need to log in
before you can comment on or make changes to this bug.
Description
•