Closed Bug 829617 Opened 12 years ago Closed 11 years ago

data races on gTimestamp in xpcom/threads/HangMonitor.cpp

Categories

(Core :: XPCOM, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla28

People

(Reporter: jseward, Assigned: decoder)

References

(Blocks 1 open bug)

Details

(Whiteboard: [data-race] [helgrind])

Attachments

(1 file, 1 obsolete file)

gTimestamp in xpcom/threads/HangMonitor.cpp isn't consistently protected by any lock. In at least one place it is protected by gMonitor, but on the whole it is accessed without any locks held. Per C++11 this means the behaviour is undefined -- races now count as 'undefined behaviour' which means compilers and hardware can do anything they want if they can detect the absence of protecting locks. Line 43 has this comment // Monitor protects gShutdown and gTimeout, but not gTimestamp which rely on // being atomically set by the processor; synchronization doesn't really matter // in this use case. Atomicity is only part of the picture. It is also of concern that stores done by one processor may become visible to other processors in a different order than they were performed in. C++11 allows that. Also, we have volatile PRIntervalTime gTimestamp = PR_INTERVAL_NO_WAIT; [...] case kUIActivity: if (gTimestamp != PR_INTERVAL_NO_WAIT) { cumulativeUILagMS += PR_IntervalToMilliseconds(PR_IntervalNow() - gTimestamp); } Since gTimestamp is declared volatile, the compiler must issue a separate load for each of the two uses. Since it isn't consistently protected by a lock, it is possible that a different thread changes the value in between these two uses, leading to a possibly inconsistent adjustment to cumulativeUILagMS.
Complainage from Helgrind (2 of many) ---------------------------------------------------------------- Lock at 0x4F29D40 was first observed at 0x4A0A283: pthread_mutex_init (/home/sewardj/VgTRUNK/merge/helgrind/hg_intercepts.c:429) by 0x5058AB8: PR_NewLock (nsprpub/pr/src/pthreads/ptsynch.c:147) by 0x7C7A3BF: mozilla::HangMonitor::Startup() (ff-opt/xpcom/threads/../../dist/include/mozilla/Mutex.h:49) by 0x7C406B9: NS_InitXPCOM2_P (xpcom/build/nsXPComInit.cpp:481) by 0x6BA8FE0: ScopedXPCOMStartup::Initialize() (toolkit/xre/nsAppRunner.cpp:1185) by 0x6BAF2E1: XREMain::XRE_main(int, char**, nsXREAppData const*) (toolkit/xre/nsAppRunner.cpp:3886) by 0x6BAF5A6: XRE_main (toolkit/xre/nsAppRunner.cpp:4093) by 0x40208E: do_main(int, char**, nsIFile*) (browser/app/nsBrowserApp.cpp:195) by 0x402195: main (browser/app/nsBrowserApp.cpp:388) Possible data race during write of size 4 at 0x9510F68 by thread #1 Locks held: none at 0x7C7A38F: mozilla::HangMonitor::Suspend() (xpcom/threads/HangMonitor.cpp:356) by 0x78D5851: PollWrapper(_GPollFD*, unsigned int, int) (widget/gtk2/nsAppShell.cpp:34) by 0x30E4447AF3: ??? (in /usr/lib64/libglib-2.0.so.0.3200.4) by 0x30E4447C13: g_main_context_iteration (in /usr/lib64/libglib-2.0.so.0.3200.4) by 0x78D5807: nsAppShell::ProcessNextNativeEvent(bool) (widget/gtk2/nsAppShell.cpp:135) by 0x78F1B65: nsBaseAppShell::DoProcessNextNativeEvent(bool, unsigned int) (widget/xpwidgets/nsBaseAppShell.cpp:139) by 0x78F1D47: nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool, unsigned int) (widget/xpwidgets/nsBaseAppShell.cpp:280) by 0x7C74D6D: nsThread::ProcessNextEvent(bool, bool*) (xpcom/threads/nsThread.cpp:600) by 0x7C3BA00: NS_ProcessNextEvent_P(nsIThread*, bool) (ff-opt/xpcom/build/nsThreadUtils.cpp:237) by 0x79DE756: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (ipc/glue/MessagePump.cpp:82) by 0x7CA2F15: MessageLoop::RunInternal() (ipc/chromium/src/base/message_loop.cc:215) by 0x7CA2F24: MessageLoop::RunHandler() (ipc/chromium/src/base/message_loop.cc:208) This conflicts with a previous read of size 4 by thread #5 Locks held: 1, at address 0x4F29D40 at 0x7C7A1E9: mozilla::HangMonitor::ThreadMain(void*) (xpcom/threads/HangMonitor.cpp:173) by 0x505E156: _pt_root (nsprpub/pr/src/pthreads/ptthread.c:158) by 0x4A0A126: mythread_wrapper (/home/sewardj/VgTRUNK/merge/helgrind/hg_intercepts.c:219) by 0x30E2C07D13: start_thread (in /usr/lib64/libpthread-2.15.so) by 0x30E28F167C: clone (in /usr/lib64/libc-2.15.so) ---------------------------------------------------------------- Lock at 0x4F29D40 was first observed at 0x4A0A283: pthread_mutex_init (/home/sewardj/VgTRUNK/merge/helgrind/hg_intercepts.c:429) by 0x5058AB8: PR_NewLock (nsprpub/pr/src/pthreads/ptsynch.c:147) by 0x7C7A3BF: mozilla::HangMonitor::Startup() (ff-opt/xpcom/threads/../../dist/include/mozilla/Mutex.h:49) by 0x7C406B9: NS_InitXPCOM2_P (xpcom/build/nsXPComInit.cpp:481) by 0x6BA8FE0: ScopedXPCOMStartup::Initialize() (toolkit/xre/nsAppRunner.cpp:1185) by 0x6BAF2E1: XREMain::XRE_main(int, char**, nsXREAppData const*) (toolkit/xre/nsAppRunner.cpp:3886) by 0x6BAF5A6: XRE_main (toolkit/xre/nsAppRunner.cpp:4093) by 0x40208E: do_main(int, char**, nsIFile*) (browser/app/nsBrowserApp.cpp:195) by 0x402195: main (browser/app/nsBrowserApp.cpp:388) Possible data race during write of size 4 at 0x9510F68 by thread #1 Locks held: none at 0x7C7A32B: mozilla::HangMonitor::NotifyActivity(mozilla::HangMonitor::ActivityType) (xpcom/threads/HangMonitor.cpp:334) by 0x78D586B: PollWrapper(_GPollFD*, unsigned int, int) (widget/gtk2/nsAppShell.cpp:36) by 0x30E4447AF3: ??? (in /usr/lib64/libglib-2.0.so.0.3200.4) by 0x30E4447C13: g_main_context_iteration (in /usr/lib64/libglib-2.0.so.0.3200.4) by 0x78D5807: nsAppShell::ProcessNextNativeEvent(bool) (widget/gtk2/nsAppShell.cpp:135) by 0x78F1B65: nsBaseAppShell::DoProcessNextNativeEvent(bool, unsigned int) (widget/xpwidgets/nsBaseAppShell.cpp:139) by 0x78F1D47: nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal*, bool, unsigned int) (widget/xpwidgets/nsBaseAppShell.cpp:280) by 0x7C74D6D: nsThread::ProcessNextEvent(bool, bool*) (xpcom/threads/nsThread.cpp:600) by 0x7C3BA00: NS_ProcessNextEvent_P(nsIThread*, bool) (ff-opt/xpcom/build/nsThreadUtils.cpp:237) by 0x79DE756: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (ipc/glue/MessagePump.cpp:82) by 0x7CA2F15: MessageLoop::RunInternal() (ipc/chromium/src/base/message_loop.cc:215) by 0x7CA2F24: MessageLoop::RunHandler() (ipc/chromium/src/base/message_loop.cc:208) This conflicts with a previous read of size 4 by thread #5 Locks held: 1, at address 0x4F29D40 at 0x7C7A1E9: mozilla::HangMonitor::ThreadMain(void*) (xpcom/threads/HangMonitor.cpp:173) by 0x505E156: _pt_root (nsprpub/pr/src/pthreads/ptthread.c:158) by 0x4A0A126: mythread_wrapper (/home/sewardj/VgTRUNK/merge/helgrind/hg_intercepts.c:219) by 0x30E2C07D13: start_thread (in /usr/lib64/libpthread-2.15.so) by 0x30E28F167C: clone (in /usr/lib64/libc-2.15.so) ----------------------------------------------------------------
Whiteboard: [data-race] [helgrind]
I don't understand what it's complaining about. The variable is only ever written on the main thread, and as far as I understand the rules, this means that there is no possibility of any kind of race from reading it on the main thread either. The hang monitor thread reads the value off the main thread, but it's ok if the read is not perfectly in sync... it would have to be many seconds off for it to affect the behavior of the program. If there's a way to annotate this behavior, we should do that. Otherwise I think that this is WONTFIX.
> I don't understand what it's complaining about. It's complaining that is saw the hang monitor thread read gTimestamp and the main thread later write it, but with no synchronisation event in between. I'll annotate it as don't-bother-to-track-this-variable and close as WONTFIX.
Fwiw, TSan is also still seeing this issue, this is a real race. I can blacklist the Suspend function at compile time to ignore it. Shall I attach a patch here for review?
Blocks: tsan
Flags: needinfo?(benjamin)
Attached patch tsan-hangmonitor.patch (obsolete) — Splinter Review
This makes access to gTimestamp atomic and fixes both the Suspend and the NotifyActivity TSan races for me. Talked to bsmedberg and he said looks good, but nfroyd should review.
Assignee: nobody → choller
Status: NEW → ASSIGNED
Attachment #823463 - Flags: review?(nfroyd)
Flags: needinfo?(benjamin)
Comment on attachment 823463 [details] [diff] [review] tsan-hangmonitor.patch Review of attachment 823463 [details] [diff] [review]: ----------------------------------------------------------------- r=me with the changes below. ::: xpcom/threads/HangMonitor.cpp @@ +7,5 @@ > #include "mozilla/Monitor.h" > #include "mozilla/Preferences.h" > #include "mozilla/Telemetry.h" > #include "mozilla/ProcessedStack.h" > +#include "mozilla/Attributes.h" This #include is redundant, AFAICS. Please remove. @@ +56,5 @@ > bool gShutdown; > > // The timestamp of the last event notification, or PR_INTERVAL_NO_WAIT if > // we're currently not processing events. > +mozilla::Atomic<PRIntervalTime, mozilla::ReleaseAcquire> gTimestamp(PR_INTERVAL_NO_WAIT); The default for Atomic is what you want, not mozilla::ReleaseAcquire. Please remove it. You should be able to get away without the mozilla:: here.
Attachment #823463 - Flags: review?(nfroyd) → review+
Patch v2. Addressed review comments, carrying over r+ from last review. Thanks Nathan! :)
Attachment #823463 - Attachment is obsolete: true
Attachment #823669 - Flags: review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/c5b39aa24ffe Setting needinfo for Julian because I don't know if he needs something else to be done here for Helgrind. If not, then we can close this bug after my patch landed.
Flags: needinfo?(jseward)
Whiteboard: [data-race] [helgrind] → [data-race] [helgrind][leave open]
(In reply to Christian Holler (:decoder) from comment #9) > Setting needinfo for Julian because I don't know if he needs something else > to be done here for Helgrind. I think the fix is OK for Helgrind, so this can be closed now.
Flags: needinfo?(jseward)
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [data-race] [helgrind][leave open] → [data-race] [helgrind]
Target Milestone: --- → mozilla28
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: