Closed
Bug 1222101
Opened 9 years ago
Closed 8 years ago
TSan: data race ipc/glue/MessagePump.cpp:142 ScheduleWork (race on mThread)
Categories
(Core :: IPC, defect)
Core
IPC
Tracking
()
RESOLVED
FIXED
mozilla48
People
(Reporter: jseward, Assigned: billm)
References
(Blocks 1 open bug)
Details
(Keywords: csectype-race, sec-moderate, Whiteboard: [tsan][post-critsmash-triage][adv-main48+])
Attachments
(2 files, 2 obsolete files)
9.19 KB,
text/plain
|
Details | |
15.10 KB,
patch
|
jld
:
review+
|
Details | Diff | Splinter Review |
The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer). * Specific information about this bug STR: apply test harness patch from bug 1222043m then run: TSAN_OPTIONS=suppressions=/home/sewardj/MOZ/SUPPS/tsan-1.txt DISPLAY=:1.0 ./mach xpcshell-test --sequential --log-mach - dom/base/test/unit_ipc/test_bug553888_wrap.js 2>&1 | tee spew-53-tsan-1 Looks like a race on MessagePump::mThread. * General information about TSan, data races, etc. Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1][2]. If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist. [1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong [2] _How to miscompile programs with "benign" data races_: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
Reporter | ||
Comment 1•9 years ago
|
||
A bit of analysis. What we have is: toolkit/xre/nsEmbedFunctions.cpp:556: a MessageLoop (containing MessagePump?) object is created. In particular, uiMessageLoop.mThread is set to nullptr at this point. 555 // Associate this thread with a UI MessageLoop 556 MessageLoop uiMessageLoop(uiLoopType); 557 { A child thread is created: 574 process = new ContentProcess(parentPID); The child goes off and reads uiMessageLoop.mThread, uncoordinated: ipc/glue/MessagePump.cpp:142 141 // Make sure the event loop wakes up. 142 if (mThread) { 143 mThread->Dispatch(mDoWorkEvent, NS_DISPATCH_NORMAL); Meanwhile, uncoordinated, some time later, the parent writes uiMessageLoop.mThread: 622 // Run the UI event loop on the main thread. 623 uiMessageLoop.MessageLoop::Run(); (leads to "mThread = NS_GetCurrentThread()" at MessagePump.cpp:84 It's unclear to me whether the author(s) intend a race to exist, and think it is OK, or whether they are/were unaware of the existence of the race.
Reporter | ||
Comment 2•9 years ago
|
||
Looking more at MessagePump::ScheduleWork() in ipc/glue/MessagePump.cpp, it looks like this race could cause events (mDoWorkEvent) to mistakenly be delivered to the main thread rather than to the thread that mThread will eventually refer to. This seems bad.
Assignee | ||
Comment 3•9 years ago
|
||
Thanks Julian. This does seems like a very serious bug. I'm going to close it for now. I think we should be able to initialize mThread in the MessagePump constructor. I'll post a patch tomorrow or Monday.
Assignee: nobody → wmccloskey
Group: core-security
Flags: needinfo?(wmccloskey)
Assignee | ||
Comment 4•9 years ago
|
||
Actually, I guess this isn't a big deal. mThread is just going to end up being the main thread anyway. We should still fix it, but it's probably not causing anyone any trouble.
Flags: needinfo?(wmccloskey)
Reporter | ||
Comment 5•9 years ago
|
||
That seems to imply that the only possible mThread transition is nullptr --> main_thread. If so, would it be possible to simply bake the main thread ID into the MessagePump when it is created? Anyway, am happy to test a patch as/when available.
Updated•9 years ago
|
Keywords: csectype-race,
sec-moderate
Updated•9 years ago
|
Group: core-security → dom-core-security
Assignee | ||
Comment 6•8 years ago
|
||
Does this fix the problem?
Attachment #8725487 -
Flags: feedback?(jseward)
Reporter | ||
Comment 7•8 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #6) > Does this fix the problem? Yes, it does. Thanks!
Assignee | ||
Comment 8•8 years ago
|
||
The tricky issue here is that the current thread sometimes isn't available in the MessagePump constructor because XPCOM hasn't been initialized yet. I'm guessing that's why we initialize it in ::Run instead. I was a little concerned that there could be a race here where we call ScheduleWork before the thread manager is initialized. However, ContentChild does that explicitly to avoid that situation: https://dxr.mozilla.org/mozilla-central/source/dom/ipc/ContentChild.cpp#666 The Open call on the first channel doesn't happen until after the thread manager is initialized, so we're safe there. There's also a question whether this could be a race for non-main threads. I made some changes to pass the thread in explicitly for non-main threads. I think there might be a small window here where the message loop is exposed outside the thread and mThread hasn't been set yet: https://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/thread.cc#164
Attachment #8725487 -
Attachment is obsolete: true
Attachment #8725487 -
Flags: feedback?(jseward)
Attachment #8726357 -
Flags: review?(jld)
Comment 9•8 years ago
|
||
Comment on attachment 8726357 [details] [diff] [review] patch v2 Review of attachment 8726357 [details] [diff] [review]: ----------------------------------------------------------------- Sorry for taking so long. I've been staring at this (and, more importantly, the event loop code) for a while, and I'm not completely convinced I understand everything I should. But, if I understand correctly enough, there are a few cases: * The main thread, in which case mThread is (after this patch) always nullptr and NS_DispatchToMainThread is used (I assume that does something reasonable if called early?). * A non-main thread, but its nsIThread exists when it's started, so mThread is initialized in the constructor and everything is fine. * A non-main thread started before nsIThreads exist. (Is this just the child process's IPC I/O thread? Are there other cases? Does PBackground always wind up in the previous case?) Here mThread is initially nullptr and later changed to the actual thread at the top of MessagePump::Run. Could this be explicitly distinguished from the main-thread case somehow? It would be good if we could release assert or otherwise fail instead of quietly dispatching a runnable to the wrong thread. In any case, it looks like an improvement on the current situation. ::: ipc/glue/MessagePump.cpp @@ +79,5 @@ > +{ > + if (!NS_IsMainThread()) { > + // Don't write to mThread if we don't have to. It can cause a race if > + // ScheduleWork runs on another thread. > + mThread = NS_GetCurrentThread(); If mThread is already set to a non-null value from the constructor, shouldn't this skip the write? (And maybe assert or release-assert that it's the correct thread.) ::: ipc/glue/MessagePump.h @@ +120,5 @@ > NS_DECL_NSITHREADOBSERVER > > public: > + MessagePumpForNonMainUIThreads(nsIThread* aThread) : > + MessagePump(aThread), Does this work? It looks like MessagePumpForNonMainUIThreads doesn't inherit from MessagePump.
Attachment #8726357 -
Flags: review?(jld) → review+
Reporter | ||
Comment 10•8 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #8) > Created attachment 8726357 [details] [diff] [review] > patch v2 This patch appears to remove the TSan complaints. +1 from me, therefore.
Bill landed something for this in https://hg.mozilla.org/integration/mozilla-inbound/rev/00f8c8fde8ca0e3d2816cf3ddc3f6e26b5070c52 but I backed it out in https://hg.mozilla.org/integration/mozilla-inbound/rev/5dbb8c2b35f4 because something from this push broke builds like https://treeherder.mozilla.org/logviewer.html#?job_id=24870637&repo=mozilla-inbound
Flags: needinfo?(wmccloskey)
Comment 12•8 years ago
|
||
This looks like the problem: https://hg.mozilla.org/integration/mozilla-inbound/rev/00f8c8fde8ca0e3d2816cf3ddc3f6e26b5070c52#l4.12 The fix should be a simple matter of adding "explicit" to those 1-argument constructors that used to be 0-argument.
Flags: needinfo?(wmccloskey) → needinfo?(jld)
Comment 13•8 years ago
|
||
Attachment #8726357 -
Attachment is obsolete: true
Flags: needinfo?(jld)
Attachment #8736540 -
Flags: review+
Comment 14•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7afa9ef11df6 (shown here as two patches, but I squashed them for the patch I just uploaded). I didn't rerun tests (besides building) because the incremental change shouldn't affect anything besides the static-analysis failure.
Keywords: checkin-needed
Keywords: checkin-needed
Comment 16•8 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/c91015c43d55
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox48:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Updated•8 years ago
|
Group: dom-core-security → core-security-release
Updated•8 years ago
|
Whiteboard: [tsan] → [tsan][post-critsmash-triage]
Updated•8 years ago
|
Whiteboard: [tsan][post-critsmash-triage] → [tsan][post-critsmash-triage][adv-main48+]
Updated•8 years ago
|
status-firefox-esr45:
--- → wontfix
Updated•7 years ago
|
Group: core-security-release
You need to log in
before you can comment on or make changes to this bug.
Description
•