Intermittent SUMMARY: ThreadSanitizer: data race /builds/worker/workspace/build/src/ipc/chromium/src/third_party/libevent/signal.c:125:16 in evsig_set_base_
Categories
(Core :: IPC, defect, P1)
Tracking
()
People
(Reporter: intermittent-bug-filer, Assigned: jld)
References
(Blocks 1 open bug, )
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])
Attachments
(1 obsolete file)
Filed by: nbeleuzu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=289453443&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/TFmdzKfcRPCMCQ1GziyX-g/runs/0/artifacts/public/logs/live_backing.log
[task 2020-02-19T02:47:11.701Z] 02:47:11 INFO - TEST-INFO | Ran 1 Loops
[task 2020-02-19T02:47:11.708Z] 02:47:11 INFO - SimpleTest FINISHED
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | ==================
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | WARNING: ThreadSanitizer: data race (pid=1990)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | Write of size 4 at 0x7f23e939ced8 by thread T2:
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #0 evsig_set_base_ /builds/worker/workspace/build/src/ipc/chromium/src/third_party/libevent/signal.c:125:16 (libxul.so+0x126b7cf)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #1 event_base_loop /builds/worker/workspace/build/src/ipc/chromium/src/third_party/libevent/event.c:1901:3 (libxul.so+0x1264514)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #2 base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) /builds/worker/workspace/build/src/ipc/chromium/src/base/message_pump_libevent.cc (libxul.so+0x1245ca9)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #3 RunInternal /builds/worker/workspace/build/src/ipc/chromium/src/base/message_loop.cc:315:10 (libxul.so+0x1242fdc)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #4 RunHandler /builds/worker/workspace/build/src/ipc/chromium/src/base/message_loop.cc:308:3 (libxul.so+0x1242fdc)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #5 MessageLoop::Run() /builds/worker/workspace/build/src/ipc/chromium/src/base/message_loop.cc:290:3 (libxul.so+0x1242fdc)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #6 base::Thread::ThreadMain() /builds/worker/workspace/build/src/ipc/chromium/src/base/thread.cc:192:16 (libxul.so+0x12535e3)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #7 ThreadFunc(void*) /builds/worker/workspace/build/src/ipc/chromium/src/base/platform_thread_posix.cc:40:13 (libxul.so+0x124d1be)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | Previous read of size 4 at 0x7f23e939ced8 by main thread:
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | [failed to restore the stack]
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | Location is global 'evsig_base_fd' of size 4 at 0x7f23e939ced8 (libxul.so+0x00000a793ed8)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | Thread T2 'Gecko_IOThread' (tid=1998, running) created by main thread at:
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #0 pthread_create /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:967:3 (firefox+0x571ab)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #1 CreateThread /builds/worker/workspace/build/src/ipc/chromium/src/base/platform_thread_posix.cc:123:14 (libxul.so+0x124a9b7)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #2 PlatformThread::Create(unsigned long, PlatformThread::Delegate*, unsigned long*) /builds/worker/workspace/build/src/ipc/chromium/src/base/platform_thread_posix.cc:134:10 (libxul.so+0x124a9b7)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #3 base::Thread::StartWithOptions(base::Thread::Options const&) /builds/worker/workspace/build/src/ipc/chromium/src/base/thread.cc:97:8 (libxul.so+0x12531c6)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #4 NS_InitXPCOM /builds/worker/workspace/build/src/xpcom/build/XPCOMInit.cpp:316:9 (libxul.so+0xaa2728)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #5 Initialize /builds/worker/workspace/build/src/toolkit/xre/nsAppRunner.cpp:1264:8 (libxul.so+0x632e2dd)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #6 XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/workspace/build/src/toolkit/xre/nsAppRunner.cpp:4697:22 (libxul.so+0x632e2dd)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #7 XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/workspace/build/src/toolkit/xre/nsAppRunner.cpp:4752:21 (libxul.so+0x632e6b4)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #8 mozilla::BootstrapImpl::XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/workspace/build/src/toolkit/xre/Bootstrap.cpp:45:12 (libxul.so+0x6337432)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #9 do_main /builds/worker/workspace/build/src/browser/app/nsBrowserApp.cpp:217:22 (firefox+0xc8b33)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | #10 main /builds/worker/workspace/build/src/browser/app/nsBrowserApp.cpp:331:16 (firefox+0xc8b33)
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | SUMMARY: ThreadSanitizer: data race /builds/worker/workspace/build/src/ipc/chromium/src/third_party/libevent/signal.c:125:16 in evsig_set_base_
[task 2020-02-19T02:47:20.882Z] 02:47:20 INFO - GECKO(1990) | ==================
[task 2020-02-19T02:47:21.064Z] 02:47:21 INFO - TEST-INFO | Main app process: killed by SIGIOT
[task 2020-02-19T02:47:21.065Z] 02:47:21 INFO - Buffered messages finished
[task 2020-02-19T02:47:21.066Z] 02:47:21 ERROR - TEST-UNEXPECTED-FAIL | Last test finished | application terminated with exit code -6
Comment 1•6 years ago
|
||
This is reported upstream as https://github.com/libevent/libevent/issues/779. There is also a PR with a fix at https://github.com/libevent/libevent/pull/741, but it has not been accepted (yet).
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 5•5 years ago
|
||
In the last 7 days there have been 20 occurrences on linux 64 tsan opt.
Recent failure: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=295049632&repo=autoland&lineNumber=3224
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Updated•5 years ago
|
Comment 12•5 years ago
|
||
There are 26 failures associated to this bug in the last 7 days. These are occurring on linux1804-64-tsan opt builds.
log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=299884943&repo=autoland&lineNumber=2529
[task 2020-04-28T23:18:07.550Z] 23:18:07 INFO - GECKO(2698) | WARNING: ThreadSanitizer: data race (pid=2698)
[task 2020-04-28T23:18:07.551Z] 23:18:07 INFO - GECKO(2698) | Write of size 4 at 0x7f99ff683038 by thread T4:
[task 2020-04-28T23:18:07.551Z] 23:18:07 INFO - GECKO(2698) | #0 evsig_set_base_ /builds/worker/checkouts/gecko/ipc/chromium/src/third_party/libevent/signal.c:125:16 (libxul.so+0x1333c1f)
[task 2020-04-28T23:18:07.551Z] 23:18:07 INFO - GECKO(2698) | #1 event_base_loop /builds/worker/checkouts/gecko/ipc/chromium/src/third_party/libevent/event.c:1901:3 (libxul.so+0x132c994)
[task 2020-04-28T23:18:07.551Z] 23:18:07 INFO - GECKO(2698) | #2 base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_pump_libevent.cc (libxul.so+0x130d609)
[task 2020-04-28T23:18:07.552Z] 23:18:07 INFO - GECKO(2698) | #3 RunInternal /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:315:10 (libxul.so+0x130a94c)
[task 2020-04-28T23:18:07.553Z] 23:18:07 INFO - GECKO(2698) | #4 RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:308:3 (libxul.so+0x130a94c)
[task 2020-04-28T23:18:07.553Z] 23:18:07 INFO - GECKO(2698) | #5 MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:290:3 (libxul.so+0x130a94c)
[task 2020-04-28T23:18:07.553Z] 23:18:07 INFO - GECKO(2698) | #6 base::Thread::ThreadMain() /builds/worker/checkouts/gecko/ipc/chromium/src/base/thread.cc:192:16 (libxul.so+0x131ace5)
[task 2020-04-28T23:18:07.553Z] 23:18:07 INFO - GECKO(2698) | #7 ThreadFunc(void*) /builds/worker/checkouts/gecko/ipc/chromium/src/base/platform_thread_posix.cc:40:13 (libxul.so+0x1314afe)
[task 2020-04-28T23:18:07.553Z] 23:18:07 INFO - GECKO(2698) | Previous read of size 4 at 0x7f99ff683038 by main thread:
[task 2020-04-28T23:18:07.553Z] 23:18:07 INFO - GECKO(2698) | [failed to restore the stack]
[task 2020-04-28T23:18:07.553Z] 23:18:07 INFO - GECKO(2698) | Location is global 'evsig_base_fd' of size 4 at 0x7f99ff683038 (libxul.so+0x00000ab57038)
[task 2020-04-28T23:18:07.554Z] 23:18:07 INFO - GECKO(2698) | Thread T4 'Gecko_IOThread' (tid=2708, running) created by main thread at:
[task 2020-04-28T23:18:07.554Z] 23:18:07 INFO - GECKO(2698) | #0 pthread_create /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:967:3 (firefox+0x5749b)
[task 2020-04-28T23:18:07.554Z] 23:18:07 INFO - GECKO(2698) | #1 CreateThread /builds/worker/checkouts/gecko/ipc/chromium/src/base/platform_thread_posix.cc:123:14 (libxul.so+0x1312187)
[task 2020-04-28T23:18:07.554Z] 23:18:07 INFO - GECKO(2698) | #2 PlatformThread::Create(unsigned long, PlatformThread::Delegate*, unsigned long*) /builds/worker/checkouts/gecko/ipc/chromium/src/base/platform_thread_posix.cc:134:10 (libxul.so+0x1312187)
[task 2020-04-28T23:18:07.554Z] 23:18:07 INFO - GECKO(2698) | #3 base::Thread::StartWithOptions(base::Thread::Options const&) /builds/worker/checkouts/gecko/ipc/chromium/src/base/thread.cc:97:8 (libxul.so+0x131a8e1)
[task 2020-04-28T23:18:07.554Z] 23:18:07 INFO - GECKO(2698) | #4 NS_InitXPCOM /builds/worker/checkouts/gecko/xpcom/build/XPCOMInit.cpp:317:9 (libxul.so+0xb1114d)
[task 2020-04-28T23:18:07.555Z] 23:18:07 INFO - GECKO(2698) | #5 ScopedXPCOMStartup::Initialize(bool) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:1311:8 (libxul.so+0x638dcfd)
[task 2020-04-28T23:18:07.556Z] 23:18:07 INFO - GECKO(2698) | #6 XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:4739:22 (libxul.so+0x63968b3)
[task 2020-04-28T23:18:07.557Z] 23:18:07 INFO - GECKO(2698) | #7 XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:4797:21 (libxul.so+0x6396c24)
[task 2020-04-28T23:18:07.557Z] 23:18:07 INFO - GECKO(2698) | #8 mozilla::BootstrapImpl::XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/Bootstrap.cpp:45:12 (libxul.so+0x63a1762)
[task 2020-04-28T23:18:07.558Z] 23:18:07 INFO - GECKO(2698) | #9 do_main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:217:22 (firefox+0xc93e3)
[task 2020-04-28T23:18:07.558Z] 23:18:07 INFO - GECKO(2698) | #10 main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:331:16 (firefox+0xc93e3)
[task 2020-04-28T23:18:07.558Z] 23:18:07 INFO - GECKO(2698) | SUMMARY: ThreadSanitizer: data race /builds/worker/checkouts/gecko/ipc/chromium/src/third_party/libevent/signal.c:125:16 in evsig_set_base_
[task 2020-04-28T23:18:07.559Z] 23:18:07 INFO - GECKO(2698) | ==================
[task 2020-04-28T23:18:07.735Z] 23:18:07 INFO - TEST-INFO | Main app process: killed by SIGIOT
[task 2020-04-28T23:18:07.735Z] 23:18:07 INFO - Buffered messages finished
[task 2020-04-28T23:18:07.736Z] 23:18:07 ERROR - TEST-UNEXPECTED-FAIL | Last test finished | application terminated with exit code -6
[task 2020-04-28T23:18:07.736Z] 23:18:07 INFO - runtests.py | Application ran for: 0:00:46.715597
[task 2020-04-28T23:18:07.736Z] 23:18:07 INFO - zombiecheck | Reading PID log: /tmp/tmpENcMNrpidlog
[task 2020-04-28T23:18:07.736Z] 23:18:07 INFO - ==> process 2698 launched child process 2713
[task 2020-04-28T23:18:07.736Z] 23:18:07 INFO - ==> process 2698 launched child process 2823
[task 2020-04-28T23:18:07.737Z] 23:18:07 INFO - ==> process 2698 launched child process 2843
[task 2020-04-28T23:18:07.737Z] 23:18:07 INFO - ==> process 2698 launched child process 2919
[task 2020-04-28T23:18:07.737Z] 23:18:07 INFO - ==> process 2698 launched child process 2948
[task 2020-04-28T23:18:07.737Z] 23:18:07 INFO - zombiecheck | Checking for orphan process with PID: 2713
[task 2020-04-28T23:18:07.737Z] 23:18:07 INFO - zombiecheck | Checking for orphan process with PID: 2919
[task 2020-04-28T23:18:07.737Z] 23:18:07 INFO - zombiecheck | Checking for orphan process with PID: 2843
[task 2020-04-28T23:18:07.738Z] 23:18:07 INFO - zombiecheck | Checking for orphan process with PID: 2948
[task 2020-04-28T23:18:07.738Z] 23:18:07 INFO - zombiecheck | Checking for orphan process with PID: 2823
[task 2020-04-28T23:18:07.738Z] 23:18:07 INFO - Stopping web server
:jld can you take a look?
Assignee | ||
Comment 13•5 years ago
|
||
There's a fundamentally hard problem here, which seems to be why upstream doesn't have a fix: if an async signal handler is using some kind of resource, like a fd or heap memory, it's not enough to unset the signal handler before freeing the resource; it's necessary to ensure that the handler is no longer running on any other thread, and locks can't be used for this.
We may not need to care about that: the fd is owned by the event_base
(the abstraction of the native I/O multiplexer) and doesn't appear to be closed until it's destroyed, which for IPC is during shutdown. (Specifically, after ShutdownXPCOM
deletes the BrowserProcessSubThread
and sends a ThreadQuitTask
to the I/O thread.)
But then there's what we're being yelled at about here: the fd is stored into the global variable, then the signal handler is installed, then — according to TSan — it's possible for another thread on another CPU to run the signal handler before it sees the change to the non-atomic global. For that, a change proposed in one of the GitHub issues should suffice: adding the C11 _Atomic
qualifier.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 15•5 years ago
|
||
I have a patch, and it doesn't appear to break the build.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 29•5 years ago
|
||
Jed, in comment #15 you mentioned you had a patch to fix this, but it's not attached to this bug. What's the state of that?
Assignee | ||
Comment 30•5 years ago
|
||
See comment #13 on the bug for an explanation of what this fixes, what it
doesn't fix, and why the upstream bug report about the larger issue here
has stalled.
Assignee | ||
Comment 31•5 years ago
|
||
(In reply to Simon Giesecke [:sg] [he/him] from comment #29)
Jed, in comment #15 you mentioned you had a patch to fix this, but it's not attached to this bug. What's the state of that?
Thanks for the reminder.
Comment 32•5 years ago
|
||
Comment 33•5 years ago
|
||
bugherder |
Comment 34•5 years ago
|
||
Still shows up after the fix landed on central: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=312141776&repo=mozilla-central&lineNumber=2259
Assignee | ||
Comment 35•5 years ago
|
||
Slightly different: it's complaining about evsig_base
, not evsig_base_fd
.
There's… a lot going on here. If evsig_base_fd
were loaded first in the signal handler, because it's stored last in the code that updates the globals, it would act as a barrier for the accesses to evsig_base
. But those globals are set every time a signal handlers registered and every time we enter the event loop (which ought to be just once, but that's the source of the write that's called out in the TSan report).
So, I could keep trying to outwit TSan, or I could rewrite the code in IPC that handles child process termination on Unix, which is overdue for some cleanup, to not use this feature of libevent.
Comment 36•5 years ago
|
||
(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #35)
Slightly different: it's complaining about
evsig_base
, notevsig_base_fd
.There's… a lot going on here. If
evsig_base_fd
were loaded first in the signal handler, because it's stored last in the code that updates the globals, it would act as a barrier for the accesses toevsig_base
. But those globals are set every time a signal handlers registered and every time we enter the event loop (which ought to be just once, but that's the source of the write that's called out in the TSan report).So, I could keep trying to outwit TSan, or I could rewrite the code in IPC that handles child process termination on Unix, which is overdue for some cleanup, to not use this feature of libevent.
Should we file a different bug or can we use this one?
Assignee | ||
Comment 37•5 years ago
|
||
(In reply to Andreea Pavel [:apavel] from comment #36)
Should we file a different bug or can we use this one?
I'll file another bug and have it block this one.
Comment 38•5 years ago
|
||
Backed out as requested.
Backout link: https://hg.mozilla.org/integration/autoland/rev/a606d54431cffc3d91cc628b3da8f2ec3d975134
Updated•5 years ago
|
![]() |
||
Comment 39•5 years ago
|
||
Backout merged: https://hg.mozilla.org/mozilla-central/rev/a606d54431cf
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 55•3 years ago
|
||
As far as we know the bug is still present, even if TSan runs aren't finding it anymore for some reason, but bug 1658072 will take care of it.
Description
•