Closed Bug 1132908 Opened 9 years ago Closed 3 years ago

TSan: data race tools/profiler/platform-linux.cc:132 paf_prepare

Categories

(Core :: Gecko Profiler, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: froydnj, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [tsan])

Attachments

(2 files)

Attached file profiler-race.txt
[cribbing from decoder's script, hopefully he won't mind]

The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer).

Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1].

If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist.

If I understand the stack traces correctly, it looks like this is just multiple fork() calls writing to the same variable.  (I don't know why we're forking multiple times during a mochitest run, though.)  Seems like making this Atomic<> should be sufficient...

[1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
Summary: TSan: data race → TSan: data race tools/profiler/platform-linux.cc:132 paf_prepare
Flags: needinfo?(bgirard)
If you have the setup to test this patch it would be handy but if not I'll get around to it later.
Flags: needinfo?(nfroyd)
I don't understand what's going on here, but I still see a race even with the patch applied, though with different stacks:

WARNING: ThreadSanitizer: data race (pid=32087)
  Write of size 1 at 0x7ff866f87258 by thread T2:
    #0 paf_prepare /home/froydnj/src/gecko-dev.git/tools/profiler/platform-linux.cc:134 (libxul.so+0x000003e7fbe0)
    #1 __libc_fork /build/glibc-NmptCx/glibc-2.19/posix/../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c:95 (libc.so.6+0x0000000b75c1)
    #2 LaunchApp /home/froydnj/src/gecko-dev.git/ipc/chromium/src/base/process_util_linux.cc:281 (libxul.so+0x000000e1d080)
    #3 PerformAsyncLaunchInternal /home/froydnj/src/gecko-dev.git/ipc/glue/GeckoChildProcessHost.cpp:721 (libxul.so+0x000000e70a5c)
    #4 PerformAsyncLaunch /home/froydnj/src/gecko-dev.git/ipc/glue/GeckoChildProcessHost.cpp:447 (libxul.so+0x000000e6fdb4)
    #5 RunPerformAsyncLaunch /home/froydnj/src/gecko-dev.git/ipc/glue/GeckoChildProcessHost.cpp:485 (libxul.so+0x000000e6f65d)
    #6 DispatchToMethod<mozilla::ipc::GeckoChildProcessHost, bool (mozilla::ipc::GeckoChildProcessHost::*)(std::vector<std::basic_string<char>, std::allocator<std::basic_string<char> > >, base::ProcessArchitecture), std::vector<std::basic_string<char>, std::allocator<std::basic_string<char> > >, base::ProcessArchitecture> /home/froydnj/src/gecko-dev.git/ipc/chromium/src/base/tuple.h:400 (libxul.so+0x000000e72c07)
    #7 RunTask /home/froydnj/src/gecko-dev.git/ipc/chromium/src/base/message_loop.cc:361 (libxul.so+0x000000e42f96)
    #8 Run /home/froydnj/src/gecko-dev.git/ipc/chromium/src/base/message_pump_libevent.cc:311 (libxul.so+0x000000e1c328)
    #9 RunInternal /home/froydnj/src/gecko-dev.git/ipc/chromium/src/base/message_loop.cc:233 (libxul.so+0x000000e420bf)
    #10 ThreadMain /home/froydnj/src/gecko-dev.git/ipc/chromium/src/base/thread.cc:170 (libxul.so+0x000000e52097)
    #11 ThreadFunc /home/froydnj/src/gecko-dev.git/ipc/chromium/src/base/platform_thread_posix.cc:39 (libxul.so+0x000000e1cd4e)
  Previous write of size 1 at 0x7ff866f87258 by thread T41:
    #0 paf_prepare /home/froydnj/src/gecko-dev.git/tools/profiler/platform-linux.cc:134 (libxul.so+0x000003e7fbe0)
    #1 __libc_fork /build/glibc-NmptCx/glibc-2.19/posix/../nptl/sysdeps/unix/sysv/linux/x86_64/../fork.c:95 (libc.so.6+0x0000000b75c1)
    #2 g_test_get_filename ??:0 (libglib-2.0.so.0+0x000000070926)
  Thread T2 'Gecko_IOThread' (tid=32109, running) created by main thread at:
    #0 pthread_create ??:0 (exe+0x00000002bdd2)
    #1 CreateThread /home/froydnj/src/gecko-dev.git/ipc/chromium/src/base/platform_thread_posix.cc:144 (libxul.so+0x000000e1cc4f)
    #2 StartWithOptions /home/froydnj/src/gecko-dev.git/ipc/chromium/src/base/thread.cc:92 (libxul.so+0x000000e51e5f)
    #3 NS_InitXPCOM2 /home/froydnj/src/gecko-dev.git/xpcom/build/XPCOMInit.cpp:548 (libxul.so+0x000000a077ae)
    #4 Initialize /home/froydnj/src/gecko-dev.git/toolkit/xre/nsAppRunner.cpp:1406 (libxul.so+0x0000040dede2)
    #5 XRE_main /home/froydnj/src/gecko-dev.git/toolkit/xre/nsAppRunner.cpp:4456 (libxul.so+0x0000040df52c)
    #6 do_main /home/froydnj/src/gecko-dev.git/browser/app/nsBrowserApp.cpp:294 (exe+0x00000005b07c)
    #7 __libc_start_main /build/glibc-NmptCx/glibc-2.19/csu/libc-start.c:287 (libc.so.6+0x000000021b44)
  Thread T41 'gmain' (tid=32155, running) created by main thread at:
    #0 pthread_create ??:0 (exe+0x00000002bdd2)
    #1 g_mutex_init ??:0 (libglib-2.0.so.0+0x00000008e1bf)
    #2 HandlerExists /home/froydnj/src/gecko-dev.git/uriloader/exthandler/unix/nsGNOMERegistry.cpp:22 (libxul.so+0x0000016c31a5)
    #3 OSProtocolHandlerExists /home/froydnj/src/gecko-dev.git/uriloader/exthandler/unix/nsOSHelperAppService.cpp:1154 (libxul.so+0x0000016abb89)
    #4 GetProtocolHandlerInfoFromOS /home/froydnj/src/gecko-dev.git/uriloader/exthandler/unix/nsOSHelperAppService.cpp:1516 (libxul.so+0x0000016ae6af)
    #5 non-virtual thunk to nsOSHelperAppService::GetProtocolHandlerInfoFromOS(nsACString_internal const&, bool*, nsIHandlerInfo**) /home/froydnj/src/gecko-dev.git/uriloader/exthandler/unix/nsOSHelperAppService.cpp:1537 (libxul.so+0x0000016ae832)
    #6 NS_InvokeByIndex /home/froydnj/src/gecko-dev.git/xpcom/reflect/xptcall/md/unix/xptcinvoke_x86_64_unix.cpp:174 (libxul.so+0x0000009de942)
    #7 Invoke /home/froydnj/src/gecko-dev.git/js/xpconnect/src/XPCWrappedNative.cpp:2110 (libxul.so+0x000001482d13)
    #8 XPC_WN_CallMethod /home/froydnj/src/gecko-dev.git/js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1144 (libxul.so+0x00000148564f)
    #9 CallJSNative /home/froydnj/src/gecko-dev.git/js/src/jscntxtinlines.h:226 (libxul.so+0x000004cc79e5)
    #10 Invoke /home/froydnj/src/gecko-dev.git/js/src/vm/Interpreter.cpp:554 (libxul.so+0x000004ca9077)
    #11 DoCallFallback /home/froydnj/src/gecko-dev.git/js/src/jit/BaselineIC.cpp:9571 (libxul.so+0x000004f09b97)
    #12 <null> <null>:0 (0x7ff84ef14510)
    #13 EnterBaselineAtBranch /home/froydnj/src/gecko-dev.git/js/src/jit/BaselineJIT.cpp:210 (libxul.so+0x000004f1ddf1)
    #14 Interpret /home/froydnj/src/gecko-dev.git/js/src/vm/Interpreter.cpp:1737 (libxul.so+0x000004cf08ff)
    #15 RunScript /home/froydnj/src/gecko-dev.git/js/src/vm/Interpreter.cpp:448 (libxul.so+0x000004cd80f8)
    #16 Invoke /home/froydnj/src/gecko-dev.git/js/src/vm/Interpreter.cpp:517 (libxul.so+0x000004cc7e3d)
    #17 Invoke /home/froydnj/src/gecko-dev.git/js/src/vm/Interpreter.cpp:554 (libxul.so+0x000004ca9077)
    #18 JS_CallFunctionValue /home/froydnj/src/gecko-dev.git/js/src/jsapi.cpp:4216 (libxul.so+0x0000051c3e0d)
    #19 CallMethod /home/froydnj/src/gecko-dev.git/js/xpconnect/src/XPCWrappedJSClass.cpp:1210 (libxul.so+0x00000147a60c)
    #20 CallMethod /home/froydnj/src/gecko-dev.git/js/xpconnect/src/XPCWrappedJS.cpp:532 (libxul.so+0x000001474587)
    #21 PrepareAndDispatch /home/froydnj/src/gecko-dev.git/xpcom/reflect/xptcall/md/unix/xptcstubs_x86_64_linux.cpp:122 (libxul.so+0x0000009df6a5)
    #22 SharedStub /home/froydnj/src/gecko-dev.git/xpcom/reflect/xptcall/md/unix/xptcstubs_x86_64_linux.cpp:0 (libxul.so+0x0000009de9ea)
    #23 GetServiceByContractID /home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:1561 (libxul.so+0x0000009b8821)
    #24 CallGetService /home/froydnj/src/gecko-dev.git/xpcom/glue/nsComponentManagerUtils.cpp:67 (libxul.so+0x0000009f19a1)
    #25 nsCOMPtr /opt/build/froydnj/build-tsan/uriloader/exthandler/../../dist/include/nsCOMPtr.h:514 (libxul.so+0x0000016bde9a)
    #26 GetTypeFromFile /home/froydnj/src/gecko-dev.git/uriloader/exthandler/nsExternalHelperAppService.cpp:2835 (libxul.so+0x0000016bf05b)
    #27 non-virtual thunk to nsExternalHelperAppService::GetTypeFromFile(nsIFile*, nsACString_internal&) /opt/build/froydnj/build-tsan/uriloader/exthandler/Unified_cpp_uriloader_exthandler0.cpp:2836 (libxul.so+0x0000016bf0b9)
    #28 MakeFileInputStream /home/froydnj/src/gecko-dev.git/netwerk/protocol/file/nsFileChannel.cpp:317 (libxul.so+0x000000becdab)
    #29 OpenContentStream /home/froydnj/src/gecko-dev.git/netwerk/protocol/file/nsFileChannel.cpp:390 (libxul.so+0x000000bed640)
    #30 BeginPumpingData /home/froydnj/src/gecko-dev.git/netwerk/base/nsBaseChannel.cpp:215 (libxul.so+0x000000a7c6d9)
    #31 AsyncOpen /home/froydnj/src/gecko-dev.git/netwerk/base/nsBaseChannel.cpp:630 (libxul.so+0x000000a7f85d)
    #32 non-virtual thunk to nsBaseChannel::AsyncOpen(nsIStreamListener*, nsISupports*) /opt/build/froydnj/build-tsan/netwerk/base/Unified_cpp_netwerk_base1.cpp:650 (libxul.so+0x000000a7f9d9)
    #33 NS_NewStreamLoaderInternal /opt/build/froydnj/build-tsan/dom/xul/../../dist/include/nsNetUtil.h:909 (libxul.so+0x000003315014)
    #34 operator class nsILoadGroup * /opt/build/froydnj/build-tsan/dom/xul/../../dist/include/nsNetUtil.h:931 (libxul.so+0x0000032e5e5e)
    #35 ResumeWalk /home/froydnj/src/gecko-dev.git/dom/xul/XULDocument.cpp:2905 (libxul.so+0x0000032d9c3e)
    #36 OnPrototypeLoadDone /home/froydnj/src/gecko-dev.git/dom/xul/XULDocument.cpp:590 (libxul.so+0x0000032d8fcd)
    #37 DidBuildModel /home/froydnj/src/gecko-dev.git/dom/xul/nsXULContentSink.cpp:240 (libxul.so+0x0000032ee7a5)
    #38 DidBuildModel /home/froydnj/src/gecko-dev.git/parser/htmlparser/nsParser.cpp:902 (libxul.so+0x0000016fd1f7)
    #39 OnStopRequest /home/froydnj/src/gecko-dev.git/parser/htmlparser/nsParser.cpp:1879 (libxul.so+0x0000016ffc3a)
    #40 non-virtual thunk to nsParser::OnStopRequest(nsIRequest*, nsISupports*, nsresult) /opt/build/froydnj/build-tsan/parser/htmlparser/Unified_cpp_parser_htmlparser0.cpp:1894 (libxul.so+0x0000016ffcd1)
    #41 OnStopRequest /home/froydnj/src/gecko-dev.git/uriloader/base/nsURILoader.cpp:323 (libxul.so+0x0000016a389d)
    #42 OnStopRequest /home/froydnj/src/gecko-dev.git/netwerk/base/nsBaseChannel.cpp:771 (libxul.so+0x000000a80696)
    #43 non-virtual thunk to nsBaseChannel::OnStopRequest(nsIRequest*, nsISupports*, nsresult) /opt/build/froydnj/build-tsan/netwerk/base/Unified_cpp_netwerk_base1.cpp:784 (libxul.so+0x000000a8078f)
    #44 OnStateStop /home/froydnj/src/gecko-dev.git/netwerk/base/nsInputStreamPump.cpp:721 (libxul.so+0x000000a9c7f2)
    #45 OnInputStreamReady /home/froydnj/src/gecko-dev.git/netwerk/base/nsInputStreamPump.cpp:440 (libxul.so+0x000000a9ba1a)
    #46 non-virtual thunk to nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*) /opt/build/froydnj/build-tsan/netwerk/base/Unified_cpp_netwerk_base1.cpp:502 (libxul.so+0x000000a9c9a9)
    #47 Run /home/froydnj/src/gecko-dev.git/xpcom/io/nsStreamUtils.cpp:90 (libxul.so+0x0000009b314c)
    #48 ProcessNextEvent /home/froydnj/src/gecko-dev.git/xpcom/threads/nsThread.cpp:855 (libxul.so+0x0000009cf229)
    #49 NS_ProcessNextEvent /home/froydnj/src/gecko-dev.git/xpcom/glue/nsThreadUtils.cpp:265 (libxul.so+0x0000009fc761)
    #50 Run /home/froydnj/src/gecko-dev.git/ipc/glue/MessagePump.cpp:99 (libxul.so+0x000000e7faec)
    #51 RunInternal /home/froydnj/src/gecko-dev.git/ipc/chromium/src/base/message_loop.cc:233 (libxul.so+0x000000e420bf)
    #52 Run /home/froydnj/src/gecko-dev.git/widget/nsBaseAppShell.cpp:164 (libxul.so+0x00000338cf28)
    #53 Run /home/froydnj/src/gecko-dev.git/toolkit/components/startup/nsAppStartup.cpp:281 (libxul.so+0x000004069999)
    #54 XRE_mainRun /home/froydnj/src/gecko-dev.git/toolkit/xre/nsAppRunner.cpp:4160 (libxul.so+0x0000040de9b7)
    #55 XRE_main /home/froydnj/src/gecko-dev.git/toolkit/xre/nsAppRunner.cpp:4236 (libxul.so+0x0000040dee48)
    #56 XRE_main /home/froydnj/src/gecko-dev.git/toolkit/xre/nsAppRunner.cpp:4456 (libxul.so+0x0000040df52c)
    #57 do_main /home/froydnj/src/gecko-dev.git/browser/app/nsBrowserApp.cpp:294 (exe+0x00000005b07c)
    #58 __libc_start_main /build/glibc-NmptCx/glibc-2.19/csu/libc-start.c:287 (libc.so.6+0x000000021b44)
Flags: needinfo?(nfroyd)
Both write locations are guarded by an auto lock. Not sure what's going wrong here TBH.
I don't know what's happening here. It looks ok to me. I wonder if because we're forking there's something more complicated happening here with the copy-on-write behavior.
Flags: needinfo?(bgirard)
(In reply to Benoit Girard (:BenWa) from comment #5)
> I don't know what's happening here. It looks ok to me. I wonder if because
> we're forking there's something more complicated happening here with the
> copy-on-write behavior.

I wondered about that too, but we'd have to fork from the initial parent, and then fork again from the newly spawned child, as paf_prepare should only be called in the parent.  I think I was running media tests, so maybe that's plausible?

I haven't verified, but it looks like the patch also introduces other weird races because of the newly adding locking.
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #3)
> I don't understand what's going on here, but I still see a race even with
> the patch applied, though with different stacks:
> 
> WARNING: ThreadSanitizer: data race (pid=32087)
>   Write of size 1 at 0x7ff866f87258 by thread T2:
>     #0 paf_prepare
> /home/froydnj/src/gecko-dev.git/tools/profiler/platform-linux.cc:134
> (libxul.so+0x000003e7fbe0)
>     #1 __libc_fork
> /build/glibc-NmptCx/glibc-2.19/posix/../nptl/sysdeps/unix/sysv/linux/x86_64/.
> ./fork.c:95 (libc.so.6+0x0000000b75c1)
>     #2 LaunchApp
> /home/froydnj/src/gecko-dev.git/ipc/chromium/src/base/process_util_linux.cc:
> 281 (libxul.so+0x000000e1d080)
[...]
>   Previous write of size 1 at 0x7ff866f87258 by thread T41:
>     #0 paf_prepare
> /home/froydnj/src/gecko-dev.git/tools/profiler/platform-linux.cc:134
> (libxul.so+0x000003e7fbe0)
>     #1 __libc_fork
> /build/glibc-NmptCx/glibc-2.19/posix/../nptl/sysdeps/unix/sysv/linux/x86_64/.
> ./fork.c:95 (libc.so.6+0x0000000b75c1)
>     #2 g_test_get_filename ??:0 (libglib-2.0.so.0+0x000000070926)
[...]
>   Thread T2 'Gecko_IOThread' (tid=32109, running) created by main thread at:
[...]
>   Thread T41 'gmain' (tid=32155, running) created by main thread at:
>     #0 pthread_create ??:0 (exe+0x00000002bdd2)
>     #1 g_mutex_init ??:0 (libglib-2.0.so.0+0x00000008e1bf)
>     #2 HandlerExists
> /home/froydnj/src/gecko-dev.git/uriloader/exthandler/unix/nsGNOMERegistry.
> cpp:22 (libxul.so+0x0000016c31a5)
>     #3 OSProtocolHandlerExists
> /home/froydnj/src/gecko-dev.git/uriloader/exthandler/unix/
> nsOSHelperAppService.cpp:1154 (libxul.so+0x0000016abb89)
>     #4 GetProtocolHandlerInfoFromOS
> /home/froydnj/src/gecko-dev.git/uriloader/exthandler/unix/
> nsOSHelperAppService.cpp:1516 (libxul.so+0x0000016ae6af)

So… if we're forking on two threads concurrently, paf_prepare can race on was_paused, is what I think it's saying.  Would it work to change paused_ to an atomic counter holding the number of threads (or other entities) currently pausing the profiler, instead of a boolean?
Ahh yes you're right, even with the lock. But that's a logic error, not a data race. Can TSan detect something that advanced? With my patch all the access to was_paused should be protected.

Presumably fixed long ago

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: