Closed Bug 939790 Opened 11 years ago Closed 9 years ago

TSan: data race xpcom/components/nsComponentManager.h:89 Lock

Categories

(Core :: XPCOM, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla43
Tracking Status
firefox43 --- fixed

People

(Reporter: decoder, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [tsan])

Attachments

(2 files)

The attached logfile shows a thread/data race (mozilla-central revision beddd6d4bcdf) detected by TSan (ThreadSanitizer).

Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause inacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1].

If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist.

[1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
Hg blame points to bsmedberg, using that as a first start for needinfo.
Component: General → XPCOM
Flags: needinfo?(benjamin)
The log is incomplete. Basically what we need to do is add runtime asserts to the lock methods that check before locking to make sure that the current thread doesn't own the lock.

This seems like a false-positive, as long as pointer-size reads and writes are atomic. The lock barriers on the current thread should prevent the compiler from reordering reads or writes that are relevant to the current thread.
Flags: needinfo?(benjamin)
Assignee: general → nobody
I saw this today while doing a tsan run.

  Read of size 8 at 0x7d4c0000dba0 by thread T2:
    #0 AssertNotCurrentThreadOwns /home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.h:113 (libxul.so+0x0000009a4a62)
    #1 Lock /home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.h:91 (libxul.so+0x0000009a1c77)
    #2 CallCreateInstance /home/froydnj/src/gecko-dev.git/xpcom/glue/nsComponentManagerUtils.cpp:135 (libxul.so+0x0000009d8912)
    #3 assign_from_helper /home/froydnj/src/gecko-dev.git/xpcom/glue/nsCOMPtr.cpp:125 (libxul.so+0x0000009d69f3)
    #4 do_CreateInstance /opt/build/froydnj/build-tsan/ipc/glue/../../dist/include/nsCOMPtr.h:621 (libxul.so+0x000000e5e401)
    #5 RunInternal /home/froydnj/src/gecko-dev.git/ipc/chromium/src/base/message_loop.cc:233 (libxul.so+0x000000e2122f)
    #6 ThreadFunc /home/froydnj/src/gecko-dev.git/xpcom/threads/nsThread.cpp:356 (libxul.so+0x0000009b2959)
    #7 _pt_root /home/froydnj/src/gecko-dev.git/nsprpub/pr/src/pthreads/ptthread.c:212 (libnspr4.so+0x00000003e270)

  Previous write of size 8 at 0x7d4c0000dba0 by main thread (mutexes: write M158):
    #0 Lock /home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.h:94 (libxul.so+0x00000099d3b4)
    #1 CallGetService /home/froydnj/src/gecko-dev.git/xpcom/glue/nsComponentManagerUtils.cpp:67 (libxul.so+0x0000009d6751)
    #2 nsCOMPtr /opt/build/froydnj/build-tsan/netwerk/base/src/../../../dist/include/nsCOMPtr.h:514 (libxul.so+0x000000aaeeb6)
    #3 nsSocketTransportServiceConstructor /home/froydnj/src/gecko-dev.git/netwerk/build/nsNetModule.cpp:72 (libxul.so+0x000000db1af1)
    #4 CreateInstance /home/froydnj/src/gecko-dev.git/xpcom/glue/GenericFactory.cpp:17 (libxul.so+0x0000009d3ec2)
    #5 CreateInstanceByContractID /home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:1199 (libxul.so+0x0000009a1f11)
    #6 GetServiceByContractID /home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:1561 (libxul.so+0x00000099d891)
    #7 CallGetService /home/froydnj/src/gecko-dev.git/xpcom/glue/nsComponentManagerUtils.cpp:67 (libxul.so+0x0000009d6887)
    #8 operator= /opt/build/froydnj/build-tsan/netwerk/base/src/../../../dist/include/nsCOMPtr.h:613 (libxul.so+0x000000a78b37)
    #9 InitializeNetworkLinkService /home/froydnj/src/gecko-dev.git/netwerk/base/src/nsIOService.cpp:291 (libxul.so+0x000000a781da)
    #10 Init /home/froydnj/src/gecko-dev.git/netwerk/base/src/nsIOService.cpp:227 (libxul.so+0x000000a77ce1)
    #11 GetInstance /home/froydnj/src/gecko-dev.git/netwerk/base/src/nsIOService.cpp:304 (libxul.so+0x000000a78fda)
    #12 nsIOServiceConstructor /home/froydnj/src/gecko-dev.git/netwerk/build/nsNetModule.cpp:57 (libxul.so+0x000000db1907)
    #13 CreateInstance /home/froydnj/src/gecko-dev.git/xpcom/glue/GenericFactory.cpp:17 (libxul.so+0x0000009d3ec2)
    #14 CreateInstanceByContractID /home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:1199 (libxul.so+0x0000009a1f11)
    #15 GetServiceByContractID /home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:1561 (libxul.so+0x00000099d891)
    #16 CallGetService /home/froydnj/src/gecko-dev.git/xpcom/glue/nsComponentManagerUtils.cpp:67 (libxul.so+0x0000009d6751)
    #17 nsCOMPtr /home/froydnj/src/gecko-dev.git/xpcom/build/../glue/nsCOMPtr.h:514 (libxul.so+0x0000009ebe03)
    #18 do_GetIOService /opt/build/froydnj/build-tsan/chrome/../dist/include/nsNetUtil.h:97 (libxul.so+0x0000009cc754)
    #19 ResolveURI /home/froydnj/src/gecko-dev.git/chrome/nsChromeRegistryChrome.cpp:738 (libxul.so+0x0000009cc846)
    #20 ManifestLocale /home/froydnj/src/gecko-dev.git/chrome/nsChromeRegistryChrome.cpp:825 (libxul.so+0x0000009cd162)
    #21 ParseManifest /home/froydnj/src/gecko-dev.git/xpcom/components/ManifestParser.cpp:767 (libxul.so+0x0000009a78b5)
    #22 DoRegisterManifest /home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:613 (libxul.so+0x00000099f55a)
    #23 ManifestManifest /home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:635 (libxul.so+0x00000099f683)
    #24 ParseManifest /home/froydnj/src/gecko-dev.git/xpcom/components/ManifestParser.cpp:776 (libxul.so+0x0000009a769e)
    #25 DoRegisterManifest /home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:613 (libxul.so+0x00000099f55a)
    #26 RereadChromeManifests /home/froydnj/src/gecko-dev.git/xpcom/components/nsComponentManager.cpp:821 (libxul.so+0x00000099e6a1)
    #27 NS_InitXPCOM2 /home/froydnj/src/gecko-dev.git/xpcom/build/XPCOMInit.cpp:692 (libxul.so+0x0000009edd4e)
    #28 XRE_XPCShellMain /home/froydnj/src/gecko-dev.git/js/xpconnect/src/XPCShellImpl.cpp:1372 (libxul.so+0x000001434011)
    #29 main /home/froydnj/src/gecko-dev.git/js/xpconnect/shell/xpcshell.cpp:43 (exe+0x00000005a28e)

  Location is heap block of size 408 at 0x7d4c0000dac0 allocated by main thread:
    #0 malloc ??:0 (exe+0x000000028136)
    #1 moz_xmalloc /home/froydnj/src/gecko-dev.git/memory/mozalloc/mozalloc.cpp:52 (libmozalloc.so+0x000000001a28)
    #2 operator new /opt/build/froydnj/build-tsan/xpcom/build/../../dist/include/mozilla/mozalloc.h:209 (libxul.so+0x0000009edbfc)
    #3 XRE_XPCShellMain /home/froydnj/src/gecko-dev.git/js/xpconnect/src/XPCShellImpl.cpp:1372 (libxul.so+0x000001434011)
    #4 main /home/froydnj/src/gecko-dev.git/js/xpconnect/shell/xpcshell.cpp:43 (exe+0x00000005a28e)

  Mutex M158 created at:
    #0 pthread_mutex_init ??:0 (exe+0x00000002bc55)
    #1 PR_NewLock /home/froydnj/src/gecko-dev.git/nsprpub/pr/src/pthreads/ptsynch.c:145 (libnspr4.so+0x000000032829)
    #2 OffTheBooksMutex /opt/build/froydnj/build-tsan/xpcom/components/../../dist/include/mozilla/Mutex.h:49 (libxul.so+0x00000099deaf)
    #3 NS_InitXPCOM2 /home/froydnj/src/gecko-dev.git/xpcom/build/XPCOMInit.cpp:643 (libxul.so+0x0000009edc07)
    #4 XRE_XPCShellMain /home/froydnj/src/gecko-dev.git/js/xpconnect/src/XPCShellImpl.cpp:1372 (libxul.so+0x000001434011)
    #5 main /home/froydnj/src/gecko-dev.git/js/xpconnect/shell/xpcshell.cpp:43 (exe+0x00000005a28e)

  Thread T2 (tid=10794, running) created by main thread at:
    #0 pthread_create ??:0 (exe+0x00000002b6a2)
    #1 _PR_CreateThread /home/froydnj/src/gecko-dev.git/nsprpub/pr/src/pthreads/ptthread.c:453 (libnspr4.so+0x00000003c054)
    #2 PR_CreateThread /home/froydnj/src/gecko-dev.git/nsprpub/pr/src/pthreads/ptthread.c:544 (libnspr4.so+0x00000003bd7d)
    #3 Init /home/froydnj/src/gecko-dev.git/xpcom/threads/nsThread.cpp:467 (libxul.so+0x0000009b333a)

Our runtime thread (non-)ownership checking is triggering races. >_<

Possible options:

- Make the mOwnerThread variable Atomic<>;
- MOZ_TSAN_BLACKLIST the AssertNotCurrentThreadOwns method
Maybe make it Atomic only in DEBUG builds? Or blacklist...
I certainly do not recommend to blacklist. That last option should be avoided if possible. Making it Atomic in debug builds sounds good to me, if there is a good reason to not make it Atomic in general.
We could probably get away with it being Atomic<T*, Relaxed>, that would at least tell TSan that we know what we're doing with concurrent modifications.
Is it worth giving TSan special knowledge of AssertNotCurrentThreadOwns, since its purpose is to be a threadsafety assertion and that it also TSan's purpose?
This is similar to the solution adopted for bug 1190985, a race in
netwerk's DebugMutexAutoLock.  A relaxed atomic tells tools like TSan
that we're OK with this variable being touched from multiple threads.
That it's only set from within a locked mutex should ensure whatever
memory barriers we need are executed so all threads have a consistent
view of what value it contains.

Getting rid of another |volatile| usage in the codebase is just a bonus.

This is a false positive as suggested in comment 1, I think, but annotating
things this way ensures that TSan understands what we're doing here.  I think
this is preferable to trying to mess with MOZ_TSAN_BLACKLIST (which has its own
set of issues) or runtime blacklisting (which is expensive).
Attachment #8659998 - Flags: review?(benjamin)
Severity: critical → normal
Attachment #8659998 - Flags: review?(benjamin) → review+
https://hg.mozilla.org/mozilla-central/rev/ff494eb6487b
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: