Closed Bug 559927 Opened 14 years ago Closed 12 years ago

nsAsyncStreamCopier::mCopierCtx not consistently protected by ::mLock

Categories

(Core :: Networking, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 763013
mozilla1.9.3a5

People

(Reporter: jseward, Assigned: jseward)

Details

Attachments

(1 file)

NOTE, this bug is currently unconfirmed.  See
https://bugzilla.mozilla.org/show_bug.cgi?id=551155#c12
for background and rationale.


HOWTO REPRO: start up Fx on Helgrind and go to http://www.kde.org


netwerk/base/src/nsAsyncStreamCopier.cpp contains 3 references to
nsAsyncStreamCopier::mCopierCtx.  Two of these are protected by
nsAsyncStreamCopier::mLock, but the one in
nsAsyncStreamCopier::AsyncCopy isn't.  This leads to the following
race:

One thread writes it, without holding ::mLock

netwerk/base/src/nsAsyncStreamCopier.cpp
257 NS_IMETHODIMP
258 nsAsyncStreamCopier::AsyncCopy(nsIRequestObserver *observer,
                                   nsISupports *ctx)
259 {
...
284     rv = NS_AsyncCopy(mSource, mSink, mTarget, mMode, mChunkSize,
285                      OnAsyncCopyComplete, this, mCloseSource, mCloseSink,
286                      getter_AddRefs(mCopierCtx));

which calls through to 

xpcom/io/nsStreamUtils.cpp
591 NS_COM nsresult
592 NS_AsyncCopy(nsIInputStream         *source,
593              nsIOutputStream        *sink,
594              nsIEventTarget         *target,
595              nsAsyncCopyMode         mode,
596              PRUint32                chunkSize,
597              nsAsyncCopyCallbackFun  callback,
598              void                   *closure,
599              PRBool                  closeSource,
600              PRBool                  closeSink,
601              nsISupports           **aCopierCtx)
602 {
...
621     if (aCopierCtx) {
622         *aCopierCtx = static_cast<nsISupports*>(
623                       static_cast<nsIRunnable*>(copier));

Another thread later reads it, but holding ::mLock

86 netwerk/base/src/nsAsyncStreamCopier.cpp
87 void
88 nsAsyncStreamCopier::Complete(nsresult status)
89 {
..
94         nsAutoLock lock(mLock);
95         mCopierCtx = nsnull;


Simple fix is to hold mLock around the call to NS_AsyncCopy.  That at
least shuts Helgrind up.
Raw race report:

Possible data race during read of size 8 at 0x1f4f0978 by thread #7
   at 0x583C1F6: nsCOMPtr_base::assign_assuming_AddRef(nsISupports*)
      (nsCOMPtr.h:451)
   by 0x583C02E: nsCOMPtr_base::assign_with_AddRef(nsISupports*)
      (nsCOMPtr.cpp:89)
   by 0x12D09E00: nsCOMPtr<nsISupports>::operator=(nsISupports*)
      (nsCOMPtr.h:975)
   by 0x12D09ACB: nsAsyncStreamCopier::Complete(unsigned int)
      (nsAsyncStreamCopier.cpp:95)
   by 0x12D09B60: nsAsyncStreamCopier::OnAsyncCopyComplete(void*,
                  unsigned int) (nsAsyncStreamCopier.cpp:119)
   by 0x58625EF: nsAStreamCopier::Run() (nsStreamUtils.cpp:401)
   by 0x587BFA3: nsThreadPool::Run() (nsThreadPool.cpp:219)
   by 0x5879C2C: nsThread::ProcessNextEvent(int, int*) (nsThread.cpp:527)
   by 0x5841B16: NS_ProcessNextEvent_P(nsIThread*, int) (nsThreadUtils.cpp:250)
   by 0x587A0A1: nsThread::ThreadFunc(void*) (nsThread.cpp:254)
   by 0x61018EA: _pt_root (ptthread.c:230)
   by 0x4C2A687: mythread_wrapper (hg_intercepts.c:221)
   by 0x4E34A03: start_thread (pthread_create.c:300)

 This conflicts with a previous write of size 8 by thread #1
   at 0x5862122: NS_AsyncCopy(nsIInputStream*, nsIOutputStream*,
                 nsIEventTarget*, nsAsyncCopyMode, unsigned int, void
                 (*)(void*, unsigned int), void*, int, int, nsISupports**)
                 (nsStreamUtils.cpp:623)
   by 0x12D098AB: nsAsyncStreamCopier::AsyncCopy(nsIRequestObserver*,
                  nsISupports*) (nsAsyncStreamCopier.cpp:287)
   by 0x5888A14: NS_InvokeByIndex_P (xptcinvoke_x86_64_unix.cpp:208)
   by 0x140BBE8A: XPCWrappedNative::CallMethod(XPCCallContext&,
                  XPCWrappedNative::CallMode) (xpcwrappednative.cpp:2736)
   by 0x140C1AAE: XPC_WN_CallMethod(JSContext*, JSObject*, unsigned int,
                  long*, long*) (xpcwrappednativejsops.cpp:1769)
   by 0x52FA8F1: js_Invoke (jsinterp.cpp:835)
   by 0x52F2DD7: js_Interpret (jsops.cpp:2270)
   by 0x52FA902: js_Invoke (jsinterp.cpp:843)

 Address 0x1f4f0978 is 56 bytes inside a block of size 88 alloc'd
   at 0x4C25B08: malloc (vg_replace_malloc.c:236)
   by 0x5AD1F75: moz_xmalloc (mozalloc.cpp:75)
   by 0x12D08B7D: nsAsyncStreamCopierConstructor(nsISupports*, nsID const&,
                  void**) (mozalloc.h:222)
   by 0x58428A0: nsGenericFactory::CreateInstance(nsISupports*, nsID const&,
                 void**) (nsGenericFactory.cpp:80)
   by 0x5875AF9: nsComponentManagerImpl::CreateInstance(nsID const&,
                 nsISupports*, nsID const&, void**)
                 (nsComponentManager.cpp:1597)
   by 0x140A8700: nsJSCID::CreateInstance(nsISupports**) (xpcjsid.cpp:821)
   by 0x5888A14: NS_InvokeByIndex_P (xptcinvoke_x86_64_unix.cpp:208)
   by 0x140BBE8A: XPCWrappedNative::CallMethod(XPCCallContext&,
                  XPCWrappedNative::CallMode) (xpcwrappednative.cpp:2736)
   by 0x140C1AAE: XPC_WN_CallMethod(JSContext*, JSObject*, unsigned int,
                  long*, long*) (xpcwrappednativejsops.cpp:1769)
   by 0x52FA8F1: js_Invoke (jsinterp.cpp:835)
   by 0x52F2DD7: js_Interpret (jsops.cpp:2270)
   by 0x52FA902: js_Invoke (jsinterp.cpp:843)
Attached patch trivial fixSplinter Review
This does the obvious thing and locks around the call to 
NS_AsyncCopy.  I don't know if this is right, or if the 
scope of the locking should be extended to encompass the
surrounding NS_ADDREF_THIS .. NS_RELEASE_THIS.
Attachment #439621 - Flags: review?(cbiesinger)
The NS_AsyncCopy call happens from AsyncCopy, which is expected to be called only once on any given async copier.  As long as that expectation is met, are we ok?  Or can the worker thread end up running under the Start() call NS_AsyncCopy makes?  If it can, then we can get Complete() before NS_AsyncCopy returns, which would be bad for reasons other than mCopierCtx....
(In reply to comment #3)
> Or can the worker thread end up running under the Start() call
> NS_AsyncCopy makes?  If it can, then we can get Complete() before NS_AsyncCopy
> returns, which would be bad for reasons other than mCopierCtx....

There's nothing stopping the thread from running, right? So that seems entirely possible to me. However, why would that be bad?
Do we deal correctly with a sync Complete() under NS_AsyncCopy?
With the patch here I think we do.
Attachment #439621 - Flags: review?(cbiesinger) → review+
Keywords: checkin-needed
Assignee: nobody → jseward
http://hg.mozilla.org/mozilla-central/rev/cceacc8b7ad5
Status: UNCONFIRMED → RESOLVED
Closed: 14 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a5
backed out due to a Windows necko test failure
http://hg.mozilla.org/mozilla-central/rev/c3c26239975f

http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1272532738.1272535279.29271.gz

TEST-UNEXPECTED-FAIL | e:\builds\moz2_slave\mozilla-central-win32-debug-unittest-xpcshell\build\xpcshell\tests\test_necko\unit\test_bug470716.js | test failed (with xpcshell return code: -1073741819), see following log:

TEST-PASS | e:/builds/moz2_slave/mozilla-central-win32-debug-unittest-xpcshell/build/xpcshell/tests/test_necko/unit/test_bug470716.js | [anonymous : 42] test12 == test12
TEST-PASS | e:/builds/moz2_slave/mozilla-central-win32-debug-unittest-xpcshell/build/xpcshell/tests/test_necko/unit/test_bug470716.js | [anonymous : 50] true == true
TEST-PASS | e:/builds/moz2_slave/mozilla-central-win32-debug-unittest-xpcshell/build/xpcshell/tests/test_necko/unit/test_bug470716.js | [anonymous : 59] true == true
TEST-INFO | (xpcshell/head.js) | test 1 finished
TEST-INFO | (xpcshell/head.js) | exiting test
TEST-PASS | (xpcshell/head.js) | 60 (+ 0) check(s) passed

*** Potential deadlock between Lock@44d7fb0 and Lock@44f2f80

Current stack:
DumpJSValue[xul +0x5E045]
NS_GetMemoryManager_P[xul +0x104CD2F]
NS_DestroyXPTCallStub_P[xul +0x10612A8]
NS_FormatCodeAddressDetails[xul +0x106E386]
NS_FormatCodeAddressDetails[xul +0x106EAEC]
DumpJSValue[xul +0x15A6CD]
NS_LogCOMPtrRelease_P[xul +0x102F95E]
NS_LogCOMPtrRelease_P[xul +0x102F8E1]
NS_LogCOMPtrRelease_P[xul +0x102FAFE]
NS_LogCOMPtrRelease_P[xul +0x103195B]
NS_LogCOMPtrRelease_P[xul +0x10318A5]
NS_LogCOMPtrRelease_P[xul +0x1032D8B]
NS_LogCOMPtrRelease_P[xul +0x1032DB6]
NS_LogCOMPtrRelease_P[xul +0x1032D25]
DumpJSValue[xul +0x8344B]
DumpJSValue[xul +0x80E26]
JS_AliasElement[mozjs +0x82DE0]
JS_AliasElement[mozjs +0x81927]
JS_AliasElement[mozjs +0x2195F]
xpcshell+4383[xpcshell +0x4383]
xpcshell+11B56[xpcshell +0x11B56]
xpcshell+119AD[xpcshell +0x119AD]
ProcessIdToSessionId[kernel32 +0x2F23B]

Previous stack:
DumpJSValue[xul +0x5E045]
NS_LogCOMPtrRelease_P[xul +0x102F8D1]
NS_LogCOMPtrRelease_P[xul +0x102F8A3]
NS_LogCOMPtrRelease_P[xul +0x102F766]
DumpJSValue[xul +0x160292]
NS_InvokeByIndex_P[xul +0x1026087]
DumpJSValue[xul +0x7D2E8]
DumpJSValue[xul +0x7B4E5]
DumpJSValue[xul +0x7B237]
DumpJSValue[xul +0x907C0]
main+0[(null) +0x0]
JS_AliasElement[mozjs +0x17437C]
JS_AliasElement[mozjs +0x173D77]
JS_AliasElement[mozjs +0x176A39]
JS_AliasElement[mozjs +0x8DF52]
JS_AliasElement[mozjs +0x8A534]
JS_AliasElement[mozjs +0x2AEA9]
JS_AliasElement[mozjs +0x2ACFC]
xpcshell+546F[xpcshell +0x546F]
xpcshell+434F[xpcshell +0x434F]
xpcshell+11B56[xpcshell +0x11B56]
xpcshell+119AD[xpcshell +0x119AD]
ProcessIdToSessionId[kernel32 +0x2F23B]

###!!! ASSERTION: Potential deadlock between Lock@44d7fb0 and Lock@44f2f80: 'Error', file e:/builds/moz2_slave/mozilla-central-win32-debug/build/obj-firefox/xpcom/build/nsAutoLock.cpp, line 318
Status: RESOLVED → UNCONFIRMED
Resolution: FIXED → ---
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Status: ASSIGNED → RESOLVED
Closed: 14 years ago12 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: