Open Bug 1210747 Opened 9 years ago Updated 2 years ago

Assert_NoQueryNeeded may break thread assertion on CC objects

Categories

(Core :: XPCOM, defect)

defect

Tracking

()

Tracking Status
firefox44 --- affected

People

(Reporter: xidorn, Unassigned)

References

Details

The function Assert_NoQueryNeeded() queries the interface of the target object, and assigns it to an nsCOMPtr. This calls AddRef() on the target object.

On the other hand, for cyclic collection classes, there is an assertion that, AddRef() can only be called in its owning thread.

However, it seems to me that passing already_AddRefed<> from and to an nsCOMPtr is supposed to be thread-safe. Given the facts above, this assumption is totally untrue.

I believe this setting causes various intermittent failure, and caused a perma-fail for my push today...
See Also: → 1210749
Assert_NoQueryNeeded is used by nsCOMPtr, correct? And if you're using a nsCOMPtr, you should expect refcounting on that thread. I don't see how this in particular is a valid concern.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #1)
> Assert_NoQueryNeeded is used by nsCOMPtr, correct?

Yes.

> And if you're using a
> nsCOMPtr, you should expect refcounting on that thread.

Yes?

> I don't see how this
> in particular is a valid concern.

But that is broken by at least nsConsoleService (which causes several intermittent failure and my perma-fail). If you search all uses of NS_ReleaseOnMainThread and NS_ProxyRelease, you may find more places where nsCOMPtr is used in the thread which does not own the object.
For reference, the perma-fail mentioned above (with nsConsoleService) is:
https://treeherder.mozilla.org/logviewer.html#?job_id=15086086&repo=mozilla-inbound

("Hit MOZ_CRASH(nsScriptErrorWithStack not thread-safe)", in nsConsoleService::LogMessageWithMode), when a MessageElement gets destructed.  MessageElement holds a nsCOMPtr to a nsIConsoleMessage, which was presumably created on a different thread.
http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsConsoleService.h?rev=0b69076d3bad#82  )
I think we wouldn't need to do this check when going from nsCOMPtr<T> to nsCOMPtr<T>.  Eg if the pointer was okay before, it should still be okay. That seems to be the case for at least console service. Some kind of template programming should be able to elide the check in that case.
I guess this is trickier than that, because it goes nsCOMPtr<T> -> already_AddRefed<T> -> nsCOMPtr<T>, and already_AddRefed does not guarantee canonicity. The ctor nsCOMPtr(const nsCOMPtr<T>& aSmartPtr) already skips the NO_QUERY_NEEDED check, so I think all that is needed to fix this particular issue is to change MessageElement::forget() to take an nsCOMPtr<T>&, and assign directly without the intermediate already_AddRefed.
Depends on: 1207368
I'll fix this particular case in bug 1207368.
Blocks: 1211040
No longer blocks: 1069192
No longer blocks: 1211040
I think the general fix to this would be to add some kind of already_AddRefedCanonical<T> which maintains the same nsCOMPtr invariant that the pointer must be canonical with respect to T, then convert nsCOMPtr::forget() and the like to use it. That's probably not worth the effort. While there are plenty of mainthread nsRefPtrs being dealt with off the main thread, I don't think there are a ton of nsCOMPtrs. This problem can be hacked around for those places by using swap() between nsCOMPtrs instead of forget(), as I did in bug 1207368.
I hit something this morning that looks very much like this bug. Here's the crashing worker thread's call stack:

>	xul.dll!nsScriptErrorWithStack::AddRef() Line 33	C++
 	xul.dll!nsScriptErrorWithStack::QueryInterface(const nsID & aIID, void * * aInstancePtr) Line 40	C++
 	xul.dll!nsQueryInterface::operator()(const nsID & aIID, void * * aAnswer) Line 14	C++
 	xul.dll!nsCOMPtr<nsIConsoleMessage>::assign_from_qi(const nsQueryInterface aQI, const nsID & aIID) Line 1045	C++
 	xul.dll!nsCOMPtr<nsIConsoleMessage>::nsCOMPtr<nsIConsoleMessage>(const nsQueryInterface aQI) Line 482	C++
 	xul.dll!nsCOMPtr<nsIConsoleMessage>::Assert_NoQueryNeeded() Line 400	C++
 	xul.dll!nsCOMPtr<nsIConsoleMessage>::operator=<nsIConsoleMessage>(already_AddRefed<nsIConsoleMessage> && aRhs) Line 575	C++
 	xul.dll!nsConsoleService::LogMessageWithMode(nsIConsoleMessage * aMessage, nsConsoleService::OutputMode aOutputMode) Line 303	C++
 	xul.dll!nsConsoleService::LogMessage(nsIConsoleMessage * aMessage) Line 204	C++
 	xul.dll!nsContentUtils::LogSimpleConsoleError(const nsAString_internal & aErrorText, const char * classification) Line 3389	C++
 	xul.dll!HandshakeCallback(PRFileDesc * fd, void * client_data) Line 1259	C++
 	nss3.dll!ssl_FinishHandshake(sslSocketStr * ss) Line 143	C
 	nss3.dll!ssl3_FinishHandshake(sslSocketStr * ss) Line 10745	C
 	nss3.dll!ssl3_HandleFinished(sslSocketStr * ss, unsigned char * b, unsigned int length, const SSL3Hashes * hashes) Line 10698	C
 	nss3.dll!ssl3_HandleHandshakeMessage(sslSocketStr * ss, unsigned char * b, unsigned int length) Line 10946	C
 	nss3.dll!ssl3_HandleHandshake(sslSocketStr * ss, sslBufferStr * origBuf) Line 11020	C
 	nss3.dll!ssl3_HandleRecord(sslSocketStr * ss, SSL3Ciphertext * cText, sslBufferStr * databuf) Line 11689	C
 	nss3.dll!ssl3_GatherCompleteHandshake(sslSocketStr * ss, int flags) Line 378	C
 	nss3.dll!ssl_GatherRecord1stHandshake(sslSocketStr * ss) Line 1227	C
 	nss3.dll!ssl_Do1stHandshake(sslSocketStr * ss) Line 109	C
 	nss3.dll!ssl_SecureRecv(sslSocketStr * ss, unsigned char * buf, int len, int flags) Line 1211	C
 	nss3.dll!ssl_Recv(PRFileDesc * fd, void * buf, int len, int flags, unsigned int timeout) Line 2102	C
 	xul.dll!PSMRecv(PRFileDesc * fd, void * buf, int amount, int flags, unsigned int timeout) Line 1419	C++
 	nss3.dll!PR_Recv(PRFileDesc * fd, void * buf, int amount, int flags, unsigned int timeout) Line 189	C
 	xul.dll!nsSocketTransport::IsAlive(bool * result) Line 2258	C++
 	xul.dll!mozilla::net::nsHttpConnection::IsAlive() Line 758	C++
 	xul.dll!mozilla::net::nsHttpConnection::CanReuse() Line 695	C++
 	xul.dll!mozilla::net::nsHttpConnection::OnInputStreamReady(nsIAsyncInputStream * in) Line 2023	C++
 	xul.dll!nsSocketInputStream::OnSocketReady(nsresult condition) Line 288	C++
 	xul.dll!nsSocketTransport::OnSocketReady(PRFileDesc * fd, short outFlags) Line 1885	C++
 	xul.dll!nsSocketTransportService::DoPollIteration(bool wait, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> * pollDuration) Line 1087	C++
 	xul.dll!nsSocketTransportService::Run() Line 864	C++
 	xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 867	C++
 	xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) Line 277	C++
	xul.dll!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate * aDelegate) Line 326	C++
 	xul.dll!MessageLoop::RunInternal() Line 235	C++
 	xul.dll!MessageLoop::RunHandler() Line 228	C++
 	xul.dll!MessageLoop::Run() Line 202	C++
 	xul.dll!nsThread::ThreadFunc(void * aArg) Line 362	C++
 	nss3.dll!_PR_NativeRunThread(void * arg) Line 406	C
 	nss3.dll!pr_root(void * arg) Line 91	C
 	msvcr120.dll!_callthreadstartex() Line 376	C
 	msvcr120.dll!_threadstartex(void * ptd) Line 354	C
 	kernel32.dll!0000000077065a4d()	Unknown
 	ntdll.dll!000000007729b831()	Unknown

If you plow through the thread ownership info (very ugly since VS 2015 doesn't support using the segment registers -- in this case gs -- in any inspector windows!) the nsScriptErrorWithStack object turns out to be owned by the main thread.
What version of Firefox are you running? That looks like it is crashing on the same line I fixed in bug 1207368, so it shouldn't be doing an AddRef there any more.
It's 42.0 x64 debug, so looks like no problem.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.