This bug was filed from the Socorro interface and is
report bp-5f2749ad-852f-4233-ae16-dfc0c2120529 .
It was a low volume crash but there's a spike in crashes from 14.0a2/20120519 and 15.0a1/20120522. The m-c regression range is:
It's likely a regression from bug 756551.
More reports at:
Even though its probably the same issue as 756551 (the crash rate is pretty much identical), at least it provides new avenues to explore.
These are my notes on the topic. I had to write them down before quitting for the day, and the bug seems like as good a place as any - rampant speculation follows.
clearly mTransaction::mConnection is null in a transaction that is attached to a spdy stream. The stack traces are clear on that.
A stream only gets ctor'd in one place, and the transaction associated with the stream never changes, and the connection for that transaction is always set to SpdySession::this on the line right before the stream ctor. That nsHttpTransaction::mConnection can be changed by nsHttpTransaction::SetConnection() at any point, however.
So it seems that we have either seen mTransaction::SetConnection(nsnull) after the stream was created or perhaps the transaction has been deleted. Both are possible but the transaction object has accumulated quite a stack by the time of the crash which would seem unlikely if it had been deleted.
There are two places that do explicity SetConnection(nsnull).. one is in connectionmgr::dispatchAbstractTransaction on the Activate() failure of a non-spdy connection. That's probably not relevant.
The other place is in the ProcessNewTransaction() stage of a STICKY transaction. In that case the trans::mConnection is set to null, and if the connection is spdy, nshttpconnection::Activate() is called which calls nshttpconnection::addtransaction() which calls spdysession::addstream() with the new transaction.
this is where we find some of the code from 756551.. session::addstream() double checks to see if the new transaction is already in the session transaction hash and addstreaam now fails if that is true.
Without that check in place its easy to see how the hash gets out of sync (two streams with different ids share the same nshttptransaction object) and how that would lead to those corrupt stream crashes we had been previously seeing.
What happens now is the new stream doesn't get created, but the old one must still exist (we've just seen its in the hash afterall) and when it does get some data and goes to deref nshttptransaction::mconnection the value of mConnection has been reset to null back in ProcessNewTransaction() and thus the crash.
iirc STICKY only happens in two places - websockets (almost certainly not happening here) and auth retry. I will walk through the auth retry cases and see if it leads to this. At first glance, it makes sense that it does.
Not sure if this helps, but IIRC this crash happened for me while I wasn't using the browser; it was idle. Most activity would probably have come from a GMail or GCalendar tab, or an auto-refreshing tab from work that uses HTTP basic authentication.
(In reply to Patrick McManus [:mcmanus] from comment #2)
> So it seems that we have either seen mTransaction::SetConnection(nsnull)
> after the stream was created or perhaps the transaction has been deleted.
> Both are possible but the transaction object has accumulated quite a stack
> by the time of the crash which would seem unlikely if it had been deleted.
There are a couple other ways that nsHttpTransaction::mConnection can become NULL.. 1] via its own close() method - but then we wouldn't see the crash stacks processing content if that were true, and also 2] by its own restart() method - which is very suspicous looking!
when a transaction does a restart() it clears mConnection and resubmits itself to the ConnectionManager without deleting the transaction object or calling CloseTransaction() or anything like that. That will result in the same transaction being submitted to the transaction hash twice and is consistent with this stack crash and the corrupted streams we saw before the integrity checks went in. The more I think about this, the more it seems to be clearly the real culprit.
a] drops its reference to mConnection (spdysession)
b] resubmits itself to the connection manager through InitiateTransaction() which will try to create a new stream with the same transaction. The VerifyStream() code prevents that from happening.
at that moment the stream which contained the transaction that was restarted is not necessarily dead yet (though it will presumably be soon), and the transaction pointer has the potential to be in the session hash twice if verifystream didn't prevent it. It's my theory that this matches up to the UpdateTransportEvents() crashes we were seeing.
If verifystream() does prevent this addition there is still the problem that the transaction has no mConnection field but there is an active spdy stream that might try and dispatch onto it if a correlated spdy frame arrives under the old id... dropping the connection reference isn't going to shutdown a muxxed connection. It's my theory that this matches up to the crashes that are part of this bug.
Even if the transaction ended up in 2 hashes with distinct namespaces (i.e. different sessions, as would happen based on a ping timeout) bad things would happen as 2 streams would still share the same transaction. That doesn't quite match the first crash data though. I'm not positive the path for restart that is getting called.
Created attachment 629219 [details] [diff] [review]
The initiateTransaction call in nsHttpTransaction is indeed pretty sketchy because it creates the same transaction ptr on two streams at the same time.
OTOH that initiate gets posted to the event queue, so the rest of the code on the stack should run before it actually gets added. And that code should include disabling the stream that has been restarted. That's been the case in every instance of this I've been able to walk through.
So I'm working on the assumption that the cleanup doesn't happen immediately at some point in the spdy code.
My #1 suspect here is Session::Close(), wwherein the transactionhash is enumerated and the stream->Close() is done on each one (which could cause the restart), but the stream is not removed from the transaction hash, the id hash, or indeed from the various queues (readyforwrite, urgentforwrite, queuedstreams). The key point is that Session::Close completes having called stream->Close() on every stream but leaves those streams in all of the various indicies.
If we've done Close() we are headed towards dtor of the session eventually, but its not necessarily immediate.
The same thing is done in the dtor, but in the dtor all those things are about to disappear anyhow and that's not necessarily true in Close(). indeed the dtor will re-close all those streams which is a bad thing if the transaction associated with the stream is running on another connection now :)
Comment on attachment 629219 [details] [diff] [review]
Review of attachment 629219 [details] [diff] [review]:
I cannot confirm this fixes the problem, however this seems to be a good code.
Some more thoughts:
- what about disabling AddStream when session is mClosed or even !CanReuse()?
- what about adding !mClosed to RoomForMoreStream or, add && mSpdySession->CanReuse() to nsHttpConnection::CanDirectlyActivate()?
That should definitely prevent adding streams to an already dead session.
on inbound, not m-c. shouldn't have marked fixed.
this patch is in the 0605 builds of both FF16 (nightly) and FF15 (auorora).. To this early point there are no crashes with this signature for those builds.
FF14 remains unpatched at this time.. builds earlier than 0605 don't have the patch either.