Last Comment Bug 756551 - spdy stream index integrity checks
: spdy stream index integrity checks
Status: RESOLVED FIXED
[spdy]
:
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: 15 Branch
: x86_64 Linux
: -- enhancement (vote)
: mozilla15
Assigned To: Patrick McManus [:mcmanus]
:
:
Mentors:
Depends on: 759438
Blocks: 753663
  Show dependency treegraph
 
Reported: 2012-05-18 12:25 PDT by Patrick McManus [:mcmanus]
Modified: 2012-05-29 14:39 PDT (History)
2 users (show)
ryanvm: in‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch 0 (11.82 KB, patch)
2012-05-18 12:33 PDT, Patrick McManus [:mcmanus]
honzab.moz: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
part 2 - rev 0 (1.73 KB, patch)
2012-05-18 15:01 PDT, Patrick McManus [:mcmanus]
honzab.moz: review+
lukasblakk+bugs: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Patrick McManus [:mcmanus] 2012-05-18 12:25:38 PDT
Due to what looks like an invalid stream issue in bug 753663, sanity check the session stream hash indicies whenever

* we add something to the index
* we remove something from the index
* we lookup a stream from the index (which happens shortly before the crash of 753663).. and handle the failure at runtime

a verification consists of checking the stream for the "I am deleted sentinel", confirming that a stream looked up by its id points back to itself, and confirming that a stream looked up by its transaction ptr points back to itself.
Comment 1 Patrick McManus [:mcmanus] 2012-05-18 12:33:06 PDT
Created attachment 625195 [details] [diff] [review]
patch 0

I think we should push this and see if it helps. I wish I had something firmer to offer.

it adds sanity checks and some runtime handling for them, and it also reverts the previous CanDirectlyActivate/EverUsedSpdy change as that showed 1 crash anyhow.
Comment 2 Patrick McManus [:mcmanus] 2012-05-18 15:01:51 PDT
Created attachment 625275 [details] [diff] [review]
part 2 - rev 0

The only path I can really see that might end up in a double cleanupStream() would be is conjuction with mNeedsCleanup .. that all looks ok to me, but there is a lot of other stuff on the stack while transaction->WriteSegments() gets called so maybe a return code gets munged in some way I'm not seeing.

this patch makes sure that can't happen and is a good addition to the verifications.
Comment 3 Patrick McManus [:mcmanus] 2012-05-21 04:55:25 PDT
Comment on attachment 625195 [details] [diff] [review]
patch 0

I'm eager to get this 2 patch series into (at least aurora) to see if helps with the beta crash on 75366 while there is still time to get it onto beta. The nightly audience is really too small to quickly validate it, averaging just .5 crashes a day on nightly but around 4 on aurora..

if it doesn't bear fruit it can be removed.
Comment 4 Patrick McManus [:mcmanus] 2012-05-21 04:56:07 PDT
Comment on attachment 625275 [details] [diff] [review]
part 2 - rev 0

same reasoning as comment 3 (this is part 2 of same patch)
Comment 5 Honza Bambas (:mayhemer) 2012-05-21 11:12:53 PDT
Comment on attachment 625195 [details] [diff] [review]
patch 0

Review of attachment 625195 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/SpdySession.cpp
@@ +929,5 @@
>    LOG3(("SpdySession::CleanupStream %p %p 0x%x %X\n",
>          this, aStream, aStream->StreamID(), aResult));
>  
> +  if (!VerifyStream(aStream))
> +    return;

Please make sure you can deduce what call to VerifyStream has failed (callstack) when reading the logs.

@@ +1043,5 @@
> +SpdySession::SetInputFrameDataStream(PRUint32 streamID)
> +{
> +  mInputFrameDataStream = mStreamIDHash.Get(streamID);
> +  if (VerifyStream(mInputFrameDataStream, streamID))
> +    return NS_OK;

Maybe add a log, the more the better.

@@ +1205,5 @@
>    if (!self->mInputFrameDataStream) {
>      LOG3(("SpdySession::HandleRstStream %p lookup streamID for RST Frame "
>            "0x%X failed", self, streamID));
>      return NS_ERROR_ILLEGAL_VALUE;
>    }

Maybe also check for rv of as on other places SetInputFrameDataStream() to distinguish?
Comment 6 Patrick McManus [:mcmanus] 2012-05-21 14:13:20 PDT
thanks honza! now with more logging..

https://hg.mozilla.org/mozilla-central/rev/1157a225434a
https://hg.mozilla.org/mozilla-central/rev/64187d60fae7
Comment 7 Lukas Blakk [:lsblakk] use ?needinfo 2012-05-21 16:44:20 PDT
Comment on attachment 625195 [details] [diff] [review]
patch 0

approved, hope we get the needed additional feedback on aurora.
Comment 8 Patrick McManus [:mcmanus] 2012-05-23 05:47:02 PDT
in nightly >= 052203
   aurora >=  052304
Comment 9 Patrick McManus [:mcmanus] 2012-05-24 07:04:19 PDT
its a little early to draw any conclusions here, but on 0523 aurora had no crashes of interest to this bug that used the 0523 build.. it had 3 total fails (using earlier builds) which is less than the 4.5 or so it had been averaging on a weekday. If you want to squint hard enough that's good news implying partial take up of a working fix. But you've really got to squint.

at least it isn't bad news :)

there are 0 nightly crashes with the signature of interest in the last few days.
Comment 10 Patrick McManus [:mcmanus] 2012-05-25 06:08:22 PDT
on 0524 aurora had no crashes of interest to this bug that used builds >= 0523, and 1 crash that used an older unpatched build. we were averaging around 4.5 - so that looks like a very strong data point.

I talked with alex about this and the patch would not be accepted for beta at this point (the threshold is chempsill-worthy at this stage), so we can get a little more data before declaring its fixed up.
Comment 11 Honza Bambas (:mayhemer) 2012-05-25 06:12:05 PDT
So we are actually masking a true problem in the code with these patches, right?
Comment 12 Patrick McManus [:mcmanus] 2012-05-25 06:19:23 PDT
(In reply to Honza Bambas (:mayhemer) from comment #11)
> So we are actually masking a true problem in the code with these patches,
> right?

probably.

it is possible that the mCleanupStream changes in part 2 are the correct fix to a not well identified problem :).

we can find out if that's the case when we have a fresh nightly (ff16) by making the work verifystream() does happen only in debug builds and see if we still have crash-stats data show up.
Comment 13 Honza Bambas (:mayhemer) 2012-05-25 06:55:34 PDT
(In reply to Patrick McManus [:mcmanus] from comment #12)
> it is possible that the mCleanupStream changes in part 2 are the correct fix
> to a not well identified problem :).

Seems most probable to me as well.

> 
> we can find out if that's the case when we have a fresh nightly (ff16) by
> making the work verifystream() does happen only in debug builds and see if
> we still have crash-stats data show up.

Worth to check for this, soon please!

Thanks for fixing this, btw.
Comment 14 Honza Bambas (:mayhemer) 2012-05-25 06:56:15 PDT
(In reply to Patrick McManus [:mcmanus] from comment #10)
> I talked with alex about this and the patch would not be accepted for beta

So it means to disable SPDY on beta, right?
Comment 15 Patrick McManus [:mcmanus] 2012-05-25 07:03:29 PDT
(In reply to Honza Bambas (:mayhemer) from comment #14)
> (In reply to Patrick McManus [:mcmanus] from comment #10)
> > I talked with alex about this and the patch would not be accepted for beta
> 
> So it means to disable SPDY on beta, right?

nope.. just live with it. While this is the #1 spdy issue, this is actually a pretty low crasher in the grand scheme of things.
Comment 16 Patrick McManus [:mcmanus] 2012-05-27 10:16:52 PDT
(In reply to Patrick McManus [:mcmanus] from comment #10)
> on 0524 aurora had no crashes of interest to this bug that used builds >=
> 0523, 

also true of 0525 and 0526. that's enough.

Note You need to log in before you can comment on or make changes to this bug.