Last Comment Bug 766210 - Delay updating nsPrefetchNode's reference to redirect channel until we know redirect is approved
: Delay updating nsPrefetchNode's reference to redirect channel until we know r...
Status: RESOLVED FIXED
: crash
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: Trunk
: All All
: -- critical (vote)
: mozilla17
Assigned To: Honza Bambas (:mayhemer)
:
Mentors:
http://www.universalstudioshollywood....
Depends on:
Blocks: 532972 598790
  Show dependency treegraph
 
Reported: 2012-06-19 11:16 PDT by Bob Clary [:bc:]
Modified: 2012-08-01 02:54 PDT (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
crash report (61.56 KB, text/plain)
2012-06-19 11:16 PDT, Bob Clary [:bc:]
no flags Details
patch 0 (1.06 KB, patch)
2012-06-19 14:39 PDT, Patrick McManus [:mcmanus]
bzbarsky: review-
Details | Diff | Splinter Review
v1 (5.32 KB, patch)
2012-07-11 15:03 PDT, Honza Bambas (:mayhemer)
jduell.mcbugs: review+
honzab.moz: checkin+
Details | Diff | Splinter Review

Description Bob Clary [:bc:] 2012-06-19 11:16:12 PDT
Created attachment 634507 [details]
crash report

1. http://www.universalstudioshollywood.com/attractions/transformers-the-ride-3d/
2. ABORT twice

ABORT: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file c:\work\mozilla\builds\beta\mozilla\firefox-debug\dist\include\nsCOMPtr.h, line 809

Thread 0 (crashed)
 0  KERNELBASE.dll + 0x33e2e
    eip = 0x75653e2e   esp = 0x0031ce40   ebp = 0x0031ce44   ebx = 0x00000001
    esi = 0x0913e688   edi = 0x08d35228   eax = 0x00000000   ecx = 0x00000001
    edx = 0x00000000   efl = 0x00200202
    Found by: given as instruction pointer in context
 1  xul.dll!NS_DebugBreak_P [nsDebugImpl.cpp : 369 + 0x4]
    eip = 0x68bc6af1   esp = 0x0031ce4c   ebp = 0x0031d264
    Found by: previous frame's frame pointer
 2  xul.dll!nsCOMPtr<nsIChannel>::operator->() [nsCOMPtr.h : 809 + 0x22]
    eip = 0x676afc4a   esp = 0x0031d26c   ebp = 0x0031d284
    Found by: call frame info
 3  xul.dll!nsPrefetchNode::OnStopRequest(nsIRequest *,nsISupports *,unsigned int) [nsPrefetchService.cpp : 338 + 0xa]
    eip = 0x685878cb   esp = 0x0031d28c   ebp = 0x0031d290
    Found by: call frame info
 4  xul.dll!nsStreamListenerTee::OnStopRequest(nsIRequest *,nsISupports *,unsigned int) [nsStreamListenerTee.cpp : 82 + 0x27]
    eip = 0x676effcb   esp = 0x0031d298   ebp = 0x0031d2bc
    Found by: call frame info
 5  xul.dll!nsHttpChannel::OnStopRequest(nsIRequest *,nsISupports *,unsigned int) [nsHttpChannel.cpp : 4495 + 0x55]
    eip = 0x677ad3e7   esp = 0x0031d2c4   ebp = 0x0031d35c
    Found by: call frame info

Linux, Windows; Beta/14, Aurora/15, Nightly/16 Debug only. Doesn't crash opt builds.

I can crash reliably the first time with a fresh profile, but not reliably after that. Try start the browser or use a fresh profile if you can't reproduce.
Comment 1 Patrick McManus [:mcmanus] 2012-06-19 14:33:29 PDT
#9  0x00007fc903b3acdb in nsPrefetchNode::OnStopRequest (this=0x7fc8def6a3c0, aRequest=0x7fc8ddbfd058, aContext=0x0, aStatus=0) at ../../../spdy/uriloader/prefetch/nsPrefetchService.cpp:306
306	        mChannel->GetContentLength(&mBytesRead);
(gdb) print mChannel
$1 = {
  mRawPtr = 0x0
}

I've never read this code before, but it seems pretty clear to me the caching channel has been canceled and reset to null.

I put in a printf to confirm that and reproduced with it.
Comment 2 Patrick McManus [:mcmanus] 2012-06-19 14:39:21 PDT
Created attachment 634597 [details] [diff] [review]
patch 0

Boris, I'm not very familiar with this code - but the null check appears to be sufficient if the channel has been canceled.
Comment 3 Josh Matthews [:jdm] 2012-06-19 15:17:41 PDT
I have chased this crash many times before from crash stats (bug 598790) and never been able to come up with a situation in which we would have a success status but null channel. Got any theories?
Comment 4 Boris Zbarsky [:bz] (still a bit busy) 2012-06-20 10:46:22 PDT
Yeah, that that was my question.  How can we possibly end up with mChannel null but mStatus NS_OK?

Patrick, if you can reproduce this, would you mind adding printfs in nsPrefetchNode::CancelChannel and in the HTTP channel Cancel() code listing the addresses of the two objects and the nsresult value being used?
Comment 5 Patrick McManus [:mcmanus] 2012-06-20 14:00:28 PDT
It involves a redirect. Lovely. The new channel in accepted and the acceptance is posted async to the old channel. The new channel is then canceled. Then the old channel receives the posted onredirectverifycallback(). Then the old channel executes OnStopRequest(NS_OK because it hasn't been canceled). Off the top of my head I would have expected that OnStopRequest to come from the new channel, but maybe I'm wrong on that.

jason is the redir guru.

Here's the slime trail:

nsprefetchnode::AsyncOnChannelRedirect this=0x7f65f3b78e80 mChannel=0x7f65f25dc858 oldChannel=0x7f65f25dc858 newChannel=0x7f65f22c0058

nsprefetchnode::cancelchannel this=0x7f65f3b78e80 mChannel=0x7f65f22c0058 code=804b0002 (binding aborted)

nsHttpChannel::Cancel [this=0x7f65f22c0000 status=804b0002]

nsHttpChannel::OnRedirectVerifyCallback [this=0x7f65f25dc800] result=0 stack=2 mWaitingForRedirectCallback=1

nsHttpChannel::OnStopRequest [this=0x7f65f25dc800 request=0x7f65f384a200 status=0 mCanceled=0 mStatus=0]


I can repro this fairly easily with bob's url and a fresh profile.. about 1 in 5 times.. happens immediately (or not).
Comment 6 Boris Zbarsky [:bz] (still a bit busy) 2012-06-20 14:56:57 PDT
> Off the top of my head I would have expected that OnStopRequest to come from the new
> channel

That's how it should work, I would think, yes.
Comment 7 Patrick McManus [:mcmanus] 2012-06-20 17:23:32 PDT
(In reply to Boris Zbarsky (:bz) from comment #6)
> > Off the top of my head I would have expected that OnStopRequest to come from the new
> > channel
> 
> That's how it should work, I would think, yes.

maybe the fact that the new channel was canceled before the OnRedirectVerifyCallback was processed made that verification fail (even though it was sent as OK). Hopefully Jason can shed light.
Comment 8 Jason Duell [:jduell] (needinfo me) 2012-06-25 11:57:09 PDT
I'll take a look at this, but probably not before the end of the month.   

Every time I wade into the redirect code my brain hurts :)
Comment 9 Boris Zbarsky [:bz] (still a bit busy) 2012-06-25 22:24:00 PDT
Comment on attachment 634597 [details] [diff] [review]
patch 0

I really don't think this is the right fix...
Comment 10 Honza Bambas (:mayhemer) 2012-06-26 06:23:01 PDT
(In reply to Jason Duell (:jduell) from comment #8)
> Every time I wade into the redirect code my brain hurts :)

I know it well, so I could take a look at this.  But as well, probably not sooner then the next month.
Comment 11 Honza Bambas (:mayhemer) 2012-07-11 12:59:50 PDT
Do I need flash installed to reproduce?
Comment 12 Bob Clary [:bc:] 2012-07-11 13:33:56 PDT
It is hard to say. The abort is so random it is hard to say if it only appears with Flash. I just got inconsistent results with beta and nightly on win7. I'll try again later.
Comment 13 Honza Bambas (:mayhemer) 2012-07-11 14:03:42 PDT
Flash is not needed.  I managed to reproduce by moving code of nsPrefetchNode::CancelChannel to the tail of nsPrefetchNode::AsyncOnChannelRedirect.

Canceling the target (new) channel makes it fail to AsyncOpen.  Then we cannot process the redirect and fall back to processing the redirect request normally.  Hence OnStopRequest comes from the old channel and status is OK.
Comment 14 Honza Bambas (:mayhemer) 2012-07-11 15:03:31 PDT
Created attachment 641213 [details] [diff] [review]
v1

I am using nsIRedirectResultListener to switch mChannel.  That callback is called after the redirect is finally going to happen or not.  When it tells the prefetch node redirect is happening, it switches its mChannel to the target (new) channel.

This way, when CancelChannel is called after AsyncOnChannelRedirect and before the redirect is actually going to happen, the original channel is canceled and therefor will carry the failure state to OnStopRequest correctly.

Bob, could you please test this patch?  Try: https://tbpl.mozilla.org/?tree=Try&rev=4568a6fd9ffa



As I see this patch now, I believe there are other places in the code that changes the "working" channel of a class in AsyncOnChannelRedirect to something that later may not be the true channel the notifications come from.  I think we should check these places and turn them to using OnRedirectResult.  OnRedirectResult signature should change to pass also the channel that is going to make the load as an arg to simplify the code.

There is an extension https://mxr.mozilla.org/addons/search?find=%2F&string=nsIRedirectResultListener that use this method, but only for logging.
Comment 15 Bob Clary [:bc:] 2012-07-12 06:39:11 PDT
I used the try build on windows 7 to load the page many times over night and did not experience the abort.
Comment 16 Jason Duell [:jduell] (needinfo me) 2012-07-12 09:45:55 PDT
Comment on attachment 641213 [details] [diff] [review]
v1

stealing (feel free to chime in too, bz)
Comment 17 Jason Duell [:jduell] (needinfo me) 2012-07-12 15:29:36 PDT
Comment on attachment 641213 [details] [diff] [review]
v1

Review of attachment 641213 [details] [diff] [review]:
-----------------------------------------------------------------

As discussed on IRC I think it would also be helpful to revert bug 584863, so we no longer have the semantics that cancelling a redirected-to channel also vetoes the redirect iff AsyncOpen hasn't been called on it yet.   But Honza has convinced me on IRC that the fix here is sufficient, and the bug 584863 reversion too risky to do before the next merge.  I've filed bug 773475 for that. (if it turns out this patch is buggy for any reason, I think that patch will also fix this crash, by itself, FWIW).

  nsPrefetchNode::CancelChannel(nsresult error)
 {
-    mChannel->Cancel(error);
-    mChannel = nsnull;
+    if (mChannel) {
+        mChannel->Cancel(error);
+        mChannel = nsnull;
+    }

While debugging with Honza's suggested extra CancelChannel inserted into AsyncOnChannelRedirect, I wound up hitting cancel twice, which blows up.  So I suggest the above addition to be safe, even though it's not clear it would ever happen in practice.

::: uriloader/prefetch/nsPrefetchService.cpp
@@ +335,1 @@
>      return NS_ERROR_NO_INTERFACE;

Is there some reason why nsPrefetchNode isn't using the standard QI stuff here?  I.e why it fails instead of QIs to nsIRequestObserver, etc.

Not important for +r, just curious.
Comment 18 Honza Bambas (:mayhemer) 2012-07-12 15:38:04 PDT
(In reply to Jason Duell (:jduell) from comment #17)

Thanks for the review.

>   nsPrefetchNode::CancelChannel(nsresult error)
>  {
> -    mChannel->Cancel(error);
> -    mChannel = nsnull;
> +    if (mChannel) {
> +        mChannel->Cancel(error);
> +        mChannel = nsnull;
> +    }
> 
> While debugging with Honza's suggested extra CancelChannel inserted into
> AsyncOnChannelRedirect, I wound up hitting cancel twice, which blows up.  So
> I suggest the above addition to be safe, even though it's not clear it would
> ever happen in practice.

Hmmm... I think this is not necessary since we have never crashed here, AFAIK - crash stats are dead...
Comment 19 Jason Duell [:jduell] (needinfo me) 2012-07-12 15:47:36 PDT
> I think this is not necessary since we have never crashed here

I don't see the harm in fixing the code now in case it ever happens.  But either way is OK I guess.

Filed bug 773481 for fixing other classes that need the same OnRedirectResult fix we're doing here.
Comment 21 Honza Bambas (:mayhemer) 2012-07-31 12:04:02 PDT
https://tbpl.mozilla.org/?tree=Try&rev=6c65467433ca
Comment 22 Ed Morley [:emorley] 2012-08-01 02:54:11 PDT
https://hg.mozilla.org/mozilla-central/rev/c55c42ea076e

Note You need to log in before you can comment on or make changes to this bug.