Closed Bug 766210 Opened 12 years ago Closed 12 years ago

Delay updating nsPrefetchNode's reference to redirect channel until we know redirect is approved

Categories

(Core :: Networking: HTTP, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla17

People

(Reporter: bc, Assigned: mayhemer)

References

()

Details

(Keywords: crash)

Attachments

(2 files, 1 obsolete file)

Attached file crash report
1. http://www.universalstudioshollywood.com/attractions/transformers-the-ride-3d/
2. ABORT twice

ABORT: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file c:\work\mozilla\builds\beta\mozilla\firefox-debug\dist\include\nsCOMPtr.h, line 809

Thread 0 (crashed)
 0  KERNELBASE.dll + 0x33e2e
    eip = 0x75653e2e   esp = 0x0031ce40   ebp = 0x0031ce44   ebx = 0x00000001
    esi = 0x0913e688   edi = 0x08d35228   eax = 0x00000000   ecx = 0x00000001
    edx = 0x00000000   efl = 0x00200202
    Found by: given as instruction pointer in context
 1  xul.dll!NS_DebugBreak_P [nsDebugImpl.cpp : 369 + 0x4]
    eip = 0x68bc6af1   esp = 0x0031ce4c   ebp = 0x0031d264
    Found by: previous frame's frame pointer
 2  xul.dll!nsCOMPtr<nsIChannel>::operator->() [nsCOMPtr.h : 809 + 0x22]
    eip = 0x676afc4a   esp = 0x0031d26c   ebp = 0x0031d284
    Found by: call frame info
 3  xul.dll!nsPrefetchNode::OnStopRequest(nsIRequest *,nsISupports *,unsigned int) [nsPrefetchService.cpp : 338 + 0xa]
    eip = 0x685878cb   esp = 0x0031d28c   ebp = 0x0031d290
    Found by: call frame info
 4  xul.dll!nsStreamListenerTee::OnStopRequest(nsIRequest *,nsISupports *,unsigned int) [nsStreamListenerTee.cpp : 82 + 0x27]
    eip = 0x676effcb   esp = 0x0031d298   ebp = 0x0031d2bc
    Found by: call frame info
 5  xul.dll!nsHttpChannel::OnStopRequest(nsIRequest *,nsISupports *,unsigned int) [nsHttpChannel.cpp : 4495 + 0x55]
    eip = 0x677ad3e7   esp = 0x0031d2c4   ebp = 0x0031d35c
    Found by: call frame info

Linux, Windows; Beta/14, Aurora/15, Nightly/16 Debug only. Doesn't crash opt builds.

I can crash reliably the first time with a fresh profile, but not reliably after that. Try start the browser or use a fresh profile if you can't reproduce.
#9  0x00007fc903b3acdb in nsPrefetchNode::OnStopRequest (this=0x7fc8def6a3c0, aRequest=0x7fc8ddbfd058, aContext=0x0, aStatus=0) at ../../../spdy/uriloader/prefetch/nsPrefetchService.cpp:306
306	        mChannel->GetContentLength(&mBytesRead);
(gdb) print mChannel
$1 = {
  mRawPtr = 0x0
}

I've never read this code before, but it seems pretty clear to me the caching channel has been canceled and reset to null.

I put in a printf to confirm that and reproduced with it.
Attached patch patch 0 (obsolete) — Splinter Review
Boris, I'm not very familiar with this code - but the null check appears to be sufficient if the channel has been canceled.
Attachment #634597 - Flags: review?(bzbarsky)
I have chased this crash many times before from crash stats (bug 598790) and never been able to come up with a situation in which we would have a success status but null channel. Got any theories?
Blocks: 598790
Yeah, that that was my question.  How can we possibly end up with mChannel null but mStatus NS_OK?

Patrick, if you can reproduce this, would you mind adding printfs in nsPrefetchNode::CancelChannel and in the HTTP channel Cancel() code listing the addresses of the two objects and the nsresult value being used?
It involves a redirect. Lovely. The new channel in accepted and the acceptance is posted async to the old channel. The new channel is then canceled. Then the old channel receives the posted onredirectverifycallback(). Then the old channel executes OnStopRequest(NS_OK because it hasn't been canceled). Off the top of my head I would have expected that OnStopRequest to come from the new channel, but maybe I'm wrong on that.

jason is the redir guru.

Here's the slime trail:

nsprefetchnode::AsyncOnChannelRedirect this=0x7f65f3b78e80 mChannel=0x7f65f25dc858 oldChannel=0x7f65f25dc858 newChannel=0x7f65f22c0058

nsprefetchnode::cancelchannel this=0x7f65f3b78e80 mChannel=0x7f65f22c0058 code=804b0002 (binding aborted)

nsHttpChannel::Cancel [this=0x7f65f22c0000 status=804b0002]

nsHttpChannel::OnRedirectVerifyCallback [this=0x7f65f25dc800] result=0 stack=2 mWaitingForRedirectCallback=1

nsHttpChannel::OnStopRequest [this=0x7f65f25dc800 request=0x7f65f384a200 status=0 mCanceled=0 mStatus=0]


I can repro this fairly easily with bob's url and a fresh profile.. about 1 in 5 times.. happens immediately (or not).
> Off the top of my head I would have expected that OnStopRequest to come from the new
> channel

That's how it should work, I would think, yes.
(In reply to Boris Zbarsky (:bz) from comment #6)
> > Off the top of my head I would have expected that OnStopRequest to come from the new
> > channel
> 
> That's how it should work, I would think, yes.

maybe the fact that the new channel was canceled before the OnRedirectVerifyCallback was processed made that verification fail (even though it was sent as OK). Hopefully Jason can shed light.
I'll take a look at this, but probably not before the end of the month.   

Every time I wade into the redirect code my brain hurts :)
Comment on attachment 634597 [details] [diff] [review]
patch 0

I really don't think this is the right fix...
Attachment #634597 - Flags: review?(bzbarsky) → review-
(In reply to Jason Duell (:jduell) from comment #8)
> Every time I wade into the redirect code my brain hurts :)

I know it well, so I could take a look at this.  But as well, probably not sooner then the next month.
Assignee: nobody → honzab.moz
Do I need flash installed to reproduce?
It is hard to say. The abort is so random it is hard to say if it only appears with Flash. I just got inconsistent results with beta and nightly on win7. I'll try again later.
Flash is not needed.  I managed to reproduce by moving code of nsPrefetchNode::CancelChannel to the tail of nsPrefetchNode::AsyncOnChannelRedirect.

Canceling the target (new) channel makes it fail to AsyncOpen.  Then we cannot process the redirect and fall back to processing the redirect request normally.  Hence OnStopRequest comes from the old channel and status is OK.
Attached patch v1Splinter Review
I am using nsIRedirectResultListener to switch mChannel.  That callback is called after the redirect is finally going to happen or not.  When it tells the prefetch node redirect is happening, it switches its mChannel to the target (new) channel.

This way, when CancelChannel is called after AsyncOnChannelRedirect and before the redirect is actually going to happen, the original channel is canceled and therefor will carry the failure state to OnStopRequest correctly.

Bob, could you please test this patch?  Try: https://tbpl.mozilla.org/?tree=Try&rev=4568a6fd9ffa



As I see this patch now, I believe there are other places in the code that changes the "working" channel of a class in AsyncOnChannelRedirect to something that later may not be the true channel the notifications come from.  I think we should check these places and turn them to using OnRedirectResult.  OnRedirectResult signature should change to pass also the channel that is going to make the load as an arg to simplify the code.

There is an extension https://mxr.mozilla.org/addons/search?find=%2F&string=nsIRedirectResultListener that use this method, but only for logging.
Attachment #634597 - Attachment is obsolete: true
Attachment #641213 - Flags: review?(bzbarsky)
Status: NEW → ASSIGNED
I used the try build on windows 7 to load the page many times over night and did not experience the abort.
Comment on attachment 641213 [details] [diff] [review]
v1

stealing (feel free to chime in too, bz)
Attachment #641213 - Flags: review?(bzbarsky) → review?(jduell.mcbugs)
Comment on attachment 641213 [details] [diff] [review]
v1

Review of attachment 641213 [details] [diff] [review]:
-----------------------------------------------------------------

As discussed on IRC I think it would also be helpful to revert bug 584863, so we no longer have the semantics that cancelling a redirected-to channel also vetoes the redirect iff AsyncOpen hasn't been called on it yet.   But Honza has convinced me on IRC that the fix here is sufficient, and the bug 584863 reversion too risky to do before the next merge.  I've filed bug 773475 for that. (if it turns out this patch is buggy for any reason, I think that patch will also fix this crash, by itself, FWIW).

  nsPrefetchNode::CancelChannel(nsresult error)
 {
-    mChannel->Cancel(error);
-    mChannel = nsnull;
+    if (mChannel) {
+        mChannel->Cancel(error);
+        mChannel = nsnull;
+    }

While debugging with Honza's suggested extra CancelChannel inserted into AsyncOnChannelRedirect, I wound up hitting cancel twice, which blows up.  So I suggest the above addition to be safe, even though it's not clear it would ever happen in practice.

::: uriloader/prefetch/nsPrefetchService.cpp
@@ +335,1 @@
>      return NS_ERROR_NO_INTERFACE;

Is there some reason why nsPrefetchNode isn't using the standard QI stuff here?  I.e why it fails instead of QIs to nsIRequestObserver, etc.

Not important for +r, just curious.
Attachment #641213 - Flags: review?(jduell.mcbugs) → review+
(In reply to Jason Duell (:jduell) from comment #17)

Thanks for the review.

>   nsPrefetchNode::CancelChannel(nsresult error)
>  {
> -    mChannel->Cancel(error);
> -    mChannel = nsnull;
> +    if (mChannel) {
> +        mChannel->Cancel(error);
> +        mChannel = nsnull;
> +    }
> 
> While debugging with Honza's suggested extra CancelChannel inserted into
> AsyncOnChannelRedirect, I wound up hitting cancel twice, which blows up.  So
> I suggest the above addition to be safe, even though it's not clear it would
> ever happen in practice.

Hmmm... I think this is not necessary since we have never crashed here, AFAIK - crash stats are dead...
> I think this is not necessary since we have never crashed here

I don't see the harm in fixing the code now in case it ever happens.  But either way is OK I guess.

Filed bug 773481 for fixing other classes that need the same OnRedirectResult fix we're doing here.
Summary: ABORT: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0' [@ !nsPrefetchNode::OnStopRequest] → Delay updating nsPrefetchNode's reference to redirect channel until we know redirect is approved
https://hg.mozilla.org/mozilla-central/rev/c55c42ea076e
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: