Bug 766210 - Delay updating nsPrefetchNode's reference to redirect channel until we know redirect is approved
 Summary: Delay updating nsPrefetchNode's reference to redirect channel until we know r...
 Status: RESOLVED FIXED crash Core Components Networking: HTTP (show other bugs) Trunk All All -- critical (vote) mozilla17 Honza Bambas (:mayhemer) 532972 598790 Show dependency tree / graph

Reported: 2012-06-19 11:16 PDT by Bob Clary [:bc:]
Modified: 2012-08-01 02:54 PDT (History)
5 users (show)
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

Attachments
crash report (61.56 KB, text/plain)
2012-06-19 11:16 PDT, Bob Clary [:bc:]
no flags Details
patch 0 (1.06 KB, patch)
2012-06-19 14:39 PDT, Patrick McManus [:mcmanus]
bzbarsky: review-
Details | Diff | Review
v1 (5.32 KB, patch)
2012-07-11 15:03 PDT, Honza Bambas (:mayhemer)
jduell.mcbugs: review+
honzab.moz: checkin+
Details | Diff | Review

 Bob Clary [:bc:] 2012-06-19 11:16:12 PDT Created attachment 634507 [details] crash report 1. http://www.universalstudioshollywood.com/attractions/transformers-the-ride-3d/ 2. ABORT twice ABORT: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file c:\work\mozilla\builds\beta\mozilla\firefox-debug\dist\include\nsCOMPtr.h, line 809 Thread 0 (crashed) 0 KERNELBASE.dll + 0x33e2e eip = 0x75653e2e esp = 0x0031ce40 ebp = 0x0031ce44 ebx = 0x00000001 esi = 0x0913e688 edi = 0x08d35228 eax = 0x00000000 ecx = 0x00000001 edx = 0x00000000 efl = 0x00200202 Found by: given as instruction pointer in context 1 xul.dll!NS_DebugBreak_P [nsDebugImpl.cpp : 369 + 0x4] eip = 0x68bc6af1 esp = 0x0031ce4c ebp = 0x0031d264 Found by: previous frame's frame pointer 2 xul.dll!nsCOMPtr::operator->() [nsCOMPtr.h : 809 + 0x22] eip = 0x676afc4a esp = 0x0031d26c ebp = 0x0031d284 Found by: call frame info 3 xul.dll!nsPrefetchNode::OnStopRequest(nsIRequest *,nsISupports *,unsigned int) [nsPrefetchService.cpp : 338 + 0xa] eip = 0x685878cb esp = 0x0031d28c ebp = 0x0031d290 Found by: call frame info 4 xul.dll!nsStreamListenerTee::OnStopRequest(nsIRequest *,nsISupports *,unsigned int) [nsStreamListenerTee.cpp : 82 + 0x27] eip = 0x676effcb esp = 0x0031d298 ebp = 0x0031d2bc Found by: call frame info 5 xul.dll!nsHttpChannel::OnStopRequest(nsIRequest *,nsISupports *,unsigned int) [nsHttpChannel.cpp : 4495 + 0x55] eip = 0x677ad3e7 esp = 0x0031d2c4 ebp = 0x0031d35c Found by: call frame info Linux, Windows; Beta/14, Aurora/15, Nightly/16 Debug only. Doesn't crash opt builds. I can crash reliably the first time with a fresh profile, but not reliably after that. Try start the browser or use a fresh profile if you can't reproduce. Patrick McManus [:mcmanus] 2012-06-19 14:33:29 PDT #9 0x00007fc903b3acdb in nsPrefetchNode::OnStopRequest (this=0x7fc8def6a3c0, aRequest=0x7fc8ddbfd058, aContext=0x0, aStatus=0) at ../../../spdy/uriloader/prefetch/nsPrefetchService.cpp:306 306 mChannel->GetContentLength(&mBytesRead); (gdb) print mChannel \$1 = { mRawPtr = 0x0 } I've never read this code before, but it seems pretty clear to me the caching channel has been canceled and reset to null. I put in a printf to confirm that and reproduced with it. Patrick McManus [:mcmanus] 2012-06-19 14:39:21 PDT Created attachment 634597 [details] [diff] [review] patch 0 Boris, I'm not very familiar with this code - but the null check appears to be sufficient if the channel has been canceled. Josh Matthews [:jdm] 2012-06-19 15:17:41 PDT I have chased this crash many times before from crash stats (bug 598790) and never been able to come up with a situation in which we would have a success status but null channel. Got any theories? Boris Zbarsky [:bz] 2012-06-20 10:46:22 PDT Yeah, that that was my question. How can we possibly end up with mChannel null but mStatus NS_OK? Patrick, if you can reproduce this, would you mind adding printfs in nsPrefetchNode::CancelChannel and in the HTTP channel Cancel() code listing the addresses of the two objects and the nsresult value being used? Patrick McManus [:mcmanus] 2012-06-20 14:00:28 PDT It involves a redirect. Lovely. The new channel in accepted and the acceptance is posted async to the old channel. The new channel is then canceled. Then the old channel receives the posted onredirectverifycallback(). Then the old channel executes OnStopRequest(NS_OK because it hasn't been canceled). Off the top of my head I would have expected that OnStopRequest to come from the new channel, but maybe I'm wrong on that. jason is the redir guru. Here's the slime trail: nsprefetchnode::AsyncOnChannelRedirect this=0x7f65f3b78e80 mChannel=0x7f65f25dc858 oldChannel=0x7f65f25dc858 newChannel=0x7f65f22c0058 nsprefetchnode::cancelchannel this=0x7f65f3b78e80 mChannel=0x7f65f22c0058 code=804b0002 (binding aborted) nsHttpChannel::Cancel [this=0x7f65f22c0000 status=804b0002] nsHttpChannel::OnRedirectVerifyCallback [this=0x7f65f25dc800] result=0 stack=2 mWaitingForRedirectCallback=1 nsHttpChannel::OnStopRequest [this=0x7f65f25dc800 request=0x7f65f384a200 status=0 mCanceled=0 mStatus=0] I can repro this fairly easily with bob's url and a fresh profile.. about 1 in 5 times.. happens immediately (or not). Boris Zbarsky [:bz] 2012-06-20 14:56:57 PDT > Off the top of my head I would have expected that OnStopRequest to come from the new > channel That's how it should work, I would think, yes. Patrick McManus [:mcmanus] 2012-06-20 17:23:32 PDT (In reply to Boris Zbarsky (:bz) from comment #6) > > Off the top of my head I would have expected that OnStopRequest to come from the new > > channel > > That's how it should work, I would think, yes. maybe the fact that the new channel was canceled before the OnRedirectVerifyCallback was processed made that verification fail (even though it was sent as OK). Hopefully Jason can shed light. Jason Duell [:jduell] (needinfo? me) 2012-06-25 11:57:09 PDT I'll take a look at this, but probably not before the end of the month. Every time I wade into the redirect code my brain hurts :) Boris Zbarsky [:bz] 2012-06-25 22:24:00 PDT Comment on attachment 634597 [details] [diff] [review] patch 0 I really don't think this is the right fix... Honza Bambas (:mayhemer) 2012-06-26 06:23:01 PDT (In reply to Jason Duell (:jduell) from comment #8) > Every time I wade into the redirect code my brain hurts :) I know it well, so I could take a look at this. But as well, probably not sooner then the next month. Honza Bambas (:mayhemer) 2012-07-11 12:59:50 PDT Do I need flash installed to reproduce? Bob Clary [:bc:] 2012-07-11 13:33:56 PDT It is hard to say. The abort is so random it is hard to say if it only appears with Flash. I just got inconsistent results with beta and nightly on win7. I'll try again later. Honza Bambas (:mayhemer) 2012-07-11 14:03:42 PDT Flash is not needed. I managed to reproduce by moving code of nsPrefetchNode::CancelChannel to the tail of nsPrefetchNode::AsyncOnChannelRedirect. Canceling the target (new) channel makes it fail to AsyncOpen. Then we cannot process the redirect and fall back to processing the redirect request normally. Hence OnStopRequest comes from the old channel and status is OK. Honza Bambas (:mayhemer) 2012-07-11 15:03:31 PDT Created attachment 641213 [details] [diff] [review] v1 I am using nsIRedirectResultListener to switch mChannel. That callback is called after the redirect is finally going to happen or not. When it tells the prefetch node redirect is happening, it switches its mChannel to the target (new) channel. This way, when CancelChannel is called after AsyncOnChannelRedirect and before the redirect is actually going to happen, the original channel is canceled and therefor will carry the failure state to OnStopRequest correctly. Bob, could you please test this patch? Try: https://tbpl.mozilla.org/?tree=Try&rev=4568a6fd9ffa As I see this patch now, I believe there are other places in the code that changes the "working" channel of a class in AsyncOnChannelRedirect to something that later may not be the true channel the notifications come from. I think we should check these places and turn them to using OnRedirectResult. OnRedirectResult signature should change to pass also the channel that is going to make the load as an arg to simplify the code. There is an extension https://mxr.mozilla.org/addons/search?find=%2F&string=nsIRedirectResultListener that use this method, but only for logging. Bob Clary [:bc:] 2012-07-12 06:39:11 PDT I used the try build on windows 7 to load the page many times over night and did not experience the abort. Jason Duell [:jduell] (needinfo? me) 2012-07-12 09:45:55 PDT Comment on attachment 641213 [details] [diff] [review] v1 stealing (feel free to chime in too, bz) Jason Duell [:jduell] (needinfo? me) 2012-07-12 15:29:36 PDT Comment on attachment 641213 [details] [diff] [review] v1 Review of attachment 641213 [details] [diff] [review]: ----------------------------------------------------------------- As discussed on IRC I think it would also be helpful to revert bug 584863, so we no longer have the semantics that cancelling a redirected-to channel also vetoes the redirect iff AsyncOpen hasn't been called on it yet. But Honza has convinced me on IRC that the fix here is sufficient, and the bug 584863 reversion too risky to do before the next merge. I've filed bug 773475 for that. (if it turns out this patch is buggy for any reason, I think that patch will also fix this crash, by itself, FWIW). nsPrefetchNode::CancelChannel(nsresult error) { - mChannel->Cancel(error); - mChannel = nsnull; + if (mChannel) { + mChannel->Cancel(error); + mChannel = nsnull; + } While debugging with Honza's suggested extra CancelChannel inserted into AsyncOnChannelRedirect, I wound up hitting cancel twice, which blows up. So I suggest the above addition to be safe, even though it's not clear it would ever happen in practice. ::: uriloader/prefetch/nsPrefetchService.cpp @@ +335,1 @@ > return NS_ERROR_NO_INTERFACE; Is there some reason why nsPrefetchNode isn't using the standard QI stuff here? I.e why it fails instead of QIs to nsIRequestObserver, etc. Not important for +r, just curious. Honza Bambas (:mayhemer) 2012-07-12 15:38:04 PDT (In reply to Jason Duell (:jduell) from comment #17) Thanks for the review. > nsPrefetchNode::CancelChannel(nsresult error) > { > - mChannel->Cancel(error); > - mChannel = nsnull; > + if (mChannel) { > + mChannel->Cancel(error); > + mChannel = nsnull; > + } > > While debugging with Honza's suggested extra CancelChannel inserted into > AsyncOnChannelRedirect, I wound up hitting cancel twice, which blows up. So > I suggest the above addition to be safe, even though it's not clear it would > ever happen in practice. Hmmm... I think this is not necessary since we have never crashed here, AFAIK - crash stats are dead... Jason Duell [:jduell] (needinfo? me) 2012-07-12 15:47:36 PDT > I think this is not necessary since we have never crashed here I don't see the harm in fixing the code now in case it ever happens. But either way is OK I guess. Filed bug 773481 for fixing other classes that need the same OnRedirectResult fix we're doing here. Honza Bambas (:mayhemer) 2012-07-31 12:03:38 PDT Comment on attachment 641213 [details] [diff] [review] v1 https://hg.mozilla.org/integration/mozilla-inbound/rev/c55c42ea076e Honza Bambas (:mayhemer) 2012-07-31 12:04:02 PDT https://tbpl.mozilla.org/?tree=Try&rev=6c65467433ca Ed Morley [:emorley] 2012-08-01 02:54:11 PDT https://hg.mozilla.org/mozilla-central/rev/c55c42ea076e

 Note You need to log in before you can comment on or make changes to this bug.