Closed Bug 1407186 Opened 3 years ago Closed 3 years ago

Crash in imgRequest::OnStartRequest

Categories

(Core :: DOM: Service Workers, defect, P2, critical)

58 Branch
Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 --- unaffected
firefox56 --- unaffected
firefox57 --- unaffected
firefox58 --- fixed

People

(Reporter: calixte, Assigned: bkelly)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, Whiteboard: [clouseau])

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-45f5ec68-a228-40f1-9ed9-cd49e0171010.
=============================================================

There are 5 crashes in nightly 58 with buildid 20171009220104. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1391693.

[1] https://hg.mozilla.org/mozilla-central/rev?node=4ff6aa3fae0bc95db7fd69dc7537781c57f55e20
Flags: needinfo?(bkelly)
Do you have access to the URLs where these crashes are occurring?  Can you email them to me?  I'd like to try to reproduce.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Flags: needinfo?(bkelly) → needinfo?(cdenizet)
Done.
Flags: needinfo?(cdenizet)
I'm having trouble reproducing this locally.  Lets see if the fix in bug 1407225 helps here.  They both seem to relate to redirects.
Depends on: 1407225
See Also: → 1407302
If this persists after bug 1407225, then I'd like to see if my further changes in bug 1204254 help.  I believe the crash here is due to a duplicate OnStartRequest being fired and my Cancel/Abort fixes in bug 1204254 may eliminate that.
Looking at this a bit more, we could also try checking for a failed mStatus here:

https://searchfox.org/mozilla-central/source/netwerk/protocol/http/nsHttpChannel.cpp#3006

It seems likely these are channels that are getting canceled in the middle of a the new internal redirect to the InterceptedHttpChannel class.  This may just be an existing race that is not normally triggered in nsHttpChannel on other redirects.
Oh, I actually think I see the problem now.

I think we are double-invoking the listener here:

https://searchfox.org/mozilla-central/source/netwerk/protocol/http/InterceptedHttpChannel.cpp#465

And here:

https://searchfox.org/mozilla-central/source/netwerk/protocol/http/nsHttpChannel.cpp#3033

When the nsHttpChannel redirects the InterceptedHttpChannel::AsyncOpen() is probably returning an error.  And we are invoking the listener callback from both classes.

I should change InterceptedHttpChannel::AsyncOpen() to either not return an error in this case or to not invoke the listener itself.
Yea, this is what nsHttpChannel::AsyncOpen() does:

https://searchfox.org/mozilla-central/source/netwerk/protocol/http/nsHttpChannel.cpp#6141

It returns NS_OK and invokes the listener with the error itself via AsyncAbort().  InterceptedHttpChannel should do the same.

I'll write a patch for this in the morning.
Priority: -- → P2
I think this test shows the problem.  Our previous interception code did a ResetInterception() in this case, but in InterceptedHttpChannel we are returning an error.  We're also mistakenly double-firing listener callbacks.
Comment on attachment 8917377 [details] [diff] [review]
P2 Test that we reset the interception if ChannelIntercepted() returns an error. r=asuth

So, we had a couple problems with InterceptedHttpChannel::AsyncOpen() error handling.  This patch fixes them by:

1. Don't both return an error value and invoke listener callbacks.  This results in both the caller and the InterceptedHttpChannel invoking the listener callbacks which can cause nullptr derefs like this crash.
2. If the initial interception of a channel fails, we need to ResetInterception() to match previous behavior.  We try to be nice and fallback to network for internal problems.
3. If this is a pre-synthesized channel we fire Cancel() and fire listener callbacks with an error code.  We can't fallback to network after the service worker has already seen the FetchEvent.
Attachment #8917377 - Flags: review?(bugmail)
Comment on attachment 8917404 [details] [diff] [review]
P1 Fix InterceptedHttpChannel::AsyncOpen() error handling. r=asuth

This adds a test to verify we ResetInterception() if something unexpected happens during nsIInterceptedChannel::ChannelIntercepted().  Without the P1 patch we fail this test in non-e10s mode.  With the P1 patch we pass this test in non-e10s mode.
Attachment #8917404 - Flags: review?(bugmail)
Comment on attachment 8917404 [details] [diff] [review]
P1 Fix InterceptedHttpChannel::AsyncOpen() error handling. r=asuth

Review of attachment 8917404 [details] [diff] [review]:
-----------------------------------------------------------------

The MakeScopeExit closed over the rv makes for refreshingly concise code!
Attachment #8917404 - Flags: review?(bugmail) → review+
Attachment #8917377 - Flags: review?(bugmail) → review+
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9f3b5e007cd1
P1 Fix InterceptedHttpChannel::AsyncOpen() error handling. r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/1a282a37fcb1
P2 Test that we reset the interception if ChannelIntercepted() returns an error. r=asuth
https://hg.mozilla.org/mozilla-central/rev/9f3b5e007cd1
https://hg.mozilla.org/mozilla-central/rev/1a282a37fcb1
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
So far zero crash reports in the first buildid with the fix.  So looking pretty good.

By the way, the quick crash bug was super helpful in getting this fixed.  Thanks!
Duplicate of this bug: 1407302
You need to log in before you can comment on or make changes to this bug.