Closed Bug 1318759 Opened 8 years ago Closed 8 years ago

Crash in mozilla::net::HttpChannelParent::OnStartRequest

Categories

(Core :: Networking, defect)

defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla53
Tracking Status
firefox52 --- fixed
firefox53 --- fixed

People

(Reporter: marcia, Assigned: mayhemer)

References

Details

(Keywords: crash, regression, Whiteboard: [necko-active])

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-653c5f48-b685-42b9-a491-622112161118.
=============================================================

Seen while looking at trunk crash stats: http://bit.ly/2g4vtkC

Crashes on 53 started with Build 20161114043454. Crash is also present on Mac with a slightly different signature.

Here is the checkins between the 13th and 14th: https://mzl.la/2fNdEn6
Assignee: nobody → honzab.moz
Whiteboard: [necko-active]
This is disproportionately common on Mac, with slightly more Mac crashes (11) than Windows crashes (8) over the past 3 days, making it the #1 browser topcrash in that time period.
The correlations report in:
https://crash-stats.mozilla.com/signature/?product=Firefox&release_channel=nightly&signature=mozilla%3A%3Anet%3A%3AHttpChannelParent%3A%3AOnStartRequest&date=%3E%3D2016-11-13T07%3A16%3A00.000Z&date=%3C2016-12-01T07%3A16%3A00.000Z#correlations
gives:
(100.0% in signature vs 05.72% overall) Addon "Ghostery" = true
(100.0% in signature vs 08.47% overall) Addon "NoScript Security Suite" = true

It also appears to be more common on Linux than Windows.

(There's no correlation report for the separate Mac signature.)
I suspect it's Ghostery that's relevant more so than NoScript, because bug 1319606 (which appears strongly related to this bug) has a 100% Ghostery correlation but no such NoScript correlation.
Going to work on this now.  I'll use URL from https://bugzilla.mozilla.org/show_bug.cgi?id=1317517#c14 to reproduce.
To reproduce the crash one needs to install both addons:

> (100.0% in signature vs 03.01% overall) Addon "NoScript Security Suite" = true
> (100.0% in signature vs 03.25% overall) Addon "Ghostery" = true

W/o NoScript Security Suite, I only experience bug 1322346 and bug 1322355.
The nasty thing is clearly visible here:

2016-12-07 16:28:59.010000 UTC - [Main Thread]: V/nsHttp HttpChannelParent::RecvRedirect2Verify call OnRedirectVerifyCallback [this=184a4e20 result=805303f4, mRedirectCallback=1b996410]
2016-12-07 16:28:59.010000 UTC - [Main Thread]: D/nsHttp nsHttpChannel::OnRedirectVerifyCallback [this=1ceb1800] result=0 stack=1 mWaitingForRedirectCallback=1


HttpChannelParent::RecvRedirect2Verify shows the redirect have been vetoed but nsHttpChannel::OnRedirectVerifyCallback sees the result as NS_OK (i.e. carry on with the redirect).

mRedirectCallback is not the expected nsHttpChannel, but something JS, provided by the extension.  I found a code like this in NoScript:

RCBDelegate.prototype = {
  QueryInterface: XPCOMUtils.generateQI([Ci.sIAsyncVerifyRedirectCallback]),
  onRedirectVerifyCallback: function(result) {
    try {
      if (result !== 0) ns.log("Overriding failed (" + result + ") redirect callback for " + this.label); // plugin failure is 2147500037
      this.delegator.onRedirectVerifyCallback(0);
    } catch (e) {
      ns.log(e);
    }
  }
}

and

  asyncOnChannelRedirect: function(oldChan, newChan, flags, redirectCallback) {
    this.onChannelRedirect(oldChan, newChan, flags);
    redirectCallback.onRedirectVerifyCallback(0);
  },


If this is added on every channel, then it's actually a security issue because it 1) allows redirects we have vetoed based on any content security policy and check-may-load policy and 2) breaks the internal logic, since when we notify redirect result as a failure, we are in a certain state and count on the redirect to not proceed.  I need to check details here first to report this to NoScript, tho.

Still, I will be really happy to move away from xpcom for extensions, specially in this case!!
Status: NEW → ASSIGNED
(backup, needs even more local testing)
Comment on attachment 8817202 [details] [diff] [review]
v1 (Properly query final class in HttpChannelParent::OnStartRequest)

Jason, please try to review or find someone to do it.  This is not a oneliner, but something I wanted to do a long ago.

I was also thinking about simply returning an error from onstartrequest (aka aborting the channel) when nsHttpChannel is not found, because we are then sending the data to HttpChannelChild, which seems a bit weird, at least.

I don't remember why we were doing the static_cast on aRequest at the first place and not use mChannel and assert aRequest == mChannel...
Attachment #8817202 - Attachment description: wip1 → v1 (Properly query final class in HttpChannelParent::OnStartRequest)
Attachment #8817202 - Flags: review?(jduell.mcbugs)
Comment on attachment 8817202 [details] [diff] [review]
v1 (Properly query final class in HttpChannelParent::OnStartRequest)

Review of attachment 8817202 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/HttpBaseChannel.h
@@ +238,5 @@
>    NS_IMETHOD SetIntegrityMetadata(const nsAString& aIntegrityMetadata) override;
> +  virtual mozilla::net::nsHttpChannel * QueryHttpChannelImpl(void) override
> +  {
> +      return nullptr;
> +  }

Same comment as below about inline + virtual here.

::: netwerk/protocol/http/HttpChannelParent.cpp
@@ +1169,5 @@
> +    // XXX here we should throw mChannel away since it no longer points to
> +    // nsHttpChannel we actually work with.  But, it's so heavilly used all over
> +    // this class it's simply not save and easy thing to do.
> +    // The true solution here is to rethink the HttpChannelParent in relation
> +    // to redirects to different schemes.

So... what happens after this?  We call OnStart/Stop on the child with a dummy requesthead?  Should we mark the channel with an error code?

::: netwerk/protocol/http/nsHttpChannel.h
@@ +149,5 @@
>      NS_IMETHOD ForceIntercepted(uint64_t aInterceptionID) override;
> +    virtual mozilla::net::nsHttpChannel * QueryHttpChannelImpl(void) override
> +    {
> +      return this;
> +    }

So this is a virtual function and thus can't be inlined I assume? Is this going to generate a copy of this in each .cpp file that includes it? (Sorry if my knowledge of C++ compilers is out of date).
Attachment #8817202 - Flags: review?(jduell.mcbugs) → review+
For ref, here's a try run that aborts the stream listener contract (returns NS_ERROR_something from onstart) when aRequest is not nsHttpChannel.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d6241cbfc5bb118b73cc2a42d2f6e3b4520c0cd5

If it goes well, I might prefer to land that patch.  (Still need to check locally it fixes the crash and doesn't introduce any other problem)
Honza, do you know why this (and the related signatures) would have started on November 14?
Flags: needinfo?(honzab.moz)
Crash Signature: [@ mozilla::net::HttpChannelParent::OnStartRequest] [@ <name omitted> | mozilla::net::HttpChannelParent::OnStartRequest] → [@ mozilla::net::HttpChannelParent::OnStartRequest] [@ <name omitted> | mozilla::net::HttpChannelParent::OnStartRequest] [@ mozilla::net::nsHttpChannel::GetCacheTokenExpirationTime]
differences from v1:
- review comments addressed
- onstart return NS_ERROR_UNEXPECTED when aRequest != nsHttpChannel

Few simplifications to the patch comes with that.  Rather one more try push:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=dae0139bd62f0b04e998311342479f0270c6efb7
Attachment #8817202 - Attachment is obsolete: true
Flags: needinfo?(honzab.moz)
Attachment #8817616 - Flags: review+
(In reply to Nicholas Nethercote [:njn] from comment #14)
> Honza, do you know why this (and the related signatures) would have started
> on November 14?

Likely a NoScript update.  There was a number of updates to that addon since November 13.  Before that the last update was on August 8, 2016.

Ghostery was update on November 28, but that may not have any effect here.
Keywords: checkin-needed
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/291d703d4964
Properly query final class in HttpChannelParent::OnStartRequest to prevent evil static_cast, r=jduell
Keywords: checkin-needed
Need a day or two to confirm the fix is crashed on Nightly. If it is, this should be uplifted to Aurora. (But it doesn't seem to be affecting Beta or Release.)
https://hg.mozilla.org/mozilla-central/rev/291d703d4964
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
(In reply to Nicholas Nethercote [:njn] from comment #18)
> Need a day or two to confirm the fix is crashed on Nightly.

We don't have full numbers yet for the 20161214030231 Nightly... but I don't see any crashes matching the three signatures for this bug, which is a good sign. We should have a clearer idea in 24 hours.
> We don't have full numbers yet for the 20161214030231 Nightly... but I don't
> see any crashes matching the three signatures for this bug, which is a good
> sign. We should have a clearer idea in 24 hours.

Updated numbers look good! No crashes after Nightly 20161213030215.

Honza, please nominate this for Aurora uplift.
Status: RESOLVED → VERIFIED
Flags: needinfo?(honzab.moz)
Comment on attachment 8817616 [details] [diff] [review]
v2 (query method for nsHttpChannel, abort from HttpChannelParent when aRequest != nsHttpChannel)

Approval Request Comment
[Feature/Bug causing the regression]: e10s + some addons recent updates
[User impact if declined]: crash during regular browsing with Ghostery and/or NoScript addons
[Is this code covered by automated tests?]: yes (heavily executed http redirect code)
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no need, except verification of the fix with the mentioned addons
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: I think very small, safe enough for aurora
[Why is the change risky/not risky?]: small risk because: the patch changes the way we get some info from the channel and pass down to child process.  Before the patch we get it from a previously stored channel (mChannel) but now we get it from the channel passed to OnStartRequest (aRequest).  There has been added an assertion that mChannel == aRequest (verified by tests).  But in release builds in the wild an inequality caused by some addon may not be caught and I'm not sure what impact that can have.  But in defense, the patched code is actually more correct than before!
[String changes made/needed]:  none
Flags: needinfo?(honzab.moz)
Attachment #8817616 - Flags: approval-mozilla-aurora?
Comment on attachment 8817616 [details] [diff] [review]
v2 (query method for nsHttpChannel, abort from HttpChannelParent when aRequest != nsHttpChannel)

fix top crash in aurora52
Attachment #8817616 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I just recalled nsQueryObject.  We should use it here.  I'll file a separate bug.
Depends on: 1325655
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: