Crash in [@ mozilla::dom::FetchDriver::OnDataAvailable]

RESOLVED FIXED in Firefox 67

Status

()

defect
P1
critical
RESOLVED FIXED
4 months ago
3 months ago

People

(Reporter: marcia, Assigned: edenchuang)

Tracking

({crash, regression})

Trunk
mozilla68
Unspecified
Android
Points:
---

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox66 wontfix, firefox67 fixed, firefox68 fixed)

Details

(Whiteboard: [necko-triaged], crash signature)

Attachments

(3 attachments)

This bug is for crash report bp-e2a901ca-d898-4dc5-bbe9-6d95f0190326.

Seen while looking at Fennec release crash stats, but visible on other branches as well: https://bit.ly/2TwONXr. This was visible in 65/65.0.1 with over 3K in crashes. #19 overall in the current 66 release.

Comments mentioning scrolling and zooming in on openstreet maps.

(40.26% in signature vs 00.46% overall) useragent_locale = fi

Top 10 frames of crashing thread:

0 libxul.so mozilla::dom::FetchDriver::OnDataAvailable dom/fetch/InternalResponse.h:71
1 libxul.so nsCORSListenerProxy::OnDataAvailable netwerk/protocol/http/nsCORSListenerProxy.cpp:636
2 libxul.so mozilla::net::InterceptedHttpChannel::OnDataAvailable netwerk/protocol/http/InterceptedHttpChannel.cpp:1080
3 libxul.so nsInputStreamPump::OnInputStreamReady netwerk/base/nsInputStreamPump.cpp:554
4 libxul.so nsInputStreamReadyEvent::Run xpcom/io/nsStreamUtils.cpp:91
5 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1180
6 libxul.so NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:482
7 libxul.so mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:88
8 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:315
9 libxul.so nsBaseAppShell::Run widget/nsBaseAppShell.cpp:137

Component: DOM: Core & HTML → DOM: Networking
Blocks: 1290481
Priority: -- → P1
Whiteboard: [necko-triaged]

Tom, can you take a look here?

Flags: needinfo?(shes050117)

Sure, I think Honza's analysis is right. It returns NS_OK unexpectedly for StartRequest() so that OnDataAvailable() is possible to be called with a dangling mResponse.

Assignee: nobody → shes050117
Status: NEW → ASSIGNED
Flags: needinfo?(shes050117)
Attachment #9053967 - Attachment description: Bug 1539208 - Make GeneratePaddingInfo infalliable; → Bug 1539208 - P2 - Use RandomNum while failing to get a random number from RandomNumberGenerator;
Attachment #9053967 - Attachment description: Bug 1539208 - P2 - Use RandomNum while failing to get a random number from RandomNumberGenerator; → Bug 1539208 - P2 - Use RandomNum while failing to get a random number from RandomGenerator;
Pushed by shes050117@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/cec67adbdbb6
P1 - Return an error when the Reponse object is null'd; r=mayhemer
https://hg.mozilla.org/integration/autoland/rev/7b17d77256be
P2 - Use RandomNum while failing to get a random number from RandomGenerator; r=baku
Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
Pushed by shindli@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8b7207f23c2f
P2 - Use RandomNum while failing to get a random number from RandomGenerator;
https://hg.mozilla.org/integration/autoland/rev/107061d96dd2
P1 - Return an error when the Reponse object is null'd;
Flags: needinfo?(shes050117)
Status: REOPENED → RESOLVED
Closed: 4 months ago4 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

Tom, do you estimate that this would be safe to uplift to 67? (Not a lot of crashes on beta 67 but it wasn't a lot either in 66 beta while it is noticeable on 66 release) Thanks

Flags: needinfo?(shes050117)

This should be safe because it only corrects the behavior by returning a proper error (it returned an NS_OK for the failure case) and tries another way to generate a random number by another API.

I'll send an uplift request to Beta.

Flags: needinfo?(shes050117)

Comment on attachment 9054198 [details]
Bug 1539208 - P1 - Return an error when the Reponse object is null'd;

Beta/Release Uplift Approval Request

  • Feature/Bug causing the regression: Bug 1290481
  • User impact if declined: It would crash if it failed to get a random number service because the code returned an NS_OK for that case.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): P1 corrects the returning nsresult so it shouldn't have any risky in theory.
    P2 generates a random number by another API while the original service fails so it also shouldn't have any risky in theory.
  • String changes made/needed:
Attachment #9054198 - Flags: approval-mozilla-beta?
Attachment #9053967 - Flags: approval-mozilla-beta?
No longer blocks: 1290481
Regressed by: 1290481

Comment on attachment 9054198 [details]
Bug 1539208 - P1 - Return an error when the Reponse object is null'd;

Crash fix, low risk patch and the volume on 66 release for this signature is medium. Uplift approved for 67 beta 9, thanks!

Attachment #9054198 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9053967 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

hi, crash reports are continuing to come in from builds containing the patch unfortunately.
as per the user at https://support.mozilla.org/questions/1256474 visiting www.sciencespots.com on fennec reproducibly crashes the browser (which i can confirm on 67.0b9).

Flags: needinfo?(shes050117)

Will look into this. Thanks!

I did a stupid check on [1]. However, in Nightly, the fact we didn't hit this nullptr does prove the crash is not coming from Bug 1290481.

We also have an assertion for mResponse inside the ShouldCheckSRI() [2].

I wonder if the problem is something else.

[1] https://searchfox.org/mozilla-beta/rev/22be965751d52a56f4e6920d58283152e0d8bec0/dom/fetch/FetchDriver.cpp#1050
[2] https://searchfox.org/mozilla-beta/rev/22be965751d52a56f4e6920d58283152e0d8bec0/dom/fetch/FetchDriver.cpp#1056

Flags: needinfo?(shes050117)
No longer regressed by: 1290481

Eden says he is recently working on issues on Android and is happy to look into this.

Assignee: shes050117 → echuang

Analyzing the case on an emulator, FetchDriver::OnStartRequest() exits in

https://searchfox.org/mozilla-central/source/dom/fetch/FetchDriver.cpp#814

In the case, FetchDriver::mChannel is nullptr, and mResponse is not created in OnStartRequest. Then when we receive the OnDataAvailable, it accesses a nullptr (mResponse->Type()).

https://searchfox.org/mozilla-central/source/dom/fetch/FetchDriver.cpp#1184

Now there are two possible problems.

  1. mChannel should not be nullptr at the moment.
  2. In theory, if FetchDriver::OnStartRequest() returns fail, FetchDriver::OnDataAvailable should not be called. But it is called in this case.

InterceptedHttpChannel::OnStartReqeust() doesn't propagate the error from its mListener. It always returns NS_OK, even mListener->OnStartRequest() failed.

https://searchfox.org/mozilla-central/source/netwerk/protocol/http/InterceptedHttpChannel.cpp#1023

That's the reason why the nsInputStreamPump's state doesn't change as our expected to STATE_STOP.

Yes, that is a huge bug, good catch!

  1. Propagate the mListener's result in InterceptedHttpChannel::OnStartRequest()
    and ::OnStopRequest().

  2. remove unnecessary assertion in FetchDriver::OnStartRequest().

Keywords: checkin-needed
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Pushed by aiakab@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c94a6d04f305
Propagate the result of mListener in InterceptedHttpChannel::OnStartRequest and ::OnStopRequest r=perry,ttung,kershaw

Keywords: checkin-needed
Status: REOPENED → RESOLVED
Closed: 4 months ago3 months ago
Resolution: --- → FIXED

Pascal, need we uplift the patch to the beta?

Flags: needinfo?(pascalc)

(In reply to Eden Chuang[:edenchuang] from comment #26)

Pascal, need we uplift the patch to the beta?

Could you request an uplift and give your evaluation of the risks? I'll look at it tomorrow, thanks!

Flags: needinfo?(pascalc)

Comment on attachment 9060070 [details]
Propagate the result of mListener in InterceptedHttpChannel::OnStartRequest and ::OnStopRequest

Beta/Release Uplift Approval Request

  • User impact if declined: Crash when using ServiceWorker on Android.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Not risky. This patch just propagates the nsIStreamListener::OnStartRequest/OnStopRequest results to nsInputStreamPump to make nsInputStreamPump change the state correctly.
  • String changes made/needed: No
Attachment #9060070 - Flags: approval-mozilla-beta?

Comment on attachment 9060070 [details]
Propagate the result of mListener in InterceptedHttpChannel::OnStartRequest and ::OnStopRequest

Low risk crash fix for Android, uplift approved for 67 beta 15, thanks.

Attachment #9060070 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.