Closed Bug 1106396 Opened 10 years ago Closed 9 years ago

[e10s] crash in mozilla::net::HttpChannelParentListener::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned __int64, unsigned int)

Categories

(Core :: Networking, defect)

x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
e10s m7+ ---
firefox38 --- affected

People

(Reporter: jdm, Assigned: dragana)

References

()

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(4 files, 5 obsolete files)

+++ This bug was initially created as a clone of Bug #1097878 +++

>Maybe in "mozilla36" the issue was fixed, but "mozilla37" still has it: >https://crash-stats.mozilla.com/report/index/f6517f6e-8158-48fa-81ec-a4bbf2141130
>
>However, it is not easily reproducible since after the crash and restart FF did >not immediately crash again at the same exact place.
>
>So it looks like the bug now triggers only after you use FF for a while and it >somehow accumulates certain level of craziness and crashes *only then*.
Assignee: nobody → dd.mozilla
Attached patch not important fix (obsolete) — Splinter Review
Here is the latest crash:
https://crash-stats.mozilla.com/report/index/d208f0fe-a3b3-47ba-a07b-478f42141212

It has happened right after I restarted FF Nightly (latest version) after the same crash prior to that (55 seconds runtime), so this time it was *reproducible*.

What I did to crash FF Nightly in 55 seconds?

1. Opened this link: https://www.tokyotosho.info/search.php?terms=Ranma&type=0&size_min=100&size_max=&username=

2. Scrolled down to "[Doki] Ranma ½ - Box 1 (960x720 Hi10P BD FLAC)" text-link (biege "Anime" moniker is on the left from it)

3. Via context menu, selected Open Link in Foreground Tab (though you can click on it, usually it will be the same action).

At this moment FF tries to connect with external pick-up application (torrent application) to forward this link to it for further processing, however, it can not do that, it just crashes at this moment.

Dragana, when your fix will take into effect? Thanks in advance.
Flags: needinfo?(dd.mozilla)
I ran into this (Linux x86_64) with the MP3 links on https://archive.org/details/PR_Gnus_Of_The_World_News_1500_to_1599a — but only with my regular browser profile, which has been around for a while and has an entry for “MP3 audio” in about:preferences#applications (currently set to “Always ask”).  A new profile doesn't have that entry and doesn't crash, even with the same addons and roughly similar configuration otherwise.  There's no UI to add/remove those entries and I'm not sure where in the profile directory they're hiding, so I can't confirm that that's enough to reproduce it.
I can confirm that I have torrent client assigned in "Applications" (from "Preferences").

However, there is something strange about it. There are two records on that: 

1) "BitComet file (application/torrent)" with "Use (default)" selected, and I can *not* choose concrete application to, unlike selected in #2, the selection remains untouched, and no error message appears on why FF can not allow me to choose whatever I selected.

2) "BitComet file (application/x-bittorrent)" with "Use BitComet - a BitTorrent client (default)".

Maybe the issue is with crooked #1 entry, which even can not be manually fixed?

(Off topic: Jed, did you ever see on 64-bit Nightly with e10s any of the three screens that attached to Bug #1111209?)
Flags: needinfo?(jld)
Can you try to do the same thing with logging turned on. Can you use all:5 and send me log per e-mail if you do not want to submit it here. It will help a lot. Thanks.
Flags: needinfo?(dd.mozilla) → needinfo?(zxspectrum3579)
This is also enough:
timestamp,nsHttp:5,nsSocketTransport:5,nsStreamPump:5,nsHostResolver:5
Dragana, how do I turn logging on? I did it once long time ago for a complicated bug case, but have already forgot. As far as I remember I had to first start special application that would spy on FF in the background and log huge amount of FF's activity in a giant file -- depending on how long the tracking is turned on until it stopped.
Flags: needinfo?(zxspectrum3579)
Dragana, I have tried to send you a letter with 7 MB log file in an attachment. Not sure how well it went, my mail service is not that good. Could you please confirm whether you received it?

If not, then let me know, I will try to upload it here.
Thanks a lot, I have received it.
I have sent another letter with "Child-1" (up to 1 MB size) log file according to your instruction with 32-bit testing build. If you did not receive it, let me know.
Letter with main log file sent; just in case, here is the report from the crash under your test build:

https://crash-stats.mozilla.com/report/index/9647d455-cb09-43c8-81c5-cfe2d2141215
The problem for the crash from comment 2 is noscript addon that do not do redirect properly.
It strts redirect with asycOnChannelRedirect but do not send OnRedirectResult so in e10s the redirect is not finished. No-e10s does not need this reply and it is working.

I am not sure if this is the reason for all crashes. Jed do you have noscript addon active?
Flags: needinfo?(jld)
Flags: needinfo?(jld)
Thanks to Dragana for debugging. 

It looks like NoScript is culprit of at least two bugs within e10s mode, with this one being the second. The first I detected was bug #816019.
Yes, I'm using NoScript.
Flags: needinfo?(jld)
(In reply to Jed Davis [:jld] from comment #15)
> Yes, I'm using NoScript.

This might have been fixed — I can't reproduce it in a new profile with NoScript installed, and I can't reproduce it anymore in my regular profile after updating NoScript.
There is no crash in my case any more, but the link from #comment 2 ("Doki ...") does not get forwarded to external application, a torrent tracker. If I press on the link now, nothing happens at all. This is much better than crash, but, Dragana, since you have my test profile, can you please check why FireFox does not perform file handling action (neither "Always ask", nor forwarding link to torrent applications happens)?
Flags: needinfo?(dd.mozilla)
I did check, but I will write it into bug 1111540 because it is more related to that one.
Flags: needinfo?(dd.mozilla)
Attached patch bug_1106396_add_checks_v1.patch (obsolete) — Splinter Review
This crash is caused by non proper redirect. There are no check that redirect is finished before a connection continues except when we have a divert to parent - diversion is broken and it crashes. In the log that I have seen redirect is not finished and divert to parent started and this cause the crash. 

With this patch we will check if redirect is finished for each redirected connection so I expect that more crashes will happened. 
In this bug crashes have happened only (or mostly I am not sure) because of a broken addon(s) It can be that we discover more strange addons and have more crashes, but adding some check for this can make debugging easier.

Honza can you check this, please.
On call of OnStartRequest it is possible to detect that redirect is not finished and it is possible to finish redirect but that will change the architecture so I would not do that.
Attachment #8535039 - Attachment is obsolete: true
Flags: needinfo?(honzab.moz)
I'm not sure what you are trying to achieve with the patch and also from you comment 19 the cause is not clear.  Can you provide more details?
Flags: needinfo?(honzab.moz) → needinfo?(dd.mozilla)

A redirect is finish when OnRedirectResult is called
http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/HttpChannelParentListener.cpp#166
In this function HttpChannelParentListener sets mNextListener to the new redirected HttpChannelParent.

In the log, that crashes, this function is never called, but OnStartRequest is called from the new nsHttpChannel. Further HttpChannelListener calls OnStartRequest of the old HttpChannelParent (the mChannel in HttpChannelParent is not the same as the calling HttpChannel, mChannel is the old one). This OnStartRequest causes divert-to-parent and suspends a channel, but it suspends the old channel because everything is going through the old HttpChannelParent, so the new channel is not suspending and calls OnDataAvailable which cause crash because diversion to parent is not finished jet.

My patch checks whether a redirect is finished when OnStartRequest is called (It should be).
Necko code handles redirect properly, so this will be just a check if somebody change something (to check if it is working properly). An addon was doing a strange redirect which was causing this crash. Jason told me that OnRedirectResult has been introduces for e10s only so without e10s addons have not had to call OnRedirectResult.


I hope this is explains better.
Flags: needinfo?(dd.mozilla)
Just had Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0 ID:20150116030203 CSet: cac6192956ab crash like bp-926ad431-c139-4c72-b6b6-6353e2150116

ADB Helper 0.7.3
Adblock Plus 2.6.6.3883
Click to Play per-element 0.0.7
Diccionario de Español/México 1.1.3
Firefox OS 2.0 Simulator 2.0.20140918
Flashblock 1.5.18 [DISABLED]
Gecko Profiler 1.10.17
Iconic Firefox Menu 2.0 [DISABLED]
IDS_SS_NAME IDS_SS_VERSION [DISABLED]
Nightly Tester Tools 3.7
Pixlr Grabber 2.1.4
Test Pilot 1.2.3 [DISABLED]

I don't even have NoScript my steps involved clicking a PDF result on a Bing search result set.
Version: unspecified → Trunk
Can you reproduce it?

If you can, can you turn on http logging and upload the log?
Flags: needinfo?(alex_mayorga)
Unfortunately no, I tried a few times =(

IIRC it crashed when clicking the top most result at http://www.bing.com/search?q=Java+Netbackup+7.5+client+system+requirements&pc=MOZI&form=MOZSBR
Flags: needinfo?(alex_mayorga)
Thank you.
I will look at it and try to figure out what could have cause the crash
May I ask you to send me http log going trough the step that cause the crash, even if it does not produce a crash (https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging).

Thank you!
Flags: needinfo?(alex_mayorga)
Attached patch add checks (obsolete) — Splinter Review
Attachment #8540723 - Attachment is obsolete: true
Attachment #8551152 - Flags: review?(honzab.moz)
Comment on attachment 8551152 [details] [diff] [review]
add checks

Review of attachment 8551152 [details] [diff] [review]:
-----------------------------------------------------------------

Disclaimer: this can be dangerous, we may have a situation where this won't pass but the codepath is correct.  We'll see.

::: netwerk/protocol/http/HttpChannelParent.cpp
@@ +677,5 @@
>    LOG(("HttpChannelParent::OnStartRequest [this=%p, aRequest=%p]\n",
>         this, aRequest));
>  
> +  MOZ_RELEASE_ASSERT(mChannel == aRequest,
> +      "Cannot call OnStartRequest if a redirect is in progress!");

I'd rather have the message as "request and the active channel are different!"

And the check might rather be (static_cast<nsIRequest>(mChannel.get()) == aRequest) but here I just might not know C++ enough :)
Attachment #8551152 - Flags: review?(honzab.moz) → review+
I will add some more details about the crash that was reproducible (comment 2 and 3):

This crashes were cause by noScript addon version 2.6.6. This is fix in version 2.6.9.

The reason for this crash was that a redirect was not finished properly. It has started but OnRedirectResult has not been called. In HttpChannelParentListener::OnRedirectResult the mNextListener is switch from the old HttpChannelParent to the new one. Since this function is not called the listener is not switch and OnStartRequest of the new channel calls OnStartRequest of the old HttpChannelParent, so in HttpChannelParent::OnStartRequest aRequest parameter is the new nsHttpChannel and HttpChannelParent's variable mChannel is the old nsHttpChannel. The patch "add checks" will check this, and there will be more concrete error. 
In current version HttpChannelParent::OnStartRequest works, but DivertToParent does not work because HttpChannelParent suspends the wrong, the old, nsHttpChannel and that case the new nsHttpChannel to continue and and call OnDataAvailable which cause the crash.



noScript addon version 2.6.6 did the following:
on http-on-modify-request suspends original channel and creates  new one.
in nsHttpHChannel::HandleAsyncAbort, in DoNotifyListener calls HttpChannelParentListener::AsyncOnChannelRedirect
Attached patch add checks v2Splinter Review
Check whether redirect is not in progress when OnStartRequest and OnDataAvailable is called. 
This is more secure variant.
Attachment #8551152 - Attachment is obsolete: true
Attachment #8551793 - Flags: review?(honzab.moz)
It looks like I've hit on a reproducible method to hit this crash (at least I think it's this crash). With E10s enabled, I try to download a new build of Synergy. (You can see the problems I had previously doing this in bug 1082134). Since this appears to be e10s related, marking the summary as such.

Dragana - is this the same crash? If not, I'll open a new bug.

1. Log into the synergy download site, with E10S enabled, windows 8.1, 64bit.
2. Choose to download the windows 64bit build of synergy

= actual =
crash

= expected =
download the file

Every time I download, I hit the breakpoint at: https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/HttpChannelParentListener.cpp#90

Crashes: 
* https://crash-stats.mozilla.com/report/index/0a6eb707-d338-43d7-8984-b65602150120
* https://crash-stats.mozilla.com/report/index/bp-fc6798bf-20d3-43ec-bb95-baf792150120

Call Stack:
> xul.dll!mozilla::net::HttpChannelParentListener::OnDataAvailable(nsIRequest * aRequest, nsISupports * aContext, nsIInputStream * aInputStream, unsigned __int64 aOffset, unsigned int aCount) Line 91 C++
  xul.dll!mozilla::net::nsHttpChannel::OnDataAvailable(nsIRequest * request, nsISupports * ctxt, nsIInputStream * input, unsigned __int64 offset, unsigned int count) Line 5700 C++
  xul.dll!nsInputStreamPump::OnStateTransfer() Line 610 C++
  xul.dll!nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream * stream) Line 437  C++
  xul.dll!nsInputStreamReadyEvent::Run() Line 90  C++
  xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 861  C++
  xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate) Line 99 C++
  xul.dll!MessageLoop::RunHandler() Line 227  C++
  xul.dll!MessageLoop::Run() Line 201 C++
  xul.dll!nsBaseAppShell::Run() Line 166  C++
  xul.dll!nsAppShell::Run() Line 178  C++
  xul.dll!nsAppStartup::Run() Line 282  C++
  xul.dll!XREMain::XRE_mainRun() Line 4145  C++
  xul.dll!XREMain::XRE_main(int argc, char * * argv, const nsXREAppData * aAppData) Line 4221 C++
  xul.dll!XRE_main(int argc, char * * argv, const nsXREAppData * aAppData, unsigned int aFlags) Line 4442 C++
  firefox.exe!do_main(int argc, char * * argv, nsIFile * xreDirectory) Line 294 C++
  firefox.exe!NS_internal_main(int argc, char * * argv) Line 669  C++
  firefox.exe!wmain(int argc, wchar_t * * argv) Line 124  C++
  [External Code] 
  [Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]
Flags: needinfo?(dd.mozilla)
Summary: crash in mozilla::net::HttpChannelParentListener::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned __int64, unsigned int) → [e10s] crash in mozilla::net::HttpChannelParentListener::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned __int64, unsigned int)
Looks like it.

Can you upload http log?
Flags: needinfo?(dd.mozilla) → needinfo?(ctalbert)
Attachment #8551793 - Flags: review?(honzab.moz) → review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/d11387c52d67
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla38 → ---
Attached file httplog.zip
Here are the logs from my session that re-pro'd the crash. One interesting note is that this is not reproducible with a clean profile without a lot of tabs open. This is my main broswer profile with lots of bookmarks, and perhaps more importantly, about 150 tabs open.

Here are the steps used to get this log:
1. Turned on logging for NSPR_LOG_MODULES=timestamp,nsHttp:5,nsSocketT
ransport:5,nsStreamPump:5,nsHostResolver:5
2. Started up the browser with session store. Let everything settle down
3. Loaded the synergy-project.org/Download page (I was already logged in via the cookie, so no auth flow required)
4. Clicked the download link and immediately crashed.

I have both the main log and the child log in this zip file. 

Hope it's useful.
Flags: needinfo?(ctalbert)
Looks like it's line 16664 in the child log where I open a new tab with the synergy download page and begin the download.  (Synergy-project.org was not in the sessions that were restored so that it wouldn't be confusing in the log).
Thank you, Clint!
Attached patch fix nsHttpChannel::Suspend (obsolete) — Splinter Review
I found the reason for non NoScript crashes.
In case of requests partially served from the cache and partially from the network nsHttpChannel::Suspend() only suspends the network stream but not the cache one :)
Attachment #8553635 - Flags: review?(honzab.moz)
Flags: needinfo?(alex_mayorga)
Comment on attachment 8553635 [details] [diff] [review]
fix nsHttpChannel::Suspend

Review of attachment 8553635 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/nsHttpChannel.cpp
@@ +4678,2 @@
>      if (mCachePump)
>          return mCachePump->Suspend();

good catch!  please just suspend both (if both present) and return NS_FAILED(rv1) ? rv1 : rv2, if you got me.
Attachment #8553635 - Flags: review?(honzab.moz) → feedback+
Attached patch fix suspend and resume (obsolete) — Splinter Review
I have also fix resume. That needs to be fix too.
Attachment #8553635 - Attachment is obsolete: true
Attachment #8555101 - Flags: review?(honzab.moz)
Attachment #8555101 - Attachment is obsolete: true
Attachment #8555101 - Flags: review?(honzab.moz)
Attachment #8555103 - Flags: review?(honzab.moz)
This is currently the #7 topcrash on Firefox 38.0a1 with 461/19222 crashes in the last week.
Comment on attachment 8555103 [details] [diff] [review]
fix suspend and resume

Review of attachment 8555103 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab conditioned by a green try run.
Attachment #8555103 - Flags: review?(honzab.moz) → review+
Attachment #8555103 - Flags: checkin?
checkin- needed only for the patch with checkin flag not for the other reviewed patch.
Keywords: checkin-needed
Clint, Can you please try this build:

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/dd.mozilla@gmail.com-147dff22c4b5/try-win64/firefox-38.0a1.en-US.win64-x86_64.installer.exe

I think this patch will solve you problem, but if it does not please send me a log made with this build.
Flags: needinfo?(ctalbert)
(In reply to Dragana Damjanovic [:dragana] from comment #49)
> Clint, Can you please try this build:
> 
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/dd.mozilla@gmail.
> com-147dff22c4b5/try-win64/firefox-38.0a1.en-US.win64-x86_64.installer.exe
> 
> I think this patch will solve you problem, but if it does not please send me
> a log made with this build.

Dragana,

I tried the new build you linked to and it works fine. Sadly, however, today's currently nightly also works fine for me. So, hopefully the patch is working, but my testing is kind of inconclusive since I couldn't easily repro it in today's nightly. Tried doing a bunch of network based things all at once while downloading, flipping between tabs, watching and seeking videos and reloading other sites. Nothing I did with either build would cause the crash.
Flags: needinfo?(ctalbert)
Attachment #8555103 - Flags: checkin? → checkin+
(In reply to Clint Talbert ( :ctalbert ) from comment #50)
> (In reply to Dragana Damjanovic [:dragana] from comment #49)
> > Clint, Can you please try this build:
> > 
> > http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/dd.mozilla@gmail.
> > com-147dff22c4b5/try-win64/firefox-38.0a1.en-US.win64-x86_64.installer.exe
> > 
> > I think this patch will solve you problem, but if it does not please send me
> > a log made with this build.
> 
> Dragana,
> 
> I tried the new build you linked to and it works fine. Sadly, however,
> today's currently nightly also works fine for me. So, hopefully the patch is
> working, but my testing is kind of inconclusive since I couldn't easily
> repro it in today's nightly. Tried doing a bunch of network based things all
> at once while downloading, flipping between tabs, watching and seeking
> videos and reloading other sites. Nothing I did with either build would
> cause the crash.

Thank you for trying.
This patch will be in one of the next nightlies so we will see if the crashes without NoScript disappear.
(There were not so may 2-3 a day :)
https://hg.mozilla.org/mozilla-central/rev/9a1f431a00a9
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: