Closed Bug 739023 Opened 8 years ago Closed 8 years ago

crash in nsHttpChannel::OnDataAvailable

Categories

(Core :: Networking: HTTP, defect, critical)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla14
Tracking Status
firefox13 --- unaffected
firefox14 + fixed

People

(Reporter: scoobidiver, Assigned: mcmanus)

References

Details

(Keywords: crash, regression, topcrash, Whiteboard: [native-crash])

Crash Data

Attachments

(1 file, 1 obsolete file)

It first appeared in 14.0a1/20120324 and is currently #2 top crasher in 14.0a1 over the last day.
The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ab2ff3b5611f&tochange=df1f94b2bdee

Signature 	nsHttpChannel::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned int, unsigned int) More Reports Search
UUID	9f2a61bb-5e01-45e9-874a-3f2962120325
Date Processed	2012-03-25 06:31:45
Uptime	1401
Last Crash	27.1 minutes before submission
Install Age	15.6 hours since version was first installed.
Install Time	2012-03-24 14:52:24
Product	Firefox
Version	14.0a1
Build ID	20120324031100
Release Channel	nightly
OS	Windows NT
OS Version	6.1.7601 Service Pack 1
Build Architecture	x86
Build Architecture Info	GenuineIntel family 6 model 37 stepping 2
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x1c
App Notes 	
AdapterVendorID: 0x1002, AdapterDeviceID: 0x68c1, AdapterSubsysID: 395117aa, AdapterDriverVersion: 8.772.0.0
Has dual GPUs. GPU #2: AdapterVendorID2: 0x8086, AdapterDeviceID2: 0x0046, AdapterSubsysID2: 395117aa, AdapterDriverVersion2: 8.772.0.0D2D? D2D+ DWrite? DWrite+ D3D10 Layers? D3D10 Layers+ 
EMCheckCompatibility	True	
Total Virtual Memory	4294836224
Available Virtual Memory	3541532672
System Memory Use Percentage	28
Available Page File	3622756352
Available Physical Memory	2904408064

Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	nsHttpChannel::OnDataAvailable 	netwerk/protocol/http/nsHttpChannel.cpp:4596
1 	xul.dll 	nsInputStreamPump::OnStateTransfer 	netwerk/base/src/nsInputStreamPump.cpp:514
2 	xul.dll 	nsInputStreamPump::OnInputStreamReady 	netwerk/base/src/nsInputStreamPump.cpp:402
3 	xul.dll 	nsInputStreamReadyEvent::Run 	xpcom/io/nsStreamUtils.cpp:114
4 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:657
5 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:110
6 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:201
7 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:175
8 	xul.dll 	nsBaseAppShell::Run 	widget/xpwidgets/nsBaseAppShell.cpp:189
9 	xul.dll 	nsAppShell::Run 	widget/windows/nsAppShell.cpp:267
10 	xul.dll 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:295
11 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3703
...

More reports at:
https://crash-stats.mozilla.com/report/list?signature=nsHttpChannel%3A%3AOnDataAvailable%28nsIRequest*%2C+nsISupports*%2C+nsIInputStream*%2C+unsigned+int%2C+unsigned+int%29
Whiteboard: [native-crash]
I just wanted to note in the bug that this is at the top of my coding list, I'm just out of the office this week and it is not clear if I'll have time to tackle it until next monday.
Assignee: nobody → mcmanus
I get this crash on some pages only if i set network.http.pipelining.aggressive true.
(In reply to SpeciesX from comment #2)
> I get this crash on some pages only if i set
> network.http.pipelining.aggressive true.

if you have a url (or a set) where it seems to show up that would be helpful.
Keywords: needURLs
Crashes happens randomly and in my environment, firefox crashes even if network.http.pipelining.aggressive is set off.

As a tendency, once a crash happened with a site, it repeats 2 or 3 times with same site. But once opened correctly, no more crashes...
(In reply to Patrick McManus [:mcmanus] from comment #3)
> if you have a url (or a set) where it seems to show up that would be helpful.

These crash with 100% reliability over here on Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:14.0) Gecko/20120327 Firefox/14.0a1 ID:20120327031149

Anything *.google.com 
https://support.xobni.com/entries/20151838

To add insult to injury, break pad is not catching these reliably.

After crashing all morning today eventually bug 739720 was captured, not sure if it is related.
Blocks: 739762
mresponsehead in nshttpchannel::ondataavailable() is pretty clearly null from those reports. Presumably from TakeResponseHead(). (I can't reproduce, but there are lots of reports :))

that of course should be impossible.
I didn't check the report more deeply but couldn't mResponseHead rather be invalid pointer?  Do you think this is related to the restart logic?  Maybe we could try to land a small patch to disable some of these functionality bits to identify the crash.  It is quit frequent, so we should figure out quickly ;)

I can take care of it if you agree, but I also have a busy week...
(In reply to Honza Bambas (:mayhemer) from comment #7)
> I didn't check the report more deeply but couldn't mResponseHead rather be
> invalid pointer? 

all the crashes are on deref of 0x1c or 0x20 (dep on 32 or 64 bit build), so It seems very likely to be a member off the null object. Some of the other reports look like invalid ptrs :)
(by other reports I meant differnet bugzilla entries - that was unclear)
honza,

given that its hard to really look into this this week - would you considering r? and pushing this for me? We could get data on the crashrate..

diff --git a/netwerk/protocol/http/nsHttpTransaction.cpp b/netwerk/protocol/http                                                                                        /nsHttpTransaction.cpp
--- a/netwerk/protocol/http/nsHttpTransaction.cpp
+++ b/netwerk/protocol/http/nsHttpTransaction.cpp
@@ -835,16 +835,18 @@ nsHttpTransaction::PipelinePosition()
 // nsHttpTransaction <private>
 //-----------------------------------------------------------------------------

 nsresult
 nsHttpTransaction::RestartInProgress()
 {
     NS_ASSERTION(PR_GetCurrentThread() == gSocketThread, "wrong thread");

+    return NS_ERROR_FAILURE;
+
     // Lock RestartInProgress() and TakeResponseHead() against main thread
     MutexAutoLock lock(*nsHttp::GetLock());

     // don't try and restart 0.9
     if (mHaveAllHeaders && !mRestartInProgressVerifier.IsSetup())
         return NS_ERROR_NET_RESET;

     LOG(("Will restart transaction %p and skip first %lld bytes, "
Don't know if this helps - my FF14 crashes when pipelining is enabled and Avira Webguard is running, which AFAIK acts as a proxy. Disabling pipelining or Webguard makes the browser stable again.

(When I first had the problem, I found no difference between running with or withoug Webguard, during the last days this has changed to what I describe above)
(In reply to Bernd Wolff from comment #11)
> Don't know if this helps - my FF14 crashes when pipelining is enabled and
> Avira Webguard is running, which AFAIK acts as a proxy. Disabling pipelining
> or Webguard makes the browser stable again.
> 
> (When I first had the problem, I found no difference between running with or
> withoug Webguard, during the last days this has changed to what I describe
> above)


Bernd, that's a helpful lead - thanks! I won't be able to track it down in earnest until next week but until then let me ask you is that Avira "antivirus" or Avira "Internet Security"? I don't see anything named webguard on their website (anymore?)
I have Avira Antivirus Premium 2012, Webguard is one of the components (the others being Mailguard and Realtime protection). IIRC "Internet Security" adds a firewall.
How to disable pipelining?

All boolean values containing "pipelining" in about:config are false here and the crashes persist on Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:14.0) Gecko/20120328 Firefox/14.0a1 ID:20120328031211
http.network.pipelining was true in my config, after setting it to false the crashes stopped.

My current config values found when filtering "http.pipe" are:

network.http.pipelining.read-timeout;10000
network.http.pipelining.maxsize;300000
network.http.pipelining.max-optimistic-requests;4
network.http.pipelining.maxrequests;50
network.http.pipelining;false
network.http.pipelining.aggressive;false
network.http.pipelining.ssl;false

Broadening the filter to use just "pipe", I find

network.http.proxy.pipelining;true

maxrequests is marked as "user set", but I cannot remember when or why I should have changed it.
(In reply to alex_mayorga from comment #14)
> How to disable pipelining?
> 
> All boolean values containing "pipelining" in about:config are false here
> and the crashes persist on Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:14.0)
> Gecko/20120328 Firefox/14.0a1 ID:20120328031211

Then you probably have it off - its not clear that you need to have it on for this to happen. (though that is true for some users)

are you running any firewall software like avira webguard? (comment 13)

can you paste in any non default preferences you have set (about:support will tell you this)
(In reply to Patrick McManus [:mcmanus] from comment #16)
> are you running any firewall software like avira webguard? (comment 13)

No but I'm behind a corporate proxy (Websense IIRC)

> can you paste in any non default preferences you have set (about:support
> will tell you this)

accessibility.typeaheadfind.flashBar 0
browser.cache.disk.capacity 1048576
browser.cache.disk.smart_size.first_run false
browser.cache.disk.smart_size_cached_value 1048576
browser.places.smartBookmarksVersion 3
browser.startup.homepage_override.buildID 20120328031211
browser.startup.homepage_override.mstone 14.0a1
extensions.lastAppVersion 14.0a1
gfx.direct3d.prefer_10_1 true
network.cookie.prefsMigrated true
network.negotiate-auth.delegation-uris .work.net,.work.com
network.negotiate-auth.trusted-uris .work.net,.work.com
places.history.expiration.transient_current_max_pages 104254
privacy.sanitize.migrateFx3Prefs true
security.warn_viewing_mixed false
Everything for "pipe" in about:config has "default" values:

network.http.pipelining;false
network.http.pipelining.aggressive;false
network.http.pipelining.max-optimistic-requests;4
network.http.pipelining.maxrequests;32
network.http.pipelining.maxsize;300000
network.http.pipelining.read-timeout;10000
network.http.pipelining.ssl;false
network.http.proxy.pipelining;false
The crashes caused by https://support.xobni.com/entries/20151838 are now being caught by breakpad on Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:14.0) Gecko/20120329 Firefox/14.0a1 ID:20120329031156 \o/

bp-417dc8c5-3ef9-401f-83d3-701c02120329
bp-96bfabbc-e0a4-4ffa-b06a-89a192120329

Seems like it's bug 739024.
(In reply to Patrick McManus [:mcmanus] from comment #10)
> honza,
> 
> given that its hard to really look into this this week - would you
> considering r? and pushing this for me? We could get data on the crashrate..


Landed investigation patch:

https://hg.mozilla.org/integration/mozilla-inbound/rev/4df2d9a68d6e

Please don't close the bug.
Whiteboard: [native-crash] → [native-crash] [DON'T CLOSE]
Some URLs as requested, these are for nsHttpChannel::OnDataAvailable:

107 \N
     79 about:blank
     10 http://www.repubblica.it/
     10 http://www.boerse.bz/
     10 http://0day.kiev.ua/
      8 http://www.harveynorman.com.au/
      7 http://www.t-online.de/
      6 http://www.112brabantnieuws.nl/
      5 http://www.facebook.com/
      5 https://www.facebook.com/
      5 http://forums.mozillazine.org/viewtopic.php?f=23&t=2448905&p=11857895
      5 http://boards.4chan.org/b/
      5 about:newaddon?id=uriloader@pdf.js
      4 http://www.tagesschau.de/
      4 http://www.penny-arcade.com/
      4 http://www.momanda.de/videos
      4 http://www.facebook.com/
      4 http://de.mg40.mail.yahoo.com/
      4 http://boerse.bz/
      4 about:newtab
      3 http://www.wetteronline.de/
      3 http://www.toyota.de/cars/coming_soon/index.tmex
      3 http://www.repubblica.it/index.html?refresh_ce
      3 http://www.getprice.com.au/D-Link-DHP-501AV-Powerline-AV-500-Network-Adapter-Starter-Kit-Gpnc_53--59619858.htm
      3 http://www.geekstogo.com/forum/topic/297804-my-antivirus-doesnt-work/page__st__45
      3 http://www.ebay.de/itm/Sennheiser-IE-8-Ear-Canal-Phones-Earphone-IE8-New-/290629761152?pt=Kopfh%C3%B6rer&hash=item43aae23880
      3 http://www.divxtotal.com/
      3 http://www.computerbild.de/
      3 http://dimonvideo.ru/articles.html
      3 http://boards.4chan.org/g/
      3 http://bay152.mail.live.com/
      2 http://zhan.renren.com/eblaster?checked=true
      2 http://www.youtube.com/show/minecraft?s=1
      2 http://www.youtube.com/
      2 http://www.unitymedia.de/entertainment/index.html?salesid_nrw=43180&salesid_hessen=43180&cid=UM-04019
      2 http://www.sportschau.de/sp/
      2 http://www.sportmediaset.mediaset.it/
      2 http://www.pixmania.com/televisori/itit3_8_pm.html
      2 http://www.nero.com/rus/store.html?NeroSID=78c09c7881514307b41870fb43a5e922
      2 http://www.mobafire.com/league-of-legends/build/dutchbrothers-volibear-build-155239
      2 http://www.inoprosport.ru/football/
      2 http://www.huyandex.com/
      2 http://www.gamekeyshop.net/
      2 http://www.footytube.com/index.php
      2 http://www.flickr.com/photos/toyota-europe/sets/
      2 http://www.ekonja-verlag.com/
      2 http://www.dobreprogramy.pl/
      2 http://www.bvb.de/?Z%1B%E7%F4%9D
      2 http://www.bom.gov.au/nsw/
      2 http://www.avclub.com/tvclub/tvshow/30-rock,23/
      2 http://www.abnehmen-aktuell.de/
      2 http://www5.407etr.com/tolls/rate-chart-2012.html
      2 http://www.123video.nl/members/home.asp
      2 http://well.ca/
Keywords: needURLs
https://hg.mozilla.org/mozilla-central/rev/4df2d9a68d6e
Whiteboard: [native-crash] [DON'T CLOSE] → [native-crash] [leave open]
Target Milestone: --- → mozilla14
I'm getting this sporadically in SeaMonkey while loading my homepage (200+ tabs). It takes some time to happen, and doesn't happen every time: I've had it twice so far (Sm 2.11a1 [Gecko 14.0a1] on linux-x86_64).
bp-ab98c71f-cc59-47f0-9b27-e54942120326 build ID: 20120326003030
bp-37bf9bc1-7bfe-4d4c-b7c9-2d34f2120331 build ID: 20120330003035
Version: 14 Branch → Trunk
I can repro this with avira webguard, pipelining on, and a build from 3/26

https://crash-stats.mozilla.com/report/index/bp-115d7e92-c6e4-457e-9b49-8aa722120403
https://crash-stats.mozilla.com/report/index/bp-a1084763-00c7-4430-b6c1-18fc42120403
https://crash-stats.mozilla.com/report/index/bp-4f3fa72a-1174-4fb8-952b-a45c12120403

disable pipelines and the crashes go away. proxy and ssl pipeline settings are not relevant.

url is not consistent but redhat.com and ibm.com reproduce after a few seconds of clicking.

crash stats at https://crash-stats.mozilla.com/report/list?signature=nsHttpChannel%3A%3AOnDataAvailable%28nsIRequest*%2C+nsISupports*%2C+nsIInputStream*%2C+unsigned+int%2C+unsigned+int%29 show this signature basically going away starting with the 3/31 nightly. The previous versions crashed 169, 135, 36, and 134 times - but 3/31 and 4/1 have just 1 crash a piece (and there are occasional instances of this crash from before the pipeline code).

The patch in comment 20 that disabled restart-in-progress first appreared in the 3/31 nightly, so we can feel pretty confident the issue is in RIP (and that patch was worth doing for the data - thanks honza!)

I've further confirmed by trying to repro with the 3/30 nightly (positive) and the 3/31 nightly (negative). All of this is on a windows 7 VM with avira webguard.
Attached patch patch 0 (obsolete) — Splinter Review
This turns out to be pretty simple.

TakeResponseHeaders() was returning null if mHaveAllHeaders was false. In the case where we had done a restart mHaveAllHeaders is also reset until the restarted transaction has its headers arrive, even though if it is true at the time of reset the existing headers are set aside. Therefore in TakeResponseHeaders() any set-aside headers can always be returned without fear of them being incomplete. My testing here didn't show that race condition (i.e. I verified the setaside headers being returned from TakeResponseHeaders() but they mHaveAllHeaders happened to have migrated back to true at the time)

The patch is just fyi for the moment while try verifies it and gives me real windows builds I can use to verify the fix. (I was able to simulate it and get the right stack trace on linux by injecting errors). [hmm.. it seems try is closed for the night :(]

this patch also re-enables restartinprogress()
Attached patch patch 1Splinter Review
Update this to only enable restart-in-progress for 200/GET transactions. That's clearly its intended use case and I'm not 100% comfortable with other permutations right now.

(the main fix in this patch is necessary for the race condition with 200/GET, so that limitation is not meant as a workaround - just a further refinement.)
Attachment #612070 - Attachment is obsolete: true
Whiteboard: [native-crash] [leave open] → [native-crash]
Attachment #612194 - Flags: review?(honzab.moz)
Worth noting that even though this was an implementation bug, this use case is a real success story for the overall patch set.

Avira is terminating the pipelines during some large responses and the piepline code is adding a blacklist (so the problem does not recur), is rescheduling aborted parts of the pipeline, and fixing in place the image that was the large response. (well its fixing it in place with this patch applied :)). The avira "proxy" is really an lsp so this is not explicitly detectable otherwise.
Comment on attachment 612194 [details] [diff] [review]
patch 1

Review of attachment 612194 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab

::: netwerk/protocol/http/nsHttpTransaction.cpp
@@ +373,5 @@
> +    if (mForTakeResponseHead) {
> +        head = mForTakeResponseHead;
> +        mForTakeResponseHead = nsnull;
> +        return head;
> +    }

Agrhh... yes, now I can see it.  Hell, I had to catch this during the review!  Good work finding it.

@@ +1621,5 @@
>      if (mSetup)
>          return;
>  
> +    // If mSetup does not transition to true then this verifier cannot
> +    // be used for restartInProgress

Maybe: If mSetup does not transition to true restart in progress is later forbidden.
Attachment #612194 - Flags: review?(honzab.moz) → review+
http://hg.mozilla.org/mozilla-central/rev/f2c92cf02061
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.