Closed Bug 1163900 Opened 9 years ago Closed 9 years ago

crash in mozilla::net::nsHttpChannelCacheKey::GetData(unsigned int*, nsACString_internal&)

Categories

(Core :: Networking, defect)

Unspecified
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
e10s m7+ ---
firefox39 --- unaffected
firefox40 + fixed
firefox41 + fixed
firefox-esr31 --- unaffected
firefox-esr38 --- unaffected

People

(Reporter: Usul, Assigned: michal)

References

Details

(4 keywords)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-94469f8f-1862-48fd-a895-23d562150512.
=============================================================
0 	libxul.so 	mozilla::net::nsHttpChannelCacheKey::GetData(unsigned int*, nsACString_internal&) 	netwerk/protocol/http/nsHttpChannel.cpp
1 	libxul.so 	mozilla::net::HttpChannelChild::ContinueAsyncOpen() 	netwerk/protocol/http/HttpChannelChild.cpp
2 	libxul.so 	mozilla::net::HttpChannelChild::AsyncOpen(nsIStreamListener*, nsISupports*) 	netwerk/protocol/http/HttpChannelChild.cpp
3 	libxul.so 	nsURILoader::OpenURI(nsIChannel*, unsigned int, nsIInterfaceRequestor*) 	uriloader/base/nsURILoader.cpp
4 	libxul.so 	nsDocShell::DoChannelLoad(nsIChannel*, nsIURILoader*, bool) 	docshell/base/nsDocShell.cpp
5 	libxul.so 	nsDocShell::DoURILoad(nsIURI*, nsIURI*, bool, unsigned int, nsISupports*, char const*, nsAString_internal const&, nsIInputStream*, nsIInputStream*, bool, nsIDocShell**, nsIRequest**, bool, bool, bool, nsAString_internal const&, nsIURI*, unsigned int) 	docshell/base/nsDocShell.cpp
6 	libxul.so 	nsDocShell::InternalLoad(nsIURI*, nsIURI*, unsigned int, nsISupports*, unsigned int, char16_t const*, char const*, nsAString_internal const&, nsIInputStream*, nsIInputStream*, unsigned int, nsISHEntry*, bool, nsAString_internal const&, nsIDocShell*, nsIURI*, nsIDocShell**, nsIRequest**) 	docshell/base/nsDocShell.cpp
7 	libxul.so 	nsDocShell::LoadHistoryEntry(nsISHEntry*, unsigned int) 	docshell/base/nsDocShell.cpp
8 	libxul.so 	nsDocShell::LoadURI(nsIURI*, nsIDocShellLoadInfo*, unsigned int, bool) 	docshell/base/nsDocShell.cpp
9 	libxul.so 	nsSHistory::InitiateLoad(nsISHEntry*, nsIDocShell*, long) 	docshell/shistory/src/nsSHistory.cpp
10 	libxul.so 	nsSHistory::LoadEntry(int, long, unsigned int) 	docshell/shistory/src/nsSHistory.cpp
11 	libxul.so 	nsSHistory::ReloadCurrentEntry() 	docshell/shistory/src/nsSHistory.cpp
12 	libxul.so 	NS_InvokeByIndex 	xpcom/reflect/xptcall/md/unix/xptcinvoke_x86_64_unix.cpp
13 	libxul.so 	XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) 	js/xpconnect/src/XPCWrappedNative.cpp
14 	libxul.so 	XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) 	js/xpconnect/src/XPCWrappedNativeJSOps.cpp
15 	libxul.so 	js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) 	js/src/jscntxtinlines.h
16 	libxul.so 	Interpret 	js/src/vm/Interpreter.cpp
17 	libxul.so 	js::RunScript(JSContext*, js::RunState&) 	js/src/vm/Interpreter.cpp
18 	libxul.so 	js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp
19 	libxul.so 	js::CallOrConstructBoundFunction(JSContext*, unsigned int, JS::Value*) 	js/src/jsfun.cpp
20 	libxul.so 	js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) 	js/src/jscntxtinlines.h
21 	libxul.so 	js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>) 	js/src/vm/Interpreter.cpp
22 	libxul.so 	js::DirectProxyHandler::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const 	js/src/proxy/DirectProxyHandler.cpp
23 	libxul.so 	js::CrossCompartmentWrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const 	js/src/proxy/CrossCompartmentWrapper.cpp
24 	libxul.so 	js::Proxy::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) 	js/src/proxy/Proxy.cpp
25 	libxul.so 	js::proxy_Call(JSContext*, unsigned int, JS::Value*) 	js/src/proxy/Proxy.cpp
26 	libxul.so 	js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) 	js/src/jscntxtinlines.h
27 	libxul.so 	Interpret 	js/src/vm/Interpreter.cpp
28 	libxul.so 	js::RunScript(JSContext*, js::RunState&) 	js/src/vm/Interpreter.cpp
29 	libxul.so 	js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp
30 	libxul.so 	js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>) 	js/src/vm/Interpreter.cpp
31 	libxul.so 	JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) 	js/src/jsapi.cpp
32 	libxul.so 	nsFrameMessageManager::ReceiveMessage(nsISupports*, nsIFrameLoader*, bool, nsAString_internal const&, bool, mozilla::dom::StructuredCloneData const*, mozilla::jsipc::CpowHolder*, nsIPrincipal*, nsTArray<nsString>*) 	dom/base/nsFrameMessageManager.cpp
33 	libxul.so 	nsFrameMessageManager::ReceiveMessage(nsISupports*, nsIFrameLoader*, nsAString_internal const&, bool, mozilla::dom::StructuredCloneData const*, mozilla::jsipc::CpowHolder*, nsIPrincipal*, nsTArray<nsString>*) 	dom/base/nsFrameMessageManager.cpp
34 	libxul.so 	mozilla::dom::TabChild::RecvAsyncMessage(nsString const&, mozilla::dom::ClonedMessageData const&, nsTArray<mozilla::jsipc::CpowEntry>&&, IPC::Principal const&) 	dom/ipc/TabChild.cpp
35 	libxul.so 	mozilla::dom::PBrowserChild::OnMessageReceived(IPC::Message const&) 	obj-firefox/ipc/ipdl/PBrowserChild.cpp
36 	libxul.so 	mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) 	obj-firefox/ipc/ipdl/PContentChild.cpp
37 	libxul.so 	mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) 	ipc/glue/MessageChannel.cpp
38 	libxul.so 	mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message const&) 	ipc/glue/MessageChannel.cpp
39 	libxul.so 	mozilla::ipc::MessageChannel::OnMaybeDequeueOne() 	ipc/glue/MessageChannel.cpp
40 	libxul.so 	MessageLoop::DoWork() 	ipc/chromium/src/base/message_loop.cc
41 	libxul.so 	mozilla::ipc::DoWorkRunnable::Run() 	ipc/glue/MessagePump.cpp
42 	libxul.so 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp
43 	libxul.so 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/glue/nsThreadUtils.cpp
44 	libxul.so 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp
45 	libxul.so 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc
46 	libxul.so 	nsBaseAppShell::Run() 	widget/nsBaseAppShell.cpp
47 	libxul.so 	XRE_RunAppShell 	toolkit/xre/nsEmbedFunctions.cpp
48 	libxul.so 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc
49 	libxul.so 	XRE_InitChildProcess 	toolkit/xre/nsEmbedFunctions.cpp
50 	libxul.so 	_fini
How long for bugzilla <-> crashstats sync ? I'm crashing a lot on that and not seeing the bug in crash stats.
I got this today after a session restore, and e1os enabled.
The tab crashed.  And it kept on crashing when i tried to restore that tab. I have dom.ipc.processCount set to 20.
(In reply to Ludovic Hirlimann [:Usul] from comment #1)
> How long for bugzilla <-> crashstats sync ? I'm crashing a lot on that and
> not seeing the bug in crash stats.

Search *usually* should get it within a few minutes (when processing is done), and when you call a crash up from its ID, it even goes into a priority processing queue, topcrash reports and signature summaries are only generated in the US night and for full UTC days.
That said, we have a strange issue with the crash processors right now and there is a long backlog for processing. The team is working on that.
http://java.com/en/download/installed8.jsp?detect=jre might or might not trigger this crash...

Report ID 	Date Submitted
bp-9d9c4981-1bb1-476c-8b0a-5d7cc2150512
	12/05/2015	03:41 p.m.
bp-28309a59-76fd-49bf-a1b5-c4d692150512
	12/05/2015	03:39 p.m.
bp-2fa493f1-317a-4d23-a1bd-779dc2150512
	12/05/2015	03:19 p.m.
bp-ce4d3528-decc-4c80-896f-626572150512
	12/05/2015	03:19 p.m.
bp-e574be74-a1b8-4c71-9a76-052022150512
	12/05/2015	03:16 p.m.
It seems to be related to POSTs after restarting for me.

Here's the STR I used:

1. Enable e10s
2. Set Firefox to show windows and tabs from last time
3. Go to http://www.pangloss.com/seidel/Poem/
4. Click "Groove" button to submit a POST
5. Restart Firefox

After restart, the tab will crash.
Michal, do you think bug 1156493 is responsible here?
Flags: needinfo?(michal.novotny)
I got what I believe is this crash. I had several tabs open and restarted for an update. On restart all tabs were crashed, and restoring all resulted in all tabs crashing again. I selectively restored tabs until I discovered that it was the New Tab Page that was crashing things.
Everybody who is experiencing this bug - do you have dom.ipc.processCount > 1? Or is there anybody here who has it set to 1, and seeing this crash?
Flags: needinfo?(ludovic)
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #8)
> Everybody who is experiencing this bug - do you have dom.ipc.processCount >
> 1? Or is there anybody here who has it set to 1, and seeing this crash?

my about:config says dom.ipc.processCount default integer 1 and I experienced this crash.
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #8)
> Everybody who is experiencing this bug - do you have dom.ipc.processCount >
> 1? Or is there anybody here who has it set to 1, and seeing this crash?

I have it set to 1 and see the crash.
Alright, cool - thanks.
Flags: needinfo?(ludovic)
(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #6)
> Michal, do you think bug 1156493 is responsible here?

It seems so.
Assignee: nobody → michal.novotny
Flags: needinfo?(michal.novotny)
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #8)
> Everybody who is experiencing this bug - do you have dom.ipc.processCount >
> 1? Or is there anybody here who has it set to 1, and seeing this crash?

It's at one right now - but I had to restart in safe-mode at some point - that cleared things.
I have dom.ipc.processCount == 1.
Attached patch fixSplinter Review
In this case, mCacheKey in HttpChannelChild is not nsHttpChannelCacheKey because it is set by SessionHistory code which serializes only the integer part of the cacheKey and deserializes it to nsSupportsPRUint32Impl. I decided to remove nsHttpChannelCacheKey completely and use nsSupportsPRUint32Impl because the string from the key is not used anywhere, it's not even checked in nsHttpChannel::SetCacheKey().
Attachment #8605235 - Flags: review?(jduell.mcbugs)
[Tracking Requested - why for this release]:

As this was caused by the patch for bug 1156493 which landed before the uplift, I think this needs to track.
This makes e10s unusable for me.
tracking-e10s: --- → ?
We should probably get this reviewed and landed ASAP, or back out the offending bug. I agree with mccr8 - this is making e10s particularly unstable.

jduell - do you think you'd be able to get to a review soonish? Or should we back out bug 1156493?
Flags: needinfo?(jduell.mcbugs)
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #20)
> We should probably get this reviewed and landed ASAP, or back out the
> offending bug. I agree with mccr8 - this is making e10s particularly
> unstable.
> 
> jduell - do you think you'd be able to get to a review soonish? Or should we
> back out bug 1156493?

I'm hitting this pretty often with Dev Edition 40 builds - let's stabilize this ASAP before the bulk of dev edition users update on Friday.
Alright - if we don't hear from jduell abou tthis within 24 hours, I will back out bug 1156493 from mozilla-central and mozilla-aurora.
Flags: needinfo?(mconley)
(In reply to Jeff Griffiths (:canuckistani) from comment #21)
> I'm hitting this pretty often with Dev Edition 40 builds - let's stabilize
> this ASAP before the bulk of dev edition users update on Friday.

I agree. From my POV, QE sign-off for turning on updates to Dev Edition 40 is gated on this bug being fixed, either through a patch or backout.
Comment on attachment 8605235 [details] [diff] [review]
fix

Review of attachment 8605235 [details] [diff] [review]:
-----------------------------------------------------------------

This patch is confusing me (but then again I don't know this code as well as I should).  Michal, so you're no longer calling GenerateCacheKey in GetCacheKey, i.e you're now using only the uint32 postKey in the cache.key.  I've got some questions about whether that's OK:

1) in the existing string version of cacheKey we wind up using different keys for anonymous channels (i.e. we prepend "anon&" to the key):

  https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpChannel.cpp?from=nsHttpChannel.cpp#3503

Is it OK to use just the postID for anonymous channels, or will that be a possible information leak?

2) We also used to attach the URI itself to the cacheKey.  Why did we do that, and why is it OK to not do it any more?

3) Assuming #1 and #2 are OK, should we be getting rid of GenerateCacheKey() entirely?  Right now your patch leaves it around, and it's used to create the chain of cacheKeys that we use to detect infinite loops in cached redirects:
 
 https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpChannel.cpp?from=nsHttpChannel.cpp#3178

Does that redirect loop detection require the string fields (like the uri) to work, or can it switch to uint32 too?  

4) So are we confident that in e10s the docshell will be setting only integer cacheKeys?  We'll accept any nsISupports it passes in, and it looks like with this patch we'll just QI to SupportsUInt32 and grab the postId out of it.  Is that OK?

The actual patch here to convert to uint32-only cacheKeys looks fine, so if there's nothing to worry about with my concerns, Michal, feel free to mark this +r and land it.
Flags: needinfo?(jduell.mcbugs) → needinfo?(michal.novotny)
Michal, Is it true that backing out 1156493 only impacts view source on e10s?

If that's the case (or if there are other similarly minor impacts) I think we should just back it out of 40 and fix it on 41.
AFAIK viewsource and restoring session won't work correctly in case the post ID is not zero.
+1 on backout and 41 fix.
Flags: needinfo?(michal.novotny)
(In reply to Jason Duell [:jduell] (needinfo? me) from comment #25)
> 1) in the existing string version of cacheKey we wind up using different
> keys for anonymous channels (i.e. we prepend "anon&" to the key):
>  
> https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/
> nsHttpChannel.cpp?from=nsHttpChannel.cpp#3503
> 
> Is it OK to use just the postID for anonymous channels, or will that be a
> possible information leak?
> 
> 2) We also used to attach the URI itself to the cacheKey.  Why did we do
> that, and why is it OK to not do it any more?
> 
> 3) Assuming #1 and #2 are OK, should we be getting rid of GenerateCacheKey()
> entirely?  Right now your patch leaves it around, and it's used to create
> the chain of cacheKeys that we use to detect infinite loops in cached
> redirects:
>  
>  https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/
> nsHttpChannel.cpp?from=nsHttpChannel.cpp#3178
> 
> Does that redirect loop detection require the string fields (like the uri)
> to work, or can it switch to uint32 too?  

With this patch, we still use the cacheKey internally as before. The only difference is that the cacheKey returned by nsICacheInfoChannel::GetCacheKey() contains only the post ID. The cacheKey string in it was never used. I.e. you cannot (and it never worked) get cacheKey from channel with URI foo.com, pass it to a channel with URI bar.com and think that it will load the foo.com page from the cache. The cacheKey was always used only to specify the post ID. So there should be no change in the functionality.


> 4) So are we confident that in e10s the docshell will be setting only
> integer cacheKeys?  We'll accept any nsISupports it passes in, and it looks
> like with this patch we'll just QI to SupportsUInt32 and grab the postId out
> of it.  Is that OK?

Docshell should always set the cacheKey that it received from the http channel. The only exceptions that I'm aware of now is the session history which needs to serialize/deserialize the cacheKey.


I think we should land it, but if you think we should do it in the next release I'm OK with it. I just think that the patch won't change.
(In reply to Michal Novotny (:michal) from comment #27)
> AFAIK viewsource and restoring session won't work correctly in case the post
> ID is not zero.

The two bugs that this was helping to fix were bug 1025146 and bug 1128050, which were slated to be fixed for 40 because we had assumed e10s was going to be enabled by default.

It sounds like Michal's patch has some risk, so I've talked to jimm - we're OK to back out bug 1156493 on 40, and try this on 41. That will cause bug 1025146 and bug 1128050 to be re-opened for 40, but we think that's an acceptable trade-off considering how awful these crashes are.

I am unaware of any session restoring bug that bug 1156493 fixes.

Note that neither bug 1025146 or bug 1128050 need to be backed out along with bug 1156493 - they should continue to operate properly for non-e10s (and, in fact, the devtools team needs bug 1025146 to stay on 40 so they can try to get bug 1067325 fixed for the spring release).

TL;DR - e10s team is cool with bug 1156493 getting backed out on 40, and trying this fix on 41.
Flags: needinfo?(mconley)
Comment on attachment 8605876 [details] [diff] [review]
Back out 371cbdcc9562 (bug 1156493) for causing frequent content process crashes (bug 1163900). a=?

Approval Request Comment
[Feature/regressing bug #]:

Bug 1156493.

[User impact if declined]:

e10s users on 40 will sometimes hit the network when viewing source or saving a page to disk via Save Page As (see bug 1025146 and bug 1128050).

[Describe test coverage new/current, TreeHerder]:

This is a backout, so we're going back to old, battle-hardened code.

[Risks and why]: 

Pretty low-risk, considering this is a backout to something that's likely been shipping for a while. The View Source and Save Page As bugs are annoying, but considering that e10s is opt-in on 40, it's not as bad as it would have been if e10s had been on by default.

[String/UUID change made/needed]:

nsICacheInfoChannel.idl
nsICachingChannel.idl
Attachment #8605876 - Attachment description: Back out 371cbdcc9562 () for causing frequent content process crashes (bug 1163900). a=? → Back out 371cbdcc9562 (bug 1156493) for causing frequent content process crashes (bug 1163900). a=?
Attachment #8605876 - Flags: approval-mozilla-aurora?
Comment on attachment 8605235 [details] [diff] [review]
fix

marking r+ as per comment #25
Attachment #8605235 - Flags: review?(jduell.mcbugs) → review+
Comment on attachment 8605876 [details] [diff] [review]
Back out 371cbdcc9562 (bug 1156493) for causing frequent content process crashes (bug 1163900). a=?

Approved for uplift to aurora (40) to back out some changes causing a bad crash.
Attachment #8605876 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Liz Henry (:lizzard) from comment #35)
> Comment on attachment 8605876 [details] [diff] [review]
> Back out 371cbdcc9562 (bug 1156493) for causing frequent content process
> crashes (bug 1163900). a=?
> 
> Approved for uplift to aurora (40) to back out some changes causing a bad
> crash.

Great, thanks lizzard. I'll land the backout once Aurora opens up.
https://hg.mozilla.org/mozilla-central/rev/50c7134b4408
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Just to distinguish what landed where, let's set 40 to disabled, we backed out the cause and didn't actually fix the issue. Thanks for doing the backout that quickly!
Adding a tracking flag for firefox41 and firefox40. While it seems e10s are disabled by default on firefox40 so most users will not his this issue. However, if they turn it on they will run into this crash.
To be clear, the backout made it so that 40 is immune to the crash.
41 has been made immune by the fix patch in this bug.
I can confirm that my crashes have disappeared in trunk builds. Thank you for the fix.
(In reply to Mike Conley (:mconley) - PTO until June 2nd from comment #42)
> 41 has been made immune by the fix patch in this bug.

Has it?

(In reply to Nicholas Nethercote [:njn] from comment #44)
> I can confirm that my crashes have disappeared in trunk builds. Thank you
> for the fix.

bp-6bf09bde-3c54-44c8-a996-61aa92150524
	23/05/2015	10:10 p.m.

¡Hola Michal!

Could you please take a look at my crash above and tell me if I need to file a new bug?

¡Gracias!
Flags: needinfo?(michal.novotny)
(In reply to alex_mayorga from comment #45)
> bp-6bf09bde-3c54-44c8-a996-61aa92150524
> 	23/05/2015	10:10 p.m.

The build ID in your crash report is 20150513030209, which is one day before the fix for 41 landed, as noted in comment 39.

Updating to a newer Nightly should fix the issue for you.
Flags: needinfo?(michal.novotny)
You need to log in before you can comment on or make changes to this bug.