Closed Bug 1142109 Opened 9 years ago Closed 9 years ago

crash in WaitForSingleObjectEx | base::MessagePumpForIO::ScheduleWork() after logging in to newrelic.com

Categories

(Core :: DOM: Content Processes, defect)

All
Windows NT
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla39
Tracking Status
e10s m5+ ---
firefox39 --- verified

People

(Reporter: stephend, Assigned: billm)

References

()

Details

(Keywords: crash)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-504780e3-a1ca-4187-9893-004882150311.
=============================================================

Build ID: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:39.0) Gecko/20100101 Firefox/39.0

xref: bug 1116884, bug 1123463, bug 1124524, bug 1141755 (sorry if this is a dupe, but wanted to get this repro-case filed).

STR:

1. Log in to http://newrelic.com/
2. While it's loading (slowly), do something in the Firefox UI, such as Tools | Web Developer | Network
3. Wait

Actual Results:

Crashes here:

Frame 	Module 	Signature 	Source
0 	ntdll.dll 	ZwWaitForSingleObject 	
1 	kernelbase.dll 	WaitForSingleObjectEx 	
2 	xul.dll 	base::MessagePumpForIO::ScheduleWork() 	ipc/chromium/src/base/message_pump_win.cc
3 	xul.dll 	MessageLoop::PostTask_Helper(tracked_objects::Location const&, Task*, int, bool) 	ipc/chromium/src/base/message_loop.cc
4 	xul.dll 	mozilla::ipc::MessageChannel::WaitForSyncNotify() 	ipc/glue/WindowsMessageLoop.cpp
5 	xul.dll 	mozilla::ipc::MessageChannel::Send(IPC::Message*, IPC::Message*) 	ipc/glue/MessageChannel.cpp
6 	xul.dll 	mozilla::net::PCookieServiceChild::SendGetCookieString(mozilla::ipc::URIParams const&, bool const&, bool const&, IPC::SerializedLoadContext const&, nsCString*) 	obj-firefox/ipc/ipdl/PCookieServiceChild.cpp
7 	xul.dll 	mozilla::net::CookieServiceChild::GetCookieStringInternal(nsIURI*, nsIChannel*, char**, bool) 	netwerk/cookie/CookieServiceChild.cpp
8 	xul.dll 	mozilla::net::CookieServiceChild::GetCookieString(nsIURI*, nsIChannel*, char**) 	netwerk/cookie/CookieServiceChild.cpp
9 	xul.dll 	nsHTMLDocument::GetCookie(nsAString_internal&, mozilla::ErrorResult&) 	dom/html/nsHTMLDocument.cpp
10 	xul.dll 	mozilla::dom::HTMLDocumentBinding::get_cookie 	obj-firefox/dom/bindings/HTMLDocumentBinding.cpp
11 	xul.dll 	mozilla::dom::GenericBindingGetter(JSContext*, unsigned int, JS::Value*) 	dom/bindings/BindingUtils.cpp
12 	xul.dll 	js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp
13 	xul.dll 	js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>) 	js/src/vm/Interpreter.cpp
14 	xul.dll 	js::InvokeGetterOrSetter(JSContext*, JSObject*, JS::Value, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) 	js/src/vm/Interpreter.cpp
15 	xul.dll 	CallGetter 	js/src/vm/NativeObject.cpp
16 	xul.dll 	js::NativeGetProperty(JSContext*, JS::Handle<js::NativeObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) 	js/src/vm/NativeObject.cpp
17 	xul.dll 	JS_ForwardGetPropertyTo(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::Handle<JSObject*>, JS::MutableHandle<JS::Value>) 	js/src/jsapi.cpp
18 	xul.dll 	mozilla::dom::GetPropertyOnPrototype(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, bool*, JS::MutableHandle<JS::Value>) 	dom/bindings/BindingUtils.cpp
19 	xul.dll 	mozilla::dom::HTMLDocumentBinding::DOMProxyHandler::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) 	obj-firefox/dom/bindings/HTMLDocumentBinding.cpp
20 	xul.dll 	js::Proxy::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) 	js/src/proxy/Proxy.cpp
21 	xul.dll 	js::proxy_GetProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) 	js/src/proxy/Proxy.cpp
22 	xul.dll 	js::jit::ComputeGetPropResult 	js/src/jit/BaselineIC.cpp
23 	xul.dll 	js::jit::DoGetPropFallback 	js/src/jit/BaselineIC.cpp
24 		@0x6a6623
Screencast demonstrating the crash: http://screencast.com/t/qxwOAiEe
I had a lot of add-ons enabled, and couldn't readily crash in Safe Mode, so I started disabling, until I was only left with https://addons.mozilla.org/en-US/firefox/addon/bugzillajs/, which led to this crash:

https://crash-stats.mozilla.com/report/index/2f53a9c6-3062-4c0b-8e4a-d34962150311
Basically a dupe of bug 1128457, with a good site / str.
Depends on: 1128457
I wasn't able to reproduce this, but I'll try to find Stephen in the office and figure this out. It would be great to understand this bug more.
Not an IPC bug, in any case.
Component: IPC → Networking
Assignee: nobody → wmccloskey
Component: Networking → DOM: Content Processes
I'm pretty sure the reason this flared up is that bug 1113369 had a bunch of fallout. You can see the list of dependencies there. A lot of sites started showing the slow script warning and that triggered CPOW timeouts.

Despite that, we shouldn't be crashing. Originally I worried that something was broken in the IPC code since all the messages we were hanging on were urgent priority. However, that seems to be a coincidence. The code is pretty much working as designed. Here's what's happening:

1. Script runs slowly and we send a slow script notification to the parent process via the hang monitor.
2. The hang monitor times out all CPOWs at that time. Assume it times out a CPOW message M.
3. Assume the slow script stops, either naturally or from the user selecting "Stop Script". If we immediately return to the event loop in the child, we'll respond to M and everything will return to normal.
4. However, if we don't return to the event loop right away and instead we send a GetCookieString message, then it will arrive in the parent before the response to M. The parent will return an error for this message since it rejects all messages until the M reply.
5. The child is killed by the parent because of the error.

There are two ways we could improve this:
A. We don't need to kill the child in step 5. Timeouts are harmless. This happens just because our default behavior is to kill.
B. Somehow it would be good if we didn't need to reject the GetCookieString message in step 4.

I like approach B because it allows us to return a successful result for GetCookieString. The easiest way to do this is to have the code that sends a sync message first see if there are any CPOWs to be processed before it even sends its message. That way we'll return the response to the CPOW message M before we send GetCookieString, so the parent will no longer be in its timeout state. I don't think there's any harm in doing it this way.
Attachment #8576677 - Flags: review?(dvander)
Attached patch Fix IPDL testsSplinter Review
In running the IPDL tests, I found failures unrelated to this bug. The main problem was that in bug 1049879 I flip-flopped about whether we should resolve races in favor of the parent or the child. Originally I changed behavior so races were resolved in favor of the child and I fixed the test to assert that. Then I changed my mind but I never changed the test back. This patch changes the test back.

I also that I failed to fix TestHangs.cpp for bug 1110938. I changed the behavior there so we don't automatically close the channel on timeout. So I had to change the test to close on its own when we decide to timeout.
Attachment #8576682 - Flags: review?(dvander)
Comment on attachment 8576677 [details] [diff] [review]
Process incoming urgent messages before sending

Review of attachment 8576677 [details] [diff] [review]:
-----------------------------------------------------------------

Seems reasonable.
Attachment #8576677 - Flags: review?(dvander) → review+
Attachment #8576682 - Flags: review?(dvander) → review+
Status: NEW → ASSIGNED
QA Whiteboard: [triaged]
https://hg.mozilla.org/mozilla-central/rev/ec2acf860187
https://hg.mozilla.org/mozilla-central/rev/da4cb8b98517
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Verified FIXED for me on nightly, using: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:39.0) Gecko/20100101 Firefox/39.0

(Do I just mark it verified since it's fixed on trunk?  Sorry, it's been a while!)
Thanks Stephen, I'll mark this verified for you.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: