This bug was filed from the Socorro interface and is report bp-504780e3-a1ca-4187-9893-004882150311. ============================================================= Build ID: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:39.0) Gecko/20100101 Firefox/39.0 xref: bug 1116884, bug 1123463, bug 1124524, bug 1141755 (sorry if this is a dupe, but wanted to get this repro-case filed). STR: 1. Log in to http://newrelic.com/ 2. While it's loading (slowly), do something in the Firefox UI, such as Tools | Web Developer | Network 3. Wait Actual Results: Crashes here: Frame Module Signature Source 0 ntdll.dll ZwWaitForSingleObject 1 kernelbase.dll WaitForSingleObjectEx 2 xul.dll base::MessagePumpForIO::ScheduleWork() ipc/chromium/src/base/message_pump_win.cc 3 xul.dll MessageLoop::PostTask_Helper(tracked_objects::Location const&, Task*, int, bool) ipc/chromium/src/base/message_loop.cc 4 xul.dll mozilla::ipc::MessageChannel::WaitForSyncNotify() ipc/glue/WindowsMessageLoop.cpp 5 xul.dll mozilla::ipc::MessageChannel::Send(IPC::Message*, IPC::Message*) ipc/glue/MessageChannel.cpp 6 xul.dll mozilla::net::PCookieServiceChild::SendGetCookieString(mozilla::ipc::URIParams const&, bool const&, bool const&, IPC::SerializedLoadContext const&, nsCString*) obj-firefox/ipc/ipdl/PCookieServiceChild.cpp 7 xul.dll mozilla::net::CookieServiceChild::GetCookieStringInternal(nsIURI*, nsIChannel*, char**, bool) netwerk/cookie/CookieServiceChild.cpp 8 xul.dll mozilla::net::CookieServiceChild::GetCookieString(nsIURI*, nsIChannel*, char**) netwerk/cookie/CookieServiceChild.cpp 9 xul.dll nsHTMLDocument::GetCookie(nsAString_internal&, mozilla::ErrorResult&) dom/html/nsHTMLDocument.cpp 10 xul.dll mozilla::dom::HTMLDocumentBinding::get_cookie obj-firefox/dom/bindings/HTMLDocumentBinding.cpp 11 xul.dll mozilla::dom::GenericBindingGetter(JSContext*, unsigned int, JS::Value*) dom/bindings/BindingUtils.cpp 12 xul.dll js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) js/src/vm/Interpreter.cpp 13 xul.dll js::Invoke(JSContext*, JS::Value const&, JS::Value const&, unsigned int, JS::Value const*, JS::MutableHandle<JS::Value>) js/src/vm/Interpreter.cpp 14 xul.dll js::InvokeGetterOrSetter(JSContext*, JSObject*, JS::Value, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) js/src/vm/Interpreter.cpp 15 xul.dll CallGetter js/src/vm/NativeObject.cpp 16 xul.dll js::NativeGetProperty(JSContext*, JS::Handle<js::NativeObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) js/src/vm/NativeObject.cpp 17 xul.dll JS_ForwardGetPropertyTo(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::Handle<JSObject*>, JS::MutableHandle<JS::Value>) js/src/jsapi.cpp 18 xul.dll mozilla::dom::GetPropertyOnPrototype(JSContext*, JS::Handle<JSObject*>, JS::Handle<jsid>, bool*, JS::MutableHandle<JS::Value>) dom/bindings/BindingUtils.cpp 19 xul.dll mozilla::dom::HTMLDocumentBinding::DOMProxyHandler::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) obj-firefox/dom/bindings/HTMLDocumentBinding.cpp 20 xul.dll js::Proxy::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) js/src/proxy/Proxy.cpp 21 xul.dll js::proxy_GetProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) js/src/proxy/Proxy.cpp 22 xul.dll js::jit::ComputeGetPropResult js/src/jit/BaselineIC.cpp 23 xul.dll js::jit::DoGetPropFallback js/src/jit/BaselineIC.cpp 24 @0x6a6623
Screencast demonstrating the crash: http://screencast.com/t/qxwOAiEe
I had a lot of add-ons enabled, and couldn't readily crash in Safe Mode, so I started disabling, until I was only left with https://addons.mozilla.org/en-US/firefox/addon/bugzillajs/, which led to this crash: https://crash-stats.mozilla.com/report/index/2f53a9c6-3062-4c0b-8e4a-d34962150311
I wasn't able to reproduce this, but I'll try to find Stephen in the office and figure this out. It would be great to understand this bug more.
Not an IPC bug, in any case.
Created attachment 8576677 [details] [diff] [review] Process incoming urgent messages before sending I'm pretty sure the reason this flared up is that bug 1113369 had a bunch of fallout. You can see the list of dependencies there. A lot of sites started showing the slow script warning and that triggered CPOW timeouts. Despite that, we shouldn't be crashing. Originally I worried that something was broken in the IPC code since all the messages we were hanging on were urgent priority. However, that seems to be a coincidence. The code is pretty much working as designed. Here's what's happening: 1. Script runs slowly and we send a slow script notification to the parent process via the hang monitor. 2. The hang monitor times out all CPOWs at that time. Assume it times out a CPOW message M. 3. Assume the slow script stops, either naturally or from the user selecting "Stop Script". If we immediately return to the event loop in the child, we'll respond to M and everything will return to normal. 4. However, if we don't return to the event loop right away and instead we send a GetCookieString message, then it will arrive in the parent before the response to M. The parent will return an error for this message since it rejects all messages until the M reply. 5. The child is killed by the parent because of the error. There are two ways we could improve this: A. We don't need to kill the child in step 5. Timeouts are harmless. This happens just because our default behavior is to kill. B. Somehow it would be good if we didn't need to reject the GetCookieString message in step 4. I like approach B because it allows us to return a successful result for GetCookieString. The easiest way to do this is to have the code that sends a sync message first see if there are any CPOWs to be processed before it even sends its message. That way we'll return the response to the CPOW message M before we send GetCookieString, so the parent will no longer be in its timeout state. I don't think there's any harm in doing it this way.
Created attachment 8576682 [details] [diff] [review] Fix IPDL tests In running the IPDL tests, I found failures unrelated to this bug. The main problem was that in bug 1049879 I flip-flopped about whether we should resolve races in favor of the parent or the child. Originally I changed behavior so races were resolved in favor of the child and I fixed the test to assert that. Then I changed my mind but I never changed the test back. This patch changes the test back. I also that I failed to fix TestHangs.cpp for bug 1110938. I changed the behavior there so we don't automatically close the channel on timeout. So I had to change the test to close on its own when we decide to timeout.
Comment on attachment 8576677 [details] [diff] [review] Process incoming urgent messages before sending Review of attachment 8576677 [details] [diff] [review]: ----------------------------------------------------------------- Seems reasonable.
Verified FIXED for me on nightly, using: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:39.0) Gecko/20100101 Firefox/39.0 (Do I just mark it verified since it's fixed on trunk? Sorry, it's been a while!)
Thanks Stephen, I'll mark this verified for you.