Closed Bug 1587686 Opened 6 years ago Closed 6 years ago

Intermittent PID 4676 | Assertion failure: XRE_IsParentProcess(), at z:/build/build/src/toolkit/components/windowwatcher/nsWindowWatcher.cpp:703

Categories

(Core :: Networking: HTTP, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox-esr68 --- wontfix
firefox70 --- wontfix
firefox71 --- wontfix
firefox72 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: kershaw)

References

Details

(Keywords: assertion, intermittent-failure, testcase-wanted, Whiteboard: [necko-triaged])

Crash Data

Attachments

(1 file)

Filed by: ncsoregi [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=270581674&repo=mozilla-inbound
Full log: https://queue.taskcluster.net/v1/task/Wzyu9dkMTb68LEJkFb5CDg/runs/0/artifacts/public/logs/live_backing.log


[task 2019-10-09T23:41:39.324Z] 23:41:39 INFO - PID 4676 | Assertion failure: XRE_IsParentProcess(), at z:/build/build/src/toolkit/components/windowwatcher/nsWindowWatcher.cpp:703
[task 2019-10-09T23:41:40.231Z] 23:41:40 INFO - Browser not responding, setting status to CRASH
[task 2019-10-09T23:41:40.235Z] 23:41:40 INFO - mozcrash Copy/paste: Z:\task_1570658318\build\win32-minidump_stackwalk.exe c:\users\task_1570658318\appdata\local\temp\tmpm3hjye.mozrunner\minidumps\ccf65631-17bf-4f43-8b16-50e256332f85.dmp Z:\task_1570658318\build\symbols
[task 2019-10-09T23:42:01.779Z] 23:42:01 INFO - mozcrash Saved minidump as Z:\task_1570658318\build\blobber_upload_dir\ccf65631-17bf-4f43-8b16-50e256332f85.dmp
[task 2019-10-09T23:42:01.782Z] 23:42:01 INFO - mozcrash Saved app info as Z:\task_1570658318\build\blobber_upload_dir\ccf65631-17bf-4f43-8b16-50e256332f85.extra
[task 2019-10-09T23:42:01.874Z] 23:42:01 INFO - PROCESS-CRASH | /service-workers/service-worker/clients-get.https.html | application crashed [@ nsWindowWatcher::OpenWindowInternal(mozIDOMWindowProxy *,char const *,char const *,char const *,bool,bool,bool,nsIArray *,bool,bool,bool,nsDocShellLoadState *,mozilla::dom::BrowsingContext * *)]
[task 2019-10-09T23:42:01.874Z] 23:42:01 INFO - Crash dump filename: c:\users\task_1570658318\appdata\local\temp\tmpm3hjye.mozrunner\minidumps\ccf65631-17bf-4f43-8b16-50e256332f85.dmp
[task 2019-10-09T23:42:01.874Z] 23:42:01 INFO - Operating system: Windows NT
[task 2019-10-09T23:42:01.874Z] 23:42:01 INFO - 6.1.7601 Service Pack 1
[task 2019-10-09T23:42:01.874Z] 23:42:01 INFO - CPU: x86
[task 2019-10-09T23:42:01.874Z] 23:42:01 INFO - GenuineIntel family 6 model 63 stepping 2
[task 2019-10-09T23:42:01.874Z] 23:42:01 INFO - 8 CPUs
[task 2019-10-09T23:42:01.874Z] 23:42:01 INFO -
[task 2019-10-09T23:42:01.875Z] 23:42:01 INFO - GPU: UNKNOWN
[task 2019-10-09T23:42:01.875Z] 23:42:01 INFO -
[task 2019-10-09T23:42:01.875Z] 23:42:01 INFO - Crash reason: EXCEPTION_BREAKPOINT
[task 2019-10-09T23:42:01.875Z] 23:42:01 INFO - Crash address: 0x5a9c4b9b
[task 2019-10-09T23:42:01.875Z] 23:42:01 INFO - Assertion: Unknown assertion type 0x00000000
[task 2019-10-09T23:42:01.875Z] 23:42:01 INFO - Process uptime: 2 seconds
[task 2019-10-09T23:42:01.875Z] 23:42:01 INFO -
[task 2019-10-09T23:42:01.875Z] 23:42:01 INFO - Thread 0 (crashed)
[task 2019-10-09T23:42:01.875Z] 23:42:01 INFO - 0 xul.dll!nsWindowWatcher::OpenWindowInternal(mozIDOMWindowProxy *,char const *,char const *,char const *,bool,bool,bool,nsIArray *,bool,bool,bool,nsDocShellLoadState *,mozilla::dom::BrowsingContext * *) [nsWindowWatcher.cpp:cd1f46e85c51f6fac89a5add35537581db9a6e80 : 703 + 0x0]
[task 2019-10-09T23:42:01.875Z] 23:42:01 INFO - eip = 0x5a9c4b9b esp = 0x002acd80 ebp = 0x002acf7c ebx = 0x08cd0100
[task 2019-10-09T23:42:01.875Z] 23:42:01 INFO - esi = 0x002acd80 edi = 0x000006ce eax = 0x6a55a89c ecx = 0x000002bf
[task 2019-10-09T23:42:01.875Z] 23:42:01 INFO - edx = 0x00000049 efl = 0x00000212
[task 2019-10-09T23:42:01.875Z] 23:42:01 INFO - Found by: given as instruction pointer in context
[task 2019-10-09T23:42:01.875Z] 23:42:01 INFO - 1 xul.dll!nsWindowWatcher::OpenWindow(mozIDOMWindowProxy *,char const *,char const *,char const *,nsISupports *,mozIDOMWindowProxy * *) [nsWindowWatcher.cpp:cd1f46e85c51f6fac89a5add35537581db9a6e80 : 292 + 0x1f]
[task 2019-10-09T23:42:01.875Z] 23:42:01 INFO - eip = 0x5a9c1a3a esp = 0x002acf84 ebp = 0x002acfdc ebx = 0x0c5f7460
[task 2019-10-09T23:42:01.875Z] 23:42:01 INFO - esi = 0x0c56ff20 edi = 0x08cd50f8
[task 2019-10-09T23:42:01.875Z] 23:42:01 INFO - Found by: call frame info
[task 2019-10-09T23:42:01.875Z] 23:42:01 INFO - 2 xul.dll!NS_InvokeByIndex + 0x27
[task 2019-10-09T23:42:01.876Z] 23:42:01 INFO - eip = 0x5c7242b7 esp = 0x002acfe4 ebp = 0x002ad018 ebx = 0x00000005
[task 2019-10-09T23:42:01.876Z] 23:42:01 INFO - esi = 0x002ad058 edi = 0x00000005
[task 2019-10-09T23:42:01.876Z] 23:42:01 INFO - Found by: call frame info
[task 2019-10-09T23:42:01.876Z] 23:42:01 INFO - 3 xul.dll!CallMethodHelper::Call() [XPCWrappedNative.cpp:cd1f46e85c51f6fac89a5add35537581db9a6e80 : 1183 + 0x16]
[task 2019-10-09T23:42:01.876Z] 23:42:01 INFO - eip = 0x567dd9a6 esp = 0x002ad020 ebp = 0x002ad040
[task 2019-10-09T23:42:01.876Z] 23:42:01 INFO - Found by: previous frame's frame pointer
[task 2019-10-09T23:42:01.876Z] 23:42:01 INFO - 4 xul.dll!XPCWrappedNative::CallMethod(XPCCallContext &,XPCWrappedNative::CallMode) [XPCWrappedNative.cpp:cd1f46e85c51f6fac89a5add35537581db9a6e80 : 1149 + 0x5]
[task 2019-10-09T23:42:01.876Z] 23:42:01 INFO - eip = 0x567dd786 esp = 0x002ad048 ebp = 0x002ad170 ebx = 0x002ad058
[task 2019-10-09T23:42:01.876Z] 23:42:01 INFO - esi = 0x002ad058 edi = 0x002ad048
[task 2019-10-09T23:42:01.876Z] 23:42:01 INFO - Found by: call frame info
[task 2019-10-09T23:42:01.876Z] 23:42:01 INFO - 5 xul.dll!XPC_WN_CallMethod(JSContext *,unsigned int,JS::Value *) [XPCWrappedNativeJSOps.cpp:cd1f46e85c51f6fac89a5add35537581db9a6e80 : 946 + 0x9]
[task 2019-10-09T23:42:01.876Z] 23:42:01 INFO - eip = 0x567def86 esp = 0x002ad178 ebp = 0x002ad20c ebx = 0x002ad180
[task 2019-10-09T23:42:01.876Z] 23:42:01 INFO - esi = 0x0c5f2f1c edi = 0x0c5f2f00
[task 2019-10-09T23:42:01.876Z] 23:42:01 INFO - Found by: call frame info
[task 2019-10-09T23:42:01.876Z] 23:42:01 INFO - 6 xul.dll!CallJSNative(JSContext ,bool ()(JSContext *,unsigned int,JS::Value *),js::CallReason,JS::CallArgs const &) [Interpreter.cpp:cd1f46e85c51f6fac89a5add35537581db9a6e80 : 457 + 0x7]
[task 2019-10-09T23:42:01.876Z] 23:42:01 INFO - eip = 0x5ab50aa0 esp = 0x002ad214 ebp = 0x002ad268 ebx = 0x0660c800
[task 2019-10-09T23:42:01.876Z] 23:42:01 INFO - esi = 0x567dec60 edi = 0x002ad4c0
[task 2019-10-09T23:42:01.876Z] 23:42:01 INFO - Found by: call frame info
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - 7 xul.dll!js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct,js::CallReason) [Interpreter.cpp:cd1f46e85c51f6fac89a5add35537581db9a6e80 : 549 + 0x9]
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - eip = 0x5ab50422 esp = 0x002ad270 ebp = 0x002ad2d0 ebx = 0x00000000
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - esi = 0x0660c800 edi = 0x002ad4c0
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - Found by: call frame info
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - 8 xul.dll!static bool InternalCall(struct JSContext *, const class js::AnyInvokeArgs & const, js::CallReason) [Interpreter.cpp:cd1f46e85c51f6fac89a5add35537581db9a6e80 : 618 + 0xc]
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - eip = 0x5ab51363 esp = 0x002ad2d8 ebp = 0x002ad314 ebx = 0x002ad2f0
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - esi = 0x0660c800 edi = 0x002ad4c0
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - Found by: call frame info
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - 9 xul.dll!static bool Interpret(struct JSContext *, class js::RunState & const) [Interpreter.cpp:cd1f46e85c51f6fac89a5add35537581db9a6e80 : 3111 + 0x15]
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - eip = 0x5ab49a97 esp = 0x002ad31c ebp = 0x002ad5a4 ebx = 0x00000000
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - esi = 0xffffff8c edi = 0x00000000
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - Found by: call frame info
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - 10 xul.dll!js::RunScript(JSContext *,js::RunState &) [Interpreter.cpp:cd1f46e85c51f6fac89a5add35537581db9a6e80 : 424 + 0x9]
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - eip = 0x5ab40a38 esp = 0x002ad5ac ebp = 0x002ad5d0 ebx = 0x080a0790
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - esi = 0x0660c800 edi = 0x002ad5f0
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - Found by: call frame info
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - 11 xul.dll!js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct,js::CallReason) [Interpreter.cpp:cd1f46e85c51f6fac89a5add35537581db9a6e80 : 590 + 0xb]
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - eip = 0x5ab50437 esp = 0x002ad5d8 ebp = 0x002ad630 ebx = 0x00000000
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - esi = 0x0660c800 edi = 0x002ad6c0
[task 2019-10-09T23:42:01.877Z] 23:42:01 INFO - Found by: call frame info
[task 2019-10-09T23:42:01.878Z] 23:42:01 INFO - 12 xul.dll!static bool InternalCall(struct JSContext *, const class js::AnyInvokeArgs & const, js::CallReason) [Interpreter.cpp:cd1f46e85c51f6fac89a5add35537581db9a6e80 : 618 + 0xc]
[task 2019-10-09T23:42:01.878Z] 23:42:01 INFO - eip = 0x5ab51363 esp = 0x002ad638 ebp = 0x002ad678 ebx = 0x002ad650
[task 2019-10-09T23:42:01.878Z] 23:42:01 INFO - esi = 0x0660c800 edi = 0x002ad6c0
[task 2019-10-09T23:42:01.878Z] 23:42:01 INFO - Found by: call frame info
[task 2019-10-09T23:42:01.878Z] 23:42:01 INFO - 13 xul.dll!js::Call(JSContext *,JS::Handle<JS::Value>,JS::Handle<JS::Value>,js::AnyInvokeArgs const &,JS::MutableHandle<JS::Value>,js::CallReason) [Interpreter.cpp:cd1f46e85c51f6fac89a5add35537581db9a6e80 : 635 + 0x6]
[task 2019-10-09T23:42:01.878Z] 23:42:01 INFO - eip = 0x5ab51481 esp = 0x002ad680 ebp = 0x002ad690 ebx = 0x002ad858
[task 2019-10-09T23:42:01.878Z] 23:42:01 INFO - esi = 0x002ad6c0 edi = 0x002ad878
[task 2019-10-09T23:42:01.878Z] 23:42:01 INFO - Found by: call frame info
[task 2019-10-09T23:42:01.878Z] 23:42:01 INFO - 14 xul.dll!JS_CallFunctionValue(JSContext *,JS::Handle<JSObject *>,JS::Handle<JS::Value>,JS::HandleValueArray const &,JS::MutableHandle<JS::Value>) [jsapi.cpp:cd1f46e85c51f6fac89a5add35537581db9a6e80 : 2659 + 0x12]
[task 2019-10-09T23:42:01.878Z] 23:42:01 INFO - eip = 0x5afbec88 esp = 0x002ad698 ebp = 0x002ad7c4 ebx = 0x002ad858
[task 2019-10-09T23:42:01.878Z] 23:42:01 INFO - esi = 0x002ad740
[task 2019-10-09T23:42:01.878Z] 23:42:01 INFO - Found by: call frame info
[task 2019-10-09T23:42:01.878Z] 23:42:01 INFO - 15 xul.dll!nsXPCWrappedJS::CallMethod(unsigned short,nsXPTMethodInfo const *,nsXPTCMiniVariant *) [XPCWrappedJSClass.cpp:cd1f46e85c51f6fac89a5add35537581db9a6e80 : 956 + 0x14]
[task 2019-10-09T23:42:01.878Z] 23:42:01 INFO - eip = 0x567d6dd9 esp = 0x002ad7cc ebp = 0x002ada78 ebx = 0x00000003
[task 2019-10-09T23:42:01.878Z] 23:42:01 INFO - esi = 0x80004005 edi = 0x002ad90c
[task 2019-10-09T23:42:01.878Z] 23:42:01 INFO - Found by: call frame info
[task 2019-10-09T23:42:01.878Z] 23:42:01 INFO - 16 xul.dll!static nsresult PrepareAndDispatch(class nsXPTCStubBase *, unsigned int, unsigned int *, unsigned int *) [xptcstubs.cpp:cd1f46e85c51f6fac89a5add35537581db9a6e80 : 88 + 0xf]
[task 2019-10-09T23:42:01.878Z] 23:42:01 INFO - eip = 0x55abc41c esp = 0x002ada80 ebp = 0x002adb4c ebx = 0x0c56f320
[task 2019-10-09T23:42:01.879Z] 23:42:01 INFO - esi = 0x002adab0 edi = 0x0000000c
[task 2019-10-09T23:42:01.879Z] 23:42:01 INFO - Found by: call frame info
[task 2019-10-09T23:42:01.879Z] 23:42:01 INFO - 17 xul.dll!static void SharedStub() [xptcstubs.cpp:cd1f46e85c51f6fac89a5add35537581db9a6e80 : 110 + 0x16]
[task 2019-10-09T23:42:01.879Z] 23:42:01 INFO - eip = 0x55abb166 esp = 0x002adb54 ebp = 0x002adb68 ebx = 0x002adba4
[task 2019-10-09T23:42:01.879Z] 23:42:01 INFO - esi = 0x0c5d6a40 edi = 0x002adbd0
[task 2019-10-09T23:42:01.879Z] 23:42:01 INFO - Found by: call frame info
[task 2019-10-09T23:42:01.879Z] 23:42:01 INFO - 18 xul.dll!nsDocumentOpenInfo::DispatchContent(nsIRequest *,nsISupports *) [nsURILoader.cpp:cd1f46e85c51f6fac89a5add35537581db9a6e80 : 477 + 0xf]
[task 2019-10-09T23:42:01.879Z] 23:42:01 INFO - eip = 0x569dcbcc esp = 0x002adb70 ebp = 0x002adb68
[task 2019-10-09T23:42:01.879Z] 23:42:01 INFO - Found by: call frame info
[task 2019-10-09T23:42:01.879Z] 23:42:01 INFO -

entering test: /service-workers/service-worker/clients-get.https.html

FYI, This is still reproducible on Bughunter. I've resubmitted 75 urls where this has happened in the past on both Linux and Windows and was able to reproduce on 44 of them. http://russnov.ru/vecher-s-vladimirom-solovevym-03-07-2019/ is an example.

Priority: P5 → --

Bob, I tried that link from comment 3 in a debug browser built from revision a7ca5ad33f3d on Linux and I am not getting an assertion...

Any idea what I should do to be able to reproduce the assertion? Is it happening for you reliably on that site?

Flags: needinfo?(bob)

I was using a Fedora and a build from yesterday this morning when I picked that url and it was pretty reproducible just by loading the url and perhaps refreshing once or twice but I can't get it or the others top reproducible urls to fire the assert again. I've resubmitted the 75 urls again to see there is another that I can confirm still reproduces.

I've also gone back to the entire history and tested the full 140+ urls where this has occurred in the past. Bughunter can reliably reproduce it but I am unable to do so locally. The two most frequently reproduced urls are http://aops..uscg.mil/ and http://.google.hu/ which are invalid urls and only load when I use the bughunter proxy.

The next most frequently reproduced urls are

http://russnov.ru/vecher-s-vladimirom-solovevym-03-07-2019/
http://tadslgbtmusiccharts.blogspot.com/2019/09/chart-week-38-week-of-22nd-september.html#comment-form
http://tix-iyo-tiraab.blogspot.com/2018/01/ifb-compilation-197080s_15.html
http://evasbackparty.de/choco-fresh-torte/
https://achigawacell.blogspot.com/2018/03/cara-flash-samsung-galaxy-s4-gt-i9500.html

I wonder if the proxy isn't serving me stale content and the current versions of the sites no longer have whatever advertisement caused the assertion.

cknowles: Can we make the proxy always serve fresh content? What do you think is up with the malformed urls like http://aops..uscg.mil/ and http://.google.hu/ ? When I load them with an opt build, the proxy returns a 400 Bad Request error with X-Squid-Error
ERR_INVALID_URL 0

bz: Perhaps you can proxy through a squid instance somewhere and try one of the invalid urls http://aops..uscg.mil/ and http://.google.hu/ ?

Flags: needinfo?(bob) → needinfo?(cknowles)

Sadly, we already do this:
cache deny all
Is on the config for the proxy.

Per https://wiki.squid-cache.org/SquidFaq/ConfiguringSquid#Can_I_make_Squid_proxy_only.2C_without_caching_anything.3F and other squid docs, that's the correct approach for the version we're on.

Looking at the logs, I don't see any cache hits.

I think the issue is that the squid box is trying to resolve the DNS name - and rightly failing due to the malformation. Of course, I can't even open those URLs on a test VM I have.

Let me know if I can look at anything else for you.

Flags: needinfo?(cknowles)

I'm not quite sure where to find a random squid instance...

I did a quick dnf install and just ran squid from the command line. Weirdly I get a different assertion failure than the one I got with the bughunter proxy when loading the invalid urls.

Assertion failure: cache->GetWrapperMaybeDead() == obj || (js::RuntimeIsBeingDestroyed() && !cache->GetWrapperMaybeDead()), at /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/dom/BindingUtils.h:1378

I'll keep looking for reproducible urls and will report back if I find them.

bz: Just reproduced twice in a row on Ubuntu 19.10 Nightly with https://www.baiscopelk.com/
Get it while it's hot.

Assertion failure: XRE_IsParentProcess(), at /builds/worker/workspace/build/src/toolkit/components/windowwatcher/nsWindowWatcher.cpp:700
#01: nsWindowWatcher::OpenWindow(mozIDOMWindowProxy*, char const*, char const*, char const*, nsISupports*, mozIDOMWindowProxy**) (/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
#02: NS_InvokeByIndex (/mozilla/builds/nightly/mozilla/firefox-debug/dist/bin/libxul.so)
Flags: needinfo?(bzbarsky)

I tried that URL a few times on Linux (under rr) now, starting 5 minutes after comment 12, and haven't been able to reproduce. :(

Flags: needinfo?(bzbarsky)

I can still reliably reproduce on the ubuntu 19.10 vm but I can run rr on the vm so I can't check it there. I can't reproduce on my local laptop running fedora 30 at all. What distro are you using?

cknowles got the vm configured temporarily to run rr and I was able to get the assertion. If there is a file or directory I can share with you or if you want to hit me on irc and drive me like the bot I am, let me know.

Thank you! I'm on Fedora 30 as well.

If you can rr pack the trace directory involved and then zip it up and get it to me, that would be really helpful. Also the version of rr that was used to record.

Ok, Bob and I stepped through this in rr and we think we know what's going on:

  1. An iframe is created in a web page (from the parser). It starts a load of its src attribute.
  2. That load comes back as a redirect and necko starts its various redirect processing. In particular, it calls HttpChannelChild::Redirect1Begin which calls HttpChannelChild::SetupRedirect which creates the new channel and calls SetupReplacementChannel to set up its loadgroup, etc. After this the redirection process goes async.
  3. While that async work is going on, the iframe element is removed from the web page. That tears down the docshell, which calls Cancel on the docshell's loadgroup. This cancels the pre-redirect channel, but not the post-redirect one, since the latter is not in the loadgroup yet! Docshell teardown nulls out the mDocShell member of the nsDSURIContentListener.
  4. The async redirect work finishes up and necko calls into HttpChannelChild::CompleteRedirectSetup which adds the post-redirect channel to the loadgroup. There are comments in there about how "No need to check for cancel", but those really don't seem right to me... At this point the pre-redirect channel has NS_BINDING_ABORTED as its status, as expected.
  5. The post-redirect channel sends an OnStartRequest to its listener, and its status is NOT a failure status at this point, contrary to the comments in CompleteRedirectSetup. So the URILoader tries to handle the load, the nsDSURIContentListener doesn't want to handle it (because it has a null mDocShell, per item 3), and we end up trying to hand it off to the helper app service, triggering the assert.

I would have thought that HttpChannelChild::Cancel would call Cancel on mRedirectChannelChild if it's non-null... Or that something would prevent the redirect from completing successfully if Cancel is called on the pre-redirect channel before the post-redirect channel is added to the loadgroup, at least.

Honza, could you take a look? I think you know this stuff best... If it would be useful, Bob still has the live rr replay and you could poke around in there.

Component: Window Management → Networking: HTTP
Flags: needinfo?(honzab.moz)

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #17)

Ok, Bob and I stepped through this in rr and we think we know what's going on:

  1. An iframe is created in a web page (from the parser). It starts a load of its src attribute.
  2. That load comes back as a redirect and necko starts its various redirect processing. In particular, it calls HttpChannelChild::Redirect1Begin which calls HttpChannelChild::SetupRedirect which creates the new channel and calls SetupReplacementChannel to set up its loadgroup, etc. After this the redirection process goes async.
  3. While that async work is going on, the iframe element is removed from the web page. That tears down the docshell, which calls Cancel on the docshell's loadgroup. This cancels the pre-redirect channel, but not the post-redirect one, since the latter is not in the loadgroup yet! Docshell teardown nulls out the mDocShell member of the nsDSURIContentListener.
  4. The async redirect work finishes up and necko calls into HttpChannelChild::CompleteRedirectSetup

The cancellation of the pre-redirect channel on the child process happens between the async redirect decision for the post-redirect child channel has been made and Redirect3Complete message has been received back on the child (what happens after the post-redirect nsHttpChannel on the parent process has been succesfully asyncopened in reaction to accepting the redirect also on the child).

which adds the post-redirect channel to the loadgroup. There are comments in there about how "No need to check for cancel", but those really don't seem right to me... At this point the pre-redirect channel has NS_BINDING_ABORTED as its status, as expected.

You are right! Very good catch. The comment is here: https://searchfox.org/mozilla-central/rev/088e2cf29c59d733d57af43903eb0267dbf72e2a/netwerk/protocol/http/HttpChannelChild.cpp#2249-2254 and should be removed or altered to reflect the changes I suggest below.

  1. The post-redirect channel sends an OnStartRequest to its listener, and its status is NOT a failure status at this point, contrary to the comments in CompleteRedirectSetup. So the URILoader tries to handle the load, the nsDSURIContentListener doesn't want to handle it (because it has a null mDocShell, per item 3), and we end up trying to hand it off to the helper app service, triggering the assert.

I would have thought that HttpChannelChild::Cancel would call Cancel on mRedirectChannelChild if it's non-null... Or that something would prevent the redirect from completing successfully if Cancel is called on the pre-redirect channel before the post-redirect channel is added to the loadgroup, at least.

Honza, could you take a look? I think you know this stuff best... If it would be useful, Bob still has the live rr replay and you could poke around in there.

I believe the right fix here is to simply throw the post-redirect child channel away (it's not in the load group, doesn't refer the listener and has never been asyncopened) when the pre-redirect child channel is found cancelled in HttpChannelChild::Redirect3Complete; more or less copy what happens here for succeeded == false: https://searchfox.org/mozilla-central/rev/088e2cf29c59d733d57af43903eb0267dbf72e2a/netwerk/protocol/http/HttpChannelParent.cpp#2145, never send the event.

I don't think this is a security issue.

Flags: needinfo?(honzab.moz)
Priority: -- → P2
Whiteboard: [necko-triaged]

Honza: Do you still need the rr recording?

Flags: needinfo?(honzab.moz)

(In reply to Bob Clary [:bc:] from comment #19)

Honza: Do you still need the rr recording?

I believe you can delete it. It's possible to write a test for the scenario, so we will be able to verify the cause and the fix.

And thanks!

Flags: needinfo?(honzab.moz)
Group: network-core-security
Keywords: testcase-wanted
Crash Signature: [@ nsWindowWatcher::OpenWindowInternal(mozIDOMWindowProxy *,char const *,char const *,char const *,bool,bool,bool,nsIArray *,bool,bool,bool,nsDocShellLoadState *,mozilla::dom::BrowsingContext * *)]

I don't think this is a security issue.

Are you sure? This is violating assertions in the WindowWatcher; do you understand what the implications of that are? I know I don't...

Flags: needinfo?(honzab.moz)

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #22)

I don't think this is a security issue.

Are you sure? This is violating assertions in the WindowWatcher; do you understand what the implications of that are? I know I don't...

Thanks for stepping in, Boris. I was mostly reading only the findings from your comment 17 and not much the rest of the bug. Thinking about it more, I am not able to assess if this is a security bug or not, to be honest. The assertion was added in bug 1261842, which was not a security bug and thus a comment for the assertion could have been added, but was not. Hard to say why this is expected only on the parent process and what the implications of violating it are.

This has been prioritized as a P2 in Necko. IMO, if there is a security bug then it's more likely in the response handling (doc nav) than in Necko itself. The necko bug mostly just uncovers that the paths above necko are vulnerable.

I think this should move back to window management component, and try to find a fix in ww, and we should also file a new bug for the necko issue to fix what you have found in comment 17.

What do you think?

Flags: needinfo?(honzab.moz) → needinfo?(bzbarsky)

IMO, if there is a security bug then it's more likely in the response handling

I mean, necko is violating its contract, which makes it really hard to reason about anything in the higher layers. There are lots of places where we assume that once we cancel a channel it is in fact canceled!

I think we just need to fix this on the necko side; it's really not clear to me what the higher layers could do here without breaking all sorts of other cases where we really do want to process a response from necko.

Flags: needinfo?(bzbarsky)

Got it. If this can only trigger because of necko contract violation and there is nothing else for the consumer but keeping some additional information about the channel cancellation (which is actually pretty redundant), then this is solely a necko bug, yes. Please feel free to nominate as a P1 so we can find an assignee and fix for this release.

Right, on the consumer side what this looks like is that we cancel a channel and then later we still get an OnStartRequest with a non-error status. How do I nominate as P1?

Ah, I mean to simply change the priority to P1. Done and ni? Nhi to find an assignee.

Flags: needinfo?(nhnguyen)
Priority: P2 → P1
Assignee: nobody → kershaw
Flags: needinfo?(nhnguyen)

(In reply to Honza Bambas (:mayhemer) from comment #20)

(In reply to Bob Clary [:bc:] from comment #19)

Honza: Do you still need the rr recording?

I believe you can delete it. It's possible to write a test for the scenario, so we will be able to verify the cause and the fix.

It looks like there is no easy way to write a test for this. To reproduce this, the pre-redirect channel has to be canceled after SendRedirect2Verify.

(In reply to Kershaw Chang [:kershaw] from comment #29)

(In reply to Honza Bambas (:mayhemer) from comment #20)

(In reply to Bob Clary [:bc:] from comment #19)

Honza: Do you still need the rr recording?

I believe you can delete it. It's possible to write a test for the scenario, so we will be able to verify the cause and the fix.

It looks like there is no easy way to write a test for this. To reproduce this, the pre-redirect channel has to be canceled after SendRedirect2Verify.

OK, if this is too hard, just leave it. Although, I was thinking about having a custom redirect veto handler added that would sync somehow between the two processes. It's easy now to do messaging in xpcshell tests for e10s. But that was just a thought and maybe it's really not that easy.

Pushed by kjang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4e86786de847 Cancel post-redirect channel if pre-redirect channel is already canceled r=mayhemer
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72
See Also: → 1599498
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: