Open Bug 1818569 Opened 1 year ago Updated 6 months ago

Hit MOZ_CRASH(E10SUtils.getRemoteTypeForWorkerPrincipal did throw: workerType=shared, principal=ftp, preferredRemoteType=webIsolated, processRemoteType=webIsolated, errorName=NS_ERROR_UNEXPECTED, errorLocation=resource://gre/modules/E10SUtils.sys.mjs:157)

Categories

(Core :: DOM: Workers, defect, P3)

x86_64
Linux
defect

Tracking

()

People

(Reporter: jkratzer, Assigned: aiunusov)

References

(Blocks 1 open bug)

Details

(Keywords: testcase, Whiteboard: [bugmon:bisected,confirmed])

Crash Data

Attachments

(1 file)

Testcase found while fuzzing mozilla-central rev 16f49fd3a5dc (built with: --enable-address-sanitizer --enable-fuzzing).

Testcase can be reproduced using the following commands:

$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch --build 16f49fd3a5dc --asan --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.html
Hit MOZ_CRASH(E10SUtils.getRemoteTypeForWorkerPrincipal did throw: workerType=shared, principal=ftp, preferredRemoteType=webIsolated, processRemoteType=webIsolated, errorName=NS_ERROR_UNEXPECTED, errorLocation=resource://gre/modules/E10SUtils.sys.mjs:157)

    =================================================================
    ==1011446==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000001 (pc 0x7f1d34f6d1c6 bp 0x7ffda6a22090 sp 0x7ffda6a21500 T0)
    ==1011446==The signal is caused by a WRITE memory access.
    ==1011446==Hint: address points to the zero page.
        #0 0x7f1d34f6d1c6 in MOZ_Crash /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:261:3
        #1 0x7f1d34f6d1c6 in mozilla::dom::RemoteWorkerManager::GetRemoteType(nsCOMPtr<nsIPrincipal> const&, mozilla::dom::WorkerKind) /dom/workers/remoteworkers/RemoteWorkerManager.cpp:234:5
        #2 0x7f1d34fbfbbe in mozilla::dom::SharedWorker::Constructor(mozilla::dom::GlobalObject const&, nsTSubstring<char16_t> const&, mozilla::dom::StringOrWorkerOptions const&, mozilla::ErrorResult&) /dom/workers/sharedworkers/SharedWorker.cpp:228:21
        #3 0x7f1d30fc8af9 in mozilla::dom::SharedWorker_Binding::_constructor(JSContext*, unsigned int, JS::Value*) /builds/worker/workspace/obj-build/dom/bindings/SharedWorkerBinding.cpp:612:58
        #4 0x7f1d3accd8bc in CallJSNative /js/src/vm/Interpreter.cpp:459:13
        #5 0x7f1d3accd8bc in CallJSNativeConstructor /js/src/vm/Interpreter.cpp:475:8
        #6 0x7f1d3accd8bc in InternalConstruct(JSContext*, js::AnyConstructArgs const&, js::CallReason) /js/src/vm/Interpreter.cpp:694:10
        #7 0x7f1d3bb985f6 in js::jit::DoCallFallback(JSContext*, js::jit::BaselineFrame*, js::jit::ICFallbackStub*, unsigned int, JS::Value*, JS::MutableHandle<JS::Value>) /js/src/jit/BaselineIC.cpp:1570:10
        #8 0x3fffbaab7da8  (<unknown module>)
    
    AddressSanitizer can not provide additional info.
    SUMMARY: AddressSanitizer: SEGV /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:261:3 in MOZ_Crash
    ==1011446==ABORTING
Attached file Testcase

Verified bug as reproducible on mozilla-central 20230223172038-8abe8c3a6233.
Unable to bisect testcase (Testcase reproduces on start build!):

Start: c875dbd49223e460b596f01cc6564c6fb97d59c4 (20220225104705)
End: 16f49fd3a5dc65e1275c9d38e51e5fa62d0c3af7 (20230223151926)
BuildFlags: BuildFlags(asan=True, tsan=False, debug=False, fuzzing=True, coverage=False, valgrind=False, no_opt=False, fuzzilli=False, nyx=False)

Whiteboard: [bugmon:confirm] → [bugmon:bisected,confirmed]
Crash Signature: [@ mozilla::dom::RemoteWorkerManager::GetRemoteType ]

Unfortunately the test case seems to be timing sensitive (for (let i = 0; i < 11; i++) { }), so it does not reproduce for me.

In any case this is a diagnostic assert (though it reads MOZ_CRASH it is surrounded by #ifdef) that contains some information added in bug 1663512:

E10SUtils.getRemoteTypeForWorkerPrincipal did throw:
workerType=shared,
principal=ftp,
preferredRemoteType=webIsolated,
processRemoteType=webIsolated,
errorName=NS_ERROR_UNEXPECTED,
errorLocation=resource://gre/modules/E10SUtils.sys.mjs:157

Luca, does that help diagnose the problem?

Flags: needinfo?(lgreco)
See Also: → 1663512

(In reply to Jens Stutte [:jstutte] from comment #4)

Unfortunately the test case seems to be timing sensitive (for (let i = 0; i < 11; i++) { }), so it does not reproduce for me.

In any case this is a diagnostic assert (though it reads MOZ_CRASH it is surrounded by #ifdef) that contains some information added in bug 1663512:

E10SUtils.getRemoteTypeForWorkerPrincipal did throw:
workerType=shared,
principal=ftp,
preferredRemoteType=webIsolated,
processRemoteType=webIsolated,
errorName=NS_ERROR_UNEXPECTED,
errorLocation=resource://gre/modules/E10SUtils.sys.mjs:157

Luca, does that help diagnose the problem?

Sure thing. I didn't expect it to timing sensitive and at least for me locally it wasn't and I manage to hit it consistently, the crash as you already mentioned is only hit because of the diagnostic crash on purpose which is only enabled in builds where the diagnostic assertions are enabled (the reason why it reads as MOZ_CRASH is due to the fact that we needed to include more details into that diagnostic crash and we had to opt for that implementation strategy at the time, which is unfortunate because it makes a bit less immediately clear that is a crash we trigger on purpose on non-release channels).

On a release channel that is aborting the ShareWorker construtor call (Unchaught DOMException: The operation was aborted), because on those build the diagnostic crash is not enabled.

For comparison, on chrome the same test page also hits a DOMException, but mentioning explicitly that the failure is due to the http://127.0.0.1:8080 origin not be accessible to a script coming from ftp://:pass@127.0.0.1%24/ (this seems to also be raised right away and the network not being hit)

In our case the AbortError hit on release (and diagnostic crash hit in non-release builds) are triggered (currently on purpose) from here in E10SUtils.sys.mjs validateWebRemoteType function (as also include in the diagnostic crash log details):

Which means that at the moment we would be throwing that DOMException (and eventually a diagnostic crash) on a SharedWorker that is being spawn from any of the kSafeScheme listed as well any of the web+... (custom ones registered from webapps), the kSafeScheme list in particular should be listing schemes that are allowed to be handled by an external application or from a website or extension (because the one listed in kSafeScheme are allowed to be registered as protocol handlers also by webapps and extensions without the web+... ).

Flags: needinfo?(lgreco)

Hi Andrew,
I'd like to double-check what would be your perspective on what I described in comment 5, and in particular (given also the comparison with what happens on Chrome under the same scenario) double-check with you if we do have enough element to decide if we would want to change the current behavior under this kind of scenario?

e.g. given that we know a couple of ways to hit this particular Cr.NS_ERROR_UNEXPECTED it may make sense to look into how we may take it out of the diagnostic crash and just abort it also on the non-release channel?

Flags: needinfo?(bugmail)

Very much agreed that we should stop inducing diagnostic-assert crashes for these cases and instead just generate errors.

Spec-wise, it seems like there isn't a good way to address this earlier in the data pipeline, although there are I guess still some outstanding discussions about improving SharedWorker error handling and the nuances of where the settings objects are created and when the fetches happen at/linked from https://github.com/whatwg/html/issues/5362 and https://github.com/whatwg/html/issues/5323 but I don't think that's something we need to address here.

Flags: needinfo?(bugmail)

Since bug 1663512 is p3/s3, so is this.

Severity: -- → S3
Priority: -- → P3

Bugmon was unable reproduce this issue.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Keywords: bugmon
Keywords: bugmon

A change to the Taskcluster build definitions over the weekend caused Bugmon to fail when reproducing issues. This issue has been corrected. Re-enabling bugmon.

Unable to reproduce bug 1818569 using build mozilla-central 20230223151926-16f49fd3a5dc. Without a baseline, bugmon is unable to analyze this bug.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Keywords: bugmon

Interesting, bugmon seems to not be able to reproduce, while we still see some sparse instances for 118 beta. I assume we should just do what comment 7 suggests. Artur, would you mind to take this?

Flags: needinfo?(aiunusov)

(I assume this would also fix bug 1663512.)

Assignee: nobody → aiunusov
Flags: needinfo?(aiunusov)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: