Closed Bug 1756172 Opened 2 years ago Closed 2 years ago

Crash [@ IsOnWorkerThread]

Categories

(Core :: DOM: Service Workers, defect, P3)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
100 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox97 --- unaffected
firefox98 --- unaffected
firefox99 --- wontfix
firefox100 --- verified

People

(Reporter: jkratzer, Assigned: jstutte)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression, sec-other, testcase, Whiteboard: [bugmon:bisected,confirmed][adv-main100+r])

Crash Data

Attachments

(3 files, 2 obsolete files)

Testcase found while fuzzing mozilla-central rev 2b187d932ca1 (built with: --enable-debug --enable-fuzzing).

Testcase can be reproduced using the following commands:

$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch --build 2b187d932ca1 --debug --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.zip
[@ IsOnWorkerThread]

    ==25029==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000480 (pc 0x7f416630ef4e bp 0x7f4159ae9280 sp 0x7f4159ae9270 T25043)
    ==25029==The signal is caused by a READ memory access.
    ==25029==Hint: address points to the zero page.
        #0 0x7f416630ef4e in IsOnWorkerThread /dom/workers/WorkerPrivate.cpp:5347:3
        #1 0x7f416630ef4e in AssertIsOnWorkerThread /dom/workers/WorkerPrivate.cpp:5355:3
        #2 0x7f416630ef4e in mozilla::dom::WorkerGlobalScopeBase::GetGlobalJSObjectPreserveColor() const /dom/workers/WorkerScope.cpp:259:19
        #3 0x7f4166521780 in HasJSGlobal /dom/base/nsIGlobalObject.h:123:37
        #4 0x7f4166521780 in mozilla::dom::ScriptSettingsStackEntry::ScriptSettingsStackEntry(nsIGlobalObject*, mozilla::dom::ScriptSettingsStackEntry::Type) /dom/script/ScriptSettings.cpp:171:3
        #5 0x7f41664f998a in mozilla::dom::AutoJSAPI::AutoJSAPI(nsIGlobalObject*, bool, mozilla::dom::ScriptSettingsStackEntry::Type) /dom/script/ScriptSettings.cpp:423:7
        #6 0x7f41664f96d6 in mozilla::dom::AutoEntryScript::AutoEntryScript(nsIGlobalObject*, char const*, bool) /dom/script/AutoEntryScript.cpp:66:7
        #7 0x7f41618f925e in void mozilla::dom::Promise::MaybeSomething<JS::Handle<JS::Value> const&>(JS::Handle<JS::Value> const&, void (mozilla::dom::Promise::*)(JSContext*, JS::Handle<JS::Value>)) /builds/worker/workspace/obj-build/dist/include/mozilla/dom/Promise.h:326:21
        #8 0x7f416631fea6 in InvokeMethod<(lambda at /dom/workers/WorkerScope.cpp:1129:14), void ((lambda at /dom/workers/WorkerScope.cpp:1129:14)::*)(const mozilla::MozPromise<bool, nsresult, true>::ResolveOrRejectValue &) const, mozilla::MozPromise<bool, nsresult, true>::ResolveOrRejectValue> /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:630:12
        #9 0x7f416631fea6 in InvokeCallbackMethod<false, (lambda at /dom/workers/WorkerScope.cpp:1129:14), void ((lambda at /dom/workers/WorkerScope.cpp:1129:14)::*)(const mozilla::MozPromise<bool, nsresult, true>::ResolveOrRejectValue &) const, mozilla::MozPromise<bool, nsresult, true>::ResolveOrRejectValue, RefPtr<mozilla::MozPromise<bool, nsresult, true>::Private> > /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:661:5
        #10 0x7f416631fea6 in mozilla::MozPromise<bool, nsresult, true>::ThenValue<mozilla::dom::ServiceWorkerGlobalScope::SkipWaiting(mozilla::ErrorResult&)::$_3>::DoResolveOrRejectInternal(mozilla::MozPromise<bool, nsresult, true>::ResolveOrRejectValue&) /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:914:7
        #11 0x7f4162438602 in mozilla::MozPromise<bool, nsresult, true>::ThenValueBase::ResolveOrRejectRunnable::Run() /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:487:21
        #12 0x7f41619cdb3e in OnDiscard /xpcom/threads/nsThreadUtils.cpp:94:9
        #13 0x7f41619cdb3e in non-virtual thunk to mozilla::CancelableRunnable::OnDiscard() /xpcom/threads/nsThreadUtils.cpp
        #14 0x7f4166319f07 in mozilla::dom::(anonymous namespace)::ExternalRunnableWrapper::Cancel() /dom/workers/WorkerPrivate.cpp:219:13
        #15 0x7f416630bb5f in mozilla::dom::WorkerRunnable::Run() /dom/workers/WorkerRunnable.cpp:247:5
        #16 0x7f41619c10a9 in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1165:16
        #17 0x7f41619bd574 in NS_ProcessPendingEvents(nsIThread*, unsigned int) /xpcom/threads/nsThreadUtils.cpp:432:19
        #18 0x7f41662fea22 in mozilla::dom::WorkerPrivate::ClearMainEventQueue(mozilla::dom::WorkerPrivate::WorkerRanOrNot) /dom/workers/WorkerPrivate.cpp:3750:5
        #19 0x7f41662da980 in mozilla::dom::workerinternals::(anonymous namespace)::WorkerThreadPrimaryRunnable::Run() /dom/workers/RuntimeService.cpp:2193:23
        #20 0x7f41619c10a9 in nsThread::ProcessNextEvent(bool, bool*) /xpcom/threads/nsThread.cpp:1165:16
        #21 0x7f41619c84fa in NS_ProcessNextEvent(nsIThread*, bool) /xpcom/threads/nsThreadUtils.cpp:467:10
        #22 0x7f4162475dcb in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /ipc/glue/MessagePump.cpp:300:20
        #23 0x7f4162391da7 in MessageLoop::RunInternal() /ipc/chromium/src/base/message_loop.cc:331:10
        #24 0x7f4162391cb2 in RunHandler /ipc/chromium/src/base/message_loop.cc:324:3
        #25 0x7f4162391cb2 in MessageLoop::Run() /ipc/chromium/src/base/message_loop.cc:306:3
        #26 0x7f41619bc7de in nsThread::ThreadFunc(void*) /xpcom/threads/nsThread.cpp:387:10
        #27 0x7f41760afa57 in _pt_root /nsprpub/pr/src/pthreads/ptthread.c:201:5
        #28 0x7f4176e2b608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477:8
        #29 0x7f41769f3292 in __clone /build/glibc-eX1tMB/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
    
    UndefinedBehaviorSanitizer can not provide additional info.
    SUMMARY: UndefinedBehaviorSanitizer: SEGV /dom/workers/WorkerPrivate.cpp:5347:3 in IsOnWorkerThread
    ==25029==ABORTING
Attached file Testcase
Component: DOM: Core & HTML → DOM: Service Workers
Flags: needinfo?(echuang)

Bugmon Analysis
Verified bug as reproducible on mozilla-central 20220218090822-6938c3b26d14.
The bug appears to have been introduced in the following build range:

Start: 85fb6ffa2887b21084c20349e999b1ec46db1993 (20220216131952)
End: c0922a11dfc958b24a5febb219dae8e556beab4c (20220216170132)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=85fb6ffa2887b21084c20349e999b1ec46db1993&tochange=c0922a11dfc958b24a5febb219dae8e556beab4c

Keywords: regression
Whiteboard: [bugmon:confirm] → [bugmon:bisected,confirmed]

This is actually not very concerning, we just assert here (so there is no potential UAF in old code pre-bug 1752120 in release here). But we should probably be careful with too much information on how to reproduce invariant violations like this, and the regressing bug is hidden, too.

Group: core-security
Regressed by: 1752120
Has Regression Range: --- → yes
Group: core-security → dom-core-security
Keywords: sec-other
Severity: -- → S3
Priority: -- → P3

Hi Jason, re-reading comment 0 I see: SEGV on unknown address 0x000000000480 which is not a plain nullptr so this is not the assert triggering but we want to read some member from a WorkerPrivate nullptr, IIUC. As the stack alone does not really reveal where that comes from, would it be possible to have a pernosco session here? Thanks!

Flags: needinfo?(echuang) → needinfo?(jkratzer)

You can find a pernosco session for this bug here.

Flags: needinfo?(jkratzer)

Thanks a lot!

I put some notes. So IIUC we should unroot the global only after having done mWorkerPrivate->ClearMainEventQueue(WorkerPrivate::WorkerRan); ?

Flags: needinfo?(bugmail)
Assignee: nobody → jstutte
Status: NEW → ASSIGNED

(In reply to Jens Stutte [:jstutte] from comment #7)

I put some notes. So IIUC we should unroot the global only after having done mWorkerPrivate->ClearMainEventQueue(WorkerPrivate::WorkerRan); ?

This is one of those cases where the thing that's crashing is indicating a potentially concerning lifetime problem but maybe we also want the thing that's catching this by virtue of crashing to assert under fuzzing but silently not crash or do bad things in production.

In terms of the things of what's catching this: In the pernosco trace, we don't actually care about the not-taken call to WorkerPrivateSaysForbidScript because the global was already marked as dying (nsIGlobalObject::StartDying) previously during the transitions to Canceling and Killing. So if we ever got to CallbackObject::CallSetup::CallSetup's invocation of IsScriptForbidden we'd say it's not okay. But we're not getting there because it's the call to nsIGlobalObject::HasJSGlobal during AutoEntryScript construction for the Promise that's crashing earlier. (Edited) Perhaps we could have the subclassed WorkerGlobalScopeBase::GetGlobalJSObjectPreserveColor that HasJSGlobal calls assert hard and return null when mWorkerPrivate is null. I'm not immediately clear on whether there will be runtime bails from this somewhere or just a change in where the crash is. :smaug NI-ing you to see if you have any thoughts about HasJSGlobal's behavior unrooting; pre-edit I was wondering if we should return false for HasJSGlobal when mDying, but that sounds too dramatic given that the HasJSGlobal was in an assert, not a runtime path. Note that I have not yet translated the Peers meeting about doing better about poisoning globals/JSContexts and part of that discussion did involve the behavior of AutoEntryScript/etc., so that might potentially be a relevant place to interact with the global (for new AES's).

In terms of the lifetime problem, the problem is that ServiceWorkerGlobalScope::SkipWaiting has a lambda that holds a promise that's basically entirely unaware of the lifecycle stuff going on. It's (through a helper wrapper) calling RemoteWorkerChild::MaybeSendSetServiceWorkerSkipWaitingFlag that involves bouncing to the worker launcher thread (bypassing the main thread). Arguably this scenario could be simpler if we were sending the SkipWaiting request over a PBackground path from the worker, which is something we'd discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1672493#c1. Specifically, async MozPromise/callbacks from the IPC actor will automatically reject when the actor is torn down, which is a fairly straightforward lifecycle step.

Note that skipWaiting is explicitly NewObject which means that we can't just have a dedicated holder slot for skipWaiting. The simplest fix for this might just be to acquire and hold a StrongWorkerRef for the duration of the skipwaiting promise since the request has a finite/bounded lifetime (which is likely why we were cavalier with the promise's ownership).

Flags: needinfo?(bugmail) → needinfo?(bugs)

Trying to translate this into a more local reasoning about the lifetime-relation between WorkerPrivate and WorkerGlobalScopeBase:

The "thing that's crashing" here is an access to the mWorkerPrivate of WorkerGlobalScopeBase after it has already been nulled out. I might be simplifying things, but IMHO we have only two options here regarding the lifetime-relation between WorkerPrivate and WorkerGlobalScopeBase derived objects:

  1. We expect WorkerGlobalScopeBase derived objects to never outlive their mWorkerPrivate.
    This seems to be kind of the current state, only that evidently there are violations to this. If this is what we want, we should just continue to enforce this (for example by adding more asserts before any access to mWorkerPrivate) and adjust any violations (actually we already catch them now by having nulled out the mWorkerPrivate). Apparently one violation can be caused by the mWorkerPrivate->ClearMainEventQueue(WorkerPrivate::WorkerRan); done after we nulled out the global's mWorkerPrivate and thus keeping this pointer alive until the latest possible moment in the WorkerPrivate's live seems advisable to me, as proposed by the patch here.

  2. We expect the global's mWorkerPrivate to possibly go away at any time, introducing a further (most definite) way of telling the global "this worker is dead".
    This would imply to null-check mWorkerPrivate before accessing it at runtime in all places and to act accordingly.

IIUC the state of the worker is already well modeled by its internal state machine, so until the global can access its mWorkerPrivate it knows the worker's state and acts accordingly and having mWorkerPrivate set to nullptr does not add new information. It just guards us from accessing a potentially dead object. And until today we kind of expected that 1. is true, so I would expect to be there only few violations we can find and should handle.

This is one of those cases where the thing that's crashing is indicating a potentially concerning lifetime problem but maybe we also want the thing that's catching this by virtue of crashing to assert under fuzzing but silently not crash or do bad things in production.

Not sure if I second this. I'd rather prefer to first have a clear decision between 1. and 2. and then adjust the code accordingly. To crash only while fuzzing reads like "we want 1. but we are lazy to find all violations". Now that can be the right call if we expect those violations to be very hard to find.

In terms of the things of what's catching this: In the pernosco trace, we don't actually care about the not-taken call to WorkerPrivateSaysForbidScript because the global was already marked as dying (nsIGlobalObject::StartDying) previously during the transitions to Canceling and Killing. So if we ever got to CallbackObject::CallSetup::CallSetup's invocation of IsScriptForbidden we'd say it's not okay. But we're not getting there because it's the call to nsIGlobalObject::HasJSGlobal during AutoEntryScript construction for the Promise that's crashing earlier.

Should I read this as "on the way to check if we should actually execute script here we died earlier due to the missing mWorkerPrivate" ? Such a check should definitely just fail but not crash in the state we are in here, IIUC. I assume (but I did not check) that if the mWorkerPrivate was still set in this situation, everything would have just been fine. And the latest possible access to it seems to be triggered by the final ClearMainEventQueue, so ensuring that we still can access our-self during that clear still seems advisable to me. It might not solve other violations, but we can be kind of sure that the worker thread will not try to use the global and its mWorkerPrivate anymore only after that point, and that any other access must come from a different thread then.

Long story short, IMHO we should:

  • decide that we want 1.
  • repair this specific violation, potentially with attachment 9268485 [details]
  • enforce the invariant from 1. further (adding some asserts), though this is kind of optional and mostly documents our assumption everywhere (setting mWorkerPrivate to nullptr technically is sufficient to catch them).

repair this specific violation

More precisely actually this specific violation can be seen just as a false-positive as we null out the mWorkerPrivate too early. D138442 was wrong in claiming "Null out the mWorkerPrivate on WorkerGlobalScopeBase when a worker ends." as the worker really ceases to process events only later.

Situation 1 is the only case that makes sense given that the event target should go away when the worker shuts down and the WorkerPrivate goes away. How can something legally be referencing a global which has a same-thread-usage invariant when the thread no longer exists?

Any event loop processing can reveal bugs. ClearMainEventQueue is not special in this specific case; it's generally plausible that we could see the exact same crash scenario on the bare worker thread after the primary runnable has completed. (And we could probably shift the reproduction to behave like this if we removed the call to ClearMainEventQueue.) https://bugzilla.mozilla.org/show_bug.cgi?id=1743671#c19 is some recent discussion in this space about us potentially tightening up the lack of meaningful invariants around our worker event targets.

My concern about crashing is primarily to avoid crashing on the release channel. It's very easy for sites to change their behavior (ex: using our new-ish implementations of things) and uncover longstanding correctness issues that we haven't addressed. While it would be nice to have the crash reports from the field, it feels like the wrong trade-off if the net result might be to cause non-stop crashing of content processes for that origin. It's a situation where I'd much rather have the discussed ability to effectively trigger a crash report without actually taking down the browser process.

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #12)

Any event loop processing can reveal bugs. ClearMainEventQueue is not special in this specific case; it's generally plausible that we could see the exact same crash scenario on the bare worker thread after the primary runnable has completed.

Ah, that was probably what I was missing. Thanks! Still I think we should move unrooting after ClearMainEventQueue, simply because it is a well known and expected case.

My concern about crashing is primarily to avoid crashing on the release channel. It's very easy for sites to change their behavior (ex: using our new-ish implementations of things) and uncover longstanding correctness issues that we haven't addressed. While it would be nice to have the crash reports from the field, it feels like the wrong trade-off if the net result might be to cause non-stop crashing of content processes for that origin. It's a situation where I'd much rather have the discussed ability to effectively trigger a crash report without actually taking down the browser process.

That would mean to go through all accesses of mWorkerPrivate and add a MOZ_ASSERT and an if (!mWorkerPrivate) bail out;, I assume. If you know this to be such sensible that we cannot just rely on tests and fuzzers, then we should probably do so.

(In reply to Jens Stutte [:jstutte] from comment #13)

That would mean to go through all accesses of mWorkerPrivate and add a MOZ_ASSERT and an if (!mWorkerPrivate) bail out;, I assume. If you know this to be such sensible that we cannot just rely on tests and fuzzers, then we should probably do so.

I guess my specific proposal would be:

  • We shouldn't be adding new MOZ_CRASH() calls related to this. We should favor MOZ_ASSERT or MOZ_DIAGNOSTIC_ASSERT followed by an attempt to bail and not crash.
  • We should make AutoEntryScript not crash when mWorkerPrivate is null, but it definitely should at least MOZ_ASSERT with not having the global. But it would be nice if in production the case did not crash out of an abundance of caution.

In terms of transforming all existing mWorkerPrivate calls... I don't know? I think AutoEntryScript should cover most of the scenarios I would imagine are still lingering right now. I think it would probably be most beneficial right now to:

  • audit the cases like this specific scenario where the WorkerPrivate is involved in creating promises that involve runnables involving RemoteWorkerChild and fix those.
  • try and help improve the event target situation as discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1743671#c19 so that we could potentially assert when we're seeing runnables dispatched late, but this is a more complex issue

"We expect WorkerGlobalScopeBase derived objects to never outlive their mWorkerPrivate" sounds reasonable, but it means also that we assume there aren't ever any leaks. But leaks do occur, unfortunately, and we shouldn't crash because of them.

The stack trace in comment AutoEntryScript::AutoEntryScript is about the generic issue we've had with AutoEntryScript forever that it doesn't have Init() method similarly to AutoJSAPI. Whoever calls AutoJSAPI::Init() is supposed to check the return value, and that returns false for example if the
global doesn't have JS object anymore.
I commented about this also in https://phabricator.services.mozilla.com/D141719

Flags: needinfo?(bugs)

(In reply to Olli Pettay [:smaug] from comment #16)

"We expect WorkerGlobalScopeBase derived objects to never outlive their mWorkerPrivate" sounds reasonable, but it means also that we assume there aren't ever any leaks. But leaks do occur, unfortunately, and we shouldn't crash because of them.

That kind of brings us back to the options from comment 10. I understood Andrew is saying "this is an invariant we should enforce" and it is also what our code assumed before (risking UAF on the mWorkerPrivate in case of leaks which then we wanted to mitigate in bug 1752120 by nulling them out). So the first patch of this stack basically restores the previous behavior before bug 1752120 (through some re-ordering of calls) but keeping the UAF-safety and adding the asserts to enforce the (presumed) invariant, which might turn out difficult to achieve.

If we really cannot prevent from leaks and must accept them as a normal and expected state of operations, we will never be able to guarantee the invariant of option 1. That would automatically mean, we loosen the strict coupling of WorkerPrivate's lifecycle with WorkerGlobalScopeBase derived objects and need to always check mWorkerPrivate before using it, bringing us straight to option 2.

The stack trace in comment AutoEntryScript::AutoEntryScript is about the generic issue we've had with AutoEntryScript forever that it doesn't have Init() method similarly to AutoJSAPI. Whoever calls AutoJSAPI::Init() is supposed to check the return value, and that returns false for example if the global doesn't have JS object anymore.
I commented about this also in https://phabricator.services.mozilla.com/D141719

Well, this is one of the probably many consequences of 2. - we must add a meaningful and robust error handling if we always need to check mWorkerPrivate before using - if it is not there we need to act accordingly without crashing subsequently. And if we can repair this long standing issue together, even better.

I'd then propose to land https://bugzilla.mozilla.org/attachment.cgi?id=9268485 minus the sentinel asserts asap in order to repair the regression from bug 1752120 which makes us crash much more often than in the past before it hits release and then add additional patches to harden incrementally all accesses to mWorkerPrivate. But I'd want to hear also asuth' opinion on the (non-)possibility to enforce the invariant from option 1, of course.

Flags: needinfo?(bugmail)

Just adding some pieces of my (patchwork) understanding here:

IIUC, WorkerGlobalScopeBase asserts in most of its functions (AFAICS all functions that access mWorkerPrivate) that we are on the worker thread. So a better wording of the invariant we want to enforce could be:

"We do not expect anyone to use worker-thread-only functions on the global scope after we exit from WorkerThreadPrimaryRunnable::Run".

I assume this describes quite well the assumed state of things before we tried to improve them here - we just trusted that nobody would ever want to use mWorkerPrivate afterwards.

This is mainly achieved by emptying the event queue(s) of the worker thread and killing all pending timers before leaving. After that point we would not expect other events that want to access the global scopes' worker-thread-only functions to come in on the worker thread before it is completely torn down. If they do, it is probably an error and should be corrected.

We enforce this invariant already by setting mWorkerPrivate to nullptr, but we did it just too early in bug 1752120. Obviously this is a very hard enforcement, as we will always crash now. That could be ok if we expect this to happen rarely enough to not matter. In fact such use is equivalent to a UAF on the code pre bug 1752120 and we did not see those happen very often (but they are obviously harder to detect than a nullptr crash). If instead we want to soften the consequences, we need to treat all accesses to m̀WorkerPrivate` well just like scenario 2.

Now the main driver for all these changes was to audit the uses of WorkerPrivate* by the introduction of CheckedUnsafePtr<WorkerPrivate>. This resulted in the detection of the dangling WorkerPrivate* of WorkerGlobalScopeBase in some cases. This diagnostic assert is actually the equivalent of the sentinel asserts we wanted to introduce here - just triggered later on destruction of the global scope object and detecting a leak. It might thus give us a hint (a lower boundary actually, as it is checked later) on how often this can be expected to happen and if it is worth to try to enforce the stronger invariant: "We expect WorkerGlobalScopeBase derived objects to never outlive their mWorkerPrivate".

All this seems to tell me, that:

I'd then propose to land https://bugzilla.mozilla.org/attachment.cgi?id=9268485 minus the sentinel asserts asap in order to repair the regression from bug 1752120 which makes us crash much more often than in the past before it hits release and then add additional patches to harden incrementally all accesses to mWorkerPrivate.

is the right path forward here, unless we want to rewind all the CheckedUnsafePtr patches and start something completely different - which sounds bad to me as they actually helped to detect real problems (like here) and are also a step towards our ability to transform them to real smart pointers (of whatever flavor, strong or weak) one day.

The scope of this bug here should just be "repair the regression from bug 1752120". All the rest can be done in normal, non-secure bugs, I think, as we should then be on the safe side from the UAF point of view. And if this fix here really causes many crashes, we will have the right drive to finally improve things.

(I assume, Eden is already following this discussion, too, but in case - fyi)

Flags: needinfo?(echuang)

(asuth and Eden are well involved, so clearing their ni?s)

Jason, would it be possible to get a check if this test case crashes on an optimized non-debug build (simulation as release) ? At least at this specific line the crash might be just debug only. Thank you!

Flags: needinfo?(jkratzer)
Flags: needinfo?(echuang)
Flags: needinfo?(bugmail)

:jstutte, it does not crash for me using an optimized build of mozilla-central rev 607fa621b6f2.

Flags: needinfo?(jkratzer)
Attachment #9268485 - Attachment description: Bug 1756172: Unroot the globals only after ClearMainEventQueue ran. r?#dom-worker-reviewers → Bug 1756172: Unroot the globals only after last worker event ran. r?#dom-worker-reviewers
Attachment #9268485 - Attachment description: Bug 1756172: Unroot the globals only after last worker event ran. r?#dom-worker-reviewers → Bug 1756172: Unroot the global scopes only after the last worker event ran. r?#dom-worker-reviewers
Attachment #9269388 - Attachment description: Bug 1756172, Make also DebuggerScope support weak referencese, r=jstutte → Bug 1756172, Make also DebuggerScope support weak references, r=jstutte
Keywords: leave-open
Keywords: leave-open
Attachment #9269332 - Attachment is obsolete: true
Attachment #9268824 - Attachment is obsolete: true
Group: dom-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 100 Branch

Bugmon Analysis
Bug appears to be fixed on mozilla-central 20220325214737-2b624fdb002e but BugMon was unable to find a usable build for 2b187d932ca1.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

The patch landed in nightly and beta is affected.
:jstutte, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(jstutte)

Andrew, wdyt? We are already in late beta, IIUC, and release seems not affected by these specific crashes. So just riding the trains might be ok here?

Flags: needinfo?(jstutte) → needinfo?(bugmail)

Yes, this likely should just ride the trains given that this landed after the last beta in the cycle and CheckedUnsafePtr is behaving like the pre-CheckedUnsafePtr transition in release.

That said, wacky crashes are possible if there are changes in what the web is doing, so if we see a spike in crashes in workers on release at any point during worker shutdown, this would then want a dot release. But it's not worth the risk to try and uplift it at this very late date.

Flags: needinfo?(bugmail)
Whiteboard: [bugmon:bisected,confirmed] → [bugmon:bisected,confirmed][adv-main100+r]
Regressions: 1769878
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: