Open Bug 1435343 Opened 7 years ago Updated 6 days ago

Crash in [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging]. Shutdown problem in workers.

Categories

(Core :: DOM: Workers, defect, P3)

defect

Tracking

()

ASSIGNED
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox-esr78 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 - wontfix
firefox82 --- wontfix
firefox83 --- wontfix
firefox84 --- wontfix
firefox85 --- wontfix
firefox86 --- fix-optional

People

(Reporter: mccr8, Assigned: jstutte)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

(4 keywords, Whiteboard: [DWS_NEXT][stockwell unknown][tbird topcrash],qa-not-actionable)

Crash Data

Attachments

(1 obsolete file)

This bug was filed from the Socorro interface and is
report bp-5a9b14a4-456a-4502-ae80-f61c10180202.
=============================================================

Top 8 frames of crashing thread:

0 mozglue.dll MOZ_CrashOOL mfbt/Assertions.cpp:33
1 xul.dll mozilla::dom::workerinternals::RuntimeService::CrashIfHanging dom/workers/RuntimeService.cpp:2014
2 xul.dll mozilla::`anonymous namespace'::RunWatchdog toolkit/components/terminator/nsTerminator.cpp:162
3 nss3.dll PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397
4 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:137
5 ucrtbase.dll __crt_stdio_output::crop_zeroes 
6 kernel32.dll BaseThreadInitThunk 
7 ntdll.dll RtlUserThreadStart 

=============================================================

Number 12 Windows top crash on the 2/1 build, from a number of different installations.
baku, any ideas?
Flags: needinfo?(amarchesini)
This is a shutdown problem in workers. It's my top priority for this week.
Flags: needinfo?(amarchesini)
In this particular crash, the bug is not in workers. This is the crash message:

Workers Hanging - 0|A:3|S:0|Q:0-BC:0|...
                  ^
                  where, this 0, means that RuntimeService has not received the xpcom-shutdown notification yet.

This is happening because netwerk/cache2/CacheFileIOManager.cpp:4156 is blocking the main-thread doing some I/O.

jduell, can this operation be done on an I/O thread instead the main-one?
Flags: needinfo?(jduell.mcbugs)
Just to be clear, the worker shutdown happens when xpcom-shutdown is received, but the crash report shows that this operation is not started yet.

https://dxr.mozilla.org/mozilla-central/source/dom/workers/RuntimeService.cpp#1996

mShuttingDown is set here: https://dxr.mozilla.org/mozilla-central/source/dom/workers/RuntimeService.cpp#1852

by:

https://dxr.mozilla.org/mozilla-central/source/dom/workers/RuntimeService.cpp#2611-2614
Assignee: nobody → amarchesini
Priority: -- → P1
Hoping Michal or Honza can answer comment #3
Flags: needinfo?(jduell.mcbugs) → needinfo?(michal.novotny)
Flags: needinfo?(honzab.moz)
There are other related crashes. Here a few:

https://crash-stats.mozilla.com/report/index/b924b1db-a013-4b0a-a137-98f910180206#allthreads
here we crash because the main-thread is blocked by mozilla::net::nsSocketTransportService::ShutdownThread() that spins the event loop and never returns.

https://crash-stats.mozilla.com/report/index/7cd7e0ae-bba3-43ef-ae44-0ba890180206#allthreads
here netwerk/cache2/CacheFileIOManager.cpp:583 calls: mozilla::net::ShutdownEvent::PostAndWait()

https://crash-stats.mozilla.com/report/index/68ae56fb-016f-4ac1-be5c-d0acd0180206#allthreads
blocks main-thread with mozilla::net::CacheFileIOManager::SyncRemoveDir(nsIFile*, char const*)

https://crash-stats.mozilla.com/report/index/d8397a6e-923f-494b-a6f6-48f350180206#allthreads
maybe unrelated, but still necko: netwerk/protocol/http/nsHttpHandler.cpp:2766 spins the event loop and it doesn't return.

https://crash-stats.mozilla.com/report/index/83089e4b-77e7-4920-ae56-16d960180206#allthreads and
https://crash-stats.mozilla.com/report/index/a910bdd8-a605-4444-a2a6-d4d4f0180206#allthreads and
https://crash-stats.mozilla.com/report/index/73db45ac-e05c-4441-8a8f-1b5b00180206#allthreads
mozilla::net::nsHttpConnectionMgr::Shutdown() spins the event loop.

I recently landed a patch that starts the worker shutdown in xpcom-will-shutdown. This will improve the situation, but definitely, having a spin-event-loop when xpcom-shutdown notification is received, blocks other component to receive the same notification.
Depends on: 1435958
Depends on: 1435960
Depends on: 1435961
I'm filing separate bugs for each component blocking the main-thread on shutdown. Canceling the NIs here.
Flags: needinfo?(michal.novotny)
Flags: needinfo?(honzab.moz)
Depends on: 1435962
Depends on: 1435963
Depends on: 1435964
Depends on: 1435966
I also found:
https://treeherder.mozilla.org/logviewer.html#?job_id=160567328&repo=autoland&lineNumber=48584-48596

Which is related to bug 1411908. It also looks as it would be related to this problem. Andrea, could you please check?
Flags: needinfo?(amarchesini)
> Which is related to bug 1411908. It also looks as it would be related to
> this problem. Andrea, could you please check?

You are right. This is related to bug 1435958. QuotaManager is blocking the main-thread.
Flags: needinfo?(amarchesini)
(In reply to Henrik Skupin (:whimboo) from comment #9)
> https://treeherder.mozilla.org/logviewer.
> html#?repo=autoland&job_id=160035538&lineNumber=60357

I assume a new bug needs to be filed for this case which is mozilla::dom::workerinternals::RuntimeService::Cleanup
See Also: → 1434189
Firefox 60.0a1 Crash Report [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging ]
ID: a5168015-8edb-4e09-ab5d-ada430180211

Date Processed 	2018-02-11 04:55:59
Uptime 	9,984 seconds (2 hours, 46 minutes and 24 seconds)
Last Crash 	611,637 seconds before submission (1 week, 1 hour and 53 minutes)
Install Age 	400,199 seconds since version was first installed (4 days, 15 hours and 9 minutes)
Install Time 	2018-02-06 10:22:39

Release Channel 	nightly
Version 	60.0a1
Build ID 	20180205220102
OS 	Windows 7

MOZ_CRASH Reason 	Workers Hanging - 0|A:3|S:0|Q:0-BC:0|WorkerHolderToken|PerformanceStorageWorkerHolder-BC:0|WorkerHolderToken|PerformanceStorageWorkerHolder-BC:0|WorkerHolderToken|PerformanceStorageWorkerHolder

Total Virtual Memory 	8,796,092,891,136 bytes (8.8 TB)
Available Virtual Memory 	8,793,084,715,008 bytes (8.79 TB)
Available Page File 	5,065,678,848 bytes (5.07 GB)
Available Physical Memory 	2,159,378,432 bytes (2.16 GB)

Crashing Thread (62), Name: Shutdown Hang Terminator
Frame 	Module 	Signature 	Source
0 	mozglue.dll 	MOZ_CrashOOL 	mfbt/Assertions.cpp:33
1 	xul.dll 	mozilla::dom::workerinternals::RuntimeService::CrashIfHanging() 	dom/workers/RuntimeService.cpp:2014
2 	xul.dll 	mozilla::`anonymous namespace'::RunWatchdog 	toolkit/components/terminator/nsTerminator.cpp:162
3 	nss3.dll 	PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:397
4 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:137
5 	ucrtbase.dll 	__crt_stdio_output::crop_zeroes(char*, __crt_locale_pointers* const) 	
6 		@0x1400a2 	
7 	ntdll.dll 	RtlUserThreadStart
(In reply to Trevor Skywalker from comment #12)
> Firefox 60.0a1 Crash Report [@
> mozilla::dom::workerinternals::RuntimeService::CrashIfHanging ]
> ID: a5168015-8edb-4e09-ab5d-ada430180211

Here an other example of something blocking the shutting down:

MOZ_CRASH Reason 	Workers Hanging - 0|A:3|S:0|Q:0-BC:0...
                                          ^

0 means that the xpcom-shutdown has not been received by RuntimeService.cpp yet.
The main-thread seems busy doing some JS stuff.
We should see a decreasing of the crash-report because of bug 1437575.
Depends on: 1437575
(In reply to Andrea Marchesini [:baku] from comment #15)
> We should see a decreasing of the crash-report because of bug 1437575.

Out of interest, could you explain why improved logging (as you mentioned on this other bug in the initial comment) makes it so that we do not see that many crashes anymore?
Flags: needinfo?(amarchesini)
In bug 1437575 I introduced a new crash message that is shown in case the shutdown steps are not completed yet after the internal timeout. If this happens, it means that a component is blocking the main-thread.

Because of this new crash message, we are not going to see mozilla::dom::workerinternals::RuntimeService::CrashIfHanging signature except if really, the hanging happens because of workers.
Flags: needinfo?(amarchesini)
Ok, so that just changes the log message, but wouldn't reduce the amount of possible crash reports. Just that this specific crash as covered by this bug won't happen that often anymore. 

Thanks, and I will keep an eye out for it.
With the latest changes, these crash-reports dropped down. Can we reduce the priority to p2, maybe?
Flags: needinfo?(afarre)
Flags: needinfo?(afarre)
Priority: P1 → P2
Depends on: 1445020
No longer depends on: 1435958
Depends on: 1356853
Crash Signature: [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging] → [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging] [@ shutdownhang | nsThread::Shutdown | mozilla::net::nsSocketTransportService::ShutdownThread ] [@ shutdownhang | mozilla::net::ShutdownEvent::PostAndWait] [@ shutdownhang | mozilla::Spi…
FF44-58
[@ shutdownhang | mozilla::dom::workers::RuntimeService::Cleanup]
Showing results from 7 days ago - 8,354 Results

FF59
[@ mozilla::dom::workers::RuntimeService::CrashIfHanging ] 
Showing results from 7 days ago - 2,318 Results

FF60/61
[@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging]
Showing results from 7 days ago - 221 Results 

-------------------------------

Top Crashers for Firefox 52.7.3esr
17  0.67% 	0.09% 	shutdownhang | mozilla::dom::workers::RuntimeService::Cleanup	804 	804 	0 	0 	648 	0 	2015-10-31

Top Crashers for Firefox 58.0.2
24 	0.56% 	0.15% 	shutdownhang | mozilla::dom::workers::RuntimeService::Cleanup	59 	59 	0 	0 	59 	0 	2015-10-31

Top Crashers for Firefox 59.0.2
8 	1.19% 	-0.06% 	mozilla::dom::workers::RuntimeService::CrashIfHanging	1933 	1827 	97 	9 	1951 	0 	2017-11-16 

Top Crashers for Firefox 60.0b
17 	0.58% 	-0.09% 	mozilla::dom::workerinternals::RuntimeService::CrashIfHanging	205 	186 	17 	2 	199 	0 	2018-02-01

Top Crashers for Firefox 61.0a1
52 	0.22% 	-0.07% 	mozilla::dom::workerinternals::RuntimeService::CrashIfHanging	15 	15 	0 	0 	15 	0 	2018-02-01
Crash Signature: mozilla::SpinEventLoopUntil<T> | mozilla::net::nsHttpConnectionMgr::Shutdown ] [@ shutdownhang | mozilla::net::nsHttpConnectionMgr::Shutdown ] [@ mozilla::net::CacheFileIOManager::SyncRemoveDir ] → mozilla::SpinEventLoopUntil<T> | mozilla::net::nsHttpConnectionMgr::Shutdown ] [@ shutdownhang | mozilla::net::nsHttpConnectionMgr::Shutdown ] [@ mozilla::net::CacheFileIOManager::SyncRemoveDir ] [@ mozilla::dom::workers::RuntimeService::CrashIfHangin…
OS: Windows 10 → All
Hardware: Unspecified → All
Summary: Crash in mozilla::dom::workerinternals::RuntimeService::CrashIfHanging → Crash in [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging]. Shutdown problem in workers.
Version: unspecified → 44 Branch
Trevor, what about this bug suggests its a service worker problem?  Just trying to figure out why you attached it to bug 1328631.
Flags: needinfo?(skywalker333)
(In reply to Ben Kelly [:bkelly] from comment #23)
> Trevor, what about this bug suggests its a service worker problem?  Just
> trying to figure out why you attached it to bug 1328631.

Sorry, it's not related to service workers.
Blocks: 988872
No longer blocks: ServiceWorkers-stability
Flags: needinfo?(skywalker333)
See Also: → 705178
Crash Signature: mozilla::SpinEventLoopUntil<T> | mozilla::net::nsHttpConnectionMgr::Shutdown ] [@ shutdownhang | mozilla::net::nsHttpConnectionMgr::Shutdown ] [@ mozilla::net::CacheFileIOManager::SyncRemoveDir ] [@ mozilla::dom::workers::RuntimeService::CrashIfHangin… → mozilla::SpinEventLoopUntil<T> | mozilla::net::nsHttpConnectionMgr::Shutdown ] [@ shutdownhang | static bool mozilla::SpinEventLoopUntil<T> | mozilla::net::nsHttpConnectionMgr::Shutdown] [@ shutdownhang | mozilla::net::nsHttpConnectionMgr::Shutdown ] …
update on status of this?
Flags: needinfo?(amarchesini)
Sort of. I'm waiting for a try-push result for bug 1434618. If it doesn't break any test, moving the shutdown of workers to xpcom-will-shutdown should improve this crash.
Flags: needinfo?(amarchesini)
Still the #3 top crash on beta 62.0b16. It's been in the top 10 for many releases so I'm marking it fix-optional for 62.
Andrew, while Baku is away can you give this a look and see if anything immediately apparent jumps out at you?
Flags: needinfo?(mdaly) → needinfo?(bugmail)
Per Marion, not tracking for 63.
Quick update, more to come as I investigate this over the next few days intermixed with other work:
- I'm taking over the bug since :baku is now primarily working on privacy engineering and tracking protection.
- The special crash reporting that tries to help us identify what's going on with the workers is reporting dispatch errors.  That likely wants to be fixed.  This wants to be investigated/fixed.
- Manual sampling of the crash reports involving mozilla::net::nsHttpConnectionMgr::Shutdown suggests that many of them don't actually have anything to do with worker shutdown.  However, some do seem to have workers around, so I wanted to script grabbing some tallies to provide directly actionable info to the necko team instead of just sweeping under the dirt under a bunch of other, smaller, rugs.
Assignee: amarchesini → bugmail
Status: NEW → ASSIGNED
removing NI and moving to DWS_NEXT
Flags: needinfo?(bugmail)
Whiteboard: DWS_NEXT
Assignee: bugmail → nobody
Status: ASSIGNED → NEW
Crash Signature: ] [@ mozilla::net::CacheFileIOManager::SyncRemoveDir ] [@ mozilla::dom::workers::RuntimeService::CrashIfHanging ] [@ shutdownhang | mozilla::dom::workers::RuntimeService::Cleanup] → ] [@ mozilla::net::CacheFileIOManager::SyncRemoveDir ] [@ mozilla::dom::workers::RuntimeService::CrashIfHanging ] [@ shutdownhang | mozilla::dom::workers::RuntimeService::Cleanup] [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging()]
Crash Signature: ] [@ mozilla::net::CacheFileIOManager::SyncRemoveDir ] [@ mozilla::dom::workers::RuntimeService::CrashIfHanging ] [@ shutdownhang | mozilla::dom::workers::RuntimeService::Cleanup] [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging()] → ] [@ mozilla::net::CacheFileIOManager::SyncRemoveDir ] [@ mozilla::dom::workers::RuntimeService::CrashIfHanging ] [@ shutdownhang | mozilla::dom::workers::RuntimeService::Cleanup] [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging()] [@…
Whiteboard: DWS_NEXT[stockwell unknown][topcrash-thunderbird] → [DWS_NEXT][stockwell unknown][tbird topcrash]

I want to understand better, what all these signatures are about, so ni to myself.

Flags: needinfo?(jstutte)

[@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging ] is the number one crash signature for the April 15 Linux Nightlies. I don't know if somebody was just having a really bad day or what.

Crash Signature: ] [@ mozilla::net::CacheFileIOManager::SyncRemoveDir ] [@ mozilla::dom::workers::RuntimeService::CrashIfHanging ] [@ shutdownhang | mozilla::dom::workers::RuntimeService::Cleanup] [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging()] … → ] [@ mozilla::net::CacheFileIOManager::SyncRemoveDir ] [@ mozilla::dom::workers::RuntimeService::CrashIfHanging ] [@ shutdownhang | mozilla::dom::workers::RuntimeService::Cleanup] [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging]

Digging a bit into the signatures, I see three main buckets of signatures:

  1. Crashes at arbitrary places caused by MOZ_CRASH("Shutdown hanging before starting."); or by MOZ_CRASH("Shutdown too long, probably frozen, causing a crash.");

    1.1 shutdownhang | nsThread::Shutdown | mozilla::net::nsSocketTransportService::ShutdownThread
    Happens mostly on 68.x but has some occurrences also on 75.0.

    1.2 shutdownhang | mozilla::net::ShutdownEvent::PostAndWait
    Here 75.0 and 68.x are dominant, but the total volume is one order of magnitude lower than 1.1.

    1.3 shutdownhang | mozilla::SpinEventLoopUntil<T> | mozilla::net::nsHttpConnectionMgr::Shutdown
    Here we have only versions up to 68.x. and mostly ESR. Volume is about half of 1.1.

    1.4 shutdownhang | mozilla::net::nsHttpConnectionMgr::Shutdown
    Here 75 and 52.9.0esr dominate the ranks. I assume this to be just a variant of 1.3.

    1.5 shutdownhang | mozilla::dom::workers::RuntimeService::Cleanup
    Happens only on versions up to 52.9.0esr and can be safely ignored.

    All these crashes seem not really worker related to me (or I am overlooking something not evident), at least we do not know, what caused the hang.

  2. Crashes with worker specific "Workers Hanging ..." MOZ_CRASH messages

    2.1. mozilla::dom::workerinternals::RuntimeService::CrashIfHanging
    (This signature has been added twice, it seems.)
    Here we have a collection of many different (but similar) MOZ_CRASH reasons. I assume, they reflect the evolution of those messages through the different versions (as we can see versions back to 60.2.0esr here).
    The 4 top scorers (making together more than 60%) are:

1 	Workers Hanging - 1|A:1|S:0|Q:0-BC:1|WorkerDebuggeeRunnable::mSender 	                 393 	21.30 %
2 	Workers Hanging - 1|A:1|S:0|Q:0-BC:0Dispatch Error 	                                 283 	15.34 %
3 	Workers Hanging - 1|A:3|S:0|Q:0-BC:0Dispatch Error-BC:0Dispatch Error-BC:0Dispatch Error 282 	15.28 %
4 	Workers Hanging - 1|A:1|S:0|Q:0-BC:1|IDBOpenDBRequest 	                                 165 	 8.94 %
5 	Workers Hanging - 1|A:2|S:0|Q:0-BC:0Dispatch Error-BC:0Dispatch Error 	                 141 	 7.64 %

. These are the cases to care (most) about in this bug, I think. It would be interesting to relate the different messages to the versions we see in order to narrow down similar causes.

. 2.2 mozilla::dom::workers::RuntimeService::CrashIfHanging
This signature happens on very old, unsupported versions only and can be safely ignored.

  1. Signatures without MOZ_CRASH message at all

    3.1 mozilla::net::CacheFileIOManager::SyncRemoveDir
    Has a very low but recent volume. I think this signature deserves a bug on its own.

    3.2 shutdownhang | static bool mozilla::SpinEventLoopUntil<T> | mozilla::net::nsHttpConnectionMgr::Shutdown
    Has no occurrences at all in our data and can be removed from the signatures.

:mccr8, do I read well in the crash data and can we adjust a bit the signatures relevant for this bug?

(edit: it seems I am unable to format this well - hope it works anyway)

Flags: needinfo?(jstutte) → needinfo?(continuation)

Looking into case 2.1. for "Dispatch Error" messages:

The "Dispatch Error" message is constructed here if and only if the Dispatch() returns false.

The first occasion to fail is the call to PreDispatch(mWorkerPrivate), which is a virtual function with many overrides.

Most implementations of that function just return true (some of them doing AssertIsOnMainThread(); some not) but there are four, that do more:

EventRunnable::PreDispatch
Despite its length, it returns always true (if it does not crash). So probably not relevant here.

WorkerDebuggeeRunnable::PreDispatch
Has a special behavior in case of ParentThreadUnchangedBusyCount, which can lead to false responses. This smells, as the busy count might be involved in determining pending workers? Interestinglee, WorkerDebuggeeRunnable has its own shutdown hang messages, too.

WorkerRunnable::PreDispatch
Here we have a special behavior in case of WorkerThreadModifyBusyCount which returns the result of aWorkerPrivate->ModifyBusyCount(true);. Again this smells.

NotifyRunnable::PreDispatch
Here we always return the result of aWorkerPrivate->ModifyBusyCount(true);. This smells even more with respect to the previous case?

Andrew, it might be just a gut feeling (ignoring the details of that code), but my impression is, that a PreDispatch returning false might provoke a shutdown hang (not) manipulating the BC correctly (in some cases)?

Flags: needinfo?(bugmail)

BTW, I made a sheet with the "Workers Hanging ..." messages for 75.0. Note the very long messages for some WorkerDebuggeeRunnable::mSender which are caused by many WorkerRefs in WorkerPrivate. This looks suspicious, too.

I don't know anything about worker shutdown, but your analysis sounds reasonable to me. It looks like baku fixed a bunch of issues back in 2018 when this was first filed, so it would make sense that some of the signatures might not be happening in recent versions. I'm not sure why he added the HTTP connection manager signatures to this bug.

Flags: needinfo?(continuation)
Blocks: 1633342
No longer blocks: 1633342

Removed all signatures but case 2.1 from this bug, created bug 1633342 to collect the other (probably net related) signatures (and dropped the signatures with cases for unsupported versions only).

Crash Signature: [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging] [@ shutdownhang | nsThread::Shutdown | mozilla::net::nsSocketTransportService::ShutdownThread ] [@ shutdownhang | mozilla::net::ShutdownEvent::PostAndWait] [@ shutdownhang | mozilla::Spi… → [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging]
Blocks: 1633342

Moved the single dependencies to bug 1633342. Still I am not sure, if all these dependencies are real.

No longer blocks: 1633342
Depends on: 1633342
No longer depends on: 1356853, 1435961, 1435962, 1445020, 1594572
Blocks: 1633342
No longer depends on: 1633342

Not sure why bugzilla switched those dependencies.

No longer blocks: 1633342
Depends on: 1633342
See Also: → 1633469

Expanding on comment 71, the relevant Runnable is a CrashIfHangingRunnable, whose PreDispatch always returns true, so that can't be the source of the failure. Going one step further into the relevant DispatchInternal, WorkerPrivate::DispatchControlRunnable is getting called. This function only fails if the worker's status is Dead, so that seems to be the source of the problem. This failure is possible because there's a race condition between a worker's status turning Dead and its removal from RuntimeService::mDomainMap. If we consider the simple case with a single worker, its removal happens here, and gets scheduled from here.
Long story short, it looks like the worker isn't really hanging, but rather the main thread doesn't run the Runnable that removes the record of the worker's existence. If we take a look at one of the relevant reports, we can see that the DOM Worker thread is indeed idle. It might be possible to reduce the number of reports that falsely attribute the hang to workers by removing the worker's entry from RuntimeService::mDomainMap at the same time its status changes to Dead, I have to look into it.

Assignee: nobody → ytausky
Depends on: 1636147
Flags: needinfo?(bugmail)

Shall we expect then with the fix from bug 1636147, that all the crashes with DispatchError in the message (around 70%) would go away, leaving around only the ones with WorkerDebuggeRunnable::mSender ? That would be a great reduction!

Flags: needinfo?(ytausky)

Yes, that's the idea. Those messages indicate that the main thread is hanging, not the workers.

Flags: needinfo?(ytausky)

(In reply to Jens Stutte [:jstutte] from comment #72)

BTW, I made a sheet with the "Workers Hanging ..." messages for 75.0. Note the very long messages for some WorkerDebuggeeRunnable::mSender which are caused by many WorkerRefs in WorkerPrivate. This looks suspicious, too.

(In reply to Jens Stutte [:jstutte] from comment #78)

Shall we expect then with the fix from bug 1636147, that all the crashes with DispatchError in the message (around 70%) would go away, leaving around only the ones with WorkerDebuggeRunnable::mSender ? That would be a great reduction!

It seems, that the so-far remaining cases on 78 are all carrying WorkerDebuggeeRunnable::mSender messages, which did not go away as predicted by Yaron and for which we have not yet a clear understanding, what is causing them.

Assignee: ytausky → perry
See Also: → 1566718
Assignee: perry → nobody
See Also: → 1660950
Depends on: 1664386

Looking at the first new crashes coming in from beta 83 with enhanced reporting as of bug 1664386 I see:

Workers Hanging - 1|A:1|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender
Workers Hanging - 1|A:1|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender
Workers Hanging - 1|A:1|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender
Workers Hanging - 1|A:1|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender
Workers Hanging - 1|A:1|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender
Workers Hanging - 1|A:1|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender
Workers Hanging - 1|A:1|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender
Workers Hanging - 1|A:1|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender
Workers Hanging - 1|A:1|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender
Workers Hanging - 1|A:1|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender
Workers Hanging - 1|A:1|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender
Workers Hanging - 1|A:1|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender
Workers Hanging - 1|A:1|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender
Workers Hanging - 1|A:1|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender
Workers Hanging - 1|A:1|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender
Workers Hanging - 1|A:1|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender

It may be early to say this definitely, but it seems, that the original assumption that we have chrome workers blocking us, is false (if we can trust the result of IsChromeWorker()).

As of bug 1664386 comment 1, this means that:

a) the (single) shutdown timeout has been reached by the RunWatchdog (active only on the parent process)
b) the shutdown steps were completed (sShutdownNotified == true)
c) there is a worker associated to any domain which is still able to receive runnables (and to respond!)
d) the blocking worker is not (necessarily) a chrome worker

Asuth, Yaron, are we aware of any non-chrome worker that may run in the parent process?

Flags: needinfo?(ytausky)
Flags: needinfo?(bugmail)

I crashed with Thunderbird 90.0b3 on Mac during shutdown - not password related
bp-3ca8e432-e5b3-4c41-b9e3-2ba500210701
0 XUL mozilla::dom::workerinternals::RuntimeService::CrashIfHanging() dom/workers/RuntimeService.cpp:1708 context
1 XUL mozilla::(anonymous namespace)::RunWatchdog(void*) toolkit/components/terminator/nsTerminator.cpp:230 scan
2 libnss3.dylib _pt_root nsprpub/pr/src/pthreads/ptthread.c:201 scan
3 libsystem_pthread.dylib _pthread_start scan
4 libsystem_pthread.dylib thread_start scan

Whiteboard: [DWS_NEXT][stockwell unknown][tbird topcrash] → [DWS_NEXT][stockwell unknown][tbird topcrash],qa-not-actionable
Crash Signature: [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging] → [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging] [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging()]

The variant with () is not happening any more.

Crash Signature: [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging] [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging()] → [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging]

FWIW, the () now get removed on crash stats, so they'll never show up in signatures.

Crash Signature: [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging] → [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging] [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging()]

I filed bugbug issues on it reverting a change and on it generating junk signatures.
https://github.com/mozilla/bugbug/issues/2540
https://github.com/mozilla/bugbug/issues/2541

Crash Signature: [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging] [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging()] → [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging] [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging()]

IIRC the bot is just grabbing the signatures from the duplicate bugs.

I adjusted the signature in the other bug accordingly, too.

Crash Signature: [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging] [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging()] → [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging]

In this case it's not a bugbug-based change, so I moved the issues to the relman-auto-nag repository.

(In reply to Julien Cristau [:jcristau] from comment #93)

IIRC the bot is just grabbing the signatures from the duplicate bugs.

Yes, exactly. Let's discuss it in the issues.

I removed the () from the dupes, and filed a bug on the TreeHerder intermittent filer, as I think that's actually where the () in signatures are from. Sorry for my confusion! I forgot about the duplicate bug signature thing.

FWIW, in the most frequent case we still see any variation of recursion depth for:

Workers Hanging - 1|A:1|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender
Flags: needinfo?(bugmail)
Flags: needinfo?(ytausky) → needinfo?(jstutte)

Adjusting severity/priority based on the frequency.

Severity: critical → S3
Priority: P2 → P3

The severity field for this bug is set to S3. However, the bug has the topcrash keyword.
:jstutte, could you consider increasing the severity of this top-crash bug? If the crash isn't "top" anymore, could you drop the topcrash keyword?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jstutte)

This is a top-crash only for Thunderbird, it seems. There it spiked up from version 91.7, it seems.

I am not sure what this would mean then for our severity/priority here.

Flags: needinfo?(jstutte)
Keywords: topcrash

So looking a bit at some crashes, we (still) mostly seem to have a problem with the WorkerDebuggeeRunnable here.

Looking at https://searchfox.org/mozilla-central/rev/da6a85e615827d353e5ca0e05770d8d346b761a9/dom/workers/WorkerPrivate.h#1245 I am wondering if we just never get a chance to execute the debugee runnable, being this a throttled event queue that targets the main thread (which is probably busy all the time during shutdown).

I am wondering if it is really a good idea to carry away a ThreadSafeWorkerRef here (which is a StrongWorkerRef) and if it wouldn't be better to downgrade this to a WeakWorkerRef ?

Flags: needinfo?(echuang)

(In reply to Jens Stutte [:jstutte] from comment #100)

This is a top-crash only for Thunderbird, it seems. There it spiked up from version 91.7, it seems.

Indeed it is #3 crash for Thunderbird. I had marked up bug 1435961 for this.

The spike is false - it is the result of Thunderbird not having crash reporting on crash-stats from Nov 2021 to April 2022.

I am not sure what this would mean then for our severity/priority here.

I am wondering if it is really a good idea to carry away a ThreadSafeWorkerRef here (which is a StrongWorkerRef) and if it wouldn't be better to downgrade this to a WeakWorkerRef ?

Any idea if this would help Thunderbird crashes?

Flags: needinfo?(jstutte)

The sole purpose of this mSender worker ref seems to be to keep the worker alive, as there is no real use of that variable. Downgrading it to a weak worker ref would be equivalent to removing it, at this point.

What is not clear to me is why we think we need to keep the worker alive? I assume we want to be sure to have still a living worker when we execute the runnable on the main thread (if we are on the worker thread, we are surely alive).

Looking at the sub-classes of WorkerDebuggeeRunnable it seems:

Instead:

I assume if we move to a weak worker ref, those runnables should check the worker ref before doing anything with the WorkerPrivate* ?

(In reply to Wayne Mery (:wsmwk) from comment #102)

Any idea if this would help Thunderbird crashes?

Well, the huge difference in numbers here seem to indicate, that Thunderbird's main thread loop is too busy to ever let the RefPtr<ThrottledEventQueue> mMainThreadDebuggeeEventTarget; event queue execute their events on the main thread. So apparently there is an issue on Thunderbird's side with being too busy on the main thread with whatever during worker shutdown.

But if we ensure that the worker can go away without harm before our WorkerDebuggeeRunnable ever executes on the main thread, that could definitely help, both Firefox & Thunderbird. But I need some more expertise from Eden here if this is not going to brake other things.

Flags: needinfo?(jstutte)
Assignee: nobody → jstutte
Attachment #9276716 - Attachment description: WIP: Bug 1435343: Use a weak worker reference for WorkerDebuggeeRunnable. → Bug 1435343: Use a weak worker reference for WorkerDebuggeeRunnable. r?#dom-worker-reviewers
Status: NEW → ASSIGNED

(In reply to Jens Stutte [:jstutte] from comment #104)

(In reply to Wayne Mery (:wsmwk) from comment #102)

Any idea if this would help Thunderbird crashes?

Well, the huge difference in numbers here seem to indicate, that Thunderbird's main thread loop is too busy to ever let the RefPtr<ThrottledEventQueue> mMainThreadDebuggeeEventTarget; event queue execute their events on the main thread. So apparently there is an issue on Thunderbird's side with being too busy on the main thread with whatever during worker shutdown.

But if we ensure that the worker can go away without harm before our WorkerDebuggeeRunnable ever executes on the main thread, that could definitely help, both Firefox & Thunderbird.

To be clear: The patch here would not make block the worker shutdown by those runnables. Still on Thunderbird something frequently seems to prevent those runnables from ever being executed in time. Whatever this is, it still might cause a different flavor of hang after this patch lands. I would thus not be too optimistic that this patch fights those hangs, but it might help to have better diagnostic.

Canceling the ni? as I asked for review on the patch.

Flags: needinfo?(echuang)

(In reply to Jens Stutte [:jstutte] from comment #103)

The sole purpose of this mSender worker ref seems to be to keep the worker alive, as there is no real use of that variable. Downgrading it to a weak worker ref would be equivalent to removing it, at this point.

What is not clear to me is why we think we need to keep the worker alive? I assume we want to be sure to have still a living worker when we execute the runnable on the main thread (if we are on the worker thread, we are surely alive).

Looking at the sub-classes of WorkerDebuggeeRunnable it seems:

Instead:

I assume if we move to a weak worker ref, those runnables should check the worker ref before doing anything with the WorkerPrivate* ?

So I fear things are a bit more complicated at least for the MessageEventRunnable. If a worker dispatches this kind of event, we must make sure it arrives somewhere. It seems as if keeping the worker alive was kind of a trick to ensure this. Actually I think the MessageEventRunnable should extract all needed information from the worker on dispatch in order to be able to deliver the event even when the worker ended?

Flags: needinfo?(echuang)
Depends on: 1769913

Comment on attachment 9276716 [details]
Bug 1435343: Use a weak worker reference for WorkerDebuggeeRunnable. r?#dom-worker-reviewers

Revision D146447 was moved to bug 1769913. Setting attachment 9276716 [details] to obsolete.

Attachment #9276716 - Attachment is obsolete: true

Moved investigation to bug 1769913.

Flags: needinfo?(echuang)

(In reply to Jens Stutte [:jstutte] from comment #107)

(In reply to Jens Stutte [:jstutte] from comment #103)

The sole purpose of this mSender worker ref seems to be to keep the worker alive, as there is no real use of that variable. Downgrading it to a weak worker ref would be equivalent to removing it, at this point.

What is not clear to me is why we think we need to keep the worker alive? I assume we want to be sure to have still a living worker when we execute the runnable on the main thread (if we are on the worker thread, we are surely alive).

Looking at the sub-classes of WorkerDebuggeeRunnable it seems:

Instead:

I assume if we move to a weak worker ref, those runnables should check the worker ref before doing anything with the WorkerPrivate* ?

So I fear things are a bit more complicated at least for the MessageEventRunnable. If a worker dispatches this kind of event, we must make sure it arrives somewhere. It seems as if keeping the worker alive was kind of a trick to ensure this. Actually I think the MessageEventRunnable should extract all needed information from the worker on dispatch in order to be able to deliver the event even when the worker ended?

Maybe having some sort of interim "check-in" thread between those two would work. Just thinking out loud here. It seems there are a lot of things happening, and they need to happen in a more organized manner overall.

(In reply to Worcester12345 from comment #110)

Maybe having some sort of interim "check-in" thread between those two would work. Just thinking out loud here. It seems there are a lot of things happening, and they need to happen in a more organized manner overall.

Not sure I get the idea here.

(In reply to Intermittent Failures Robot from comment #113)

For more details, see:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?bug=1435343&startday=2022-09-05&endday=2022-09-11&tree=all

The things I see here here do not really seem to be related to the original bug?

TB 102.3.2 (32bit) on Windows 10 64bit cashed after first closing TB main window and after that sending composed mail from open mail-editor window. (Master-Password active)

bp-7e21d996-0b93-4f56-8904-742c00221013
Thunderbird 102.3.2 Crash Report [@ mozilla::dom::workerinternals::RuntimeService::CrashIfHanging ]

MOZ_CRASH Reason (Sanitized)

Workers Hanging - 1|A:6|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender

Crashing Thread (40), Name: Shutdown Hang Terminator
Frame 	Module 	Signature 	Source 	Trust
0 	xul.dll 	mozilla::dom::workerinternals::RuntimeService::CrashIfHanging() 	dom/workers/RuntimeService.cpp:1603 	context
1 	xul.dll 	mozilla::`anonymous namespace'::RunWatchdog(void*) 	toolkit/components/terminator/nsTerminator.cpp:232 	cfi
2 	nss3.dll 	_PR_NativeRunThread(void*) 	nsprpub/pr/src/threads/combined/pruthr.c:399 	cfi
3 	nss3.dll 	pr_root(void*) 	nsprpub/pr/src/md/windows/w95thred.c:139 	cfi
4 	ucrtbase.dll 	thread_start<unsigned int (__stdcall*)(void*), 1> 		cfi
5 	kernel32.dll 	BaseThreadInitThunk 		cfi
6 	mozglue.dll 	patched_BaseThreadInitThunk(int, void*, void*) 	toolkit/xre/dllservices/mozglue/WindowsDllBlocklist.cpp:572 	cfi
7 	ntdll.dll 	__RtlUserThreadStart 		cfi
8 	ntdll.dll 	_RtlUserThreadStart 		cfi

The remaining intermittent failure instances seem unrelated to workers. See bug 1805147, for example.

See Also: → 1811136
Depends on: 1823391

Is https://crash-stats.mozilla.org/report/index/6ec67d36-0c11-4739-a2e4-6025b0230322 an example** of this bug, where the Thunderbird user was locked out by something password related, perhaps too many failed password attempts.

**Crash reason is listed as
Workers Hanging - 1|A:2|S:0|Q:0-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender-BC:1IsChromeWorker(false)|WorkerDebuggeeRunnable::mSender|WorkerDebuggeeRunnable::mSender

Also https://crash-stats.mozilla.org/report/index/fe8b8258-7db3-4c48-9244-d9e3c0230323

Or should we be filing these as Thunderbird bugs?

Flags: needinfo?(jstutte)

(In reply to Wayne Mery (:wsmwk) from comment #137)

Is https://crash-stats.mozilla.org/report/index/6ec67d36-0c11-4739-a2e4-6025b0230322 an example** of this bug, where the Thunderbird user was locked out by something password related, perhaps too many failed password attempts.

Also https://crash-stats.mozilla.org/report/index/fe8b8258-7db3-4c48-9244-d9e3c0230323

Or should we be filing these as Thunderbird bugs?

Meta for a sec: This bug exists primarily to track the Firefox shutdown hangs attributed to workers by way of the associated crash signature. These hangs are potentially attributable to:

  1. Bugs in the core worker implementation. (Frequently technical debt related.)
  2. Bugs in Web APIs exposed on workers.
  3. Bugs in system JS code using workers, potentially involving a failure to pay attention to shutdown phases.
  4. Bugs in system code related to shutdown phases, such as failing to generate appropriate shutdown phases or tear down content globals, etc.

For the specific crashes you identify above, it's very likely that bug 1800659 will address the (type 1) problem. Unfortunately, that's only going to be landing in v116 and it's unlikely we'll be able to uplift[1], which is very unfortunate for Thunderbird's model of building against ESR (and the Firefox ESR itself).

Because of Thunderbird building against ESR and because I think it's potentially difficult to distinguish type 3 and type 4 problems that are specific to Thunderbird until after performing a potentially detailed investigation, it's likely appropriate to file distinct Thunderbird bugs which can be marked as depending on platform bugs as appropriate.

For TB built against m-c trunk/beta/release the calculus changes a bit because type 1 and type 2 problems are more likely to be timely. However, I think it could still make sense to file distinct TB bugs because type 3 and type 4 factors are still potentially so significant and in the event we get users commenting on the bug, the potential for confusion goes up. Also, this simplifies bug prioritization since the product impacts may vary.

For the filed TB bugs where the TB team would like input from the workers team the best practices would probably be:

  • Try and make sure the bug is shovel ready by including:
    • If available, the Workers Hanging string (as you've done above, thank you!). This should be available in crash reports (as protected data that is okay to report in bugs because we explicitly do not include any origin data, although any propagating the information should of course confirm there is nothing potentially identifying before pasting), and in debug builds where MOZ_ReportCrash does a printf.
    • If stdout/wherever MOZ_LOG would go if enabled (which may be to MOZ_LOG_FILE) is available, any of the worker state information added in https://phabricator.services.mozilla.com/D173430 which is automatically emitted as MOZ_LOG category "WorkerShutdownDump" which gets temporarily force-enabled. The output looks like https://bugzilla.mozilla.org/show_bug.cgi?id=1805613#c83
    • Links to any Thunderbird documentation about:
      • Its shutdown phases for content and system/app logic
      • Its use of workers for system/app logic. This should also include mention of any subsystems that might have previously been used in Firefox and m-c but which are no longer used (or maybe even present) in Firefox/m-c but have been forked into TB, etc.
  • Any context about extensions indicated by the crash-stats which might use the TB extension experiments mechanism that allows extensions to do all the legacy add-on stuff that Firefox is able to assume is no longer possible. My concern here would be add-ons that are creating workers and are not aware of shutdown phases as opposed to things the add-on would be doing in the worker since XPConnect is not exposed to workers and "ctypes" usage should show up in crash stacks (if active at the time of the crash).
  • Ask about the bug in the Workers & Storage chat.mozilla.org channel, pasting the bug link there. The rationale is that there really isn't 1 right person to answer questions about the factors that might contribute to shutdown hangs, especially as type 2 issues will be something that the core worker peers won't necessarily be directly aware of.
    • That said, it might make sense for the TB team to designate a dev as the "worker liaison"/similar and so anyone triaging TB crashes/bugs in this space could needinfo the relevant TB dev.

1: It's likely that the bug 1800659 fixes will be a massive improvement for these specific hangs and would be appropriate for uplift on its own, but it also:

  • Represents a major shift in worker behavior that potentially will result in a number of fixes in other components and this would increase uplift risk because those fixes may result in their own cascade of fixes which could intertwine with new functionality, etc.
  • Is expected to be followed-up by a number of other worker technical debt paydown refactorings for which it's also not clear we could uplift. And arguably it would be better to leave ESR in the pre-bug 1800659 state that has been the equilibrium state for a long time rather than having ESR have a temporary intermediate equilibrium that might only exist in Fx116 or maybe not exist in any shipped Firefox if we land more refactorings in 116 that we definitely don't want to risk uplifting (I have a few of these...).
Flags: needinfo?(jstutte)
Blocks: 1843744
Depends on: 1800659

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #138)

...
1: It's likely that the bug 1800659 fixes will be a massive improvement for these specific hangs and would be appropriate for uplift on its own, but it also:

  • Represents a major shift in worker behavior that potentially will result in a number of fixes in other components and this would increase uplift risk because those fixes may result in their own cascade of fixes which could intertwine with new functionality, etc.
  • Is expected to be followed-up by a number of other worker technical debt paydown refactorings for which it's also not clear we could uplift. And arguably it would be better to leave ESR in the pre-bug 1800659 state that has been the equilibrium state for a long time rather than having ESR have a temporary intermediate equilibrium that might only exist in Fx116 or maybe not exist in any shipped Firefox if we land more refactorings in 116 that we definitely don't want to risk uplifting (I have a few of these...).

Thanks for that info. So indeed bug 1800659 is on version 116 and not back ported to esr115

Removing the regression keyword, this is 6 years old and reasons for hangs are manifold and might have changed over time, anyways.

Keywords: regression

(In reply to Wayne Mery (:wsmwk) from comment #141)

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #138)

...
1: It's likely that the bug 1800659 fixes will be a massive improvement for these specific hangs and would be appropriate for uplift on its own, ...

Thanks for that info. So indeed bug 1800659 is on version 116 and not back ported to esr115

From the numbers from Firefox I cannot really see any improvement here for >=116.

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 20 desktop browser crashes on beta

For more information, please visit BugBot documentation.

Keywords: topcrash

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: