Service workers delaying shutdown
Categories
(Core :: DOM: Service Workers, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | affected |
firefox85 | --- | wontfix |
firefox86 | --- | wontfix |
firefox89 | --- | wontfix |
firefox90 | --- | wontfix |
firefox96 | --- | wontfix |
firefox97 | --- | wontfix |
firefox98 | --- | wontfix |
firefox104 | --- | wontfix |
firefox105 | --- | wontfix |
firefox109 | --- | wontfix |
firefox110 | --- | wontfix |
firefox111 | --- | wontfix |
firefox126 | --- | wontfix |
firefox127 | --- | fix-optional |
firefox128 | --- | affected |
People
(Reporter: ytausky, Assigned: asuth)
References
(Depends on 1 open bug)
Details
Crash Data
After the nasty deadlock in bug 1588152 was resolved, it turns out that there are still reports of service workers delaying the shutdown. The top 3 async shutdown timeout reasons, accounting for 60% of the crashes, are:
- {"shutdownStates":"parent process IPDL background thread, ","pendingPromises":1,"acceptingPromises":false}
- {"shutdownStates":"","pendingPromises":0,"acceptingPromises":true}
- {"shutdownStates":"content process main thread, ","pendingPromises":1,"acceptingPromises":false}
Reporter | ||
Updated•4 years ago
|
Comment 1•4 years ago
|
||
(In reply to Yaron Tausky [:ytausky] from comment #0)
- {"shutdownStates":"content process main thread, ","pendingPromises":1,"acceptingPromises":false}
I just encountered a crash today that falls in this category: https://crash-stats.mozilla.org/report/index/bac8b5b8-29da-487f-a42d-71cf40201104#tab-metadata
I have semi-reliable steps to reproduce. When I do startup profiling on Windows (start with MOZ_PROFILER_STARTUP=1 in the environment), sometimes when I attempt to collect the profile, the profiler tab doesn't load, and remain stuck with the animated loading indicator, and the profiler.firefox.com domain name as the tab title, but the tab doesn't actually load. The profiler website uses a service worker. When a profiler tab fails to load like this, Firefox shutdown never completes cleanly.
On a slow test laptop, I reproduce quite consistently when doing a cold startup profile. On my devlopment machine I could reproduce about half the time with my local build (so I might be able to test things there).
Updated•4 years ago
|
I woke my laptop, immediately updated Nightly and after ~2-3 minutes of the updater running in the background I got this crash:
https://crash-stats.mozilla.org/report/index/0c366038-3f4a-42d8-bdbe-181e70201116
Comment 3•4 years ago
|
||
¡Hola!
Build ID 20201211213049 crashed here:
https://crash-stats.mozilla.org/report/index/7bb232a4-9947-4e39-acdf-a71050201216#tab-details
Updated flags per the crash data available.
¡Gracias!
Alex
Comment 4•4 years ago
|
||
¡Hola y'all!
Build ID 20210429214231 crashed here:
https://crash-stats.mozilla.org/report/index/56c30794-a886-45dd-9af5-b00c20210430#tab-details
Updated flags per the crash data available:
¡Gracias!
Alex
Updated•3 years ago
|
Comment 5•3 years ago
|
||
¡Hola y'all!
Build ID 20220129091708 crashed here:
https://crash-stats.mozilla.org/report/index/56c30794-a886-45dd-9af5-b00c20210430#tab-details
Updated flags per the crash data available:
¡Gracias!
Alex
Comment 6•3 years ago
|
||
This signature started to get crash reports again at the end of last week (10-20 crash reports per Nighty). Regression range:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=720d5125a9b4aa6750806ed9b51fbd0811da10c4&tochange=59134b451eec1454be5f2e489176b4b59f1ddb5a
Yulia, could bug 1742438 be related?
Comment 7•3 years ago
|
||
Yes, possibly. Hopefully I have a fix for this regression already.
Updated•3 years ago
|
Comment 8•3 years ago
|
||
The bug is marked as tracked for firefox104 (nightly). However, the bug still has low severity.
:jstutte, could you please increase the severity for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit auto_nag documentation.
Comment 9•3 years ago
|
||
(In reply to Yulia Startsev [:yulia] from comment #7)
Yes, possibly. Hopefully I have a fix for this regression already.
Yulia, could you please link the bug with the fix here?
Comment 10•2 years ago
|
||
(discussed with jens offline)
I took a look, and this appears to still be happening. I believe this is due to the cache promise cleanup potentially sometimes being late. it is similar to the points made by Yaron above a few years ago, but now that it is done independently of the ScriptLoader shutdown it may be lagging or failing. I'll investigate.
Comment 11•2 years ago
|
||
I think I found the underlying issue. Cancellation was changed into "informing" rather than triggering a kill of the worker. However, it is possible for the worker to be killed without going through cancellation, which means we don't do clean up. I am looking in to how to make this always go through the cancellation path / do cleanup.
Comment 12•2 years ago
|
||
(removing myself as assignee as Yulia seems to be on top of it)
Updated•2 years ago
|
Comment 13•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 desktop browser crashes on nightly
:yulia, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 14•2 years ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit auto_nag documentation.
Comment 15•2 years ago
|
||
I am still tracking this bug, as there is a bit more work to do regarding the cancellation path. Will update once I am finished.
Comment 16•2 years ago
|
||
I've reviewed the crashes in the last week, and they do not appear to be related to the ScriptLoader shutdown any more, they appear to be hanging in the JS interpreter, which looks unrelated.
Comment 17•2 years ago
|
||
Indeed we are back to normal here. Unassigning Yulia but we keep it open for further monitoring.
Comment 18•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 desktop browser crashes on nightly
:jmarshall, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Updated•2 years ago
|
Comment 19•2 years ago
|
||
:jstutte this has started spiking in recent nightly builds.
Could someone take a look, is it related to Bug 1812490?
Updated•2 years ago
|
Updated•2 years ago
|
Comment 20•2 years ago
|
||
The spike seemed to start with landing of bug 1811195, which has a remote worker related part. Then we landed the fix for bug 1812490 and numbers went down a lot, but seem to be still significantly higher than before. My assumption would be, that one of the two conditions we changed in bug 1811195 is not triggered anymore since bug 1812490 landed, but the other is?
:asuth, could it be that the bail out in BackgroundParentImpl::AllocPRemoteWorkerControllerParent
is doing more harm than good?
Comment 21•2 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #20)
:asuth, could it be that the bail out in
BackgroundParentImpl::AllocPRemoteWorkerControllerParent
is doing more harm than good?
From here that looks very likely.
Updated•2 years ago
|
Assignee | ||
Comment 22•2 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #20)
:asuth, could it be that the bail out in
BackgroundParentImpl::AllocPRemoteWorkerControllerParent
is doing more harm than good?
Yeah, I think this would probably break the state machine and so removing it again in https://phabricator.services.mozilla.com/D168264 makes sense.
Comment 23•2 years ago
|
||
This is a reminder regarding comment #8!
The bug is marked as tracked for firefox111 (nightly). We have limited time to fix this, the soft freeze is in 9 days. However, the bug still isn't assigned.
Comment 24•2 years ago
|
||
(In reply to Release mgmt bot [:suhaib / :marco/ :calixte] from comment #23)
This is a reminder regarding comment #8!
The bug is marked as tracked for firefox111 (nightly). We have limited time to fix this, the soft freeze is in 9 days. However, the bug still isn't assigned.
We do not yet see data after bug 1813559 landed (which is tracked, too), the hope would be that we can untrack this bug again after that landed.
Comment 25•2 years ago
|
||
This is a reminder regarding comment #8!
The bug is marked as tracked for firefox111 (nightly). We have limited time to fix this, the soft freeze is in 8 days. However, the bug still isn't assigned.
Comment 26•2 years ago
|
||
Removing tracking for 111, the crash volume dropped after Bug 1813559 landed.
Comment 27•2 years ago
|
||
I can reproduce this bug by opening all my 40+ bookmarks and closing the browser before all pages are loaded. Sometimes it happens and sometimes it doesn’t, but most of the time I get the same crash.
STR:
1 - Open Firefox Nightly 102.0a1
2 - Right-click bookmark bar -> Open all bookmarks
3 - Close the browser right after opening all the bookmarks without waiting for them to finish loading
Results: Firefox closes most of the processes, but a few particular processes remain open for a while until I get the crash notification.
Expected results: Firefox should close all the tabs.
Using Windows 11
Crash report: https://crash-stats.mozilla.org/report/index/205110cc-31e3-4c5c-ab0d-2df8a0230221
Comment hidden (off-topic) |
Comment hidden (off-topic) |
Comment hidden (off-topic) |
Comment hidden (off-topic) |
Comment 32•2 years ago
|
||
Could it be that the bot is getting confused around merge day about how to query the nightly topcrash criteria (similar to https://github.com/mozilla/bugbug/issues/351) ?
Comment 33•2 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #32)
Could it be that the bot is getting confused around merge day about how to query the nightly topcrash criteria
Thank you for reporting this! The bot account for the release day and includes the previous version in the first week after the release date. However, it seems that contradictions in the versions on the release day caused this. I filed an issue to follow up on this: https://github.com/mozilla/bugbot/issues/2100
(similar to https://github.com/mozilla/bugbug/issues/351) ?
I guess you mean https://github.com/mozilla/bugbug/issues/3516
Comment hidden (off-topic) |
Comment hidden (off-topic) |
Comment hidden (off-topic) |
Comment 37•1 years ago
|
||
(In reply to Suhaib Mujahid [:suhaib] from comment #33)
(In reply to Jens Stutte [:jstutte] from comment #32)
Could it be that the bot is getting confused around merge day about how to query the nightly topcrash criteria
Thank you for reporting this! The bot account for the release day and includes the previous version in the first week after the release date. However, it seems that contradictions in the versions on the release day caused this. I filed an issue to follow up on this: https://github.com/mozilla/bugbot/issues/2100
It seems it is happening again.
Comment 38•1 years ago
|
||
:jstutte I will prioritize working on #2100 to prevent this from happening again.
Comment hidden (off-topic) |
Comment 40•1 years ago
|
||
"Perf" key word?
Comment 41•1 year ago
|
||
This crash may fall into the same category :
https://crash-stats.mozilla.org/report/index/3e31cb79-abdb-46e7-83da-0f8680230819#tab-bugzilla
I see Signature: [@ AsyncShutdownTimeout | profile-change-teardown | ServiceWorkerShutdownBlocker: shutting down Service Workers ]
Comment hidden (off-topic) |
Comment hidden (off-topic) |
Comment hidden (off-topic) |
Assignee | ||
Updated•1 year ago
|
Comment 45•8 months ago
|
||
This seems to consistently be a topcrash in Nightly.
Description
•