Crash in AsyncShutdownTimeout | Places Clients shutdown | sanitize.js: Sanitize on shutdown
Categories
(Toolkit :: Data Sanitization, defect, P1)
Tracking
()
People
(Reporter: philipp, Assigned: baku)
References
Details
(Keywords: crash, regression)
Crash Data
Attachments
(5 files)
This bug is for crash report bp-3344acb0-afe3-4040-b8ff-2d9eb0190131.
AsyncShutdownTimeout
{
"phase":"Places Clients shutdown",
"conditions":[
{
"name":"sanitize.js: Sanitize on shutdown",
"state":{
"progress":{
"isShutdown":true
}
},
"filename":"resource:///modules/Sanitizer.jsm",
"lineNumber":141,
"stack":[
"resource:///modules/Sanitizer.jsm:onStartup:141",
"jar:file:///C:/Program%20Files/Mozilla%20Firefox/browser/omni.ja!/components/nsBrowserGlue.js:BG__onWindowsRestored:1557",
"jar:file:///C:/Program%20Files/Mozilla%20Firefox/browser/omni.ja!/components/nsBrowserGlue.js:BG_observe:743",
"resource:///modules/sessionstore/SessionStore.jsm:initializeWindow:1177",
"resource:///modules/sessionstore/SessionStore.jsm:onBeforeBrowserWindowShown/<:1329"
]
}
]
}
crash reports on shutdown in [@ AsyncShutdownTimeout | Places Clients shutdown | sanitize.js: Sanitize on shutdown] are spiking up in firefox 65 (in 64.0.2 they accounted for 0.9% of browser crashes, currently in 65.0 they are 4.9% of browser crashes).
a number of user comments are reporting they consistently get these crashes when they close the browser after the update to firefox 65.
we already had seen a spike of those reports during the 65.0a1 nightly cycle, originally caused by bug 1505071 - bug 1507171 has taken care of most of the regressing crashes but perhaps there are still some rarer cases that aren't covered yet and might lead to these AsyncShutdownTimeouts...
Comment 1•5 years ago
|
||
Hm, I'm not very experienced with debugging AsyncShutdownTimeouts, mak, do you know how we can look into this?
Updated•5 years ago
|
Comment 2•5 years ago
|
||
This the #1 overall top crash on 65 at the moment.
Comment 3•5 years ago
|
||
Unfortunately a lot of the Async shutdown timeout metadata are missing important information, did we change something recently around this code?
This is what I'd expect to see:
{"phase":"Places Clients shutdown","conditions":[{"name":"sanitize.js: Sanitize on shutdown","state":{"progress":{"isShutdown":true,"cache":"cleared","cookies":"cleared","history":"blocking","formdata":"blocking","downloads":"blocking","sessions":"blocking"}},"filename":"resource:///modules/Sanitizer.jsm","lineNumber":141,"stack":["resource:///modules/Sanitizer.jsm:onStartup:141","jar:file:///C:/Program%20Files%20(x86)/Mozilla%20Firefox/browser/omni.ja!/components/nsBrowserGlue.js:BG__onWindowsRestored:1557","jar:file:///C:/Program%20Files%20(x86)/Mozilla%20Firefox/browser/omni.ja!/components/nsBrowserGlue.js:BG_observe:743","resource:///modules/sessionstore/SessionStore.jsm:initializeWindow:1177","resource:///modules/sessionstore/SessionStore.jsm:onBeforeBrowserWindowShown/<:1329"]}]}
As you can see it is clearly showing a few cleaners blocking us, it shows our progress.
This is what I see in most reports:
{"phase":"Places Clients shutdown","conditions":[{"name":"sanitize.js: Sanitize on shutdown","state":{"progress":{"isShutdown":true}},"filename":"resource:///modules/Sanitizer.jsm","lineNumber":141,"stack":["resource:///modules/Sanitizer.jsm:onStartup:141","jar:file:///C:/Program%20Files/Mozilla%20Firefox/browser/omni.ja!/components/nsBrowserGlue.js:BG__onWindowsRestored:1557","jar:file:///C:/Program%20Files/Mozilla%20Firefox/browser/omni.ja!/components/nsBrowserGlue.js:BG_observe:743","resource:///modules/sessionstore/SessionStore.jsm:initializeWindow:1177","resource:///modules/sessionstore/SessionStore.jsm:onBeforeBrowserWindowShown/<:1329"]}]}
And this is not useful, because it doesn't report any debugging info. There's no progress indication, like we stopped earlier.
The first thing would be to check if some recent changes have touched this code and eventually made the reported metadata less useful.
Reporter | ||
Comment 4•5 years ago
|
||
affected users are reporting that they can fix the problem by manually clearing "offline website data" once:
http://forums.mozillazine.org/viewtopic.php?p=14822808
https://support.mozilla.org/en-US/questions/1249036
Comment 5•5 years ago
|
||
It could make sense, 65 made lots of changes to clearing site data, iirc.
Comment 6•5 years ago
|
||
So, this code, and all the following code, seems to completely skip any async shtudown progress marking:
https://searchfox.org/mozilla-central/rev/152993fa346c8fd9296e4cd6622234a664f53341/browser/modules/Sanitizer.jsm#685
We should be investigating that code, and also add progress metadata to the progress object as this code runs
Updated•5 years ago
|
Comment 8•5 years ago
|
||
FYI, I'm planning a 65.0.1 release likely early next week. I'd be happy to take a patch which improves these reports in it.
Updated•5 years ago
|
Comment 9•5 years ago
|
||
Following up in email as the crash volume is high and I'd like to get a patch into 66.
Assignee | ||
Comment 10•5 years ago
|
||
Philipp and I have been discussed this bug a lot. Philipp is in contact with a volunteer who is able to reproduce this crash consistently. The volunteer (sorry, I don't know the name) shared their profile folder with us and I'm debugging the crash locally. Unfortunately I'm not able to reproduce it, yet.
I'm taking this bug.
Assignee | ||
Comment 11•5 years ago
|
||
Assignee | ||
Comment 12•5 years ago
|
||
I just submitted a diagnostic patch to have more log. Let's keep this bug open.
Comment 13•5 years ago
|
||
Pushed by amarchesini@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b1ac22673987 Diagnostic messages on Sanitizer.jsm, r=mak
Comment 14•5 years ago
|
||
bugherder |
Comment 15•5 years ago
|
||
Looking at my crash log submissions, I see this has been happening since 11 Oct 2018. It happens every time I shut down.I have disabled all plugins and it still happens.Please email me if I can upload any files that might help you.
Assignee | ||
Comment 16•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Comment 17•5 years ago
|
||
Pushed by amarchesini@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/25397a6f8c4f Optimize the comparison of principals in Sanitizer.jsm, r=johannh
Comment 18•5 years ago
|
||
bugherder |
Comment 19•5 years ago
|
||
There are 9 crashes in nightly 20190221215439 so after the patch landed.
Comment 20•5 years ago
|
||
All the crashes report this progress:
{"isShutdown":true,"advancement":"get-principals","step":"principals-quota-manager"}
Reporter | ||
Comment 21•5 years ago
|
||
unfortunately it looks like the volume of crash reports got worse as well...
Assignee | ||
Comment 22•5 years ago
|
||
Assignee | ||
Comment 23•5 years ago
|
||
I add some additional debugging messages. I suspect it's QM that doesn't resolve the callback.
With my previous patch getAllPrincipals() is always executed and that could be the reason of the increasing of crashes.
I'm going to submit a patch to optimize this as well.
unfortunately it looks like the volume of crash reports got worse as well...
Assignee | ||
Comment 24•5 years ago
|
||
Let's keep this bug open. I'm going to land some debugging code.
Assignee | ||
Comment 25•5 years ago
|
||
Depends on D20976
Comment 26•5 years ago
|
||
Pushed by amarchesini@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/81cc693e6417 Introduce additional debugging messages to Sanitizer.jsm, r=mak
Comment 27•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Comment 28•5 years ago
|
||
Pushed by amarchesini@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b7ea35510cf1 Sanitizer.jsm should retrieve the list of nsIPrincipal objects with site data only when needed, r=johannh
Comment 29•5 years ago
|
||
bugherder |
Assignee | ||
Comment 30•5 years ago
|
||
I can confirm that the problem is QM not executing the getUsage() callback during the shutdown:
https://searchfox.org/mozilla-central/rev/dbddac86aadf1d4871fb350bbe66db43728a9f81/browser/modules/Sanitizer.jsm#752-753
Comment 31•5 years ago
|
||
Too late to fix this in 65 at this point with 66 shipping in less than 3 weeks.
Comment 34•5 years ago
|
||
I don't think that will land in time for the 66 release but I'll keep tracking this in case it is or in case it ends up being upliftable for a dot release.
Comment 35•5 years ago
|
||
baku/Tom, bug 1529301 is fixed but we're still seeing crashes in recent nightly builds. Can you take another look?
Comment 36•5 years ago
|
||
Looking into the tens recent reports, the steps in the metadata of them are:
- principals-quota-manager
- promises:162
- promises:1
- principals-quota-manager
- sanitized
- sanitized
- promises:12
- sanitized
- sanitized
- promises:1
Looks like only two of them are crashing at getting usage/origin. Perhaps, we could get the duration of each step to know which one consumes more than expected?
Assignee | ||
Comment 37•5 years ago
|
||
It seems that we are crashing here:
https://searchfox.org/mozilla-central/rev/ddd1679c0534f7ddf36cafddd17b710c4fefe3c4/browser/modules/Sanitizer.jsm#905
I'm saying this because the 'promises:X' state is set after the populating of the promises list here:
https://searchfox.org/mozilla-central/rev/ddd1679c0534f7ddf36cafddd17b710c4fefe3c4/browser/modules/Sanitizer.jsm#851
And at that point the cleaning has already started, but not completed yet. The same happens for 'sanitized' step, which comes from:
https://searchfox.org/mozilla-central/rev/ddd1679c0534f7ddf36cafddd17b710c4fefe3c4/browser/modules/Sanitizer.jsm#910
in scenarios where we have probably more promises to resolve (more cleaning to be done).
When we sanitize site data, we use these flags (expanding CLEAR_DOM_STORAGES):
CLEAR_APPCACHE | CLEAR_DOM_QUOTA | CLEAR_DOM_PUSH_NOTIFICATIONS | CLEAR_REPORTS | CLEAR_COOKIES
Maybe one of these cleaning takes too much time.
To be sure about this, I would like to land a patch that sets an extra value in 'progress' data.
Assignee | ||
Comment 38•5 years ago
|
||
Updated•5 years ago
|
Comment 39•5 years ago
|
||
Pushed by amarchesini@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a1d82278bf83 Add more debugging message to Sanitizer.jsm, r=mak
Comment 40•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Assignee | ||
Comment 41•5 years ago
|
||
I spent quite some time reading the crash reports with the latest debugging messages.
I can confirm my assumption: 90% of the crashes happens because the cleanup takes too long.
We can try to optimize it more, but:
- we have crash-reports coming from profiles with more than 1000 origins to clean up. Many of them are between 70 to 100 origins which are still a lot for slow machines.
- The cleanup involves a lot of I/O in a moment, the shutdown, where many other components are performing actions.
Probably, it's time to have a different approach. I suggest to introduce a cleanup dialog, to be shown after the closing of firefox. In this dialog we can have a progress bar, and maybe a list of operations that the browser is performing (cleaning X, cleaning Y,...)
This dialog would be shown only when needed: to the users who want to delete data on shutdown, or/and to the users who have set a custom session-only cookie permission for some origins. This dialog could be also used by other features such as 'forget about site' and 'clear data history'.
I have already discussed this proposal with johannh. I will organize a meeting to discuss it with more people.
Comment 42•5 years ago
|
||
I don't think we'll be able to do anything for 66, but I like the cleanup dialog idea!
Comment 43•5 years ago
|
||
[Tracking Requested - why for this release]:
we should probably track the improvement of cleanup dialog for 68.
Updated•5 years ago
|
Comment 44•5 years ago
|
||
(In reply to Neha Kochar [:neha] from comment #43)
[Tracking Requested - why for this release]:
we should probably track the improvement of cleanup dialog for 68.
That'd presumably be in a different bug.
Comment 45•5 years ago
|
||
Also, to be clear, we do not have the resources to come up with a cleanup dialog based solution in 68. We'll have a roadmap and resourcing discussion for data sanitization during the Whistler All-Hands. If you're interested in that I'm happy to invite you, Neha. :)
Updated•5 years ago
|
Updated•5 years ago
|
Comment 46•5 years ago
•
|
||
I'm looking at the idea of the cleanup dialog (not as an implementer, but as the most likely reviewer), but it sounds like punching holes in a safety net, so I'm not a huge fan.
Description
•