Crash in AsyncShutdownTimeout | Places Clients shutdown | sanitize.js: Sanitize on shutdown

RESOLVED FIXED in mozilla67

Status

()

defect
P1
critical
RESOLVED FIXED
4 months ago
14 days ago

People

(Reporter: philipp, Assigned: baku)

Tracking

({crash, regression})

65 Branch
mozilla67
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 wontfix, firefox65 wontfix, firefox66 wontfix, firefox67 wontfix, firefox68- affected)

Details

(crash signature)

Attachments

(5 attachments)

(Reporter)

Description

4 months ago

This bug is for crash report bp-3344acb0-afe3-4040-b8ff-2d9eb0190131.

AsyncShutdownTimeout 	

{

    "phase":"Places Clients shutdown",
    "conditions":[
        {
            "name":"sanitize.js: Sanitize on shutdown",
            "state":{
                "progress":{
                    "isShutdown":true
                }
            },
            "filename":"resource:///modules/Sanitizer.jsm",
            "lineNumber":141,
            "stack":[
                "resource:///modules/Sanitizer.jsm:onStartup:141",
                "jar:file:///C:/Program%20Files/Mozilla%20Firefox/browser/omni.ja!/components/nsBrowserGlue.js:BG__onWindowsRestored:1557",
                "jar:file:///C:/Program%20Files/Mozilla%20Firefox/browser/omni.ja!/components/nsBrowserGlue.js:BG_observe:743",
                "resource:///modules/sessionstore/SessionStore.jsm:initializeWindow:1177",
                "resource:///modules/sessionstore/SessionStore.jsm:onBeforeBrowserWindowShown/<:1329"
            ]
        }
    ]

}

crash reports on shutdown in [@ AsyncShutdownTimeout | Places Clients shutdown | sanitize.js: Sanitize on shutdown] are spiking up in firefox 65 (in 64.0.2 they accounted for 0.9% of browser crashes, currently in 65.0 they are 4.9% of browser crashes).

a number of user comments are reporting they consistently get these crashes when they close the browser after the update to firefox 65.

we already had seen a spike of those reports during the 65.0a1 nightly cycle, originally caused by bug 1505071 - bug 1507171 has taken care of most of the regressing crashes but perhaps there are still some rarer cases that aren't covered yet and might lead to these AsyncShutdownTimeouts...

Hm, I'm not very experienced with debugging AsyncShutdownTimeouts, mak, do you know how we can look into this?

Flags: needinfo?(mak77)

This the #1 overall top crash on 65 at the moment.

Unfortunately a lot of the Async shutdown timeout metadata are missing important information, did we change something recently around this code?

This is what I'd expect to see:

{"phase":"Places Clients shutdown","conditions":[{"name":"sanitize.js: Sanitize on shutdown","state":{"progress":{"isShutdown":true,"cache":"cleared","cookies":"cleared","history":"blocking","formdata":"blocking","downloads":"blocking","sessions":"blocking"}},"filename":"resource:///modules/Sanitizer.jsm","lineNumber":141,"stack":["resource:///modules/Sanitizer.jsm:onStartup:141","jar:file:///C:/Program%20Files%20(x86)/Mozilla%20Firefox/browser/omni.ja!/components/nsBrowserGlue.js:BG__onWindowsRestored:1557","jar:file:///C:/Program%20Files%20(x86)/Mozilla%20Firefox/browser/omni.ja!/components/nsBrowserGlue.js:BG_observe:743","resource:///modules/sessionstore/SessionStore.jsm:initializeWindow:1177","resource:///modules/sessionstore/SessionStore.jsm:onBeforeBrowserWindowShown/<:1329"]}]}

As you can see it is clearly showing a few cleaners blocking us, it shows our progress.

This is what I see in most reports:

{"phase":"Places Clients shutdown","conditions":[{"name":"sanitize.js: Sanitize on shutdown","state":{"progress":{"isShutdown":true}},"filename":"resource:///modules/Sanitizer.jsm","lineNumber":141,"stack":["resource:///modules/Sanitizer.jsm:onStartup:141","jar:file:///C:/Program%20Files/Mozilla%20Firefox/browser/omni.ja!/components/nsBrowserGlue.js:BG__onWindowsRestored:1557","jar:file:///C:/Program%20Files/Mozilla%20Firefox/browser/omni.ja!/components/nsBrowserGlue.js:BG_observe:743","resource:///modules/sessionstore/SessionStore.jsm:initializeWindow:1177","resource:///modules/sessionstore/SessionStore.jsm:onBeforeBrowserWindowShown/<:1329"]}]}

And this is not useful, because it doesn't report any debugging info. There's no progress indication, like we stopped earlier.

The first thing would be to check if some recent changes have touched this code and eventually made the reported metadata less useful.

(Reporter)

Comment 4

3 months ago

affected users are reporting that they can fix the problem by manually clearing "offline website data" once:
http://forums.mozillazine.org/viewtopic.php?p=14822808
https://support.mozilla.org/en-US/questions/1249036

It could make sense, 65 made lots of changes to clearing site data, iirc.

Flags: needinfo?(mak77)

So, this code, and all the following code, seems to completely skip any async shtudown progress marking:
https://searchfox.org/mozilla-central/rev/152993fa346c8fd9296e4cd6622234a664f53341/browser/modules/Sanitizer.jsm#685
We should be investigating that code, and also add progress metadata to the progress object as this code runs

Flags: needinfo?(amarchesini)

If I should guess a bug, I'd probably say bug 1505071

Blocks: 1505071

FYI, I'm planning a 65.0.1 release likely early next week. I'd be happy to take a patch which improves these reports in it.

Following up in email as the crash volume is high and I'd like to get a patch into 66.

(Assignee)

Comment 10

3 months ago

Philipp and I have been discussed this bug a lot. Philipp is in contact with a volunteer who is able to reproduce this crash consistently. The volunteer (sorry, I don't know the name) shared their profile folder with us and I'm debugging the crash locally. Unfortunately I'm not able to reproduce it, yet.

I'm taking this bug.

Assignee: nobody → amarchesini
Flags: needinfo?(amarchesini)
(Assignee)

Comment 12

3 months ago

I just submitted a diagnostic patch to have more log. Let's keep this bug open.

Keywords: leave-open

Comment 13

3 months ago
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b1ac22673987
Diagnostic messages on Sanitizer.jsm, r=mak

Comment 15

3 months ago

Looking at my crash log submissions, I see this has been happening since 11 Oct 2018. It happens every time I shut down.I have disabled all plugins and it still happens.Please email me if I can upload any files that might help you.

(Assignee)

Updated

3 months ago
Keywords: leave-open

Comment 17

3 months ago
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/25397a6f8c4f
Optimize the comparison of principals in Sanitizer.jsm, r=johannh

Comment 18

3 months ago
bugherder
Status: NEW → RESOLVED
Last Resolved: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67

There are 9 crashes in nightly 20190221215439 so after the patch landed.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

All the crashes report this progress:
{"isShutdown":true,"advancement":"get-principals","step":"principals-quota-manager"}

(Reporter)

Comment 21

3 months ago

unfortunately it looks like the volume of crash reports got worse as well...

(Assignee)

Comment 23

3 months ago

I add some additional debugging messages. I suspect it's QM that doesn't resolve the callback.
With my previous patch getAllPrincipals() is always executed and that could be the reason of the increasing of crashes.
I'm going to submit a patch to optimize this as well.

unfortunately it looks like the volume of crash reports got worse as well...

(Assignee)

Comment 24

3 months ago

Let's keep this bug open. I'm going to land some debugging code.

Keywords: leave-open

Comment 26

3 months ago
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/81cc693e6417
Introduce additional debugging messages to Sanitizer.jsm, r=mak
Priority: -- → P1

Comment 28

3 months ago
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b7ea35510cf1
Sanitizer.jsm should retrieve the list of nsIPrincipal objects with site data only when needed, r=johannh
(Assignee)

Comment 30

3 months ago

I can confirm that the problem is QM not executing the getUsage() callback during the shutdown:
https://searchfox.org/mozilla-central/rev/dbddac86aadf1d4871fb350bbe66db43728a9f81/browser/modules/Sanitizer.jsm#752-753

Depends on: 1529301

Too late to fix this in 65 at this point with 66 shipping in less than 3 weeks.

:baku - any progress on this one?

Flags: needinfo?(amarchesini)
(Assignee)

Comment 33

2 months ago

Bug 1529301 is fixing this issue.

Flags: needinfo?(amarchesini)

I don't think that will land in time for the 66 release but I'll keep tracking this in case it is or in case it ends up being upliftable for a dot release.

baku/Tom, bug 1529301 is fixed but we're still seeing crashes in recent nightly builds. Can you take another look?

Flags: needinfo?(shes050117)
Flags: needinfo?(amarchesini)

Looking into the tens recent reports, the steps in the metadata of them are:

  • principals-quota-manager
  • promises:162
  • promises:1
  • principals-quota-manager
  • sanitized
  • sanitized
  • promises:12
  • sanitized
  • sanitized
  • promises:1

Looks like only two of them are crashing at getting usage/origin. Perhaps, we could get the duration of each step to know which one consumes more than expected?

(Assignee)

Comment 37

2 months ago

It seems that we are crashing here:
https://searchfox.org/mozilla-central/rev/ddd1679c0534f7ddf36cafddd17b710c4fefe3c4/browser/modules/Sanitizer.jsm#905

I'm saying this because the 'promises:X' state is set after the populating of the promises list here:
https://searchfox.org/mozilla-central/rev/ddd1679c0534f7ddf36cafddd17b710c4fefe3c4/browser/modules/Sanitizer.jsm#851

And at that point the cleaning has already started, but not completed yet. The same happens for 'sanitized' step, which comes from:
https://searchfox.org/mozilla-central/rev/ddd1679c0534f7ddf36cafddd17b710c4fefe3c4/browser/modules/Sanitizer.jsm#910

in scenarios where we have probably more promises to resolve (more cleaning to be done).

When we sanitize site data, we use these flags (expanding CLEAR_DOM_STORAGES):
CLEAR_APPCACHE | CLEAR_DOM_QUOTA | CLEAR_DOM_PUSH_NOTIFICATIONS | CLEAR_REPORTS | CLEAR_COOKIES

Maybe one of these cleaning takes too much time.

To be sure about this, I would like to land a patch that sets an extra value in 'progress' data.

Flags: needinfo?(amarchesini)
Flags: needinfo?(shes050117)

Comment 39

2 months ago
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a1d82278bf83
Add more debugging message to Sanitizer.jsm, r=mak
(Assignee)

Comment 41

a month ago

I spent quite some time reading the crash reports with the latest debugging messages.
I can confirm my assumption: 90% of the crashes happens because the cleanup takes too long.

We can try to optimize it more, but:

  1. we have crash-reports coming from profiles with more than 1000 origins to clean up. Many of them are between 70 to 100 origins which are still a lot for slow machines.
  2. The cleanup involves a lot of I/O in a moment, the shutdown, where many other components are performing actions.

Probably, it's time to have a different approach. I suggest to introduce a cleanup dialog, to be shown after the closing of firefox. In this dialog we can have a progress bar, and maybe a list of operations that the browser is performing (cleaning X, cleaning Y,...)

This dialog would be shown only when needed: to the users who want to delete data on shutdown, or/and to the users who have set a custom session-only cookie permission for some origins. This dialog could be also used by other features such as 'forget about site' and 'clear data history'.

I have already discussed this proposal with johannh. I will organize a meeting to discuss it with more people.

I don't think we'll be able to do anything for 66, but I like the cleanup dialog idea!

[Tracking Requested - why for this release]:
we should probably track the improvement of cleanup dialog for 68.

Status: REOPENED → RESOLVED
Last Resolved: 3 months ago25 days ago
Resolution: --- → FIXED

(In reply to Neha Kochar [:neha] from comment #43)

[Tracking Requested - why for this release]:
we should probably track the improvement of cleanup dialog for 68.

That'd presumably be in a different bug.

Also, to be clear, we do not have the resources to come up with a cleanup dialog based solution in 68. We'll have a roadmap and resourcing discussion for data sanitization during the Whistler All-Hands. If you're interested in that I'm happy to invite you, Neha. :)

You need to log in before you can comment on or make changes to this bug.