Open Bug 1617559 Opened 4 years ago Updated 1 year ago

Crash Report [@ AsyncShutdownTimeout | profile-change-teardown | Places Clients shutdown ] - trying to startup Places late during profile-change-teardown

Categories

(Toolkit :: Async Tooling, defect, P3)

x86
Windows
defect

Tracking

()

People

(Reporter: mak, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

While looking at these crashes aggregations by Async Shutdown Timeout, I noticed quite a few crashes having PageThumbs.jsm in the blockers stack.
Almost a 50% of the crash reports point to this:

{"phase":"profile-change-teardown","conditions":[{"name":"Places Clients shutdown","state":{"progress":{},"Barrier":{}},"filename":"z:/task_1581871820/build/src/toolkit/components/places/Database.cpp","lineNumber":482,"stack":["resource://gre/modules/nsAsyncShutdown.jsm:addBlocker:162","resource://gre/modules/PlacesUtils.jsm:null:1915","resource://gre/modules/XPCOMUtils.jsm:get:129","resource://gre/modules/PageThumbs.jsm:PageThumbs_init:116","resource:///modules/BrowserGlue.jsm:BG__onFirstWindowLoaded:1568","resource:///modules/BrowserGlue.jsm:BG_observe:810","chrome://browser/content/browser.js:_delayedStartup:2086"]}]}

I suspect this code is trying to startup Places late, during profile-change-teardown, Places adds a profile-change-teardown blocker at this point that won't ever be able to remove because it's being added too late.
I wonder if we could use nsIAppStartup GetShuttingDown to avoid initing Places this late.

I also suspect that this check is involved:
https://searchfox.org/mozilla-central/rev/a37fc61f172b432e7ae0b6b4c4a12cac2a787a0f/toolkit/components/asyncshutdown/AsyncShutdown.jsm#693
Once a phase begins, it may be late to add new blockers, but we check this._waitForMe that is only cleared here https://searchfox.org/mozilla-central/rev/a37fc61f172b432e7ae0b6b4c4a12cac2a787a0f/toolkit/components/asyncshutdown/AsyncShutdown.jsm#897 once the phase is complete. Maybe we should check _isStarted instead?

It may be worth experimenting if changing those causes problems, because they could reduce the false positives (but real crashes) we see here.

Component: New Tab Page → Async Tooling
Product: Firefox → Toolkit

David, do you remember anything about this, and why we check _waitForMe instead of _isStarted? Do we actually want to allow consumers to add new blockers while we are waiting for existing blockers? May doing that be the cause for unremovable blockers (And thus finally a crash)?

Flags: needinfo?(dteller)
No longer depends on: 1394571

Apologies for the slow response, too much multi-tasking.

So, I think it should be simple to let the phase contain a boolean to inform clients that it's now too late to register blockers. Would this be sufficient to solve the problem?

Flags: needinfo?(dteller) → needinfo?(mak)

We already throw when it's too late to register blockers, but only when we move to the next phase. The problem I'm seeing is that when for example profile-change-teardown fires, async shutdown still allows consumers to register profile-change-teardown blockers.
Thus I was wondering whether we could make addBlocker throw as soon as a phase begins and not when we move to the next phase. That means trying to register a profile-change-teardown blocker after profile-change-teardown fired will throw.

To do the check we currently use _waitForMe (see comment 0), changing that check at line 693 with a check on _isStarted would implement my suggestion (we throw for new blockers of a phase as soon as that phase started), but I don't know if the original behavior was expected or even wanted.

Flags: needinfo?(mak) → needinfo?(dteller)

If my memory serves, the idea was that as long as a phase is not finished, we can still register blockers – and they should be handled correctly. This would let us handle annoying cases in which A's shutdown triggers B's shutdown, B's shutdown triggers C's shutdown, C's shutdown needs to block A's shutdown and doesn't know that B is already blocking it.

Of course, there are certainly other ways to handle the same situation. I think that what you suggest would work, it would just need a little refactoring of A to ensure that it registers the blockers earlier. Also, the situation is now simpler since add-ons don't have access to AsyncShutdown anymore.

In other words, from the top of my mind, I have no objection with your change, but as usual, testing needed :)

Flags: needinfo?(dteller) → needinfo?(mak)
Flags: needinfo?(mak)
Summary: Crash Report [@ AsyncShutdownTimeout | profile-change-teardown | Places Clients shutdown ] → Crash Report [@ AsyncShutdownTimeout | profile-change-teardown | Places Clients shutdown ] - trying to startup Places late during profile-change-teardown
Flags: needinfo?(gijskruitbosch+bugs)
Depends on: 1674005

I think Bug 1674005 pretty much solved my concerns here, PromiseSet is indeed doing what David pointed out, though it was not considering promises added too late and promises not being resolved because of their global going away. With that addressed it should be possible to add things also during the same phase.

We should be monitor async shutdown crashes in the next weeks/months to see the effect.

Depends on: 1677688

(In reply to Marco Bonardo [:mak] from comment #5)

We should be monitor async shutdown crashes in the next weeks/months to see the effect.

I guess this hasn't quite been fixed because there are still cases on 84 beta and 85 nightly... do we have any intuition about what is still missing here?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(mak)

I am waiting for some reports with Bug 1677688 in, but I don't see any yet.

Flags: needinfo?(mak)

The first report says we have invoked Wait on mBarrier, but we didn't receive Done from it.
Unfortunately I still don't see any state from mBarrier, looking into that.

I think these crashes may effectively be caused by SyncedBookmarksMirror. Though, I think I found why we get incomplete crash metadata from Places: an async shutdown barrier is supposed to return a propertybag from GetState, but effectively it instead returns an array of property bags, Places Shutdown fails to report it since it expects a property bag.
https://searchfox.org/mozilla-central/rev/8d722de75886d6bffc116772a1db8854e34ee6a7/toolkit/components/asyncshutdown/nsAsyncShutdown.jsm#203
Here _moduleBarrier.state is an array, built here:
https://searchfox.org/mozilla-central/rev/8d722de75886d6bffc116772a1db8854e34ee6a7/toolkit/components/asyncshutdown/AsyncShutdown.jsm#836-848
These structures are too complex anyway, I wonder why Async Shutdown went with property bags than just a string (JSON stringified), anyway, my plan for now would be to convert the array of property bags into a property bag like blockername => json string of its state. That should report more useful metadata.

Depends on: 1683280
Severity: critical → S2

Since the crash volume is low (less than 15 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit auto_nag documentation.

Severity: S2 → S3
You need to log in before you can comment on or make changes to this bug.