1579620 - Crash in [@ AsyncShutdownTimeout | Places Clients shutdown | SyncedBookmarksMirror: finalize]

Reporter

Description

•

5 years ago

This bug is for crash report bp-ca2e3b49-5280-4cdc-8ca4-7ad5e0190907.

crash reports with this signature are starting to show up cross-platform in the firefox 70 cycle. the first nightly this happened from was 70.a1 build 20190807095705.

the async shutdown timeout field generally looks similar to this in the reports:

{

    "phase":"Places Clients shutdown",
    "conditions":[
        {
            "name":"SyncedBookmarksMirror: finalize",
            "state":{
                "steps":[
                    {
                        "step":"finalize",
                        "at":1567842792535
                    }
                ]
            },
            "filename":"resource://gre/modules/SyncedBookmarksMirror.jsm",
            "lineNumber":280,
            "stack":[
                "resource://gre/modules/SyncedBookmarksMirror.jsm:SyncedBookmarksMirror:280",
                "resource://gre/modules/SyncedBookmarksMirror.jsm:open:357"
            ]
        }
    ]

}

Lina Butler [:lina]

Comment 1

•

5 years ago

Thanks for filing this! There are a few things that can cause a shutdown hang:

Failing to run PRAGMA optimize or detach from the mirror database.
A large merge. See bp-517875df-31b7-4108-8d1f-b64200190907 ({"step":"fetchRemoteTree","at":1567826800504,"took":25221,"counts":[{"name":"items","count":468552}]}), bp-87e5eeb4-915b-442d-b9f5-55a5b0190906 ({"step":"fetchRemoteTree","at":1567794381310,"took":5659,"counts":[{"name":"items","count":235805}]}), and bp-8947f7e6-6280-4c1f-93e1-1315a0190905 ({"step":"fetchRemoteTree","at":1567678575700,"took":19300,"counts":[{"name":"items","count":1016274}]}). These are hanging during application because there are too many bookmarks—468k, 235k, and 1m+ bookmarks, respectively—well above our soft limit.
A large number of tombstones. See bp-ad46895e-eb33-47ba-89b2-5d0480190905 ({"step":"merge","at":1567665986347,"took":970,"counts":[{"name":"items","count":2522},{"name":"deletes","count":1641866}]). The tree is pretty small, but that's a lot of tombstones to insert into a temp table—and then drop on the floor, since we won't store them.
Anything that's blocking the Places async thread, like Sanitizer.jsm or a Places operation. These will also show up in the shutdown hang report, though I can't find an example now. Since we're in the queue behind it, there isn't much we can do.
Database corruption. Nothing we can do here; Firefox will recreate the databases on the next startup.

We can fix (1) by optimizing when we apply, instead of waiting 'til shutdown. We can even get away with not calling finalize at all, since closing the main Places connection at shutdown will automatically detach the mirror database and drop all temp tables. However, if the main connection is stuck, we'll still see a shutdown hang, as in bug 1395542, bug 1426941, bug 1258350, and bug 1326309—but implicating Places instead of SyncedBookmarksMirror. Still, it's worth doing.

I don't think we should push on (2) just yet. Those numbers are well above our soft limit for bookmarks. https://phabricator.services.mozilla.com/D44769 should make this a bit better by adding more interrupt points, but, if SQLite is busy writing those changes to disk, there isn't much we can do. We can fix bug 1556459 to allow interrupting the main Places connections, but I think that's riskier than a shutdown hang.

We should fix (3). Inserting tombstones into a TEMP table for items that don't exist locally, and that we won't upload (and that we'll process one by one thanks to our trigger), is pretty wasteful.

(4) and (5) are out of our hands, unfortunately.

Let's keep this bug open for tracking, and fix the specific issues in dependent bugs.

Severity: critical → normal

Priority: -- → P3

Lina Butler [:lina]

Updated

•

5 years ago

Blocks: 1579727

Lina Butler [:lina]

Updated

•

5 years ago

Blocks: 1579728

Lina Butler [:lina]

Comment 2

•

5 years ago

Elaborating a bit:

We can fix (1) by optimizing when we apply, instead of waiting 'til shutdown.

I considered removing the shutdown blocker entirely, and using PlacesUtils.registerShutdownFunction. In theory, now that we use the main connection, and don't run any async tasks on shutdown, we can do this.

However, that would mean losing progress reports in shutdown hang crash annotations. Long-running merges blocked on a SQLite operation would still hang at shutdown—we can't interrupt read-write connections—except the signature would be something like AsyncShutdownTimeout | profile-change-teardown | Places Clients shutdown, and wouldn't mention SyncedBookmarksMirror at all. There would be no way to distinguish the mirror from other Places consumers.

The progress reports have been super valuable for optimizing the mirror. Bug 1567238, bug 1576810, and bug 1575757 (and bug 1561467, with the initial numbers) would have been much harder without them. With that in mind, I think we should still fix bug 1579727, but keep the blocker around, and accept that we'll still see shutdown hangs mentioning SyncedBookmarksMirror.

Lina Butler [:lina]

Comment 3

•

5 years ago

Actually, thinking more about this, (1) is wrong. We're hanging trying to run PRAGMA optimize, but that might be because we're blocked on another statement running on the async thread, not necessarily because the PRAGMA itself takes a long time to complete.

Consider:

The merge finishes. Merging and application happens in a single runnable, so we know no one else is using the connection, and we block the async thread until we're done.
We start fetching info to notify observers.
Another caller wants to use the main Places connection, either during normal operation, or at shutdown. This could be the address bar, Bookmarks.jsm, the library view, maintenance, Sanitizer.jsm clearing data at shutdown...
The other caller executes a statement that takes a long time. At this point, the async thread is blocked.
The mirror can't finish notifying observers, because it can't run any more statements until statement (4) finishes.
The user closes the browser. The mirror's shutdown blocker tries to run PRAGMA optimize, but that's blocked on (5), which is blocked on (4).

Basically, after the merge is done, other callers can use the Places connection, interleaving their own reads and writes. This isn't a consistency issue, since we already committed the merge transaction, and we're only reading from in-memory TEMP tables...but it does mean the mirror can be blocked on someone else's statements.

I think this can also happen even if the mirror isn't in the middle of a merge. In that case, it still can't run PRAGMA optimize or DETACH because the async thread is busy. This also means the hangs we see for notifyObservers and fetchLocalChangeRecords (bug 1561467, comment 0) might be misleading, because they're also counting the time they're waiting for other statements to run, not just their own.

However, fixing bug 1579727 by removing the shutdown blocker without waiting for PRAGMA optimize raises its own issues. If we did that, the mirror wouldn't report shutdown hangs at all (same issue as comment 2), since the blocker isn't waiting on pending merges to finish.

We could fix that by adding an AsyncShutdown.Barrier that waits for merges to finish before lifting the blocker, but now we're back to waiting on statements executed by other callers. That's going to be an issue as long as we're sharing the main Places connection with other callers—and using our own connection raises the specter of SQLITE_BUSY again. We've basically added a lot of complexity to get...what we have today.

TL;DR: let's close bug 1579727 as WONTFIX, and keep an eye on SyncedBookmarksMirror shutdown crashes. I suspect we're going to see many false positives, but the progress reports we get from actual hung merges are very useful, and bug 1579727 would regress that. There are already several bugs open for Places-related AsyncShutdown hangs, mentioned in comment 1. It's a bit flippant to say one more won't hurt, but the trade-off doesn't feel worth it.

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

5 years ago

status-firefox70: affected → fix-optional

Lina Butler [:lina]

Comment 4

•

5 years ago

Anything that's blocking the Places async thread, like Sanitizer.jsm or a Places operation. These will also show up in the shutdown hang report, though I can't find an example now. Since we're in the queue behind it, there isn't much we can do.

See bp-d59e26a3-22bc-4fe1-958a-dff200190925 (bug 1524200) for an example.

Pascal Chevrel:pascalc (PTO until April 26)

Comment 5

•

5 years ago

Lina, is there something actionable and upliftable to 71 to fix this crash or can I mark it as wontfix for 71? Thanks

Flags: needinfo?(lina)

Lina Butler [:lina]

Comment 6

•

5 years ago

As it is, I'm not sure this bug is actionable at all, in 71 or later.

We can "fix" the shutdown hang to have a different signature (like bug 1395542 or bug 1326309), but that would paper over the problem, and make it harder to track hung merges, instead of general Places connection hangs. Bug 1556459 might help, but also unclear.

status-firefox70: fix-optional → wontfix

status-firefox71: affected → wontfix

Flags: needinfo?(lina)

janet [:janet]

Updated

•

3 years ago

Status: NEW → RESOLVED

Closed: 3 years ago

Resolution: --- → WONTFIX

Bugzilla

Quick Search

Crash in [@ AsyncShutdownTimeout | Places Clients shutdown | SyncedBookmarksMirror: finalize]

Categories

(Firefox :: Sync, defect, P3)

Tracking

()

People

(Reporter: philipp, Unassigned)

References

Details

(Keywords: crash, regression)

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Updated

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Updated