Closed Bug 1472963 Opened 6 years ago Closed 6 years ago

Using bookmark mirror causes multi second hangs when operating with bookmarks

Categories

(Firefox :: Sync, defect)

63 Branch
x86_64
Windows 10
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 63
Tracking Status
firefox63 --- fixed

People

(Reporter: Tobias.Marty, Assigned: lina)

References

(Blocks 2 open bugs)

Details

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0
Build ID: 20180702220046

Steps to reproduce:

I've been using the new bookmark mirror for a few weeks until I had to disable it because of a few issues. What I noticed during the whole time was that its performance is often times very bad. 

After setting a few new bookmarks it would take multiple seconds until the confirmation dialog for new bookmarks would show. During that time no other operations with bookmarks were possible. But you could create duplicates, if you pressed the new bookmark button multiple times during the time the confirmation dialog wouldn't show. 

Also operations like draging and dropping multiple bookmarks to other folders in the bookmark manager window would cause hangs of the whole browser. 

This didn't seem to correlate strongly with the speed of the machines where Firefox was running - it was always slow.

After disabling services.sync.engine.bookmarks.buffer everything is running fine again.
OS: Unspecified → Windows 10
Hardware: Unspecified → x86_64
Blocks: 1433177
(In reply to TMart from comment #0)
> I've been using the new bookmark mirror for a few weeks until I had to
> disable it because of a few issues. What I noticed during the whole time was
> that its performance is often times very bad. 

Did you notice those perf problems recently (like in the last few days) or a few weeks ago?
I'm asking because we had a known performance problem that was fixed about 20 days ago and was not strictly related to the mirror.
Component: Places → Sync
Product: Toolkit → Firefox
Definitely after. I was affected by https://bugzilla.mozilla.org/show_bug.cgi?id=1467537 if that's that you mean. But those issues are gone - the hangs when using bookmarks not.
Component: Sync → Places
Product: Firefox → Toolkit
Sorry, I accidently changed the status, that was not my intention -.-
Component: Places → Sync
Product: Toolkit → Firefox
Thanks! Trace and debug logs include timestamps, which would give us a better idea of where the bottlenecks are. Also, do you see any statements mentioning `places.sqlite` in the "Slow SQL Statements" tab in about:telemetry?

If you have more cycles to help us out, capturing a performance profile (https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Profiling_with_the_Built-in_Profiler) while Sync is running (and trace logging is disabled) would be awesome...but even uploading your logs and checking Slow SQL would be enough to point us in the right direction.
Flags: needinfo?(Tobias.Marty)
I can reproduce this very easily on my desktop. I just have to set a few bookmarks and it will go nuts... It hangs for over two minutes. The machine has an i7 6700k, 16 GB RAM and an SSD, so performance shouldn't be an issue. 

There are some slow SQL statements for places.sqlite. What would be the best way to share these? 

By far the longest is 
UPDATE moz_places SET frecency = CALCULATE_FRECENCY(id) WHERE frecency = -1 /* places.sqlite */ 
with two hits and 137256 ms! I thought this was the same problem as in https://bugzilla.mozilla.org/show_bug.cgi?id=1467537 and supposed to be fixed now? 

I also capturead a profile. I started recording, started to set two bookmarks. I set a third one and noticed it hanging again. I tried to set two other bookmarks, and stopped the recording once those three bookmarks were set, too. The recording is 140s, that's how long it took until the hang was over: https://perfht.ml/2KuYtCb
Which corresponds with the running time of above mentioned SQL statement in places.sqlite. 

I have about 2,000 bookmarks right now, is that already an edge case for Sync (meaning just too many)?
Flags: needinfo?(Tobias.Marty)
Attached file sync_log
Permanent link to correct log files: https://1drv.ms/u/s!ArY6nA0dfc6ugtstaZEvgtyKr19Ang
Thanks for posting the logs!

(In reply to TMart from comment #5)
> I can reproduce this very easily on my desktop. I just have to set a few
> bookmarks and it will go nuts... It hangs for over two minutes. The machine
> has an i7 6700k, 16 GB RAM and an SSD, so performance shouldn't be an issue.
>
> I have about 2,000 bookmarks right now, is that already an edge case for
> Sync (meaning just too many)?

Yikes. :-( We should definitely be able to handle that, my laptop is way less powerful (i7-4650u, 8 GB RAM), and I have ~3500 bookmarks.

> There are some slow SQL statements for places.sqlite. What would be the best
> way to share these?

Either a screenshot, or copy-pasting the table into a text file, is fine. In particular, anything mentioning `items`, `structure`, `itemsToMerge`, or `structureToMerge`, is interesting, as is `BEGIN ...`.

I'm not sure that's the bottleneck, though. Looking at your logs, applying the merged tree took 442ms in the worst case, and usually takes under 300ms. Other worst case times I see are:

* 331ms to build the local tree.
* 284ms to build the local tree.
* 16ms to record observer notifications.
* 59ms to stage outgoing items.
* 226ms to build records for those outgoing items.
* 8ms to clean up.

I also don't see any red flags notifying observers (when we also recalculate frecencies; this usually takes under 50ms in your logs, with one outlier at 2.3s), and, from bug 1472965, comment 5, you only have 44 URLs with missing frecencies, anyway.

Building the merged tree is another story. This stands out:

> 1530636332270 Sync.Engine.Bookmarks.Mirror  TRACE Remote parent unchanged; keeping local child "JyfZ69ESHP1L (Bookmark; Age = 6480619.571s)" in local parent "unfiled_____ (Folder; Age = 1.102s; Unmerged)"
> 1530636423829 Sync.Engine.Bookmarks.Mirror  TRACE Item JyfZ69ESHP1L exists locally as "JyfZ69ESHP1L (Bookmark; Age = 6480619.571s)" and remotely as "JyfZ69ESHP1L (Bookmark; Age = 8649139.900s)"; merging
> [...]
> 1530636424421 Sync.Engine.Bookmarks.Mirror  TRACE Building complete merged tree took 92566.712ms

So about 92 *seconds* pass between these two log messages: https://searchfox.org/mozilla-central/rev/403038737ba75af3842ba6b43b6e2fb47eb06609/toolkit/components/places/SyncedBookmarksMirror.jsm#3371-3373,3790-3795 None of the other logs show a delay for `JyfZ69ESHP1L`.

The next longest merge I see took about 17 seconds, with two pauses between the same log calls, but for different items:

> 1530636234659 Sync.Engine.Bookmarks.Mirror  TRACE Remote parent unchanged; keeping remote child "LeurfU8pnWZa (Bookmark; Age = 8649042.390s)" in "unfiled_____ (Folder; Age = 1.497s; Unmerged)"
> 1530636243451 Sync.Engine.Bookmarks.Mirror  TRACE Merging remote child "06F1fo4gO7FR (Bookmark; Age = 8649042.390s)" of "unfiled_____ (Folder; Age = 1.260s)" into "unfiled_____ (Value: Local; Structure: Local)"
> [...]
> 1530636243457 Sync.Engine.Bookmarks.Mirror  TRACE Remote parent unchanged; keeping remote child "LZ1ZS3VPc-UX (Bookmark; Age = 8649042.390s)" in "unfiled_____ (Folder; Age = 1.497s; Unmerged)"
> 1530636250314 Sync.Engine.Bookmarks.Mirror  TRACE Merging remote child "e36hth_WqCvM (Bookmark; Age = 8649042.390s)" of "unfiled_____ (Folder; Age = 1.260s)" into "unfiled_____ (Value: Local; Structure: Local)"
> [...]
> 1530636250877 Sync.Engine.Bookmarks.Mirror  TRACE Building complete merged tree took 16514.557ms

The only thing I can think of is the `maybeYield` call, where briefly yield to the event loop (https://searchfox.org/mozilla-central/rev/403038737ba75af3842ba6b43b6e2fb47eb06609/services/common/async.js#99) at the start of `mergeNode()`. (Flushing log messages to disk is another idea, but unlikely, or I'd expect to see it more frequently).

It doesn't look like there's anything special about these items; they're all siblings in `unfiled`. All the other merges take under 2s, and this is with trace logging, which adds a fair bit of overhead as it is.

I'll look into this more tomorrow, but I suspect the `maybeYield` calls aren't helping us here. And, to make things worse...

(In reply to TMart from comment #0)
> After setting a few new bookmarks it would take multiple seconds until the
> confirmation dialog for new bookmarks would show. During that time no other
> operations with bookmarks were possible. But you could create duplicates, if
> you pressed the new bookmark button multiple times during the time the
> confirmation dialog wouldn't show. 
> 
> Also operations like draging and dropping multiple bookmarks to other
> folders in the bookmark manager window would cause hangs of the whole
> browser. 

Because we're yielding inside the merge transaction, the main connection can't write to the database, so the UI glitches and hangs.
(In reply to TMart from comment #5)
> I also capturead a profile. I started recording, started to set two
> bookmarks. I set a third one and noticed it hanging again. I tried to set
> two other bookmarks, and stopped the recording once those three bookmarks
> were set, too. The recording is 140s, that's how long it took until the hang
> was over: https://perfht.ml/2KuYtCb
> Which corresponds with the running time of above mentioned SQL statement in
> places.sqlite. 

Hmm, I see a lot of generator calls in that profile. I wonder if changing these to vanilla loops, and being more judicious with our `yieldingIterator` use, will help.
I also wonder if we can build the merged tree outside of the transaction. Most folks probably won't change their bookmarks mid-sync [1]...but, even if they do (for example, dragging bookmarks around in the Library, or starring a bookmark and changing its folder), we'll need to sync again, anyway, and the intermediate syncs are just getting in the way.

How about this:

1. Before building the merged tree, store the number of writes executed on the main connection.
2. Build the merged tree.
3. Start the merge transaction.
4. Check if the current number of writes matches what we saved in step 1. If they match, we can safely apply the merged tree. If not, the database changed underneath us, and we should abort the merge, and retry on the next sync.

How does that sound to you, Mak? We could expose the counter as an attribute on `nsINavBookmarksService`, and update it via `AFTER {INSERT, UPDATE, DELETE}` triggers, like we do for the last inserted ID in https://searchfox.org/mozilla-central/rev/6ef785903fee6c0b16a1eab79d722373d940fd78/toolkit/components/places/nsPlacesTriggers.h#222.

We could also try using the `TOTAL_CHANGES()` SQL function for this, but the docs caution:

> If a separate thread makes changes on the same database connection while sqlite3_total_changes() is running then the value returned is unpredictable and not meaningful. 

So I don't know how it'll interact with the mix of sync and async statements that we run on the main Places connection. Alternatively, we could have mozStorage call `sqlite3_stmt_readonly` on each statement it executes, and expose a `totalChanges` attribute on `mozIStorageAsyncConnection`...but, like `TOTAL_CHANGES()`, it would miss changes from other cloned Places connections, while the triggers would see them. It would also count changes to other tables, while we're only really interested in `moz_bookmarks`.

[1]: Citation needed.
Flags: needinfo?(mak77)
Other ideas:

1. Expose a helper for the main Places connection to interrupt statements on the mirror connection. That sounds dangerous, and likely to cause data loss if misused by other consumers. (The mirror is a bit special in that aborting the merge transaction is fine, since we already downloaded the bookmarks from the server).

2. Remove the calls to `maybeYield` from the merger entirely. This will cause jank for large trees (bug 1433806).

3. Move the merger into a worker. We can remove the `maybeYield` calls, but then we still have the same problem as now: the transaction has to be active while we're waiting on the worker. There's also serialization overhead.

4. Move the entire mirror into a worker, and use ctypes to access Sqlite. No. :-P (And we're still waiting on the merger while the transaction is active, so this is "3 but worse").
I just created a new profile for Nightly, enabled the mirror and synced my profile. I didn't notice main thread hangs when draging and dropping bookmarks in the bookmark manager. There was a hang whe setting every third bookmark, but not nearly as bad as in the old profile (maybe about 2 to 3 seconds). 

Also I'm syncing the profile with a laptop with an i5 6200u where the issues aren't nearly as bad. 

Here's a file with telemetry data with very slow SQL statements from the old profile: https://1drv.ms/t/s!ArY6nA0dfc6ugtsx5XD06XUT8I789w
Another test on mentioned slower laptop: No long hangs even when creating 10 new bookmarks in a row. No slow SQL statements longer than 2 seconds. This really puzzles me, as it is essentialy the same bookmark data. 

Would sync logs from the laptop be of use here?
(In reply to Lina Cambridge (she/her) [:lina, :kitcambridge] from comment #10)
> I also wonder if we can build the merged tree outside of the transaction.

It's a possibility, reducing transactions duration is always a good thing.

> 1. Before building the merged tree, store the number of writes executed on
> the main connection.
> 2. Build the merged tree.
> 3. Start the merge transaction.
> 4. Check if the current number of writes matches what we saved in step 1. If
> they match, we can safely apply the merged tree. If not, the database
> changed underneath us, and we should abort the merge, and retry on the next
> sync.
> 
> How does that sound to you, Mak? We could expose the counter as an attribute
> on `nsINavBookmarksService`, and update it via `AFTER {INSERT, UPDATE,
> DELETE}` triggers

Off-hand it should work, I'm not thrilled of adding more triggers (overhead) to bookmarks, but I doubt we can instrument every code doing writes to bookmarks (they're quite spreaded around).
And, as you said, we can't rely on sqlite counts because of the inter-thread use.
Sync doesn't sync keywords right?
I wonder how we'll deal with tags in the future, I suppose when tags change we'll still increase the sync change counter. Actually, could we make the UPDATE trigger only activate on syncChangeCounter changes? Are there other changes we must take care of?
Flags: needinfo?(mak77)
(In reply to TMart from comment #13)
> Another test on mentioned slower laptop: No long hangs even when creating 10
> new bookmarks in a row. No slow SQL statements longer than 2 seconds. This
> really puzzles me, as it is essentialy the same bookmark data.

A fresh profile has fewer rows in `moz_places`, which might explain part of that...though, you're still seeing some hangs on your other machine, so I think there's something else going on.

It's possible that the merger yielding to the main thread causes a deadlock. The mirror starts a transaction on the storage thread, but the merger runs on the main thread. When it yields, some other Places code might try to run a synchronous statement, also on the main thread...but it can't, because we're holding our transaction open on a storage thread.
  
> Would sync logs from the laptop be of use here?

I don't think so, I have a pretty good idea of what's happening based on the logs from your other machine. Thanks for posting them, and for checking!
Assignee: nobody → kit
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
This patch:

* Exposes a global Sync change counter on `nsINavBookmarksService`.
  This is similar to SQLite's `total_changes()`, but just for changes
  to bookmarks that affect Sync, and accounts for changes from multiple
  threads and connections.
* Adds a SQL function to bump the global Sync change counter whenever
  `moz_bookmarks` changes.
* Extends the `moz_bookmarks` triggers to call this function.
* Moves merging outside the transaction in the bookmarks mirror, and
  checks that the counters match before applying.
* Changes the Sync scheduler to always schedule follow-up syncs, even
  when there's an error.
(In reply to Marco Bonardo [::mak] from comment #14)
> Sync doesn't sync keywords right?

It does, but Places bumps the change counter for all bookmarks with the keyword URL, so we can reuse the triggers that are already there.

> I wonder how we'll deal with tags in the future, I suppose when tags change
> we'll still increase the sync change counter.

Yep, that's what we do today. I think, once tags move to their own table, they'll work the same as keywords for bumping the change counter.
Blocks: 1475104
Comment on attachment 8990358 [details]
Add a `totalSyncChanges` counter and use it in the bookmarks mirror.

Marco Bonardo [::mak] has approved the revision.

https://phabricator.services.mozilla.com/D2004
Attachment #8990358 - Flags: review+
Pushed by kcambridge@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e2c0d7d48415
Add a `totalSyncChanges` counter and use it in the bookmarks mirror. r=mak
https://hg.mozilla.org/mozilla-central/rev/e2c0d7d48415
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 63
I just tested Nightly Build ID 20180714102053 and the issues seem to be gone entirely for me. Everything is as fast and responsive as without bookmark mirror now.
I am reopening this as now again I have very long hangs when operating with bookmarks. 

This time telemetry shows this SQL statement: 
UPDATE moz_places SET frecency = CALCULATE_FRECENCY(id) WHERE id IN ( SELECT id FROM moz_places WHERE frecency < 0 ORDER BY frecency ASC LIMIT :limit ) /* places.sqlite */ 

It easily takes over 6000 ms. 

Should I again upload sync diagnostic files? In a new ticket or this one?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Please can you file a new bug, as we've already landed something here. It'll be easier for us to track the changes and patches with a new bug.
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Filed bug 1485480. :-)
See Also: → 1485480
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: