Closed Bug 1482608 Opened 6 years ago Closed 6 years ago

Ship a Rust bookmark merger on Desktop

Categories

(Firefox :: Sync, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
Firefox 68
Tracking Status
firefox68 --- fixed

People

(Reporter: lina, Assigned: lina)

References

(Blocks 2 open bugs)

Details

Attachments

(6 files, 8 obsolete files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
Currently, bookmark syncing is implemented almost entirely in https://searchfox.org/mozilla-central/rev/088938b1495aeac586bff94234c9382a4b8ca6da/toolkit/components/places/SyncedBookmarksMirror.jsm. There are two main parts: a merger, which takes two bookmark trees and recursively builds up a merged tree, and the mirror, which handles storing incoming records and updating Places. We're exploring porting the merger to Rust (https://github.com/linacambridge/vellum), so that we can use it on mobile, and I'm curious if we can then port the Rust merger back to Desktop. Sync is written in JS, so we'll still need some JS logic to store and inflate records...but I think we can handle merging and updating Places in Rust. I wrote up some thoughts in https://github.com/linacambridge/vellum/issues/7, and started hacking on a prototype. So far, I have a stub `mozISyncedBookmarkMirror` component building, but haven't wired any of it up yet. Nika, I'm CCing you in case you have tips or insights, or think this harebrained idea won't work at all. :-) I'd love to get some of your time and chat more about this after next week, too. Thanks! Small snags I've run into: * `Array<...>` params aren't supported yet, so we need to use the old `array` syntax for passing weak uploads to `apply`. * `Promise` return types (for `open` and `apply`) aren't supported, and I didn't find examples for how we'd construct one in Rust. That's OK, though; we can just use a callback interface.
Omg, it actually works! 🎉🦀🎉 We now have a cloned Places connection set up in Rust, and exposed to JS via an XPCOM component!
Things that were a bit tricky, for future reference: * Figuring out the right mix of `&` and `*` to convert an `nsString` to an `nsAString`, and vice versa. https://searchfox.org/mozilla-central/rev/dc28b8bddfbb7bbc89de5d0fd6448589aa6a2991/servo/support/gecko/nsstring/src/lib.rs#5-113 is super thorough, and it would be wonderful if it also had an example or two. * Not calling `AsyncClose`, which caused mozStorage to assert on shutdown. https://wompwompwomp.com/ * Is there a way to use the `?` operator together with `nsresult`? (Converting a `Result` into an `nsresult`, and calling an XPCOM method and having it propagate results that aren't `NS_OK`?)
Oops, I accidentally a mozStorage Rust binding. >.> But, it works! We can now initialize a mirror database from Rust. \o/ This doesn't implement a complete mozStorage wrapper, just enough for the mirror to work. Unfortunately, the callback API makes things a *lot* more cumbersome, so I'm considering an entirely different approach. The `Tasks` struct was my way of porting async.js-style `each` to Rust, but it's annoying to use because of all the boxing and explicit type names required everywhere. I copy-pasted a lot of boilerplate to get this working, but I'm not at all happy with the result. There's also a lot of ping-ponging between the main thread and async storage thread for every `db.once` (and `statement.execute`) call, errors get dropped on the floor without any way to handle them (a method can return a `Result`, but also passes one to its callback...and either can fail?), and we lose the ability to use conveniences like `?` because we're now passing those results around. It's the Node callback problem with inversion of control all over again. Boo. (It's also quite likely that I don't know what I'm doing, since I'm still just learning Rust ^_^). Instead, what if we cloned the connection synchronously, then scheduled a runnable and executed all our mirror statements on the async thread? That gives us a more natural way to call mozStorage; we can use results as they're intended, instead of passing them to callbacks. It also lets us handle errors in one place (when we dispatch the runnable back to the main thread and call the mirror callback), and reduces ping-ponging between the main and async threads. CCing Asuth because I think he'll have much more informed opinions about all this. :-)
(In reply to Lina Cambridge (she/her) [:lina] from comment #4) > Instead, what if we cloned the connection synchronously, then scheduled a > runnable and executed all our mirror statements on the async thread? That > gives us a more natural way to call mozStorage; we can use results as > they're intended, instead of passing them to callbacks. It also lets us > handle errors in one place (when we dispatch the runnable back to the main > thread and call the mirror callback), and reduces ping-ponging between the > main and async threads. Sounds like a great idea to me. If your logic or parts of it can run off the main thread and use the synchronous API, that's definitely better all around in terms of memory allocation, latency, etc. One possibility I'd raise is that if you can avoid using a mozStorage async connection and entirely use the sync API on your own thread whose life-cycle you manage, I think that's preferable. If you do have code that needs the async API, then dispatching your logic to the storage async thread words (and I think Places does that already). We may need to fix bug 1121282 to allow ownership transfers, however. Also, very cool to see mozStorage exposed via rust!
(In reply to Andrew Sutherland [:asuth] from comment #5) > Sounds like a great idea to me. If your logic or parts of it can run off > the main thread and use the synchronous API, that's definitely better all > around in terms of memory allocation, latency, etc. Perfect, thanks, Andrew! I just pushed a new revision with synchronous bindings, and the API is much, much cleaner! \o/ I do have some questions about thread safety: we dispatch a runnable to the async thread, store the result in a `Cell`, and re-dispatch back to the main thread to call the callback. So we end up accessing the `Cell` from two different threads, but it's not clear to me how much that matters, since (IIUC) `nsIEventTarget` serializes runnables, and the accesses won't ever be concurrent. (I also discovered that the implementation of `nsIVariant` for `NS_VARIANT_CONTRACTID` is cycle-collected; trying to use that from the async thread was a fun surprise! >.< But it was pretty easy to add a helper for creating a non-CC variant, and then everything worked smoothly). > One possibility I'd raise is that if you can avoid using a mozStorage async > connection and entirely use the sync API on your own thread whose life-cycle > you manage, I think that's preferable. We could definitely do that in the future. For now, we still need to expose the mozStorage connection to JS, so we might as well use the storage thread that we get for free. > Also, very cool to see mozStorage exposed via rust! Thanks! This was super fun to write, thanks to Nika's xpcomrs bindings!
(In reply to Lina Cambridge (she/her) [:lina] from comment #6) > I do have some questions about thread safety: we dispatch a runnable to the > async thread, store the result in a `Cell`, and re-dispatch back to the main > thread to call the callback. So we end up accessing the `Cell` from two > different threads, but it's not clear to me how much that matters, since > (IIUC) `nsIEventTarget` serializes runnables, and the accesses won't ever be > concurrent. When you dispatch a runnable to another thread, it's possible for the dispatched runnable to start executing before the caller resumes executing. So you should be done touching the cell before that happens. Also, this can have the unexpected side-effect that your ref-count can end up hitting zero on the async thread when you'd expect it to hit zero on the main thread you re-dispatched the runnable to instead. > (I also discovered that the implementation of `nsIVariant` for > `NS_VARIANT_CONTRACTID` is cycle-collected; trying to use that from the > async thread was a fun surprise! >.< But it was pretty easy to add a helper > for creating a non-CC variant, and then everything worked smoothly). Yes, this is why mozStorage grew its own variant impl at https://searchfox.org/mozilla-central/source/storage/Variant.h
(In reply to Andrew Sutherland [:asuth] from comment #7) > When you dispatch a runnable to another thread, it's possible for the > dispatched runnable to start executing before the caller resumes executing. > So you should be done touching the cell before that happens. Also, this can > have the unexpected side-effect that your ref-count can end up hitting zero > on the async thread when you'd expect it to hit zero on the main thread you > re-dispatched the runnable to instead. That makes sense, thanks for the great explanation! I think I'm OK, then...`Dispatch` is called at the very end, after updating the cell's contents, both when initially dispatching the runnable to the async thread, and in `Run` when dispatching back to the main thread. So, IIUC, even though we access the cell from multiple threads, those accesses are serialized. > Yes, this is why mozStorage grew its own variant impl at > https://searchfox.org/mozilla-central/source/storage/Variant.h I was wondering about that. :-) Should we expose the mozStorage variant to Rust, instead of using the non-CC `nsVariant`?
Sooo...it works all the way now! 🦀📚✨ We can write new records into the mirror from JS, use Vellum to merge and stash outgoing items in Rust, and inflate records for those items back in JS. Merging and application both happen on the async thread, which is a nice win because merging is currently the most time-consuming part of a bookmark sync, and entirely CPU-bound. I'm really sorry about the massive commit; the bulk of the diff is from vendoring Vellum and its dependencies. 😅 If we decide to go ahead with this, I'll work on splitting the patch up into more digestible chunks. There are some things we'll definitely want to figure out, though: * Error handling. For now, I'm just forwarding `nsresult`s. This is expedient, but doesn't give us a lot of context if something goes wrong. Maybe we can use Failure for richer errors, and pass them up to the `mozISyncedBookmarksMirrorCallback`s... * Logging. Trying to integrate the Rust mirror into the existing Sync logging infrastructure, built around `Log.jsm`, is probably a non-starter. We could emit logging events, or just write log files from Rust directly into `weave/logs`, so that About Sync can find them. * Cloning off the main thread. This one is a bit tricky. `mozISyncedBookmarksMirrorFactory.open` is called from the main thread, so we want to use `AsyncClone` to clone the Places connection...except `AsyncClone` returns a `mozIStorageAsyncConnection`. We could just lift that restriction, or add a `noscript` method to create a synchronous clone. Bug 1121282 would probably solve this, too, if we could clone the connection on the Places async thread, then pass it back to the main thread. * Database corruption recovery. Fairly straightforward, but has the same caveat as OMT cloning...we want to delete the corrupt database on a background thread, to avoid main thread I/O. * Thread safety. The merge runs without asserting now, so I think I got all the dodgy calls...but it'd be good to double-check this, and make sure we're not introducing data races. * Move the storage wrapper into mozStorage?
Assignee: nobody → lina
Status: NEW → ASSIGNED
Oh, and telemetry. We can fire a callback for telemetry events.
(In reply to Lina Cambridge (she/her) [:lina] from comment #8) > > Yes, this is why mozStorage grew its own variant impl at > > https://searchfox.org/mozilla-central/source/storage/Variant.h > > I was wondering about that. :-) Should we expose the mozStorage variant to > Rust, instead of using the non-CC `nsVariant`? That makes sense to me. The mozStorage Variant can't hold VTYPE_INTERFACE/friends which makes the fact that it's not cycle-collected less (or even not? :) dangerous. Also, it's the only variant involved that's NS_DECL_THREADSAFE_ISUPPORTS versus nsVariant's NS_DECL_ISUPPORTS.
(In reply to Lina Cambridge (she/her) [:lina] from comment #9) > Sooo...it works all the way now! 🦀📚✨ Wow! > * Error handling. For now, I'm just forwarding `nsresult`s. This is > expedient, but doesn't give us a lot of context if something goes wrong. > Maybe we can use Failure for richer errors, and pass them up to the > `mozISyncedBookmarksMirrorCallback`s... How do you see richer errors being used, other then ending up in logs? > * Logging. Trying to integrate the Rust mirror into the existing Sync > logging infrastructure, built around `Log.jsm`, is probably a non-starter. > We could emit logging events, or just write log files from Rust directly > into `weave/logs`, so that About Sync can find them. Writing your own log files seems reasonable in the first instance. Doing something better than Log.jsm might make sense in the longer term anyway, and I could certainly imagine this also leveraging rust. > * Database corruption recovery. Fairly straightforward, but has the same > caveat as OMT cloning...we want to delete the corrupt database on a > background thread, to avoid main thread I/O. I'd be inclined to say that given we expect this to be a very rare case, we could probably live with main-thread IO here if it makes life significantly easier. > Oh, and telemetry. We can fire a callback for telemetry events. Yeah, we certainly need this! While it's getting out of date now, https://github.com/mhammond/application-services/tree/telem is possibly still useful - https://github.com/mhammond/application-services/blob/telem/sync15-adapter/src/telemetry.rs is designed to be a little flexible in terms of how telemetry is "submitted" - so maybe that submission process could do the callback thing? Or not! If you think it makes sense, I'd be happy to use this as excuse to push on that a little more (it's kinda stalled until the application-services work gets closer to telemetry being more than a theoretical consideration) I'm still not sure how hard we should push on actually landing this though - I expect some of these issues may form a long-tail, and the existing JS merger is in a kinda "half enabled" state IIUC. I guess the good news is that it seems unlikely this effort will get stale any time soon, so we can kinda play it by ear. Amazing work though!
MozReview-Commit-ID: D4Dgo5ceQ6Q
MozReview-Commit-ID: Lrf61fXzJgJ Depends on D4909
Attached file Bug 1482608 - Vendor Dogear. (obsolete) —
MozReview-Commit-ID: 8UMB6jnZQWS Depends on D4910
Attachment #8999340 - Attachment is obsolete: true
OK...I split out the mozStorage bindings into a separate patch, and moved them to `storage/rust`. They're not complete, but, if we're interested in taking this further, we could push on getting these ready first. It's a fairly standalone chunk of work, and could be useful as we look at moving more of syncing and storage into Rust. Also, the merger is now vendored from crates.io. (In reply to Mark Hammond [:markh] from comment #12) > How do you see richer errors being used, other then ending up in logs? Logging and debugging are the only two, really. Right now, just about everything reports `Error applying mirror to Places: NS_ERROR_FAILURE`, and no backtrace. 😶 > Writing your own log files seems reasonable in the first instance. Doing > something better than Log.jsm might make sense in the longer term anyway, > and I could certainly imagine this also leveraging rust. Cool, that makes sense. We use the `log` crate in Dogear and elsewhere, and I see we have something that looks like a Log.jsm-style formatter for it already in https://searchfox.org/mozilla-central/rev/c3fef66a5b211ea8038c1c132706d02db408093a/testing/geckodriver/src/logging.rs. > I'd be inclined to say that given we expect this to be a very rare case, we > could probably live with main-thread IO here if it makes life significantly > easier. Yeah. "Database disk image malformed" is our most common error now, but that might also be because the database is inaccessible (Firefox profile on an NFS mount?), not corrupt. However, this patch also clones the Places connection synchronously, because connections cloned with `asyncClone` only implement the async interface. Andrew suggested in comment 5 that we could manage our own thread, and interact with the cloned mirror connection entirely from it. That might be easiest, though we still need a way for the JS side to write incoming and read outgoing records. Alternatively, it sounds like, if we fixed bug 1121282 to allow ownership transfers between threads, we could clone the Places connection synchronously on a new thread (or on *its* async thread), replace the mirror database if it's corrupt, then pass the cloned connection to the main thread...which I think is similar to what https://searchfox.org/mozilla-central/rev/c3fef66a5b211ea8038c1c132706d02db408093a/storage/mozStorageConnection.cpp#1493 does now. But I'm out of my depth here; Andrew and Mak, do you have suggestions? 😊 > Yeah, we certainly need this! While it's getting out of date now, > https://github.com/mhammond/application-services/tree/telem is possibly > still useful - > https://github.com/mhammond/application-services/blob/telem/sync15-adapter/ > src/telemetry.rs is designed to be a little flexible in terms of how > telemetry is "submitted" - so maybe that submission process could do the > callback thing? Or not! If you think it makes sense, I'd be happy to use > this as excuse to push on that a little more (it's kinda stalled until the > application-services work gets closer to telemetry being more than a > theoretical consideration) Oh, thanks for the link, I knew you had that branch up somewhere! Since the mirror just emits event telemetry, and it's pretty well contained in a `recordTelemetryEvent` callback today, I think we could do the same for the XPCOM version. We definitely want to think about telemetry collection in our Rust libraries more broadly, though. > I'm still not sure how hard we should push on actually landing this though - > I expect some of these issues may form a long-tail, and the existing JS > merger is in a kinda "half enabled" state IIUC. Yep. 😄 While it would be valuable to have a single implementation (or at least 2 😉) of bookmark sync on all our platforms, our immediate goal is to ship something that works for mobile, so it doesn't seem particularly worthwhile to push much more on this now. As you say, we haven't shipped the JS version, let alone the Rust one, and there are some rabbit holes here. We can still use Desktop's corpus of tests to help us develop a library for mobile, now that we know it _can_ work...and work on landing it later if we think it's useful.
Priority: -- → P3

Whew, time flies! ⏰🦋 A few things changed in the last few months, to where I think it's feasible to push on this:

  • Dogear (the Rust merger) has tests for the merge algorithm, but no integration tests that apply the merged tree to Places.
  • We're planning to use Dogear for bookmark syncing on mobile.
  • We're working on a Rust Places port that mostly uses the same schema, meaning we can also reuse many of Desktop's existing tests.
  • Dogear has changed significantly to fix up diverging structure (https://github.com/linacambridge/dogear/pull/19), and backporting those changes to the JS merger is impractical.
  • Myk added thread-safe task runnables over in bug 1490496. 🎉

The trickiest part of this patch was creating and managing the mirror DB connection from Rust...so what if we kept all that in JS? We can still fetch the local and remote trees, build our merged tree, and populate mergeStates (https://searchfox.org/mozilla-central/rev/c07aaf12f13037c1f5a343d31f8291549e57373f/toolkit/components/places/SyncedBookmarksMirror.jsm#535-604,1342-1356,1374-1387) in Rust (and off the main thread!), then call back into JS to record telemetry, apply the tree, stage items for upload, and so on.

With this approach, the Rust code wouldn't clone the Places DB connection or initialize the schema; it'd just take a handle to the connection that the JS code has already set up.

This would mean two transactions per merge: one in Rust to insert everything into mergeStates, and one in JS to apply. I'm not too worried about that, though...most syncs won't make any changes at all, so we can skip merging entirely; the old bookmarks engine runs several transactions per item; and we still have our totalSyncChanges check to ensure we can safely merge outside of a transaction.

Blocks: 1433177
Depends on: 1490496
Priority: P3 → P2
Summary: Explore shipping a Rust version of the bookmark merger in Desktop → Ship a Rust bookmark merger on Desktop
Attachment #9006171 - Attachment is obsolete: true
Attachment #9006172 - Attachment is obsolete: true
Attachment #9006173 - Attachment is obsolete: true

Depends on D18008

Storing a handle to the storage connection in mozISyncedBookmarkMerger.db leaks strings. 😮 I instrumented nsStringBuffer and nsStringStats to record allocations and releases, and see something like this for a test run:

0:03.03 pid:35555 LEAKED: places.sqlite
0:03.03 pid:35555 LEAKED: places.sqlite
0:03.03 pid:35555 LEAKED: places.sqlite
0:03.03 pid:35555 LEAKED: places.sqlite
0:03.03 pid:35555 LEAKED: places.sqlite
0:03.03 pid:35555 LEAKED: places.sqlite
0:03.03 pid:35555 LEAKED: places.sqlite
0:03.03 pid:35555 LEAKED: places.sqlite
0:03.03 pid:35555 LEAKED: /var/folders/v7/bzb2q9rn56d2rykmwdxgs7z80000gn/T/firefox/xpcshellprofile/places.sqlite
0:03.03 pid:35555 LEAKED: places.sqlite

Nulling out merger in SyncedBookmarksMirror#finalize (where we also call this.db.close()) fixes that, so I guess this has to do with the Rust merger keeping a reference to the connection alive after we've closed it from JS?

Priority: P2 → P1

Depends on D18009

Depends on D18764

Depends on: 1527452
Attachment #9039979 - Attachment is obsolete: true
Attachment #9039980 - Attachment is obsolete: true
Attachment #9041646 - Attachment is obsolete: true
Attachment #9041648 - Attachment is obsolete: true

This commit wraps just enough of the mozStorage API to support the
bookmarks mirror. It's not complete: for example, there's no way
to open, clone, or close a connection, because the mirror handles
that from JS. The wrapper also omits shutdown blocking and retrying on
SQLITE_BUSY.

This commit also changes asyncClone to propagate the parent's
mAsyncOnly flag, so that asynchronously cloned sync connections
can still QI to mozIStorageConnection.

Finally, we expose an OpenedConnection::storageConnection getter
in Sqlite.jsm, for JS code to access the underlying connection.

This commit adds ThreadPtr{Handle, Holder} to wrap an XpCom object
with thread-safe refcounting. These are analagous to
nsMainThreadPtr{Handle, Holder}, but can hold references to
objects from any thread, not just the main thread.

ThreadPtrHolder is similar to ThreadBoundRefPtr. However, it's
not possible to clone a ThreadBoundRefPtr, so it can't be shared
among tasks. This is fine for objects that are only used once, like
callbacks. However, ThreadBoundRefPtr doesn't work well for loggers
or event emitters, which might need to be called multiple times on
the owning thread.

Unlike a ThreadBoundRefPtr, it's allowed and expected to
clone and drop a ThreadPtrHolder on other threads. Internally,
the holder keeps an atomic refcount, and releases the wrapped object
on the owning thread once the count reaches zero.

This commit also changes TaskRunnable to support dispatching from
threads other than the main thread.

Depends on D20073

This commit introduces a Rust XPCOM component,
mozISyncedBookmarksMerger, that wraps the Dogear crate for
merging and applying synced bookmarks.

How this works: SyncedBookmarksMirror.jsm manages opening
the connection, initializing the schema, and writing incoming
items into the mirror database. The new mozISyncedBookmarksMerger
holds a handle to the same connection. When JS code calls
mozISyncedBookmarksMerger::apply, the merger builds local and
remote trees, produces a merged tree, applies the tree back to Places,
and stages outgoing items for upload in a temp table, all on the
storage thread. It then calls back in to JS, which inflates Sync
records for outgoing items, notifies Places observers, and cleans up.

Since Dogear has a more robust merging algorithm that attempts to fix
up invalid trees, test_bookmark_kinds.js and
test_bookmark_corruption.js intentionally fail. This is fixed in the
next commit, which changes the merger to handle invalid structure.

Depends on D20075

Depends on D20077

Depends on: 1528589
Blocks: 1530145
Depends on: 1530506
Depends on: 1530438
Attachment #9044443 - Attachment description: Bug 1482608 - Add owning thread pointer holders for Rust code. r?nika!,myk → Bug 1482608 - Add owning thread pointer holders for Rust code. r=nika
Attachment #9044444 - Attachment description: Bug 1482608 - Convert null pointers passed to `xpcom_method`s into `Option`s. r?myk! → Bug 1482608 - Convert null pointers passed to `xpcom_method`s into `Option`s. r=nika
Depends on: 1530467

There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:lina, could you have a look please?

Flags: needinfo?(lina)
Flags: needinfo?(lina)
Pushed by kcambridge@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/cfd44c936a9b Add basic Rust bindings for mozStorage. r=nika,asuth,mak https://hg.mozilla.org/integration/autoland/rev/e2042d55b76e Add owning thread pointer holders for Rust code. r=nika,myk https://hg.mozilla.org/integration/autoland/rev/6e5efb9dbc99 Convert null pointers passed to `xpcom_method`s into `Option`s. r=myk,nika https://hg.mozilla.org/integration/autoland/rev/4782957f96a4 Port the synced bookmarks merger to Rust. r=nika,mak,markh,tcsc https://hg.mozilla.org/integration/autoland/rev/e7282f4449c8 Fix up inconsistent bookmarks at sync time. r=markh,tcsc https://hg.mozilla.org/integration/autoland/rev/4a692c812a3f Remove the JS bookmark merger. r=markh
Depends on: 1539104
Depends on: 1539691
Depends on: 1540894
No longer depends on: 1539104
Regressions: 1539104
Regressions: 1542528
No longer depends on: 1539691
Regressions: 1539691

Saw this in "These Weeks in Firefox: Issue 56" - https://blog.nightly.mozilla.org/2019/04/05/these-weeks-in-firefox-issue-56/ and looked for the services.sync.bookmarks.buffer.enabled pref in about:config.

Don't see it in build id 20190412093400 - do I need to add a new boolean pref for it?

Aaaah, sorry, thanks for catching that, and for testing! I just updated the post; the correct pref name is services.sync.engine.bookmarks.buffer.

Depends on: 1546035
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: