Closed Bug 655722 Opened 13 years ago Closed 8 years ago

Building bookmarks _guidMap is extremely inefficient

Categories

(Firefox :: Sync, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox48 --- fixed

People

(Reporter: mak, Assigned: tcsc)

References

Details

(Keywords: perf, Whiteboard: [sync:bookmarks])

Attachments

(2 files)

the lazyMap uses getAllIds, then goes through each id and reads a bunch of data with separate apis from each one. Each call causes a separate query to the database.
Since getAllIDs uses a historyResult to get ids, each node has already 90% of the information needed (more, like the guid, could be added where needed), and this means that using rather a getAllNodes or building the lazymap in getAllIds could reduce queries by 90%.
If the lazyMap is built on startup this would be a decent Ts win.
Summary: building bookmarks _lazyMap is extremely unefficient → building bookmarks _lazyMap is extremely inefficient
(In reply to comment #0)
> the lazyMap uses getAllIds, then goes through each id and reads a bunch of
> data with separate apis from each one. Each call causes a separate query to
> the database.
> Since getAllIDs uses a historyResult to get ids, each node has already 90%
> of the information needed (more, like the guid, could be added where
> needed), and this means that using rather a getAllNodes or building the
> lazymap in getAllIds could reduce queries by 90%.

Oh yes, lazyMap is a clusterfuck. I've been wanting to rewrite that sucker for a while, possibly just using SQL for now until we can get a proper async API. That way we'd just have 1 query for everything, it'd be async and we wouldn't be blocked on extending the bookmark query API to return GUIDs (which is a still good idea.)

> If the lazyMap is built on startup this would be a decent Ts win.

lazyMap is not built on startup. It's built at the beginning of each sync. (Sync does nothing, absolutely nothing, at start up.)
Whiteboard: [Ts]
when does the first Sync happen after startup?
(In reply to comment #1)
> Oh yes, lazyMap is a clusterfuck. I've been wanting to rewrite that sucker
> for a while, possibly just using SQL for now

Then getAllIds would be the hog. I think you can easily merge those 2 paths and just use the getAllIds result.
(In reply to comment #2)
> when does the first Sync happen after startup?

Five seconds after the user becomes idle, starting ten seconds after startup.
(In reply to comment #3)
> (In reply to comment #1)
> > Oh yes, lazyMap is a clusterfuck. I've been wanting to rewrite that sucker
> > for a while, possibly just using SQL for now
> 
> Then getAllIds would be the hog. I think you can easily merge those 2 paths
> and just use the getAllIds result.

Sure, lazyMap and getAllIDs can probably use the same underlying async query! Do note that apart from the one instance in lazyMap, getAllIDs is only called on your very first sync ever.
(In reply to comment #5)
> (In reply to comment #3)
> Sure, lazyMap and getAllIDs can probably use the same underlying async

My point is mostly that they don't run that query twice to get ids, and then to get other stuff. One loop should be enough for everything.

> query! Do note that apart from the one instance in lazyMap, getAllIDs is
> only called on your very first sync ever.

This doesn't satisfy my user with 8 thousands bookmarks :)
(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #3)
> > Sure, lazyMap and getAllIDs can probably use the same underlying async
> 
> My point is mostly that they don't run that query twice to get ids, and then
> to get other stuff. One loop should be enough for everything.

Sorry, yes, I didn't mean to contradict that point. If you access lazyMap and then call getAllIDs (or viceversa), you should not have to make additional queries.
I'd like to pretty much eliminate the need to do this, if we can.
Depends on: 814801
OS: Windows 7 → All
Hardware: x86 → All
Summary: building bookmarks _lazyMap is extremely inefficient → Building bookmarks _guidMap is extremely inefficient
Whiteboard: [sync:bookmarks]
via bug id=1008592 comment 16; if there's low-fruit here, I'm in!

(In reply to Marco Bonardo [::mak] from comment #0)
> the lazyMap uses getAllIds, then goes through each id and reads a bunch of
> data with separate apis from each one. Each call causes a separate query to
> the database.
> Since getAllIDs uses a historyResult to get ids, each node has already 90%
> of the information needed (more, like the guid, could be added where
> needed),

Yep, I understand the above - just not how to get to the 100%

> and this means that using rather a getAllNodes or building the
> lazymap in getAllIds could reduce queries by 90%.

But I don't understand this bit - I can't see a getAllNodes and a more efficient way to enumerate isn't obvious to me - can you please elaborate?
Flags: needinfo?(mak77)
Flags: firefox-backlog+
Priority: -- → P2
(In reply to Mark Hammond [:markh] from comment #9)
> Yep, I understand the above - just not how to get to the 100%

From when I filed this bug, we added guids to the nodes, so it's likely we are already there or very close.

> > and this means that using rather a getAllNodes or building the
> > lazymap in getAllIds could reduce queries by 90%.
> 
> But I don't understand this bit - I can't see a getAllNodes and a more
> efficient way to enumerate isn't obvious to me - can you please elaborate?

I was suggesting to create a getAllNodes and have both getAllIds and buildGuidMap use it.

The main point is that the call to getAllIds that happens at the beginning of _buildguidMaps could already build a 99% guidMap, and _buildGUIDMap could just have to go through it and add the tiny amount of missing information.
These calls in _buildGUIDMap would become pointless and could be removed:
* idForGUID
* getItemType
* getBookmarkURI
* getItemTitle
* getItemIndex
* getFolderIdForItem

These calls may still be needed:
* queryId; but it could be fetched only for bookmarks having a place: url, that is very few.
* getItemTitle(parent); but since we walk a hierarchy, we could have it for free!

now, if you multiply an average of 5 calls per 1000 bookmarks, it's 5000 calls (I/O operations) we could avoid, on each sync.
Flags: needinfo?(mak77)
(In reply to Marco Bonardo [::mak] from comment #10)
> I was suggesting to create a getAllNodes and have both getAllIds and
> buildGuidMap use it.

Mak, I was just talking with Thom about this and I realized I made some assumptions that might not actually be true! This new getAllNodes() function would be inside Places itself (ie, probably in PlacesUtils.bookmarks), and could pull a few tricks (eg, hitting the db directly) to efficiently build a tree, which Sync's bookmarks.js would then leverage. And in that case, it would actually be a promise-based function (ie, likely called promiseAllNodes()). Is that correct?
Flags: needinfo?(mak77)
(In reply to Mark Hammond [:markh] from comment #11)
> Mak, I was just talking with Thom about this and I realized I made some
> assumptions that might not actually be true! This new getAllNodes() function
> would be inside Places itself (ie, probably in PlacesUtils.bookmarks), and
> could pull a few tricks (eg, hitting the db directly) to efficiently build a
> tree, which Sync's bookmarks.js would then leverage.

No, AFAICT what I suggest in comment 10 can be done today in Sync, without any special internal db access and synchronously (at least for now).
the current getAllIds has already access to all the data needed to build the guidmap.
Flags: needinfo?(mak77)
fwiw, if we'd like to go async, we could also use promiseBookmarksTree... but as I said we could do even with the current approach that is using a nsNavHistoryResult, since from what I got it's risky to make Sync asynchronous.
(In reply to Marco Bonardo [::mak] from comment #13)
> since from what I got it's risky to make Sync
> asynchronous.

FTR, making Sync fully async is risky, but still a goal. For this work I think we should be using async functions, but Sync will just "spin" waiting for the promise to resolve. That's a relatively low-risk way of using async places functions without needing to make *everything* async. We already do this for a number of FxA functions related to authentication.
Assignee: nobody → tchiovoloni
Status: NEW → ASSIGNED
Comment on attachment 8738330 [details]
MozReview Request: Bug 655722 - Fix small issues in the PlacesUtils.promiseBookmarksTree documentation string r?mak

https://reviewboard.mozilla.org/r/44457/#review41717
Attachment #8738330 - Flags: review?(mak77) → review+
Comment on attachment 8738320 [details]
MozReview Request: Bug 655722 - Rewrite _buildGUIDMap in the sync bookmark engine to use PlacesUtils.promiseBookmarksTree r?mak

https://reviewboard.mozilla.org/r/44451/#review41719

::: services/sync/modules/engines/bookmarks.js:278
(Diff revision 1)
>    _buildGUIDMap: function _buildGUIDMap() {
>      let guidMap = {};
> -    for (let guid in this._store.getAllIDs()) {
> -      // Figure out with which key to store the mapping.
> -      let key;
> -      let id = this._store.idForGUID(guid);
> +    let tree = Async.promiseSpinningly(PlacesUtils.promiseBookmarksTree("", {
> +      includeItemIds: true
> +    }));
> +    function* walkBookmarksTree(tree, parent=null) {

There is one problem here.

GetAllIds returns only descendants of menu,toolbar,unfiled.
On the other side promiseBookmarksTree returns all descendants of the main root.
That is not just menu,toolbar,unfiled, indeed it also includes tags, the Library left folder, mobile bookmarks and eventually add-on created roots.
We need somehow to filter our subtrees that don't have menu,toolbar,unfiled as ancestor.
Attachment #8738320 - Flags: review?(mak77)
Comment on attachment 8738320 [details]
MozReview Request: Bug 655722 - Rewrite _buildGUIDMap in the sync bookmark engine to use PlacesUtils.promiseBookmarksTree r?mak

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44451/diff/1-2/
Attachment #8738320 - Flags: review?(mak77)
Comment on attachment 8738330 [details]
MozReview Request: Bug 655722 - Fix small issues in the PlacesUtils.promiseBookmarksTree documentation string r?mak

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44457/diff/1-2/
Comment on attachment 8738320 [details]
MozReview Request: Bug 655722 - Rewrite _buildGUIDMap in the sync bookmark engine to use PlacesUtils.promiseBookmarksTree r?mak

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44451/diff/2-3/
Comment on attachment 8738330 [details]
MozReview Request: Bug 655722 - Fix small issues in the PlacesUtils.promiseBookmarksTree documentation string r?mak

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44457/diff/2-3/
Driveby, haven't looked at the patches: don't forget about the mobile root. Make sure you test with an Android device. Do not land this change without having done so.
I just tested syncing between fennec and desktop several times, and there were no issues.
Comment on attachment 8738320 [details]
MozReview Request: Bug 655722 - Rewrite _buildGUIDMap in the sync bookmark engine to use PlacesUtils.promiseBookmarksTree r?mak

https://reviewboard.mozilla.org/r/44451/#review42979

To me it looks good. It would be nice if a Sync peer could also have a look at it, since I may be missing some internal detail

::: services/sync/modules/engines/bookmarks.js:313
(Diff revision 3)
> +      guid = kSpecialIds.specialGUIDForId(id) || guid;
> +      let key;
> +      switch (placeType) {
> +        case PlacesUtils.TYPE_X_MOZ_PLACE:
> +          // Bookmark
> +          let query = node.annos && node.annos.find(({name}) =>

just for a minor gain you may check if node.uri startsWith "place:" before even spending time on the find() call. Minor thing btw.
Attachment #8738320 - Flags: review?(mak77) → review+
Comment on attachment 8738320 [details]
MozReview Request: Bug 655722 - Rewrite _buildGUIDMap in the sync bookmark engine to use PlacesUtils.promiseBookmarksTree r?mak

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44451/diff/3-4/
Comment on attachment 8738330 [details]
MozReview Request: Bug 655722 - Fix small issues in the PlacesUtils.promiseBookmarksTree documentation string r?mak

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44457/diff/3-4/
Attachment #8738320 - Flags: review?(markh)
Comment on attachment 8738320 [details]
MozReview Request: Bug 655722 - Rewrite _buildGUIDMap in the sync bookmark engine to use PlacesUtils.promiseBookmarksTree r?mak

https://reviewboard.mozilla.org/r/44451/#review43217

Awesome - it's nice to see a generator that's not yielding promises for a change :) The code is very elegant.

FWIW, I tested this locally against both my dev and real profiles, by instrumenting the code to write the result of buildGUIDMap to JSON and the results are identical - which is nice :)

(I guess there's also a risk that in pathological cases we may die due to hitting a max recursion limit, but that should be high enough that I'd expect we should treat that more as bookmark corruption than as a bug here - corruption is the only way I could see us hitting this in practice).

::: services/common/async.js:216
(Diff revision 4)
> +  promiseSpinningly(promise) {
> +    let cb = Async.makeSpinningCallback();
> +    promise.then(result =>  {
> +      cb(null, result);
> +    }, err => {
> +      cb(err);

I know I pasted you this code, but I now recall some obscure cases where a promise was rejected with undefined (ie, a simple "reject()" with no args) which will end up with this treated as success - so I think we should change this line to |cb(err || "promise rejected without an explicit error");| or similar - it's probably something this change could never hit, but future callers of this may.

::: services/sync/modules/engines/bookmarks.js:295
(Diff revision 4)
> +    }
> +
> +    function* walkBookmarksRoots(tree, rootGUIDs) {
> +      for (let guid of rootGUIDs) {
> +        let id = kSpecialIds.specialIdForGUID(guid, false);
> +        let bookmarkRoot = tree.children.find(child => child.id === id);

I wonder if it is worth checking |id| here (ie, something like |let bookmarkRoot = id !== null ? tree.children.find(child => child.id === id) : null;|

just to make it clear that id is expected to be null in the special case of the mobile root? I understand that the end result should be the same (assuming there's no child with an ID of null), but it seems to make the intent a little clearer.

I'll leave this as your call though as I can't think of a case where how it is written would actually fail and it does make that expression slightly uglier.
Attachment #8738320 - Flags: review?(markh) → review+
Comment on attachment 8738320 [details]
MozReview Request: Bug 655722 - Rewrite _buildGUIDMap in the sync bookmark engine to use PlacesUtils.promiseBookmarksTree r?mak

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44451/diff/4-5/
Comment on attachment 8738330 [details]
MozReview Request: Bug 655722 - Fix small issues in the PlacesUtils.promiseBookmarksTree documentation string r?mak

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/44457/diff/4-5/
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/bdc6a9d57ec4
https://hg.mozilla.org/mozilla-central/rev/63f11876b0b8
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
See Also: → 1323333
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.