Closed Bug 1344282 Opened 7 years ago Closed 7 years ago

Add an insertTree API to Bookmarks.jsm to allow batch-inserting a set of bookmarks

Categories

(Toolkit :: Places, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox53 + verified
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: Gijs, Assigned: Gijs)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

Attachments

(2 files)

Part of my hypothesis for the slowness in importing bookmarks in bug 1332225 et al. is that we insert items 1 at a time. In order to do better, we need a batch-insertion API (which doesn't exist right now). This bug is about creating such an API.
Note to self: one puzzling thing here is that there's the following sequence of events for single inserts, and therefore now also for my WIP insertTree stuff:

1) we insert the item
2) we fetch the item id (from callsite: https://dxr.mozilla.org/mozilla-central/rev/77d5a39a4677ed8e32a7ed46561c962d807fa7b1/toolkit/components/places/Bookmarks.jsm#201) here:

https://dxr.mozilla.org/mozilla-central/rev/77d5a39a4677ed8e32a7ed46561c962d807fa7b1/toolkit/components/places/PlacesUtils.jsm#2457-2475

which calls updateCache.
3) then we notify here:

https://dxr.mozilla.org/mozilla-central/rev/77d5a39a4677ed8e32a7ed46561c962d807fa7b1/toolkit/components/places/Bookmarks.jsm#206-210

which calls updateCache a *second* time, for the same item, from here:

https://dxr.mozilla.org/mozilla-central/rev/77d5a39a4677ed8e32a7ed46561c962d807fa7b1/toolkit/components/places/PlacesUtils.jsm#2530-2534

If there's other onItemAdded callers that don't go through promiseItemId, I guess we need to keep this... but it's a bit odd.
Component: Bookmarks & History → Places
Product: Firefox → Toolkit
Blocks: 1344759
(In reply to :Gijs from comment #1)
> which calls updateCache a *second* time, for the same item, from here:

The reason is that it's a cheap cache update, vs maybe having to I/O roundtrip a few minutes later.
Ideally none of this would be needed, and clearly we have a project about it whose resources were pulled: bug 1071511.
Comment on attachment 8844996 [details]
Bug 1344282 - add an insertTree API to Bookmarks.jsm,

https://reviewboard.mozilla.org/r/118240/#review121524

didn't look at the test yet, but wanted to publish the review earlier.

::: toolkit/components/places/Bookmarks.jsm:229
(Diff revision 3)
>        delete item.source;
>        return Object.assign({}, item);
>      }.bind(this));
>    },
>  
> + insertTree: Task.async(function*(infos, parentGuid) {

Would it be problematic to make this take a real tree, and not an array?

{
  guid: // the parent Guid
  children: [
    // your current infos contents
  ]
  
I'm mostly worried about 2 things:
1. it's named insertTree, but it inserts an array
2. insert() doesn't take a parentGuid argument, it's instead a property of the input bookmark object

My fear is that it may become confusing to remember these 2 different behaviors, and one may have everytime to go back looking at the documentation.

On the other side, I admit having to create an incomplete "root" bookmark object with just the guid is not so clean as well.

Any/Other ideas?

::: toolkit/components/places/Bookmarks.jsm:229
(Diff revision 3)
>        delete item.source;
>        return Object.assign({}, item);
>      }.bind(this));
>    },
>  
> + insertTree: Task.async(function*(infos, parentGuid) {

Needs a javadoc, see the other methods.
This API atm is only documented here, better to make it clear. it will also be easier to import the javadoc to mdn.

::: toolkit/components/places/Bookmarks.jsm:234
(Diff revision 3)
> + insertTree: Task.async(function*(infos, parentGuid) {
> +    if (parentGuid == this.rootGuid) {
> +      throw new Error("Can't insert into the root.");
> +    }
> +
> +    let parent = yield fetchBookmark({ guid: parentGuid });

The APIs in Bookmarks.jsm are expected to throw synchronously for invalid input, because if one forgets to .catch() or yield and puts wrong input, it should still be able to easily notice the problem.

That means validation should happen before any yielding (included checking if the parent is valid).

::: toolkit/components/places/Bookmarks.jsm:248
(Diff revision 3)
> +
> +    if (!Array.isArray(infos) || !infos.length) {
> +      throw new Error("infos should be an array with more than 1 item.");
> +    }
> +
> +    // First get a supply of guids:

Please extend the comment, it's unclear why you need those and then immediately you start counting input, the 2 things look barely related.
The comment should state you need guids for insertions because... and thus need to know how many guids to generate.

Btw, with my suggestion on exposing a makeGuid API this double looping becomes pointless, just generate the gui in the main loop building the array.

::: toolkit/components/places/Bookmarks.jsm:272
(Diff revision 3)
> +    let urlsThatMightNeedPlaces = new Set();
> +    // Set at the start of the loop to ensure we use the same time for all
> +    // items.
> +    let fallbackLastAdded = new Date();
> +
> +    let source = null;

comment here about how this will be used?

::: toolkit/components/places/Bookmarks.jsm:278
(Diff revision 3)
> +
> +    const {TYPE_BOOKMARK, TYPE_FOLDER, SOURCES} = this;
> +
> +    // We'll need to generate parent ids after we finish validating all the
> +    // objects. To do this, assign temporary IDs based on this array, and
> +    // then assign the real parentGuid afterwards.

Maybe (likely) I'm missing something, when visiting the tree you should do a Depth-first search, thus you will visit a parent before its children, and you will add the parent to the array before its children.
Thus sounds like you may just generate a guid for the parent, and use it for its children, recursively?

Or the comment needs to be clearer.

::: toolkit/components/places/Bookmarks.jsm:280
(Diff revision 3)
> +
> +    // We'll need to generate parent ids after we finish validating all the
> +    // objects. To do this, assign temporary IDs based on this array, and
> +    // then assign the real parentGuid afterwards.
> +    function appendInsertionInfoForInfoArray(infos, indexToUse, parentGuid) {
> +      let lastAddedForParent = new Date(0);

comment please (off-hand looks like it's used to ensure children have a last-added >= than their parent?)

::: toolkit/components/places/Bookmarks.jsm:282
(Diff revision 3)
> +    // objects. To do this, assign temporary IDs based on this array, and
> +    // then assign the real parentGuid afterwards.
> +    function appendInsertionInfoForInfoArray(infos, indexToUse, parentGuid) {
> +      let lastAddedForParent = new Date(0);
> +      for (let info of infos) {
> +        // Add a guid if none is present:

nit: some comments end with ":", some with ".", I don't see a coherence.

::: toolkit/components/places/Bookmarks.jsm:290
(Diff revision 3)
> +        }
> +        // Set the correct parent guid:
> +        info.parentGuid = parentGuid;
> +        // Ensure to use the same date for dateAdded and lastModified, even if
> +        // dateAdded may be imposed by the caller.
> +        let time = (info && info.dateAdded) || fallbackLastAdded;

why? couldn't you just not pass a lastModified?
The validator should already set lastModified to dateAdded, so in the worst case you should only set dateAdded first.

::: toolkit/components/places/Bookmarks.jsm:294
(Diff revision 3)
> +        // dateAdded may be imposed by the caller.
> +        let time = (info && info.dateAdded) || fallbackLastAdded;
> +        // Enforce sources are the same for all bookmarks:
> +        let sourceValidator = {defaultValue: SOURCES.DEFAULT };
> +        if (source !== null) {
> +          sourceValidator = {defaultValue: source, validIf: b => b.source == source};

Alternatively, if we move to a tree-like input, the source could just be enforced by the main root, so:

{
  guid: your_parentGuid,
  source: your_source,
  children: []
}

Similar to parentGuid of single nodes, the global source would override any other, that would totally be ignored. If the global source is omitted, it will be DEFAULT.
Then just do info.source = main_source_value

::: toolkit/components/places/Bookmarks.jsm:310
(Diff revision 3)
> +                       , validIf: b => !b.lastModified ||
> +                                        b.dateAdded <= b.lastModified }
> +          , lastModified: { defaultValue: time,
> +                            validIf: b => (!b.dateAdded && b.lastModified >= time) ||
> +                                          (b.dateAdded && b.lastModified >= b.dateAdded) }
> +          , index: { forbidden: true }

it may be simpler for consumers, if we'd just remove/ignore the index property.
While I'm usually very happy to throw early and often, it's possible the bookmark objects entering this method come from other fetch methods, and they will have an index.
we should just make sure to tell in the javadoc which properties of each child will be ignored (parentGuid, index, source), since they will be extrapolated from the tree. I want these APIs to be safe to use, but not too annoying to combine.

In the next comment I'm going to suggest a "replaceWith:" annotation, instead of "forbidden".

::: toolkit/components/places/Bookmarks.jsm:323
(Diff revision 3)
> +        // entry in moz_places for it.
> +        if (insertInfo.type == Bookmarks.TYPE_BOOKMARK) {
> +          urlsThatMightNeedPlaces.add(insertInfo.url.href);
> +        }
> +        // Set an index and add this item to the list of items to insert.
> +        insertInfo.index = indexToUse++;

Probably you could combine this into the defaultValue: of validateBookmarkObject
Something like "index: { replaceWith: indexToUse++ }"?

::: toolkit/components/places/Bookmarks.jsm:326
(Diff revision 3)
> +        }
> +        // Set an index and add this item to the list of items to insert.
> +        insertInfo.index = indexToUse++;
> +        insertInfos.push(insertInfo);
> +        // Process any children.
> +        if (info.children) {

from the point where you generated insertInfo, you should use that, since it's validated and fixed (dates could have been fixed for example)

::: toolkit/components/places/Bookmarks.jsm:337
(Diff revision 3)
> +          if (childrenLastAdded > lastAddedForParent) {
> +            lastAddedForParent = childrenLastAdded;
> +          }
> +        }
> +        // Clear off children property as it's no longer needed.
> +        delete info.children;

You can do this on insertInfo, since it's a copy, but please don't modify the caller input, the calling point may want to reuse it and we'd create subtle bugs.

::: toolkit/components/places/Bookmarks.jsm:341
(Diff revision 3)
> +        // Clear off children property as it's no longer needed.
> +        delete info.children;
> +
> +        // Ensure we track what time to update the parent to.
> +        if (info.dateAdded > lastAddedForParent) {
> +          lastAddedForParent = info.dateAdded;

insertInfo

::: toolkit/components/places/Bookmarks.jsm:351
(Diff revision 3)
> +    let lastAddedForParent = appendInsertionInfoForInfoArray(infos, parent._childCount, parentGuid);
> +
> +    let urlObjectsToCreate = [];
> +    for (let u of urlsThatMightNeedPlaces) {
> +      urlObjectsToCreate.push(new URL(u));
> +    }

Why do you store the href in urlsThatMightNeedPlaces, and then here you wrap it back into URL and store them in urlObjectsToCreate? Couldn't you just store URL objects in urlsThatMightNeedPlaces?

Alternatively, since now you have an array of insertInfos, wouldn't it be simple to just use .filter.map to extract the urls in insertBookmarkTree without all this bookkeeping?

::: toolkit/components/places/Bookmarks.jsm:1222
(Diff revision 3)
> +      yield maybeInsertManyPlaces(db, urls);
> +
> +      let syncChangeDelta =
> +        PlacesSyncUtils.bookmarks.determineSyncChangeDelta(source);
> +      let syncStatus =
> +        PlacesSyncUtils.bookmarks.determineInitialSyncStatus(source);

Would be nice to get feedback from :kit about the sync delta part, just in case.

::: toolkit/components/places/Bookmarks.jsm:1244
(Diff revision 3)
> +      yield setAncestorsLastModified(db, parent.guid, lastAddedForParent,
> +                                     syncChangeDelta);
> +    });
> +
> +    // We don't wait for the frecency calculation.
> +    updateFrecency(db, urls).then(null, Cu.reportError);

sounds like .catch

::: toolkit/components/places/Bookmarks.jsm:1729
(Diff revision 3)
>   *        the array of URLs to update.
>   */
>  var updateFrecency = Task.async(function* (db, urls) {
> +  // FIXME avoid individual frecency updates and send ManyFrecencies instead.
> +  // Also note this should be using parameters, and then should use batches.
> +  //

I'm not a big fan of FIXME comments, either fix it, or file a bug and here comment as
// TODO (Bug 123456): do this and that

::: toolkit/components/places/Bookmarks.jsm:1739
(Diff revision 3)
>         CALCULATE_FRECENCY(id), url, guid, hidden, last_visit_date
>       ) WHERE url_hash IN ( ${urls.map(url => `hash("${url.href}")`).join(", ")} )
>      `);
>  
> +  // FIXME why do these 2 statements use different things to stringify the href, and which is faster?
> +  // FIXME can we reuse the 'in' part? Looks like it.

They were probably written at different times, I assume the manual " insertion is faster than stringify.
Again, please remove FIXME comments before next review

::: toolkit/components/places/Bookmarks.jsm:1955
(Diff revision 3)
> +let maybeInsertManyPlaces = Task.async(function* _maybeInsertManyPlaces(db, urls) {
> +  let guids = yield makeGuids(db, urls.length);
> +  yield db.executeCached(
> +    `INSERT OR IGNORE INTO moz_places (url, url_hash, rev_host, hidden, frecency, guid) VALUES
> +     (:url, hash(:url), :rev_host, 0, :frecency,
> +     IFNULL((SELECT guid FROM moz_places WHERE url_hash = hash(:url) AND url = :url), :maybeguid))`,

Sounds like this could just use GENERATE_GUID() in the query as the second IFNULL argument.

::: toolkit/components/places/Bookmarks.jsm:2056
(Diff revision 3)
>        start_index: startIndex,
>        item_type: Bookmarks.TYPE_SEPARATOR
>      });
>  }
> +
> +let makeGuids = Task.async(function* (db, numberOfGuids) {

Just expose a makeGuid API from nsINavHistory, you can just reuse GenerateGUID() from Helpers.cpp

I've no problem about that, it's backwards compatible for uplifts, and it's synchronous. Moreover you won't need anymore to chunk and batch, just call it in a loop.
Attachment #8844996 - Flags: review?(mak77)
Comment on attachment 8844996 [details]
Bug 1344282 - add an insertTree API to Bookmarks.jsm,

https://reviewboard.mozilla.org/r/118240/#review121614

::: toolkit/components/places/Bookmarks.jsm:234
(Diff revision 3)
> + insertTree: Task.async(function*(infos, parentGuid) {
> +    if (parentGuid == this.rootGuid) {
> +      throw new Error("Can't insert into the root.");
> +    }
> +
> +    let parent = yield fetchBookmark({ guid: parentGuid });

I don't think a sync way of checking a guid exists is available right now. `insert()` does a similar async check for whether the parent exists.

::: toolkit/components/places/Bookmarks.jsm:290
(Diff revision 3)
> +        }
> +        // Set the correct parent guid:
> +        info.parentGuid = parentGuid;
> +        // Ensure to use the same date for dateAdded and lastModified, even if
> +        // dateAdded may be imposed by the caller.
> +        let time = (info && info.dateAdded) || fallbackLastAdded;

I think I'm just echoing the logic in `insert()` that updates the parent of where you insert based on the `dateAdded`.

::: toolkit/components/places/Bookmarks.jsm:351
(Diff revision 3)
> +    let lastAddedForParent = appendInsertionInfoForInfoArray(infos, parent._childCount, parentGuid);
> +
> +    let urlObjectsToCreate = [];
> +    for (let u of urlsThatMightNeedPlaces) {
> +      urlObjectsToCreate.push(new URL(u));
> +    }

Because a Set does not know about URL identities, and just uses object identities. So after:

```js
let urls = new Set();
urls.add(new URL("http://google.com/"));
urls.add(new URL("http://google.com/"));
```

`urls.size` will be 2, not 1. The "point" is that we do de-duplicate the collection.

::: toolkit/components/places/Bookmarks.jsm:1244
(Diff revision 3)
> +      yield setAncestorsLastModified(db, parent.guid, lastAddedForParent,
> +                                     syncChangeDelta);
> +    });
> +
> +    // We don't wait for the frecency calculation.
> +    updateFrecency(db, urls).then(null, Cu.reportError);

Yes, though this too was copied from `insert()`...

::: toolkit/components/places/Bookmarks.jsm:1955
(Diff revision 3)
> +let maybeInsertManyPlaces = Task.async(function* _maybeInsertManyPlaces(db, urls) {
> +  let guids = yield makeGuids(db, urls.length);
> +  yield db.executeCached(
> +    `INSERT OR IGNORE INTO moz_places (url, url_hash, rev_host, hidden, frecency, guid) VALUES
> +     (:url, hash(:url), :rev_host, 0, :frecency,
> +     IFNULL((SELECT guid FROM moz_places WHERE url_hash = hash(:url) AND url = :url), :maybeguid))`,

It could, but AIUI that calls back into our C++ implementation from inside sqlite with custom functions... My assumption is that pre-filling the guids will be (significantly) faster.
Comment on attachment 8844996 [details]
Bug 1344282 - add an insertTree API to Bookmarks.jsm,

https://reviewboard.mozilla.org/r/118240/#review121524

> Maybe (likely) I'm missing something, when visiting the tree you should do a Depth-first search, thus you will visit a parent before its children, and you will add the parent to the array before its children.
> Thus sounds like you may just generate a guid for the parent, and use it for its children, recursively?
> 
> Or the comment needs to be clearer.

Yeah, the comment is just wrong, I forgot to fix it after a refactor.

> why? couldn't you just not pass a lastModified?
> The validator should already set lastModified to dateAdded, so in the worst case you should only set dateAdded first.

This comment is also copied from the insert() path... I'm happy to remove it both there and here if you prefer? I've left it for now.
(In reply to Marco Bonardo [::mak] from comment #6)
> ::: toolkit/components/places/Bookmarks.jsm:234
> (Diff revision 3)
> > + insertTree: Task.async(function*(infos, parentGuid) {
> > +    if (parentGuid == this.rootGuid) {
> > +      throw new Error("Can't insert into the root.");
> > +    }
> > +
> > +    let parent = yield fetchBookmark({ guid: parentGuid });
> 
> The APIs in Bookmarks.jsm are expected to throw synchronously for invalid
> input, because if one forgets to .catch() or yield and puts wrong input, it
> should still be able to easily notice the problem.
> 
> That means validation should happen before any yielding (included checking
> if the parent is valid).

This is slightly awkward because as noted, some things require the parent (and its child count, to determine the indices. I've worked around this in the upcoming patch, but up to you if you want to leave it like this.

TBH, I'm not convinced that just throwing for 'most' cases (and still reject for some) rather than rejecting all cases is really useful, but you're the module owner. :-)
Blocks: 1320534
(In reply to :Gijs from comment #9)
> This is slightly awkward because as noted, some things require the parent
> (and its child count, to determine the indices. I've worked around this in
> the upcoming patch, but up to you if you want to leave it like this.

It requires a parent guid value, but it doesn't require that guid to be valid.

> TBH, I'm not convinced that just throwing for 'most' cases (and still reject
> for some) rather than rejecting all cases is really useful, but you're the
> module owner. :-)

It helped finding bugs already. Note that now that we are moving most code to promises, the likely of forgetting to handle a promise is increasing exponentially. Forgetting to handle a reject in some cases where you are passing invalid input to an API may cause you to lose a long time trying to figure out what happens, when a direct exception could tell you immediately instead. This was a deliberate design decision for this new API, and so far I think it made sense.

(In reply to :Gijs from comment #7)
> It could, but AIUI that calls back into our C++ implementation from inside
> sqlite with custom functions... My assumption is that pre-filling the guids
> will be (significantly) faster.

It's unlikely, the cpp code is very optimized and it's basically just a function call. while generating guids on the JS side goes through XPCOM, and it's on the main-thread.
Comment on attachment 8846887 [details]
Bug 1344282 - prep: add a makeGuid method to nsINavHistory,

https://reviewboard.mozilla.org/r/119876/#review121948

::: toolkit/components/places/nsINavHistoryService.idl:1449
(Diff revision 1)
>     * Clear all TRANSITION_EMBED visits.
>     */
>    void clearEmbedVisits();
> +
> +  /**
> +   * Generate a guid.

Please specify that the value can be used for any Places purposes, including history and bookmarks.

Also specify that it will return null in case of errors.

::: toolkit/components/places/nsNavHistory.cpp:3729
(Diff revision 1)
>  }
>  
> +NS_IMETHODIMP
> +nsNavHistory::MakeGuid(nsACString& aGuid) {
> +  nsAutoCString guid;
> +  GenerateGUID(guid);

potentially this can fail, it's unlikely but it can happen.

Let's handle a failure through a MOZ_ASSERT (so we catch eventual failures in debug tests) and aGuid.setIsVoid(true).

Note: it's possible that changing the GenerateGUID signature from nsCString& to nsACString& you can avoid the temp autostring and directly pass aGuid down to it.
Attachment #8846887 - Flags: review?(mak77)
Comment on attachment 8844996 [details]
Bug 1344282 - add an insertTree API to Bookmarks.jsm,

https://reviewboard.mozilla.org/r/118240/#review121956

We're almost there!

::: toolkit/components/places/Bookmarks.jsm:234
(Diff revision 4)
> +
> +  /**
> +   * Inserts a bookmark-tree into the existing bookmarks tree.
> +   *
> +   * All specified bookmarks will be inserted - there's no merge/update
> +   * facilitation at this time.

This could be clarified, imo, as:
"All the specified folders and bookmarks will be inserted as new, even if duplicates. There's no merge support at this time."

::: toolkit/components/places/Bookmarks.jsm:248
(Diff revision 4)
> +   * }
> +   *
> +   * Children will be appended to any existing children of the parent
> +   * that is specified. The source specified on the root of the tree
> +   * will be used for all the items inserted. Any indices or custom guids
> +   * set on children will be ignored and overwritten.

I have some doubts regarding the guid, if it's a valid guid I don't see a reason to overwrite it. And doesn't looks like you do. Preserving passed-in guids may be critical for reusing this with import/export code.
I suppose you meant that parentGuid will be ignored and overwritten, not _any_ guid.

::: toolkit/components/places/Bookmarks.jsm:260
(Diff revision 4)
> +   * @rejects if it's not possible to create the requested bookmark.
> +   * @throws if the arguments are invalid.
> +   */
> +  insertTree(tree) {
> +    if (!tree) {
> +      throw new Error("Should be provided a tree.");

Should also check that tree is an object, and the comment should be changed to "a valid tree object"

::: toolkit/components/places/Bookmarks.jsm:301
(Diff revision 4)
> +    // Reuse the 'source' property for all the entries.
> +    let source = tree.source || SOURCES.DEFAULT;
> +
> +    // We'll validate the supplied tree depth-first, and assign guids on
> +    // a FIFO basis, and then reuse those guids for the children of any
> +    // folders that get inserted this way.

I honestly find the last part more confusing than the code. The code is doing what any sane person would expect, use a folder guid for its children, I don't think there's need for a comment trying to explain that.

::: toolkit/components/places/Bookmarks.jsm:345
(Diff revision 4)
> +        }
> +        insertInfos.push(insertInfo);
> +        // Process any children. We have to use info.children here rather than
> +        // insertInfo.children because validateBookmarkObject doesn't copy over
> +        // the children ref, as the default bookmark validators object doesn't
> +        // know about children.

I actually thought adding a children validator to the object passed to validateBookmarkObject was also copying it to the parsed object. But likely it doesn't because it's not part of BOOKMARK_VALIDATORS.
Maybe we should consider it a bug. Not a big deal if you don't want to lose time with it, the workaround will do for now.

::: toolkit/components/places/Bookmarks.jsm:364
(Diff revision 4)
> +          lastAddedForParent = insertInfo.dateAdded;
> +        }
> +      }
> +      return lastAddedForParent;
> +    }
> +    // Unfortunately, we want to validate synchronously, but we can't know the

I'd remove the "unfortunately", sounds like it's not done on purpose (Yes, I know you don't like this!).

::: toolkit/components/places/Bookmarks.jsm:367
(Diff revision 4)
> +      return lastAddedForParent;
> +    }
> +    // Unfortunately, we want to validate synchronously, but we can't know the
> +    // index at which we're inserting into the parent synchronously. As
> +    // a result, we have to update the index properties later for items
> +    // inserted into the root We just use 0 here, and increment with the child

missing a period?

::: toolkit/components/places/Bookmarks.jsm:373
(Diff revision 4)
> +    // count later.
> +    let lastAddedForParent = appendInsertionInfoForInfoArray(tree.children, 0, tree.guid);
> +
> +    let urlObjectsToCreate = [];
> +    for (let u of urlsThatMightNeedPlaces) {
> +      urlObjectsToCreate.push(new URL(u));

The only reason to bring a URL object around seems to be invoking updateFrecency.
But honestly, also in spite of the deduplication thing you said, we should make updateFrecency take an array of hrefs.
Then here you can just keep your Set of hrefs and we save some minor cpu time.
The changes to do that should be trivial.

::: toolkit/components/places/Bookmarks.jsm:391
(Diff revision 4)
> +      // Now update the indices of root items
> +      for (let insertInfo of insertInfos) {
> +        if (insertInfo.parentGuid == tree.guid) {
> +          insertInfo.index += parent._childCount;
> +        }
> +      }

You know what... I just figured out all the root indices handling we may do is unsafe.
Suppose that you are doing this giant insert, and something else in the meanwhile inserts a bookmark exactly in your "root" folder. You are duplicating an index now since you pre-generated them.
For roots we may need special handling so that the index we set is surely "good", that means for the root items we may have to pass NULL as index, and use an IFNULL(:passed_in_index, SUBQUERY_fetching_max_position) to generate a valid insertion index.

::: toolkit/components/places/Bookmarks.jsm:395
(Diff revision 4)
> +        }
> +      }
> +      yield insertBookmarkTree(insertInfos, source, parent, urlObjectsToCreate, lastAddedForParent);
> +      // We need the itemIds to notify, though once the switch to guids is
> +      // complete we may stop using them.
> +      let itemIdMap = yield PlacesUtils.promiseManyItemIds(insertInfos.map(info => info.guid));

I have an half idea on how we could improve this, removing any needed I/O. May be worth a follow-up if you measure these SELECT queries as being expensive for your batch insert.
We have AFTER_INSERT and AFTER_DELETE triggers in moz_bookmarks, they currently update a static int containing the last inserted id, but that's accessible only from cpp.
I was wondering about the possibility to extend that, keeping a TEMP TABLE associating guids to ids. Basically i'm thinking to replace the guids Map we have in PlacesUtils with a TEMP TABLE. The triggers could keep the TEMP TABLE updated, querying from it would not cause any I/O.
I'd not do this here, but it could be worth a try elsewhere.

::: toolkit/components/places/Bookmarks.jsm:1255
(Diff revision 4)
>    }));
>  }
>  
> +function insertBookmarkTree(items, source, parent, urls, lastAddedForParent) {
> +  return PlacesUtils.withConnectionWrapper("Bookmarks.jsm: insertBookmarkTree",
> +    Task.async(function*(db) {

I'm fine if you want to leave this on the previous line, even if we go over 80 chars. I don't care that much about going over that limit when it makes sense.

::: toolkit/components/places/Bookmarks.jsm:1266
(Diff revision 4)
> +        PlacesSyncUtils.bookmarks.determineSyncChangeDelta(source);
> +      let syncStatus =
> +        PlacesSyncUtils.bookmarks.determineInitialSyncStatus(source);
> +
> +      items = items.map(item => ({
> +        url: item.hasOwnProperty("url") ? item.url.href : "nonexistent",

why not just using null here, instead of a fake string?
Then in the query you could try using a select case 

(CASE WHEN :url ISNULL THEN NULL ELSE (SELECT ...) END)

This would also prevent a pointless index lookup.

::: toolkit/components/places/PlacesUtils.jsm:2495
(Diff revision 4)
> +      yield PlacesUtils.withConnectionWrapper("GuidHelper.getItemId",
> +                                              Task.async(function* (db) {
> +        while (uncachedGuids.length) {
> +          let chunk = uncachedGuids.splice(0, 100);
> +          let rows = yield db.executeCached(
> +            `SELECT b.id, b.guid from moz_bookmarks b WHERE 

trailing space

::: toolkit/components/places/tests/bookmarks/test_bookmarks_insertTree.js:2
(Diff revision 4)
> +add_task(function* invalid_input_rejects() {
> +  yield Assert.throws(() => PlacesUtils.bookmarks.insertTree(), /Should be provided a tree./);

test passing a string and null

::: toolkit/components/places/tests/bookmarks/test_bookmarks_insertTree.js:6
(Diff revision 4)
> +add_task(function* invalid_input_rejects() {
> +  yield Assert.throws(() => PlacesUtils.bookmarks.insertTree(), /Should be provided a tree./);
> +  yield Assert.throws(() => PlacesUtils.bookmarks.insertTree({guid: PlacesUtils.bookmarks.unfiledGuid, children: []}),
> +                /Should have a non-zero number of children to insert./);
> +  yield Assert.throws(() => PlacesUtils.bookmarks.insertTree({guid: PlacesUtils.bookmarks.unfiledGuid}),
> +                /Should have a non-zero number of children to insert./);

nit: please indent properly (applies to multiple lines)
Attachment #8844996 - Flags: review?(mak77)
Comment on attachment 8846887 [details]
Bug 1344282 - prep: add a makeGuid method to nsINavHistory,

https://reviewboard.mozilla.org/r/119876/#review122070
Attachment #8846887 - Flags: review?(mak77) → review+
Comment on attachment 8844996 [details]
Bug 1344282 - add an insertTree API to Bookmarks.jsm,

https://reviewboard.mozilla.org/r/118240/#review122084

::: toolkit/components/places/Bookmarks.jsm:280
(Diff revision 5)
> +    if (tree.hasOwnProperty("source") &&
> +        !Object.values(this.SOURCES).includes(tree.source)) {
> +      throw new Error("Can't use source value " + tree.source);
> +    }
> +
> +    if (!PlacesUtils.isValidGuid(tree.guid)) {

just for coherency, it may be better to check this before comparing guid with rootGuid and tagsGuid.

::: toolkit/components/places/Bookmarks.jsm:394
(Diff revision 5)
> +      }
> +
> +      // Now update the indices of root items
> +      for (let insertInfo of insertInfos) {
> +        if (insertInfo.parentGuid == tree.guid) {
> +          insertInfo.index += parent._childCount;

this is no more needed since now the index is null, right?

::: toolkit/components/places/Bookmarks.jsm:1773
(Diff revision 5)
>   * Updates frecency for a list of URLs.
>   *
>   * @param db
>   *        the Sqlite.jsm connection handle.
>   * @param urls
> - *        the array of URLs to update.
> + *        the array of URL hrefs (strings!) to update.

nope, it's still URL objects (could be that you edited this, and then later discovered the reversed host problem and undid the change)

::: toolkit/components/places/tests/bookmarks/test_bookmarks_insertTree.js:316
(Diff revision 5)
> +    }
> +    Assert.equal(bm.parentGuid, PlacesUtils.bookmarks.unfiledGuid);
> +  }
> +  Assert.equal(obsInvoked, bms.length);
> +  Assert.equal(obsInvoked, 6);
> +});

afaict, none of the tests is veryfing that adding multiple entries both in the root and in its children creates bookmarks with contiguous index values (especially for the root case)
I only see a couple checks for a single insert.
It would be nice to have a test for that.
Attachment #8844996 - Flags: review?(mak77) → review+
Depends on: 1347230
Comment on attachment 8844996 [details]
Bug 1344282 - add an insertTree API to Bookmarks.jsm,

https://reviewboard.mozilla.org/r/118240/#review121956

> I have an half idea on how we could improve this, removing any needed I/O. May be worth a follow-up if you measure these SELECT queries as being expensive for your batch insert.
> We have AFTER_INSERT and AFTER_DELETE triggers in moz_bookmarks, they currently update a static int containing the last inserted id, but that's accessible only from cpp.
> I was wondering about the possibility to extend that, keeping a TEMP TABLE associating guids to ids. Basically i'm thinking to replace the guids Map we have in PlacesUtils with a TEMP TABLE. The triggers could keep the TEMP TABLE updated, querying from it would not cause any I/O.
> I'd not do this here, but it could be worth a try elsewhere.

Let's file a followup for this.

> why not just using null here, instead of a fake string?
> Then in the query you could try using a select case 
> 
> (CASE WHEN :url ISNULL THEN NULL ELSE (SELECT ...) END)
> 
> This would also prevent a pointless index lookup.

Note that this was copied, so I should probably file a followup for this.
Comment on attachment 8844996 [details]
Bug 1344282 - add an insertTree API to Bookmarks.jsm,

https://reviewboard.mozilla.org/r/118240/#review121524

> Would be nice to get feedback from :kit about the sync delta part, just in case.

Checked with Kit on IRC, he said it looked OK. :-)
Comment on attachment 8844996 [details]
Bug 1344282 - add an insertTree API to Bookmarks.jsm,

https://reviewboard.mozilla.org/r/118240/#review121956

> I actually thought adding a children validator to the object passed to validateBookmarkObject was also copying it to the parsed object. But likely it doesn't because it's not part of BOOKMARK_VALIDATORS.
> Maybe we should consider it a bug. Not a big deal if you don't want to lose time with it, the workaround will do for now.

Do you want me to file a followup for this?
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/6d4a03c0a481
prep: add a makeGuid method to nsINavHistory, r=mak
https://hg.mozilla.org/integration/autoland/rev/1be875eb29ae
add an insertTree API to Bookmarks.jsm, r=mak
Depends on: 1347281
(In reply to :Gijs from comment #21)
> > I actually thought adding a children validator to the object passed to validateBookmarkObject was also copying it to the parsed object. But likely it doesn't because it's not part of BOOKMARK_VALIDATORS.
> > Maybe we should consider it a bug. Not a big deal if you don't want to lose time with it, the workaround will do for now.
> 
> Do you want me to file a followup for this?

it's not critical and I don't think we'll have resources to spend on it.
https://hg.mozilla.org/mozilla-central/rev/6d4a03c0a481
https://hg.mozilla.org/mozilla-central/rev/1be875eb29ae
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Blocks: 1347530
Comment on attachment 8844996 [details]
Bug 1344282 - add an insertTree API to Bookmarks.jsm,

Uplift request for both csets:

Approval Request Comment
[Feature/Bug causing the regression]: importing many bookmarks
[User impact if declined]: can't uplift bug 1344759, so imports continue to be very slow in some cases
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: no, but it's exhaustively tested via automated tests
[Needs manual test from QE? If yes, steps to reproduce]:  n/a
[List of other uplifts needed for the feature/fix]: n/a
[Is the change risky?]: no
[Why is the change risky/not risky?]: strictly additive change to the bookmarks.jsm API, shouldn't affect any existing code
[String changes made/needed]: nope
Attachment #8844996 - Flags: approval-mozilla-beta?
Attachment #8844996 - Flags: approval-mozilla-aurora?
Comment on attachment 8844996 [details]
Bug 1344282 - add an insertTree API to Bookmarks.jsm,

insertTree API is a new API and this is not baked enough in nightly/aurora. Considering that this new API can help improve the importing performance, we can take this in Aurora54 to bake more. However, given we are in late beta and this is a new introduced API and not baked enough, I prefer not to take this in Beta53 and let it ride the train on 54. Aurora54+ and Beta53-.
Attachment #8844996 - Flags: approval-mozilla-beta?
Attachment #8844996 - Flags: approval-mozilla-beta-
Attachment #8844996 - Flags: approval-mozilla-aurora?
Attachment #8844996 - Flags: approval-mozilla-aurora+
(In reply to Gerry Chang [:gchang] from comment #27)
> Comment on attachment 8844996 [details]
> Bug 1344282 - add an insertTree API to Bookmarks.jsm,
> 
> insertTree API is a new API and this is not baked enough in nightly/aurora.
> Considering that this new API can help improve the importing performance, we
> can take this in Aurora54 to bake more. However, given we are in late beta
> and this is a new introduced API and not baked enough, I prefer not to take
> this in Beta53 and let it ride the train on 54. Aurora54+ and Beta53-.

The gains here are significant based on preliminary telemetry checks. Before, the 95 percentile of imports from Chrome was at 100 seconds (more than 1.5 *minutes*). With this and bug 1344759, that's dropped to around the 5 second mark. That's a 95% improvement. In the world of performance improvements, that's crazy - we normally have to spend inordinate amounts of time for gains of 2 or 5%.

There's also pretty thorough automated test coverage of both this API *and* the consumers, so I don't think a week's worth of baking is "not baked enough".

Finally, I dispute that we're in late beta. The 'early beta' define was removed just yesterday. There's still 2.5 weeks until the rc goes to build, and 53 has been in beta for 2.5 weeks at this point. So we're at the halfway mark of beta builds, and before the halfway mark of the 53 beta cycle.

Please reconsider the decision for beta you've made here and in bug 1344759.
Flags: needinfo?(gchang)
Redirect ni to Liz as the primary owner of Beta53.
Flags: needinfo?(gchang) → needinfo?(lhenry)
This sounds like a great improvement for Chrome migrations. We can see from telemetry data that 60% of new users on beta try to import data.  Many of the users doing this from Chrome hang for over a minute. So this seems like a good priority to fix as quickly as we can.    

Since this is not a web visible API I'm less worried about it being a new API that would need more review. 

My main concern here is that this is something we'd want to do manual testing for, and we just did some migration testing for beta 6. We have 4 more builds before the RC which gives some time to find problems depending on QE's workload. 

Bogdan, Cornel, can you fit in migration testing to cover this and work with Gijs to make sure you are covering a good range of scenarios here?  The changes here wouldn't affect just Chrome but may also affect migrations from IE, Edge, and Safari.  (We also still need to land the change in the automigration wizard from bug 1343080 and test that).
Flags: needinfo?(lhenry)
Flags: needinfo?(cornel.ionce)
Flags: needinfo?(bogdan.maris)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #31)
> Bogdan, Cornel, can you fit in migration testing to cover this and work with
> Gijs to make sure you are covering a good range of scenarios here?  The
> changes here wouldn't affect just Chrome but may also affect migrations from
> IE, Edge, and Safari.  (We also still need to land the change in the
> automigration wizard from bug 1343080 and test that).

FWIW, I landed the backout of bug 1343080 on beta a few minutes ago. So automigration should be turned off for b7 and it should be possible to do testing there. I'm sorry this didn't happen in time for b6. :-(
Comment on attachment 8844996 [details]
Bug 1344282 - add an insertTree API to Bookmarks.jsm,

Let's try this, the gain here seems worth the risk and extra testing. Should be in beta 7.
Attachment #8844996 - Flags: approval-mozilla-beta- → approval-mozilla-beta+
Attachment #8846887 - Flags: approval-mozilla-beta+
Flags: qe-verify+
We reached out to Gijs on this matter and we came to an agreement upon what would be the right thing to test here. 
We will run those tests as soon as possible (depending on our workload) so I'm leaving the needinfo here so we come back with a report as soon as testing comes to an end.
We finished verifying bookmarks migration in different scenarios across different platforms (Windows 10 64bit, Windows 7 64bit and macOS 10.12.2) using Firefox 53 beta 8. One real issue was found during testing (bug 1353041) but Gijs was on point and it's already fixed. Marking as verified fixed on Fx 53.
Please see the following etherpad for more details and let me know if you have any questions: https://public.etherpad-mozilla.org/p/bookmarks-migration.
Flags: qe-verify+
Flags: needinfo?(cornel.ionce)
Flags: needinfo?(bogdan.maris)
Depends on: 1393904
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: