audit Places API usage in bookmarks/history sync engines

RESOLVED FIXED in 1.3b1

Status

()

RESOLVED FIXED
9 years ago
6 months ago

People

(Reporter: mconnor, Assigned: Mardak)

Tracking

(Blocks: 1 bug)

unspecified
1.3b1
x86
macOS
Points:
---
Dependency tree / graph
Bug Flags:
blocking-weave1.3 +

Firefox Tracking Flags

(Not tracked)

Details

Ok, will nag you in case i need hints on Weave code.
Status: NEW → ASSIGNED
http://mxr.mozilla.org/labs-central/source/weave/source/modules/engines/bookmarks.js

let kSpecialIds = {};
[["menu", "bookmarksMenuFolder"],
["places", "placesRoot"],
["tags", "tagsFolder"],
["toolbar", "toolbarFolder"],
["unfiled", "unfiledBookmarksFolder"],

so, unless you need it for some special think, i'm unsure what you need placesRoot for...
You must pay attention to avoid walking it, since it also contains roots we don't want to save and restore, like the Library left pane folder
also in observers you should probably skip changes to any root that is not wanted (the left pane folder for example)... this is not easy, though, till we implement getting all ancestors of them :(

  function GUIDForId(placeId) {
    for (let [guid, id] in Iterator(kSpecialIds))
      if (placeId == id)
        return guid;
    return Svc.Bookmark.getItemGUID(placeId);
  }

never use placeid name unless it's a place id, bookmarks have an itemId, an what is returned by addVisit is a visitId.
place id is never exposed by the API.


  _sync: Utils.batchSync("Bookmark", SyncEngine),

  _syncStartup: function _syncStart() {
    SyncEngine.prototype._syncStartup.call(this);

    // Lazily create a mapping of folder titles and separator positions to GUID
    this.__defineGetter__("_lazyMap", function() {

i'm not completely sure about the creation of this lazyMap object, looks like it does a bunch of database reads to create unique keys to bookmarks
and is only used to search for duplicates, that are even allowed in bookmarks.
From this looks like i cannot have 2 bookmarks to the same uri and titles in a folder, the API allows that.
if it's using uri and title because id could change, well separators are using position that can change the same way.
Acting on this to make it cheaper would be good, we are evaluating the possibility to have a result able to return a single node from bookmarks, it would query just once and get
all needed information (title, uri, position, parent, ...), if this can't be rewritten to do less querying it should depend on that.
i filed bug 555433 about this.

  applyIncoming: function BStore_applyIncoming(record) {

    // Preprocess the record before doing the normal apply
    switch (record.type) {
      case "query": {
        // Convert the query uri if necessary
        if (record.bmkUri == null || record.folderName == null)
          break;

        // Tag something so that the tag exists
        let tag = record.folderName;
        let dummyURI = Utils.makeURI("about:weave#BStore_preprocess");
        this._ts.tagURI(dummyURI, [tag]);

        // Look for the id of the tag (that might have just been added)
        let tags = this._getNode(this._bms.tagsFolder);

when it needs to access root ids, the code should always use PlacesUtils shortcuts, this way can avoid crossing xpcom.
PlacesUtils.tagsFolderId should be available
That getNode method is a getFolderNode, btw, again we could make results API generic so that result can query a single node regardless the type.

        if (!(tags instanceof Ci.nsINavHistoryQueryResultNode))
          break;

        tags.containerOpen = true;
        for (let i = 0; i < tags.childCount; i++) {
          let child = tags.getChild(i);
          // Found the tag, so fix up the query to use the right id
          if (child.title == tag) {
            this._log.debug("query folder: " + tag + " = " + child.itemId);
            record.bmkUri = record.bmkUri.replace(/([:&]folder=)\d+/, "$1" +
              child.itemId);
            break;
          }
        }

ARGH! Close containers! this is the root container of the result, not closing it means having a useless result auto updating in background for minutes, till it's collected.

  /**
   * Move an item and all of its followers to a new position until reaching an
   * item that shouldn't be moved
   */
  _moveItemChain: function BStore__moveItemChain(itemId, insertPos, stopId) {
    let parentId = Svc.Bookmark.getFolderIdForItem(itemId);

if i read this correctly callers of this method already knew the parentId, would be nice to pass it in optionally and query for it only if it's not passed in
Thus we can save 1 access to the db most of the times.


  remove: function BStore_remove(record) {

    switch (type) {
    case this._bms.TYPE_BOOKMARK:
      this._log.debug("  -> removing bookmark " + record.id);
      this._ts.untagURI(this._bms.getBookmarkURI(itemId), null);
      this._bms.removeItem(itemId);

actually removeItem will also untagURI, su unless there is an internal reason to do so, looks like a useless call

      break;
    case this._bms.TYPE_FOLDER:
      this._log.debug("  -> removing folder " + record.id);
      Svc.Bookmark.removeItem(itemId);
      break;
    case this._bms.TYPE_SEPARATOR:
      this._log.debug("  -> removing separator " + record.id);
      this._bms.removeItem(itemId);
      break;
    default:
      this._log.error("remove: Unknown item type: " + type);
      break;
    }
  
    
To me, looks like this method should just call removeItem for all the record types.

  // Create a record starting from the weave id (places guid)
  createRecord: function createRecord(guid) {
    let placeId = idForGUID(guid);
    
ditto: this not a place id.

  get _frecencyStm() {
    this._log.trace("Creating SQL statement: _frecencyStm");
    let stm = Svc.History.DBConnection.createStatement(
      "SELECT frecency " +
      "FROM moz_places_view " +
      "WHERE url = :url");

views don't have indices, so this is most likely slow, never select from a view unless you need all records.
use UNION ALL and LIMIT 1 instead

  _getParentGUIDForId: function BStore__getParentGUIDForId(itemId) {

      this._log.debug("Found orphan bookmark, reparenting to unfiled");
      parentid = this._bms.unfiledBookmarksFolder;

ditto, use PlacesUtils shortcuts

  _getChildren: function BStore_getChildren(guid, items) {
    let node = guid; // the recursion case
    if (typeof(node) == "string") // callers will give us the guid as the first arg
      node = this._getNode(idForGUID(guid));

    if (node.type == node.RESULT_TYPE_FOLDER &&
        !this._ls.isLivemark(node.itemId)) {
      node.QueryInterface(Ci.nsINavHistoryQueryResultNode);
not sure why you QI to a queryresultnode, rather than to a generic containerresultnode
it is a folder moreover

      node.containerOpen = true;

      // Remember all the children GUIDs and recursively get more
      for (var i = 0; i < node.childCount; i++) {
        let child = node.getChild(i);
        items[GUIDForId(child.itemId)] = true;
        this._getChildren(child, items);
      }
      
doh! close container please.

BookmarksTracker.prototype = {

  /**
   * Add a bookmark (places) id to be uploaded and bump up the sync score
   *
   * @param itemId

hey this is correct, unfortunately the "(places)" above is non sense :)

  /**
   * Add the successor id for the item that follows the given item
   */
  _addSuccessor: function BMT__addSuccessor(itemId) {
    let parentId = Svc.Bookmark.getFolderIdForItem(itemId);
    let itemPos = Svc.Bookmark.getItemIndex(itemId);

the observer already had this information, why asking for it again instead of passing it along?
pass in all the info we have, make params optional and gather them if they are undefined

  _ignore: function BMT__ignore(itemId, folder) {
    // Ignore unconditionally if the engine tells us to
    if (this.ignoreAll)
      return true;

    // Ensure that the mobile bookmarks query is correct in the UI
    this._ensureMobileQuery();

    // Make sure to remove items that have the exclude annotation
    if (Svc.Annos.itemHasAnnotation(itemId, "places/excludeFromBackup")) {
      
put anno names in const above all the code, for easy check and changes


  onItemChanged: function BMT_onItemChanged(itemId, property, isAnno, value) {
    if (this._ignore(itemId))
      return;

    // ignore annotations except for the ones that we sync
    let annos = ["bookmarkProperties/description",
      "bookmarkProperties/loadInSidebar", "bookmarks/staticTitle",
      "livemark/feedURI", "livemark/siteURI", "microsummary/generatorURI"];
    if (isAnno && annos.indexOf(property) == -1)
      return;

    // Ignore favicon changes to avoid unnecessary churn
    if (property == "favicon")
      return;

would not be better to make these simple checks before this._ignore that to me looks like much more expensive?

  onItemMoved: function BMT_onItemMoved(itemId, oldParent, oldIndex, newParent, newIndex) {
    if (this._ignore(itemId))
      return;

    this._log.trace("onItemMoved: " + itemId);
    this._addId(itemId);
    this._addSuccessor(itemId);

    // Get the thing that's now at the old place
    let oldSucc = Svc.Bookmark.getIdForItemAt(oldParent, oldIndex);
    if (oldSucc != -1)
      this._addId(oldSucc);

    // Remove any position annotations now that the user moved the item
    Svc.Annos.removeItemAnnotation(itemId, PARENT_ANNO);
    Svc.Annos.removeItemAnnotation(itemId, PREDECESSOR_ANNO);

better doing this before? btw, how does this handle if an item moves in the same folder? should it remove position annotations for all items involved?
http://mxr.mozilla.org/labs-central/source/weave/source/modules/engines/history.js

  get _visitStm() {
    this._log.trace("Creating SQL statement: _visitStm");
    let stm = this._db.createStatement(
      "SELECT visit_type type, visit_date date " +
      "FROM moz_historyvisits_view " +
      "WHERE place_id = (" +
        "SELECT id " +
        "FROM moz_places_view " +
        "WHERE url = :url) " +
      "ORDER BY date DESC LIMIT 10");

views don't have indices, thus the internal query is slow
use UNION ALL and LIMIT 1 instead
Also, this is selecting only type and date, while i don't care much about session (that atm is unused in UI) i care much about from_visit, since it's used to filter redirects.
from what i see this system does not track visit ids, that can be fine, but from_visit relies on those ids...
if you really want to skip from_visit, you should ignore all visits that will redirect to something other.
you could maybe replace your internal visit id reference to an "index" on uri and visit_date. we should ensure that visit_date is unique for each visit (it's impossible to visit 2 uris at the same microsecond, even if this somehow depends on OS timers resolution).

  get _urlStm() {
    this._log.trace("Creating SQL statement: _urlStm");
    let stm = this._db.createStatement(
      "SELECT url, title, frecency " +
      "FROM moz_places_view " +
      "WHERE id = (" +
        "SELECT place_id " +
        "FROM moz_annos " +
        "WHERE content = :guid AND anno_attribute_id = (" +
          "SELECT id " +
          "FROM moz_anno_attributes " +
          "WHERE name = '" + GUID_ANNO + "'))");

ditto, this will do a linear search per no indices

  getAllIDs: function HistStore_getAllIDs() {

    let root = this._hsvc.executeQuery(query, options).root;
    root.QueryInterface(Ci.nsINavHistoryQueryResultNode);
    root.containerOpen = true;

    let items = {};
    for (let i = 0; i < root.childCount; i++) {
      let item = root.getChild(i);
      let guid = GUIDForUri(item.uri, true);
      items[guid] = item.uri;
    }
    
Close containers!
so this saves visits for the last 1000 visited uris, right?
would probably be better to change this to async storage and increase the limit a bit, my history can hold 130 000 unique pages. depending on server capabilities obviously...

 
  update: function HistStore_update(record) {

    // Add visits if there's no local visit with the same date
    for each (let {date, type} in record.visits)
      if (curvisits.every(function(cur) cur.date != date))
        Svc.History.addVisit(uri, date, null, type, type == 5 || type == 6, 0);

hm, yep this loses the referrer information
OT: fwiw, i think i've never seen any Mozilla coding style allowing loops without braces...

    this._hsvc.setPageTitle(uri, record.title);
  },

I hope all of these changes use runInBatchMode, since it's a bunch of changes... the same is valid for bookmarks, when doing multiple changes (more than a couple) they should be done in a batch


  wipe: function HistStore_wipe() {
    this._hsvc.removeAllPages();
  }

note that removeAllPages atm is async, pages are removed immediately, but any other stuff will disappear later. there is a "places-expiration-finished" observer topic when done, if needed.


  onBeforeDeleteURI: function onBeforeDeleteURI(uri) {
    if (this.ignoreAll)
      return;
    this._log.trace("onBeforeDeleteURI: " + uri.spec);
    if (this.addChangedID(GUIDForUri(uri, true)))
      this._upScore();
  },
  onDeleteURI: ...

I have sadly to note that during batches we don't always fire "delete" notifications, there is bug 530782 about that :(
there is nothing to comment about the record file, it's not really Places related.
(Assignee)

Comment 5

9 years ago
(In reply to comment #2)
> http://mxr.mozilla.org/labs-central/source/weave/source/modules/engines/bookmarks.js
Thanks for the detailed comments!

> i'm unsure what you need placesRoot for...
Pretty sure we only really use it to make sure we don't try processing it if we happen to get a notification for this id.

> never use placeid name unless it's a place id
Yeah, I believe most instances were named placeId just so that it's easier to differentiate between the bookmark itemId and weave guid.

> i'm not completely sure about the creation of this lazyMap object, looks like
> it does a bunch of database reads to create unique keys to bookmarks
> and is only used to search for duplicates, that are even allowed in bookmarks.
Yes, the lazyMap looks for duplicates on new items, but this doesn't mean the engine doesn't allow duplicates. If one machine already has duplicates, each item will have its own GUID. But yes according to the findDupe logic, duplicate bookmarks in the same folder will be pruned. This is mostly to get rid of the "new profile" bookmarks.

> when it needs to access root ids, the code should always use PlacesUtils
> shortcuts, this way can avoid crossing xpcom.
Weave also has its own local cache of ids, so some instances are just old. You might have noticed some things use _bookmark and others use Svc.Bookmark. ;)

> ARGH! Close containers! this is the root container of the result, not closing
> it means having a useless result auto updating in background for minutes, till
> it's collected.
Good to know!

> if i read this correctly callers of this method already knew the parentId,
> would be nice to pass it in optionally and query for it only if it's not 
Seems like a reasonable optimization.

> actually removeItem will also untagURI, su unless there is an internal reason
> to do so, looks like a useless call
Okay. Seems like it was added in bug 486481#c23

> To me, looks like this method should just call removeItem for all the record
> types.
Could do that now that we don't have removeFolder. Probably will still need to check if it's a valid type before doing anything though.

> views don't have indices, so this is most likely slow, never select from a view
> unless you need all records.
> use UNION ALL and LIMIT 1 instead
The individual tables have the right indices, so that's why it'll be faster? I suppose ideally we shouldn't query directly anyway.

> not sure why you QI to a queryresultnode, rather than to a generic
> containerresultnode
Dunno.. This is some old code here from 2008.

> would not be better to make these simple checks before this._ignore that to me
> looks like much more expensive?
Eh.. eh.. maybe. ignoreAll is a fast boolean check, but if not it'll do the expensive annotation lookup stuff. I suppose in the common case ignoreAll isn't true, so doing the string comparisons for properties should be faster, yeah.

> how does this handle if an item moves in the
> same folder? should it remove position annotations for all items involved?
The removal annotation is if the user explicitly moves it somewhere. The annotations are normally removed after the bookmark is correctly placed (when the predecessor/parent are synced).
Thanks a lot for the super detailed audit. Way more intense than the others, but I'm not complaining!

Since the audit is done, I'm closing this. We'll still need to file bugs to followup with all the points.
Status: ASSIGNED → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
(In reply to comment #5)
> > views don't have indices, so this is most likely slow, never select from a view
> > unless you need all records.
> > use UNION ALL and LIMIT 1 instead
> The individual tables have the right indices, so that's why it'll be faster? I
> suppose ideally we shouldn't query directly anyway.

yes, individual tables have indices, but views don't, sqlite just does not yet support them.
(Reporter)

Comment 8

9 years ago
Please don't close this until the bugs are filed.
Assignee: mak77 → edilee
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Reporter)

Updated

9 years ago
Flags: blocking-weave1.3+
(Assignee)

Updated

9 years ago
Blocks: 559154
(Assignee)

Updated

9 years ago
Blocks: 559155
(Assignee)

Updated

9 years ago
Blocks: 559156
(Assignee)

Updated

9 years ago
Blocks: 559158
(Assignee)

Updated

9 years ago
Blocks: 559159
(Assignee)

Updated

9 years ago
Blocks: 559160
(Assignee)

Updated

9 years ago
Blocks: 559161
(Assignee)

Updated

9 years ago
Blocks: 559163
(Assignee)

Updated

9 years ago
Blocks: 559165
(Assignee)

Updated

9 years ago
Blocks: 559166
(Assignee)

Updated

9 years ago
Blocks: 559167
(Assignee)

Updated

9 years ago
Blocks: 559175
(Assignee)

Updated

9 years ago
Status: REOPENED → RESOLVED
Last Resolved: 9 years ago9 years ago
Resolution: --- → FIXED
(Assignee)

Updated

9 years ago
Target Milestone: 1.3 → 1.3b1
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.