Optimize Bookmarks.jsm's updateFrecency method to be more efficient when passed many URLs

RESOLVED FIXED in Firefox 55

Status

()

P3
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: Gijs, Assigned: Gijs)

Tracking

(Blocks: 1 bug)

unspecified
mozilla55
Points:
---

Firefox Tracking Flags

(firefox55 fixed)

Details

Attachments

(1 attachment)

(Assignee)

Description

2 years ago
The code right now is:

  // We just use the hashes, since updating a few additional urls won't hurt.
  yield db.execute(
    `UPDATE moz_places
     SET frecency = NOTIFY_FRECENCY(
       CALCULATE_FRECENCY(id), url, guid, hidden, last_visit_date
     ) WHERE url_hash IN ( ${urls.map(url => `hash("${url.href}")`).join(", ")} )
    `);

  yield db.execute(
    `UPDATE moz_places
     SET hidden = 0
     WHERE url_hash IN ( ${urls.map(url => `hash(${JSON.stringify(url.href)})`).join(", ")} )
       AND frecency <> 0
    `);


There's a few things to note here:
1) when updating multiple URLs, sending a single manyfrecencieschanged notification rather than many individual onfrecencychanged notifications is probably more efficient. This would mean not using the NOTIFY_FRECENCY sql function.
2) the "url_hash IN (...)" subquery can be reused between these 2 queries. Maybe a reused temp table would make this even faster (but I don't know SQL very well, so I don't know if there's a way of then writing to those rows in the original table)
3) the two SQL queries use different ways of including the URL - hardcoding the double quotes is probably faster than using JSON.stringify.
4) I expect the doubly-nested (`${foo + `something ${bar} else`}`) template string here isn't making things better.

Finally, I suspect in many cases the items for which we run this are bookmarks (this is Bookmarks.jsm, after all), so the frecency of a given place is guaranteed to not be null when it's inserted (or shortly thereafter, when the actual bookmarks are inserted). It feels like we should be able to do better than running extra queries afterwards to set the hidden column to 0. I also wonder if that query would be faster if it also checked for hidden != 0 (before the presumably more expensive IN () query). I don't know how typical my places DB is, but it has ~50,000 places, of which only ~700 have hidden=1.
(In reply to :Gijs from comment #0)
> Finally, I suspect in many cases the items for which we run this are
> bookmarks (this is Bookmarks.jsm, after all), so the frecency of a given
> place is guaranteed to not be null when it's inserted (or shortly
> thereafter, when the actual bookmarks are inserted). It feels like we should
> be able to do better than running extra queries afterwards to set the hidden
> column to 0.

Good point, if it's a bookmark it's guaranteed to have hidden = 0, UNLESS it's a query (place: schema), so we could probably set hidden and frecency in the same query, something like hidden = (url_hash BETWEEN hash("place", "prefix_lo") AND hash("place", "prefix_hi")).

> I also wonder if that query would be faster if it also checked
> for hidden != 0 (before the presumably more expensive IN () query). I don't
> know how typical my places DB is, but it has ~50,000 places, of which only
> ~700 have hidden=1.

Unlikely, we don't have an index on hidden, and in general it's not a good idea querying on columns that look like booleans or have one value that is largely less frequent than the other one (for btree engines). Regardless any perf testing should be done with 150k places to cover the most cases in the wild.

Updated

2 years ago
Priority: -- → P3
(Assignee)

Comment 2

2 years ago
Per discussion in bug 1338800, seems like picking this up would make sense.
Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED
(Assignee)

Comment 3

2 years ago
(In reply to :Gijs from comment #2)
> Per discussion in bug 1338800, seems like picking this up would make sense.

So, I tried this, but I have to admit the impact seems pretty limited (~5% savings when tested with 210 bookmarks as in the unit-test for the Chrome migrator). I'll attach the patch either way. The savings are quite possibly higher in practice in a "real" browser depending on what things are open in the browser, as the consumers of the additional frecency notifications would constitute additional savings in practice, I expect.
Comment hidden (mozreview-request)

Comment 5

2 years ago
mozreview-review
Comment on attachment 8851609 [details]
Bug 1346979 - optimize Bookmarks.jsm's updateFrecency,

https://reviewboard.mozilla.org/r/123888/#review128440

::: toolkit/components/places/Bookmarks.jsm:1786
(Diff revision 1)
>   * @param db
>   *        the Sqlite.jsm connection handle.
>   * @param urls
>   *        the array of URLs to update.
> + * @param collapseNotifications
> + *        optional, whether we can send just one onManyFrecenciesChanged

nit: we usually use the "@param [optional] name" notation

::: toolkit/components/places/Bookmarks.jsm:1800
(Diff revision 1)
> +                     ", url, guid, hidden, last_visit_date)";
> +  }
> +  // We just use the hashes, since updating a few additional urls won't hurt.
>    yield db.execute(
>      `UPDATE moz_places
> -     SET hidden = 0
> +     SET 

trailing space...

But I actually prefer if you keep it onelined
SET hidden =....
    frecency =

don't worry too much about exact 80 chars limit, we have some tolerance :)

::: toolkit/components/places/nsINavHistoryService.idl:1440
(Diff revision 1)
>                        in nsISupports aClosure);
>  
> +  /**
> +   * Used by JS APIs to inform consumers many frecencies changed in one go.
> +   */
> +  void notifyManyFrecenciesChanged();

Rather than exposing one entry per notification, I prefer if you use history.getObservers(), see for example http://searchfox.org/mozilla-central/rev/990055a4902952e038cc350c9ffe1ac426d1c943/toolkit/components/places/History.jsm#654
Attachment #8851609 - Flags: review?(mak77)
Comment hidden (mozreview-request)
(Assignee)

Comment 7

2 years ago
(In reply to Marco Bonardo [::mak] from comment #5)
> Rather than exposing one entry per notification, I prefer if you use
> history.getObservers(), see for example
> http://searchfox.org/mozilla-central/rev/
> 990055a4902952e038cc350c9ffe1ac426d1c943/toolkit/components/places/History.
> jsm#654

Whoa, somehow I'd completely missed that. Much simpler. Updated, thanks!

Comment 8

2 years ago
mozreview-review
Comment on attachment 8851609 [details]
Bug 1346979 - optimize Bookmarks.jsm's updateFrecency,

https://reviewboard.mozilla.org/r/123888/#review129522
Attachment #8851609 - Flags: review?(mak77) → review+

Comment 9

2 years ago
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/4b283abc6b68
optimize Bookmarks.jsm's updateFrecency, r=mak

Comment 10

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/4b283abc6b68
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.