Closed Bug 1435354 Opened 4 years ago Closed 4 years ago

Add `PlacesUtils.keywords.invalidateCachedKeywords`

Categories

(Toolkit :: Places, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox60 --- fixed

People

(Reporter: lina, Assigned: lina)

References

Details

Attachments

(2 files, 1 obsolete file)

The mirror currently fires "keyword" `onItemChanged` notifications to update the keywords cache. This notification is on the chopping block in 1434607, so let's instead add an API for the mirror invalidate the entire cache when synced keywords change.
Comment on attachment 8948085 [details]
Bug 1435354 - Remove the keywords cache observer and implement cache invalidation.

https://reviewboard.mozilla.org/r/217712/#review223472

::: toolkit/components/places/PlacesUtils.jsm:2128
(Diff revision 1)
>  // Once the old API will be gone, we can remove this and stop observing.
>  var gIgnoreKeywordNotifications = false;
>  
> -XPCOMUtils.defineLazyGetter(this, "gKeywordsCachePromise", () =>
> -  PlacesUtils.withConnectionWrapper("PlacesUtils: gKeywordsCachePromise",
> +// Lazily initializes and returns the keywords cache.
> +var gKeywordChangeObserverPromise = Promise.resolve();
> +function promiseKeywordsCache() {

We could refactor this further, and only update the cache in `fetch`, `insert`, and `remove` after we update the database...but that seems like a yak for another day.

::: toolkit/components/places/PlacesUtils.jsm:2264
(Diff revision 1)
> -        },
> +  },
>  
> -        async _onUrlChanged(guid, url, oldUrl, source) {
> +  async _onUrlChanged(guid, url, oldUrl, source) {
> -          // Check if the old url is associated with keywords.
> +    // Check if the old url is associated with keywords.
> -          let entries = [];
> +    let entries = [];
> -          await PlacesUtils.keywords.fetch({ url: oldUrl }, e => entries.push(e));
> +    await PlacesUtils.keywords.fetch({ url: oldUrl }, e => entries.push(e));

While I'm here, could we use `keywordsForHref(oldUrl)` instead of going through `fetch`?
Priority: -- → P1
Comment on attachment 8948085 [details]
Bug 1435354 - Remove the keywords cache observer and implement cache invalidation.

https://reviewboard.mozilla.org/r/217712/#review223642

There is a problem, that is actually due to lack of documentation.
The reason "keywords" is a lazy getter, is so that as soon as someone takes a reference, the cache gets initialized and we start observing. Off-hand it doesn't seem necessary since you'd take a reference to call an async API BUT...
We actually do that on purpose as soon as the db is opened, because the bookmarking API can make changes at any time, and we must be notified of bookmark removals and uri changes, at least until we can completely remove keywords from bookmarks:
https://searchfox.org/mozilla-central/rev/e06af9c36a73a27864302cd2f829e6200dee8541/toolkit/components/places/PlacesCategoriesStarter.js#46
So, unfortunately we still need a way to start observing on startup.

::: toolkit/components/places/PlacesUtils.jsm:2264
(Diff revision 1)
> -        },
> +  },
>  
> -        async _onUrlChanged(guid, url, oldUrl, source) {
> +  async _onUrlChanged(guid, url, oldUrl, source) {
> -          // Check if the old url is associated with keywords.
> +    // Check if the old url is associated with keywords.
> -          let entries = [];
> +    let entries = [];
> -          await PlacesUtils.keywords.fetch({ url: oldUrl }, e => entries.push(e));
> +    await PlacesUtils.keywords.fetch({ url: oldUrl }, e => entries.push(e));

I think so, probably the code is just a bit overzealous atm.
Attachment #8948085 - Flags: review?(mak77) → review-
Comment on attachment 8948086 [details]
Bug 1435354 - Invalidate cached keywords when applying synced bookmarks.

https://reviewboard.mozilla.org/r/217714/#review223648

::: toolkit/components/places/PlacesUtils.jsm:2122
(Diff revision 1)
>        await notifyKeywordChange(url.href, "", source);
>      });
> +  },
> +
> +  /**
> +   * Clears the keywords cache and stops listening for keyword changes.

I suppose we'll then restart listening at the next API call... unfortunately that's not sufficient sinc bookmark changes may happen in the meanwhile and must "mirror" those in moz_keywords :(
Attachment #8948086 - Flags: review?(mak77) → review-
Comment on attachment 8948085 [details]
Bug 1435354 - Remove the keywords cache observer and implement cache invalidation.

https://reviewboard.mozilla.org/r/217712/#review223786

::: toolkit/components/places/PlacesUtils.jsm:2206
(Diff revision 2)
>              this._onUrlChanged(guid, val, oldVal, source).catch(Cu.reportError);
>            }
>          },
>  
>          _onKeywordChanged(guid, keyword, href) {
> +          gKeywordsCacheReadyPromise = gKeywordsCacheReadyPromise.then(_ => {

This is only really necessary for the legacy API...we could just call `promiseKeywordsCache` here, as we do in `onItemRemoved`, but then `fetch` might return stale keywords because it won't wait for `_onKeywordChanged` to update the cache, and https://searchfox.org/mozilla-central/rev/f80722d4f3bfb722c5ec53880c4a7efb71285676/toolkit/components/places/tests/legacy/test_bookmarks.js#322-323 fails. Since we'll be removing the `"keyword"` change notification entirely in bug 1434607, this might be OK for now.
So...this is getting tricky, and there are two cases I'm worried about apart from `_onKeywordChanged`:

* In `_onUrlChanged`, we call `fetch` to fetch all keywords from the cache, then iterate over the entries and use the public `remove` and `insert` APIs to remove them. It's possible that we might invalidate the cache between the time we call `fetch` and `remove`/`insert`, in which case we'll be working with out-of-date entries.

* In `onItemRemoved`, we fetch keywords first, then fetch the bookmark, then remove those keywords. If we invalidate the cache or change `moz_keywords` between fetching and removing those keywords, we might end up working with stale data. This might be OK; in the worst case, if the user makes a change as we're syncing, they'll need to redo the change. But I am worried about the cache falling out of sync with the database.

Unlike livemarks, I don't think we can easily use a promise queue to serialize these calls, because the observers use the public `PlacesUtils.keywords.{insert, remove}` methods, which call `promiseKeywordsCache` again. To make a queue work without deadlocking, I think we'd need to call `promiseKeywordsCache` in the observer, then factor out the internals of `insert` and `remove` into functions that take the cache as arguments? I started down that path on Friday, but it's a pretty substantial change.

Or maybe it's enough of an edge case where we don't need to worry too much about it.

What do you think?
(In reply to Kit Cambridge (they/them) [:kitcambridge] from comment #9)
> So...this is getting tricky, and there are two cases I'm worried about apart
> from `_onKeywordChanged`

That's going away anyway in bug 1434607.

> What do you think?

True, we have some edge-cases that are not well handled.
Actually I was hoping to completely remove this observer, since it has a non-negligible cost for a small benefit. Very few users use keywords, and at a maximum just a few of them, but we pay to notify each bookmark change.
The legacy bookmarking API is only used by tests and some code that cannot add keywords (left pane folder and tags), what about we add direct methods in PlacesUtils.keywords to update the cache, and call them directly from Bookmarks.jsm? Not having add-ons that can touch those makes that a possibility, and then we'd not need to init the cache until needed. In the end we only care about removed bookmarks and changed uris, it shouldn't be a lot of calls.
Comment on attachment 8948085 [details]
Bug 1435354 - Remove the keywords cache observer and implement cache invalidation.

(In reply to Marco Bonardo [::mak] from comment #10)
> what about we add direct methods
> in PlacesUtils.keywords to update the cache, and call them directly from
> Bookmarks.jsm? Not having add-ons that can touch those makes that a
> possibility, and then we'd not need to init the cache until needed.

This sounds like a better plan to me.
Attachment #8948085 - Flags: review?(mak77) → review-
Attachment #8948086 - Flags: review?(mak77)
Blocks: 1313188
Attachment #8948086 - Attachment is obsolete: true
Comment on attachment 8948085 [details]
Bug 1435354 - Remove the keywords cache observer and implement cache invalidation.

What do you think of this approach, Mak?

This removes the keywords cache observer entirely, in favor of `reassignCachedKeywordsToURL` and `removeCachedKeywordsFromURLs` that update the cache. The methods in Bookmarks.jsm take care of updating `moz_keywords`. (Alternatively, I could use the public `insert`, `fetch`, and `remove` APIs).

The tests will need fixing, of course.
Attachment #8948085 - Flags: review?(mak77) → feedback?(mak77)
Comment on attachment 8948085 [details]
Bug 1435354 - Remove the keywords cache observer and implement cache invalidation.

https://reviewboard.mozilla.org/r/217712/#review224494

This doesn't seem to handle eraseEverything.

The idea seems to work, I'd like to see the keywords code moved into PlacesUtils.keywords, and likely we need a way to reduce DB roundtrips.

::: toolkit/components/places/Bookmarks.jsm:2532
(Diff revision 3)
>    return list.map(JSON.stringify).join();
>  }
> +
> +async function reassignKeywords(db, oldURL, newURL) {
> +  await db.executeCached(`
> +    UPDATE moz_keywords SET

I'd prefer to keep SQL queries to moz_keywords into PlacesUtils.keywords.
Is there a strong reason to separate query operations and cache updates/notifications?

::: toolkit/components/places/Bookmarks.jsm:2548
(Diff revision 3)
> +  let rows = await db.executeCached(`
> +    SELECT k.id, h.url FROM moz_keywords k
> +    JOIN moz_places h ON h.id = k.place_id
> +    LEFT JOIN moz_bookmarks b ON b.fk = h.id
> +    WHERE b.id ISNULL`);
> +

I'm a bit worried by the perf of this, because at a minimum we have one additional read for each removal (we could avoid the write if ids.length is zero...)
I fear we may have to rely on the keywords cache to quickly tell if the uri has a keyword, and then we'd read only to build the cache.

::: toolkit/components/places/PlacesUtils.jsm:1997
(Diff revision 3)
>        entries.forEach(safeOnResult);
>        return entries.length ? entries[0] : null;
>      });
>    },
>  
> +  async removeCachedKeywordsFromURLs(urls) {

s/from/for/?
Attachment #8948085 - Flags: feedback?(mak77)
No longer blocks: 1313188
Rebased atop bug 1434607 to pick up the test changes, but I think these should land in the same push.
No longer blocks: 1434607
Depends on: 1434607
Comment on attachment 8948085 [details]
Bug 1435354 - Remove the keywords cache observer and implement cache invalidation.

https://reviewboard.mozilla.org/r/217712/#review224494

> I'd prefer to keep SQL queries to moz_keywords into PlacesUtils.keywords.
> Is there a strong reason to separate query operations and cache updates/notifications?

I was thinking we could do them in the same transaction, since we have one, but there's no strong need. The shutdown blocker is probably more important, anyway, so we shouldn't be interrupted trying to reassign keywords.

> I'm a bit worried by the perf of this, because at a minimum we have one additional read for each removal (we could avoid the write if ids.length is zero...)
> I fear we may have to rely on the keywords cache to quickly tell if the uri has a keyword, and then we'd read only to build the cache.

Makes sense. Checking the cache first would match how the observer behaves today, too.
(In reply to Kit Cambridge (they/them) [:kitcambridge] from comment #17)
> Rebased atop bug 1434607 to pick up the test changes, but I think these
> should land in the same push.

ah hum, the problem is that bug 1434607 depends on bug 1313188 (because otherwise a test fails) that I wanted to fix after this one, because they were touching related code but this patch is larger, so we're deadlocking now :)

I guess I should try to make Bug 1313188 landable, so then I can land bug 1434607, and finally this.
(In reply to Marco Bonardo [::mak] from comment #19)
> ah hum, the problem is that bug 1434607 depends on bug 1313188 (because
> otherwise a test fails)

Oh, I see. Is that `test_unsetKeyword`, by chance? I papered over the failure by calling `PlacesUtils.keywords.remove("keyword")` in this patch, but it sounds like landing bug 1313188 first is the better way.

Since I removed the observer, all tests that call `Keywords.fetch` after `setKeywordForBookmark` fail, so I thought it would be easier to just remove `setKeywordForBookmark` at the same time. Sorry for creating a deadlock. :-P
let's first land bug 1313188 and rebase this on top of it.
Blocks: 1434607
No longer depends on: 1434607
Depends on: 1313188
Ok, the proper order of patches is bug 1313188, then this one, then bug 1434607 and finally bug 1433307.
I'm landing bug 1313188 now, if everything goes well, please rebase the patches on top of that.
PS: you can steal parts from the follower bugs, if necessary (like removing a test rather than fixing it)
(In reply to Marco Bonardo [::mak] from comment #26)
> PS: you can steal parts from the follower bugs, if necessary (like removing
> a test rather than fixing it)

Thanks! Indeed, I cribbed most of the test changes from bug 1434607.
Comment on attachment 8948085 [details]
Bug 1435354 - Remove the keywords cache observer and implement cache invalidation.

https://reviewboard.mozilla.org/r/217712/#review227180

There are a few things to check yet, but it's almost there

I think you can also remove this now: https://searchfox.org/mozilla-central/rev/5536f71c3833018c4f4e2c73f37eae635aab63ff/toolkit/components/places/PlacesCategoriesStarter.js#44 since its only scope was to add the bookmark observer

::: toolkit/components/places/Bookmarks.jsm:769
(Diff revision 6)
>              }
>            }
>          }
>          if (updateInfo.hasOwnProperty("url")) {
> +          await PlacesUtils.keywords.reassign(item.url, updatedItem.url,
> +            updatedItem.source);

nit: please align arguments

::: toolkit/components/places/PlacesUtils.jsm:2229
(Diff revision 6)
> -            if (!bookmark) {
> -              for (let keyword of keywords) {
> -                await PlacesUtils.keywords.remove({ keyword, source });
> +                      WHERE url_hash = hash(:newURL) AND
> +                            url = :newURL)
> +        WHERE place_id = (SELECT id FROM moz_places
> +                          WHERE url_hash = hash(:oldURL) AND
> +                                url = :oldURL)`,
> +        { newURL: newURL.href, oldURL: oldURL.href });

what if the new url already has a keyword? wouldn't we hit the constraint here?
Maybe worth adding a test?

::: toolkit/components/places/PlacesUtils.jsm:2285
(Diff revision 6)
> -            this._onUrlChanged(guid, val, oldVal, source).catch(Cu.reportError);
> +          let rows = await db.execute(`
> +            SELECT k.id, k.keyword FROM moz_keywords k
> +            JOIN moz_places h ON h.id = k.place_id
> +            LEFT JOIN moz_bookmarks b ON b.fk = h.id
> +            WHERE b.id ISNULL AND
> +                  k.keyword IN (${"?,".repeat(chunk.length - 1) + "?"})`, chunk);

alternatively, I suspect you could just check if foreign_count == number_of_keywords. we increase fc for each bookmark and each keyword.

::: toolkit/components/places/PlacesUtils.jsm:2357
(Diff revision 6)
> -      for (let row of rows) {
> +        for (let row of rows) {
> -        let keyword = row.getResultByName("keyword");
> +          let keyword = row.getResultByName("keyword");
> -        try {
> +          try {
> -          let entry = { keyword,
> +            let entry = { keyword,
> -                        url: new URL(row.getResultByName("url")),
> +                          url: new URL(row.getResultByName("url")),
> -                        postData: row.getResultByName("post_data") || null };
> +                          postData: row.getResultByName("post_data") };

the || null removal from here could be a merge mistake, I think I just added it in bug 1313188.

We write an empty string to the db, but some API consumers expect us to return a null postData, so the cache has null postData entries.
Attachment #8948085 - Flags: review?(mak77) → review-
Comment on attachment 8949607 [details]
Bug 1435354 - Invalidate cached keywords when applying synced bookmarks.

https://reviewboard.mozilla.org/r/218974/#review227190
Attachment #8949607 - Flags: review?(mak77) → review+
Comment on attachment 8948085 [details]
Bug 1435354 - Remove the keywords cache observer and implement cache invalidation.

https://reviewboard.mozilla.org/r/217712/#review227180

> what if the new url already has a keyword? wouldn't we hit the constraint here?
> Maybe worth adding a test?

Added a test that covers all four scenarios: source URL with keywords to destination URL without keywords (move), source without to destination with (remove all keywords from destination), source with to destination with (remove all keywords from destination, then move), and source without to destination without (do nothing).

> alternatively, I suspect you could just check if foreign_count == number_of_keywords. we increase fc for each bookmark and each keyword.

Done, thanks! I also opted not to bind the URLs, and just use them to query the cache. `EXPLAIN QUERY PLAN` tells me that, either way, we: 1) scan `moz_keywords` using `moz_keywords_placepostdata_uniqueindex`, 2) search `moz_places` using `id`, 3) use a temp B-tree for the `GROUP BY`...so I suspect adding an `IN` list with the URLs and hashes isn't necessary.

> the || null removal from here could be a merge mistake, I think I just added it in bug 1313188.
> 
> We write an empty string to the db, but some API consumers expect us to return a null postData, so the cache has null postData entries.

Good catch, I goofed this while rebasing. :-/
Comment on attachment 8948085 [details]
Bug 1435354 - Remove the keywords cache observer and implement cache invalidation.

https://reviewboard.mozilla.org/r/217712/#review227744

::: services/sync/tests/unit/test_bookmark_tracker.js
(Diff revision 8)
>      _("Clean up.");
>      await cleanup();
>    }
>  });
>  
> -add_task(async function test_onItemKeywordChanged() {

With this test in place, we'll get constraint errors, because `setKeywordForBookmark` no longer updates the cache.

::: toolkit/components/places/tests/unit/test_sync_utils.js
(Diff revision 8)
>      let updatedItem2 = await PlacesSyncUtils.bookmarks.update({
>        recordId: item2.recordId,
>        keyword: "test5",
>      });
>      equal(updatedItem2.keyword, "test5", "New update succeeds");
> -    await PlacesSyncUtils.bookmarks.update({

This is a behavior change; we'll now move `"test5"` to the URL, and remove `"test4"`.

However, I think it's all right, because passing a subset of properties isn't a supported way to call `PlacesSyncUtils.bookmarks.update`. When Sync runs, it'll always pass the complete bookmark record. An absence of a property indicates we should remove it, not leave the old value in place. `PSU.b.update` tries to be fancy and mimic how `P.b.update` works.
Comment on attachment 8948085 [details]
Bug 1435354 - Remove the keywords cache observer and implement cache invalidation.

https://reviewboard.mozilla.org/r/217712/#review227804

::: toolkit/components/places/PlacesUtils.jsm:2184
(Diff revision 8)
>  
> -XPCOMUtils.defineLazyGetter(this, "gKeywordsCachePromise", () =>
> -  PlacesUtils.withConnectionWrapper("PlacesUtils: gKeywordsCachePromise",
> +  /**
> +   * Moves all (keyword, POST data) pairs from one URL to another, and fires
> +   * observer notifications for all affected bookmarks. All existing keywords
> +   * for the destination URL will be removed, even if the source URL didn't
> +   * have any keywords.

Is there a specific reason for the last part, apart from code simplicity?
This means if I change url of a bookmark that has no keywords, to a url of a bookmark that has keywords, the keywords get lost. That sounds unexpected.
Couldn't we just bailout if the old url has no keywords?
Or make the behavior optional if Sync needs it for the mirror.

::: toolkit/components/places/toolkitplaces.manifest
(Diff revision 8)
>  
>  # PlacesCategoriesStarter.js
>  component {803938d5-e26d-4453-bf46-ad4b26e41114} PlacesCategoriesStarter.js
>  contract @mozilla.org/places/categoriesStarter;1 {803938d5-e26d-4453-bf46-ad4b26e41114}
>  category idle-daily PlacesCategoriesStarter @mozilla.org/places/categoriesStarter;1
> -category places-init-complete PlacesCategoriesStarter @mozilla.org/places/categoriesStarter;1

I was trying to understand if not instantiating the categories starter with places is ok.
What's left is:
1. a shutdown observer, that is used to remove its own observers, and tell PlacesDBUtils we are shutting down. The latter could be bogus for external consumers of PlacesDBUtils, but it's an edge case.
2. gather-telemetry, is fired after idle-daily, that means the service will be listening for it already
3. idle-daily, is still registered as a category

So, I think we are ok here.
It may be worth in the future to refactor the PlacesDBUtils shutdown listener into PlacesDBUtils itself, and maybe rename this component to PlacesIdleListener. Bug 1127907 makes unclear whether we could also handle telemetry differently.
Comment on attachment 8948085 [details]
Bug 1435354 - Remove the keywords cache observer and implement cache invalidation.

https://reviewboard.mozilla.org/r/217712/#review227804

> Is there a specific reason for the last part, apart from code simplicity?
> This means if I change url of a bookmark that has no keywords, to a url of a bookmark that has keywords, the keywords get lost. That sounds unexpected.
> Couldn't we just bailout if the old url has no keywords?
> Or make the behavior optional if Sync needs it for the mirror.

That makes sense, and it makes the code slightly simpler...I just didn't think this through. Changing the URL of an existing bookmark to a URL that has keywords means you now have two bookmarks with the same URL, so, of course, they should share the existing keywords for that URL.
Comment on attachment 8948085 [details]
Bug 1435354 - Remove the keywords cache observer and implement cache invalidation.

https://reviewboard.mozilla.org/r/217712/#review228210

Thank you, I appreciate a lot this work, it will make simpler to remove the old APIs in the follow-ups.

::: toolkit/components/places/PlacesCategoriesStarter.js:43
(Diff revision 10)
>          PlacesDBUtils.telemetry();
>          break;
>        case PlacesUtils.TOPIC_INIT_COMPLETE:
> -        // Init keywords so it starts listening to bookmarks notifications.
> -        PlacesUtils.keywords;
>          break;

I think you misinterpreted my comment, or I wasn't clear. We CAN remove the init-complete category from PlacesCategoriesStarter.
Attachment #8948085 - Flags: review?(mak77) → review+
Comment on attachment 8948085 [details]
Bug 1435354 - Remove the keywords cache observer and implement cache invalidation.

https://reviewboard.mozilla.org/r/217712/#review228210

> I think you misinterpreted my comment, or I wasn't clear. We CAN remove the init-complete category from PlacesCategoriesStarter.

Ah, sorry. Re-removed. :-)
Pushed by kcambridge@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a460a637d352
Remove the keywords cache observer and implement cache invalidation. r=mak
https://hg.mozilla.org/integration/autoland/rev/5257a444bcdb
Invalidate cached keywords when applying synced bookmarks. r=mak
https://hg.mozilla.org/mozilla-central/rev/a460a637d352
https://hg.mozilla.org/mozilla-central/rev/5257a444bcdb
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Depends on: 1468104
You need to log in before you can comment on or make changes to this bug.