Open Bug 1351897 Opened 7 years ago Updated 2 years ago

[meta] Design and implement a better bookmarks API

Categories

(WebExtensions :: General, enhancement, P3)

enhancement

Tracking

(Not tracked)

People

(Reporter: rnewman, Unassigned)

References

Details

(Keywords: meta, Whiteboard: triaged)

The bookmarks API documented here[0] is lacking in a few respects.

Quickly skimming the docs:

1. No support for batching or transactions. This leads to behavior like this[1], where N changes result in NxM SQL writes. Not only does this allow intermediate states to potentially persist on disk, or be interrupted (leaving us with just 'D' in ACID!), but it's also much worse for performance and has tragic side-effects like triggering a sync after the first change, causing repeated syncs and inconsistency to spread to other devices.

Consider an add-on that tried to sort a folder containing 100 children. 100 individual calls to `bookmarks.move`, each firing onChildrenReordered and marking two bookmarks as modified from the perspective of Firefox Sync. Yikes!

2. onChanged is not documented to fire when a record's GUID is changed. Record GUIDs can change.

3. The tree node format is only able to represent folders and bookmarks, and bookmarks consist only of a title and a URL. It can't represent separators; even if we ignore the full range of types in Places, that's a bit of a shortcoming. It can't represent keywords, tags, icons, or descriptions.

4. There's no way to do something that Snooze Tabs is trying to do: store data in a way that doesn't collide in Sync. A way to create a per-device folder, or a flag to indicate that a folder isn't synced, would be two ways to approach this.

5. There's no event representing a bulk change. Consumers are expected to watch for onImportBegan, unregister their change listeners, re-register in onImportEnded, and then… figure out what changed. The two choices given to consumers -- get lots of notifications or have to do manual work -- both suck.

6. There's no event representing large-scale changes other than imports. Sync won't just create items, it'll modify, move, and delete them. A WebExtension that follows the documented advice and unregisters for onCreated will be _absolutely hammered_ if it's listening for other changes during a sync.

7. It's impossible to express safe operations, because there are no compare-and-set primitives. For example, an add-on might want to move a bookmark from one folder to another.

Simply calling `bookmarks.move(bm, {parentId: dest})` is incorrect: the bookmark might have been moved elsewhere in the meantime (perhaps a sync is in progress). The author might have an inkling of danger, and write

```
bookmarks.get(bm).then(items => {
  if (items[0].parentId === source) {
    bookmarks.move(bm, {parentId: dest});
  }
}, onErr)
```

… but I'm not convinced that even this is safe (hooray async code!), it's expensive (two queries!), and I have no expectation that real-world add-on authors will be this cautious.


I could go on, but I suspect you get the picture.

You couldn't build Sync with this API. You couldn't even correctly and performantly build backup, automatic filing extensions, deduping, or a range of other useful extensions. Just about all you could safely build are extensions that display your bookmarks and (unsafely) allow you to mutate one-by-one.

In short: the WebExtensions bookmarks API is insufficient to build non-toy examples, and should be marked as dangerous and replaced.

[0] <https://developer.mozilla.org/en-US/Add-ons/WebExtensions/API/bookmarks>
[1] <https://github.com/bwinton/SnoozeTabs/pull/288#discussion_r108796498>
Depends on: 1351915
Thanks for writing this up, Richard. I'm not so sure we need to throw out `chrome.bookmarks` entirely: many of these issues affect the Places APIs, too, and there's work we can do to make the existing `chrome.bookmarks` methods safer to use. We're constrained by Chrome compatibility as well, so I suspect the API will need to exist in some form. But I'll let the Web Extensions team speak to that, and focus on Places. I know you're very familiar with these pain points; this is more context for folks following along.

* (1) and (7) are also shortcomings in Bookmarks.jsm. There was talk of adding an async transaction manager to fix this (bug 979280), but that's a complex piece of work that will take time. Additionally, apart from `reorder` and the recently-added `insertTree` (bug 1344282), Bookmarks.jsm doesn't provide batch APIs, let alone CAS. We'll want to design and add these before revamping `chrome.bookmarks`.

* I filed bug 1351915 for (2). AFAIK, Sync is the only Places consumer that changes GUIDs, and it's an oversight that we don't do this.

* I think the lack of keywords, tags, and descriptions is a blessing in disguise, because their implementations in Places are long overdue for overhauls. Keywords are associated with URLs in the database, but with bookmarks in the UI. This creates a number of interesting mismatches; bug 1345417, comment 9 has the gory details. Tags are likewise associated with URLs, but stored as bookmarks, which is incredibly inefficient (bug 424160). Descriptions are stored as annos. Neither tags nor annos have public async APIs, so `PlacesSyncUtils` hits the DB directly for reads, and uses the (blocking, main-thread I/O) API for writes. I'd really like to see these cleaned up before we expose them via `chrome.bookmarks`.

* We'd like to move away from having non-syncable items in Places entirely (bug 1309930). Their existence has caused no end of trouble for Sync, and the queries are just confusing. This suggests bookmarks aren't the right answer to (4), but we don't have a better one ATM.

* (5) and (6) are limitations of Places observer notifications, which are also due for an overhaul. See bug 1340498. (FWIW, Places consumers suffer from the same problems. There's a concept of a "batch", but it's limited, and not supported by the async API). Once this is complete, we should be able to surface these events to `chrome.bookmarks`.
(In reply to Kit Cambridge [:kitcambridge] from comment #1)
> I'm not so sure we need to throw out `chrome.bookmarks` entirely…

I don't think we can; my aim in filing this bug is that we:

- Document its shortcomings
- Offer appropriate guidance in the documentation
- Provide a better replacement API.

> too, and there's work we can do to make the existing `chrome.bookmarks`
> methods safer to use.

And we should, whenever the spec and WE idioms allow us to.


> * (1) and (7) are also shortcomings in Bookmarks.jsm. There was talk of
> adding an async transaction manager to fix this (bug 979280), but that's a
> complex piece of work that will take time.

We used to use PlacesUtils.bookmarks.runInBatchMode in order to accrete writes within a single transaction.

On iOS, and to an extent on Android, we carefully expose primitives of varying coarseness to encourage work to be done transactionally. There are, sadly, more moving parts on desktop.


> * I filed bug 1351915 for (2). AFAIK, Sync is the only Places consumer that
> changes GUIDs, and it's an oversight that we don't do this.

I noted in that bug: the unfortunate thing about specs is that we can't _necessarily_ just add missing stuff. Who knows what bookmarks-using WebExtensions will do if they get `onChanged("oldguid", {id: "newguid"})`!

> I'd really like to see these
> cleaned up before we expose them via `chrome.bookmarks`.

Absolutely. Though the cart and the horse can trade places here: in a WebExtensions-only world, we can use the design of a modern API to inform the internals.

> * We'd like to move away from having non-syncable items in Places entirely
> (bug 1309930). Their existence has caused no end of trouble for Sync, and
> the queries are just confusing. This suggests bookmarks aren't the right
> answer to (4), but we don't have a better one ATM.

Agreed, though again this is an excellent opportunity to look at what add-on authors are trying to do, and to come up with some coherent way of enabling it. (Perhaps that's a "device-specific ID" API?)
Mak is likely to have thoughts on the above.
(In reply to Richard Newman [:rnewman] from comment #0)
> 1. No support for batching or transactions. This leads to behavior like
> this[1], where N changes result in NxM SQL writes.

For this we will provide utils in Bookrmarks.jsm and it will be task of the consumer to figure out when to use a batching API and when not.
The WebExtension API can't be changed for compatibility reasons, but it can be extended with similar batching APIs.
There has not been a design effort there, indeed I think many of those APIs are poorly and quickly designed. the primary effort went towards cross-browser compatibility.

Regarding the notifications traffic, I don't think the old batching solution is a solution, it created more damage than benefits, and we still have to create and dispatch a bunch of runnables, I'd rather move towards a new batched notifications system (bug 1340498) where a consumer can register a listener for a list of events and it will get all of them at once for each API call, so it can better decide how to batch them. So an insertTree will only send a single event including an array of changes, then it can demux them as it prefers. This can also be built in parallel to the old system and notifications can be converted one by one. It would be an incremental work.

> 2. onChanged is not documented to fire when a record's GUID is changed.
> Record GUIDs can change.

Sync is the only thing that can change GUIDs and Sync code (or Sync code in Places) should be responsibile to notify such changes. But yes, there should be a notification.

> 3. The tree node format is only able to represent folders and bookmarks, and
> bookmarks consist only of a title and a URL. It can't represent separators;
> even if we ignore the full range of types in Places, that's a bit of a
> shortcoming. It can't represent keywords, tags, icons, or descriptions.

Keywords are moving to the search service, dunno when but that's the plan. A keyword will just be a custom search engine.
Tags, what Kit said. It's likely they will retain their own separate API, and they will work as an alternative bookmarking system (in the UI they will likely still be associated to bookmarks, but in the API that would not be mandatory).
Icons is not something easy to sync, in the new hi-res favicons world, each page can have multiple icons, one per resolution (16, 32, 48, 64, ...) also with different icon urls, and the same icon can be shared by multiple pages. Plus there are root domain icons that are valid for all the pages in that domain that don't define their icon. It's simpler to figure out a privacy strategy to refetch icons from a page url.
Descriptions... dunno. We are not using them for anything, we may be using them in the future for indexing, but it's also true often they are abused for SEO reasons.

> 7. It's impossible to express safe operations, because there are no
> compare-and-set primitives. For example, an add-on might want to move a
> bookmark from one folder to another.
> 
> Simply calling `bookmarks.move(bm, {parentId: dest})` is incorrect: the
> bookmark might have been moved elsewhere in the meantime (perhaps a sync is
> in progress). The author might have an inkling of danger, and write

Yes, this is also not trivial to fix, unless we force all the API calls to be serialized, that is something I'm trying to avoid because it would severely penalize perf.
For what you are suggesting the only safe way would be to do the check in SQL in the same query doing the change, but in some cases the change requires multiple queries so that's not feasible. In addition there should also be a safe way to collect info for the following notification and return value, because if the query doesn't execute due to changes, we shouldn't notify the change. All of this should still happen from the same SQL command or there'd always be the risk that something else sneaks in the middle and changes things. I evaluated various possibilities through triggers and temp tables, all of them are expensive to implement and a mess to maintain.
For now we accept the rare possibility of race conditions, the queries do the best to avoid breaking coherence, and in some rare cases notifications may be "wrong" (things like a page is notified as removed, but was just re-added by something else after being removed). It's not critical enough for the cost it would have to fix it, imo.

(In reply to Kit Cambridge [:kitcambridge] from comment #1)
> There was talk of
> adding an async transaction manager to fix this (bug 979280), but that's a
> complex piece of work that will take time.

Actually, the async transaction manager has nothing to do with batching or database transactions. It's a name clash. The async transaction manager is just an async UNDO/REDO transaction manager, that we need to convert the UI calls to an async bookmarking API.
There won't be an async batching system because it's just too scary, if someone somehow breaks an async batch, we end up with a long transaction that will destroy any WAL journal benefits.

> Neither tags nor annos have public async
> APIs, so `PlacesSyncUtils` hits the DB directly for reads, and uses the
> (blocking, main-thread I/O) API for writes.

Both are sort of specced, but no resources to work on them. Makes sense since not even bookmarks conversion is complete, there's no reason to work on lower priorities.

> I'd really like to see these
> cleaned up before we expose them via `chrome.bookmarks`.

They won't be exposed there, tags will likely have their own chrome.tagging. API, annotations dunno but I doubt in their current shape they'll ever be a thing.

> * We'd like to move away from having non-syncable items in Places entirely
> (bug 1309930).

Right, that's what we should do, and then freeze roots so that only us can create them.
Note that for icons, the page-icon: protocol can be used to show the cached icon for a page. There are only 2 problems left to solve:
1. if the icon is not cached, there is no pass-through. It's possible to do that but it requires privacy approval and evaluation.
2. it's not exposed to content, but could be, with a security evaluation.
Flags: needinfo?(amckay)
Priority: -- → P5
Whiteboard: triaged
Whilst we try not to change APIs, a webextension API is much easier to change than a web one. Bookmarks aren't in the core of the APIs https://lists.w3.org/Archives/Public/public-browserext/2016May/0000.html and as such there isn't a "spec" - we can change things. 

The problem is mostly one of use, when we change an API that Chrome extensions rely upon we break those add-ons and we have to communicate it properly. But please don't think the API's are set in stone. If we can improve upon them, let's do so.

In the past there was no clear API for bookmarks and history from what I can tell, extensions had to talk to the appropriate .jsm's and figure out what to do. Perhaps they all got it correct but I suspect they didn't. I'm glad we've got an API so in the future we'll know what add-ons can and can't do and improve those extension points.

I'd be happy to lead conversations what to do with this once our blockers for Firefox 57 are cleared out, I bet most of the other people on this thread are busy with Firefox 57 too.
Flags: needinfo?(amckay)
Keywords: meta
Priority: P5 → P3
Product: Toolkit → WebExtensions
Bulk move of bugs per https://bugzilla.mozilla.org/show_bug.cgi?id=1483958
Component: Untriaged → General
Severity: normal → S3
Summary: Design and implement a better bookmarks API → [meta] Design and implement a better bookmarks API
You need to log in before you can comment on or make changes to this bug.