Closed Bug 1364768 Opened 7 years ago Closed 7 years ago

Migrate away from IndexedDB for WebExtension startup cache

Categories

(WebExtensions :: General, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla56

People

(Reporter: kmag, Assigned: kmag)

References

(Blocks 1 open bug)

Details

(Whiteboard: [triaged])

Attachments

(8 files)

IndexedDB works well for reducing main thread CPU, but it takes upwards of 500ms to initialize at app startup, which leads to extensions being initialized after the first window is shown, and therefore adding jank shortly after startup.

I think that switching to a lz4-compressed structured clone blob of the entire cache should solve these problems, without reintroducing the main thread JSON parsing overhead that the switch to IndexedDB was meant to address.
If reading/writing an lz4-compressed structured clone blob of the entire cache is faster than JSON parsing, should we consider using this same approach for the blocklist, and for the search service's cache? If so, could the encodeBlob/decodeBlob methods live in a more generic interface than amIAddonManagerStartup?

I'm not sure if the code here is meant to read the file from disk synchronously or asynchronously. The new NetUtil.readInputStream method looks like it's doing main thread IO (eg the .available() call), but the promiseFileContents function looks like it's attempting to do this asynchronously. Why can't OS.File be used here?
(In reply to Florian Quèze [:florian] [:flo] from comment #8)
> If reading/writing an lz4-compressed structured clone blob of the entire
> cache is faster than JSON parsing, should we consider using this same
> approach for the blocklist, and for the search service's cache?

In this case, it's considerably faster. 1-4ms vs. tens of milliseconds.

For the blocklist, I was hoping to migrate to IndexedDB, and query the
database rather than keeping the JSON data in memory (bug 1345334).

It may make sense for other things, though. Session store, in particular,
comes to mind.

> If so, could the encodeBlob/decodeBlob methods live in a more generic
> interface than amIAddonManagerStartup?

If we want to use it elsewhere, then yes, they should probably be moved, and
we should probably add some helpers to handle the IO.

> I'm not sure if the code here is meant to read the file from disk
> synchronously or asynchronously. The new NetUtil.readInputStream method
> looks like it's doing main thread IO (eg the .available() call), but the
> promiseFileContents function looks like it's attempting to do this
> asynchronously.

The actual IO should happen off-thread. The data gets buffered into a pipe,
and the readInputStream method reads that buffer synchronously, but actual
disk IO is asynchronous.

> Why can't OS.File be used here?

OS.File added about 300ms delay to the load time, which wasn't much better
than IndexedDB.
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #9)

> > I'm not sure if the code here is meant to read the file from disk
> > synchronously or asynchronously. The new NetUtil.readInputStream method
> > looks like it's doing main thread IO (eg the .available() call), but the
> > promiseFileContents function looks like it's attempting to do this
> > asynchronously.
> 
> The actual IO should happen off-thread. The data gets buffered into a pipe,
> and the readInputStream method reads that buffer synchronously, but actual
> disk IO is asynchronous.

Is this something you are doing in your patches, or something you expect NetUtil to do for you?

Reading bug 925838 and bug 801599 doesn't give me great confidence in NetUtil.async* methods being actually main thread IO free.

Neither does http://searchfox.org/mozilla-central/rev/484d2b7f51b7aed035147bbb4a565061659d9278/netwerk/base/NetUtil.jsm#87

> > Why can't OS.File be used here?
> 
> OS.File added about 300ms delay to the load time, which wasn't much better
> than IndexedDB.

Do we know what's causing this delay and if it's fixable? I think OS.File has other early-startup consumers.
(In reply to Florian Quèze [:florian] [:flo] from comment #10)
> (In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #9)
> 
> > > I'm not sure if the code here is meant to read the file from disk
> > > synchronously or asynchronously. The new NetUtil.readInputStream method
> > > looks like it's doing main thread IO (eg the .available() call), but the
> > > promiseFileContents function looks like it's attempting to do this
> > > asynchronously.
> > 
> > The actual IO should happen off-thread. The data gets buffered into a pipe,
> > and the readInputStream method reads that buffer synchronously, but actual
> > disk IO is asynchronous.
> 
> Is this something you are doing in your patches, or something you expect
> NetUtil to do for you?

It's something that I believe NetUtil already does. If not, that should be fixable. All of the interfaces involved are threadsafe, so the IO operations should be retargetable to a background thread.

> > OS.File added about 300ms delay to the load time, which wasn't much better
> > than IndexedDB.
> 
> Do we know what's causing this delay and if it's fixable? I think OS.File
> has other early-startup consumers.

I don't know, no, and I don't have time to investigate just now. If you have ideas, I'd be happy to hear them.
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #11)

> > Do we know what's causing this delay and if it's fixable? I think OS.File
> > has other early-startup consumers.
> 
> I don't know, no, and I don't have time to investigate just now. If you have
> ideas, I'd be happy to hear them.

I don't have ideas without observing this delay myself. If you have a startup profile of it, that would be nice to share, and I would needinfo Yoric on it to see if he has suggestions.
(In reply to Florian Quèze [:florian] [:flo] from comment #12)
> I don't have ideas without observing this delay myself. If you have a
> startup profile of it, that would be nice to share, and I would needinfo
> Yoric on it to see if he has suggestions.

I'm probably going to switch to fetch(), which definitely does do its IO off-thread, if the performance looks good enough.
Hm. Fetch has the same issues as OS.File. Fetch winds up taking about 75-100ms longer than asyncRead, and OS.File.read takes about 40ms longer. But I suppose the 40ms from OS.File may be acceptable for avoiding main thread IO. It may just be a matter of some extra trips through the event loop.
OS.File relies on chrome workers that are slow to start, I think that's pretty well known.  Years ago we started a short lived effort to get rid of code on the startup path that uses OS.File for this reason which unfortunately never happened.  See bug 986145 for example.
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #14)
> Hm. Fetch has the same issues as OS.File. Fetch winds up taking about
> 75-100ms longer than asyncRead, and OS.File.read takes about 40ms longer.
> But I suppose the 40ms from OS.File may be acceptable for avoiding main
> thread IO. It may just be a matter of some extra trips through the event
> loop.

What URLs are you fetch()ing?  fetch() wasn't really ever optimized that much (at all?) but it shouldn't have the same issues as OS.File.  I could totally believe it has its own host of issues though!
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #15)
> OS.File relies on chrome workers that are slow to start, I think that's
> pretty well known.  Years ago we started a short lived effort to get rid of
> code on the startup path that uses OS.File for this reason which
> unfortunately never happened.  See bug 986145 for example.

For simple read operations, it has a native fast path now. As long as you don't do anything fancy like compression. I confirmed that we're using the fast path in my case.

(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #16)
> What URLs are you fetch()ing?  fetch() wasn't really ever optimized that
> much (at all?) but it shouldn't have the same issues as OS.File.  I could
> totally believe it has its own host of issues though!

It's a local file in the profile. I'm just fetching it and returning an ArrayBuffer. From the quick looks I did thought the code prior to testing, it seems like it should be fairly efficient, but it probably does some extra trips through the event loop relative to the asyncFetch version.
> fetch() wasn't really ever optimized that much (at all?) but it shouldn't have the same issues as OS.File.  I could totally believe it has its own host of issues though!

Can we do this? `fetch` feels like an ideal candidate to unify around and replace the plethora of solutions we have right now (XHR, OS.File, NetUtils etc.) with each having it's own flock of problems.
It does help that `fetch` is also a WebAPI.

I'd like to use `fetch` for all l10n startup resource loading if possible, but I can't afford the performance hit, so I'm stuck with XHR (which, from my testing is the fastest of the methods I tried on `resource://` protocol).
Priority: -- → P1
Whiteboard: [triaged]
Comment on attachment 8867568 [details]
Bug 1364768: Part 1 - Add NetUtil.readInputStream helper.

https://reviewboard.mozilla.org/r/139106/#review142830

::: netwerk/base/NetUtil.jsm:454
(Diff revision 1)
>                                             Components.stack.caller, e.data);
>          }
>      },
>  
>      /**
> +     * Reads aCount bytes from aInputStream into a string.

nit: this returns an ArrayBuffer, not a string (there's another comment that mentions string on line 461 as well)
Attachment #8867568 - Flags: review?(aswan) → review+
Comment on attachment 8867571 [details]
Bug 1364768: Part 4 - Switch to a compressed, binary flat file for startup cache.

https://reviewboard.mozilla.org/r/139112/#review142832

::: toolkit/components/extensions/ExtensionUtils.jsm:142
(Diff revision 1)
>      if (topic === "startupcache-invalidate") {
> -      this.dbPromise = this.reallyOpen(true).catch(e => {});
> +      this._data = new Map();
> +      this._dataPromise = Promise.resolve(this._data);
>      }

When exactly does this get called?  More to the point, does the on-disk data need to get wiped here?  From a quick glance it looks like it did in the old IndexedDB implementation...

::: toolkit/components/extensions/ExtensionUtils.jsm:150
(Diff revision 1)
> +  void StartupCache.dataPromise;
> +
> -Services.obs.addObserver(StartupCache, "startupcache-invalidate");
> +  Services.obs.addObserver(StartupCache, "startupcache-invalidate");

nit: can you move these lines plus the creation of the CacheStore instances below into a StartupCache.init() method and just call it here?  It would be more idiomatic and simpler for readers trying to understand this code.
Comment on attachment 8867572 [details]
Bug 1364768: Part 5 - Store the list of available locales in the startup cache.

https://reviewboard.mozilla.org/r/139114/#review142838
Attachment #8867572 - Flags: review?(aswan) → review+
Comment on attachment 8867573 [details]
Bug 1364768: Part 6 - Use startup cache for initial extension permission data.

https://reviewboard.mozilla.org/r/139116/#review142842

::: commit-message-96ee5:4
(Diff revision 1)
> +milliseconds, largely from the overhead of initializing OS.File. We can avoid
> +that somewhat by using the stream APIs to read the files, and beginning the

Then why not just switch JSONFile over to the stream APIs?
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #18)
> > fetch() wasn't really ever optimized that much (at all?) but it shouldn't have the same issues as OS.File.  I could totally believe it has its own host of issues though!
> 
> Can we do this? `fetch` feels like an ideal candidate to unify around and
> replace the plethora of solutions we have right now (XHR, OS.File, NetUtils
> etc.) with each having it's own flock of problems.
> It does help that `fetch` is also a WebAPI.
> 
> I'd like to use `fetch` for all l10n startup resource loading if possible,
> but I can't afford the performance hit, so I'm stuck with XHR (which, from
> my testing is the fastest of the methods I tried on `resource://` protocol).

Unfortunately, I strongly suspect that XHR winds up doing its file IO on the main thread, too. I confirmed that fetch(), at least, retargets it to the socket transport service thread pool. So even if it is a bit slower, it's probably better overall for perf.
Comment on attachment 8867573 [details]
Bug 1364768: Part 6 - Use startup cache for initial extension permission data.

https://reviewboard.mozilla.org/r/139116/#review142842

> Then why not just switch JSONFile over to the stream APIs?

Like I said, I initially switched to just using the stream APIs to load the DB, but it winds up being much slower overall, and adds additional IO and JSON parsing overhead. The only way we can get the data in a reasonable time is to initialize the load very early, like we do for the startup cache. That winds up being pretty ugly, and still adds a lot of extra overhead that we don't need.
Comment on attachment 8867571 [details]
Bug 1364768: Part 4 - Switch to a compressed, binary flat file for startup cache.

https://reviewboard.mozilla.org/r/139112/#review142832

> When exactly does this get called?  More to the point, does the on-disk data need to get wiped here?  From a quick glance it looks like it did in the old IndexedDB implementation...

At the moment, probably just when an add-on is installed or uninstalled. We probably do need to trigger the save timer, though, yes. I was thinking that it would be saved at shutdown anyway, but I don't think that's actually the case.
Comment on attachment 8867571 [details]
Bug 1364768: Part 4 - Switch to a compressed, binary flat file for startup cache.

https://reviewboard.mozilla.org/r/139112/#review143138

r=me with the previously raised issues addressed
Attachment #8867571 - Flags: review?(aswan) → review+
Comment on attachment 8867573 [details]
Bug 1364768: Part 6 - Use startup cache for initial extension permission data.

https://reviewboard.mozilla.org/r/139116/#review143158

::: toolkit/components/extensions/ExtensionPermissions.jsm:128
(Diff revision 1)
>          origins.splice(i, 1);
>        }
>      }
>  
>      if (removed.permissions.length > 0 || removed.origins.length > 0) {
> -      prefs.saveSoon();
> +      this.saveSoon(extension);

missing underscore.  which suggests that this code path isn't actually tested? :(
Comment on attachment 8867573 [details]
Bug 1364768: Part 6 - Use startup cache for initial extension permission data.

https://reviewboard.mozilla.org/r/139116/#review143158

> missing underscore.  which suggests that this code path isn't actually tested? :(

I may not have run the right set of tests after this change. I'll run coverage tests later today.
Comment on attachment 8867574 [details]
Bug 1364768: Part 7 - Synchronously load the initial set of API scripts at startup.

https://reviewboard.mozilla.org/r/139118/#review143170

::: commit-message-129da:3
(Diff revision 1)
> +Bug 1364768: Part 7 - Synchronously load the initial set of API scripts at startup. r?aswan
> +
> +Loading these scripts synchronously has some advantages, but it has the

this should be asynchronously?
Comment on attachment 8867570 [details]
Bug 1364768: Part 3 - Add helpers to serialize and deserialize compressed structured clone blobs.

I'd like someone more experienced with the JS API to take a look at this as well.
Attachment #8867570 - Flags: review?(wmccloskey)
Comment on attachment 8867570 [details]
Bug 1364768: Part 3 - Add helpers to serialize and deserialize compressed structured clone blobs.

https://reviewboard.mozilla.org/r/139110/#review143686

The JSAPI usage seems fine to me. I'd be a little worried if you ever have lots of data passing through here. I think we could reduce the number of copies, and it might also be nice to avoid storing the data contiguously. For the latter, you would probably need a different JS interface though.
Attachment #8867570 - Flags: review?(wmccloskey) → review+
Comment on attachment 8867570 [details]
Bug 1364768: Part 3 - Add helpers to serialize and deserialize compressed structured clone blobs.

https://reviewboard.mozilla.org/r/139110/#review143686

This is only used for data that we need all at once at startup. For data that we need to read and update granularly, we can still use IndexedDB, which handles the structured clone stuff for us.
Comment on attachment 8867569 [details]
Bug 1364768: Part 2 - Add AsyncShutdown finalizer support to DeferredSave.

https://reviewboard.mozilla.org/r/139108/#review144192
Attachment #8867569 - Flags: review?(rhelmer) → review+
Comment on attachment 8867570 [details]
Bug 1364768: Part 3 - Add helpers to serialize and deserialize compressed structured clone blobs.

https://reviewboard.mozilla.org/r/139110/#review144200

::: toolkit/mozapps/extensions/AddonManagerStartup.cpp:147
(Diff revision 1)
>      return Err(NS_ERROR_UNEXPECTED);
>    }
>  
> -  auto size = LittleEndian::readUint32(lz4.get() + magic.Length());
> +  auto data = lz4.BeginReading() + magic.Length();
> +  auto size = LittleEndian::readUint32(data);
> +  data += 4;

4 is sprinkled around several times, why not define it?
Attachment #8867570 - Flags: review?(rhelmer) → review+
Comment on attachment 8867573 [details]
Bug 1364768: Part 6 - Use startup cache for initial extension permission data.

https://reviewboard.mozilla.org/r/139116/#review147170

r=me with the outstanding issues addressed

::: toolkit/components/extensions/ExtensionPermissions.jsm:137
(Diff revision 1)
>  
>    async removeAll(extension) {
> -    await lazyInit();
> -    delete prefs.data[extension.id];
> +    let perms = await this._getCached(extension);
> +
> +    if (perms.permission.length || perms.origins.length) {
> +      Object.assign(perms, emptyPermissions());

Why?  This is called when an extension is uninstalled, and I think it would be good to avoid leaving even small remnants around for extensions that have been uninstalled...
Attachment #8867573 - Flags: review?(aswan) → review+
Comment on attachment 8867574 [details]
Bug 1364768: Part 7 - Synchronously load the initial set of API scripts at startup.

https://reviewboard.mozilla.org/r/139118/#review147174

Please correct the commit message but otherwise looks good.
Attachment #8867574 - Flags: review?(aswan) → review+
Comment on attachment 8867573 [details]
Bug 1364768: Part 6 - Use startup cache for initial extension permission data.

https://reviewboard.mozilla.org/r/139116/#review147170

> Why?  This is called when an extension is uninstalled, and I think it would be good to avoid leaving even small remnants around for extensions that have been uninstalled...

We clear all of the cache data for an extension after it's uninstalled, so the data gets removed from the cache anyway.
Comment on attachment 8872197 [details]
Bug 1364768: Part 8 - Wait for delayed startup before loading most background pages.

https://reviewboard.mozilla.org/r/143656/#review147596

r=me with the content script timing issue answered/addressed.

::: toolkit/components/extensions/ExtensionParent.jsm:1026
(Diff revision 1)
> +        // Create a new promise to avoid keeping the browser window's
> +        // compartment alive.
> +        _delayedStartupPromise = Promise.resolve(win.delayedStartupPromise);
> +      }
> +      if (!_delayedStartupPromise) {
> +        _delayedStartupPromise = promiseObserved("browser-delayed-startup-finished").then(() => {});

Is `then` to avoid keeping the reference to the window (`subject`), similar to the comment above?  If so, this can be done in one place and/or made clearer with comments.

::: toolkit/components/extensions/ext-backgroundPage.js:62
(Diff revision 1)
> +    // Unless the extension needs to be able to modify and block web
> +    // requests, delay loading the background page until after the
> +    // first browser window is visible and interactive.

Don't we need to skip the delay for extensions with content scripts?  I couldn't find anything in `delayedStartup()` that guarantees it will finish before  `document_start`, considering your new promise from the other bug: 

"the browser makes every effort to make sure that the script runs no later than this point in the load cycle."
Attachment #8872197 - Flags: review?(tomica) → review+
Comment on attachment 8872197 [details]
Bug 1364768: Part 8 - Wait for delayed startup before loading most background pages.

https://reviewboard.mozilla.org/r/143656/#review147596

> Is `then` to avoid keeping the reference to the window (`subject`), similar to the comment above?  If so, this can be done in one place and/or made clearer with comments.

Yes, it's avoiding holding the reference to the subject. But it's different from the one above. In that case, we're avoiding holding a reference to the Promise, which belongs to the browser window, but has no resolution value. In this case, the promise belongs to the ExtensionUtils.jsm global, but holds a reference to a browser window as its resolution value.

> Don't we need to skip the delay for extensions with content scripts?  I couldn't find anything in `delayedStartup()` that guarantees it will finish before  `document_start`, considering your new promise from the other bug: 
> 
> "the browser makes every effort to make sure that the script runs no later than this point in the load cycle."

In theory, yes, but we currently can't do that, for the sake of Screenshots, which does have document_start content scripts, and needs a delayed load for its background script.

But at the moment, this isn't any different from our current behavior, since there's such a long delay before the point in the startup process where we load content scripts, so I'd rather not change it just now.

My plan for the start of the 56 cycle is to begin tracking web requests and content loads before extension startup is finished, and only block for matching documents/requests, and only when necessary.
Comment on attachment 8867573 [details]
Bug 1364768: Part 6 - Use startup cache for initial extension permission data.

https://reviewboard.mozilla.org/r/139116/#review147170

> We clear all of the cache data for an extension after it's uninstalled, so the data gets removed from the cache anyway.

It gets cleared from the cache but not from `permissions.json`
Comment on attachment 8872197 [details]
Bug 1364768: Part 8 - Wait for delayed startup before loading most background pages.

https://reviewboard.mozilla.org/r/143656/#review148088

::: commit-message-4f033:3
(Diff revision 1)
> +We shouldn't actually need to load background pages until fairly late, except
> +when blocking web request listeners are used.

Hm, I'm not so sure about this.  An extension that uses non-blocking webRequest might still want to have a chance to see everything at startup.  For that matter, any extensions that use events might want to see events that happen during browser startup.

I don't think we actually handle this correctly today though so that's probably out of scope for this bug...
Comment on attachment 8872197 [details]
Bug 1364768: Part 8 - Wait for delayed startup before loading most background pages.

https://reviewboard.mozilla.org/r/143656/#review148090

I'm fine deferring (har har) the discussion about exactly when we delay background page creation.
Attachment #8872197 - Flags: review?(aswan) → review+
Is this patch going to land for 55?

Screenshots is seeing (sometimes fatal) IndexedDB errors that we were hoping would be resolved when this lands, see https://github.com/mozilla-services/screenshots/issues/2911
Flags: needinfo?(kmaglione+bmo)
No. I was initially hoping to get this landed before the freeze, but after rebasing there were some new test failures that I didn't have time to resolve.

However, the fatal startup errors were likely fixed by the patches for bug 1368152.
Flags: needinfo?(kmaglione+bmo)
> However, the fatal startup errors were likely fixed by the patches for bug 1368152.

Good to know! That's what I was after anyway :-)
Is it related to #1357773?
Keywords: leave-open
https://hg.mozilla.org/integration/mozilla-inbound/rev/ee7cc8ce4a22ece5d5e3c41051081b8f554984c8
Bug 1364768: Part 1 - Add NetUtil.readInputStream helper. r=aswan

https://hg.mozilla.org/integration/mozilla-inbound/rev/a52d8db889321df1f9e0576801bc6629b85b0260
Bug 1364768: Part 2 - Add AsyncShutdown finalizer support to DeferredSave. r=rhelmer

https://hg.mozilla.org/integration/mozilla-inbound/rev/889cdd7588986fa4b271f6287b4cdb50be1a8e77
Bug 1364768: Part 3 - Add helpers to serialize and deserialize compressed structured clone blobs. r=rhelmer

https://hg.mozilla.org/integration/mozilla-inbound/rev/46aae63fae9175b8bc6ab0775e3c2d845ef7fc54
Bug 1364768: Part 4 - Switch to a compressed, binary flat file for startup cache. r=aswan

https://hg.mozilla.org/integration/mozilla-inbound/rev/5637dcecb9801f5ba82ef3e4bc2444c708c05985
Bug 1364768: Part 5 - Store the list of available locales in the startup cache. r=aswan

https://hg.mozilla.org/integration/mozilla-inbound/rev/9dd2156225d334fc9c164017b7c5068273bfb7ca
Bug 1364768: Part 6 - Use startup cache for initial extension permission data. r=aswan
I'll file a follow-up for the last part.
Status: NEW → RESOLVED
Closed: 7 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
See Also: → 1388266
Product: Toolkit → WebExtensions
Blocks: 1631499
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: