Closed Bug 1454378 Opened 6 years ago Closed 6 years ago

Loading the blocklist is janky

Categories

(Toolkit :: Blocklist Implementation, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: florian, Assigned: Gijs)

References

Details

Attachments

(5 files)

Despite all the work that went into reducing the size of the blocklist, loading it (off of an idle callback at the end of startup) is still janky.

Here is a profile of it on a very slow netbook, captured on yesterday's nightly: https://perfht.ml/2J1fEGh

The straight forward fix here would be to make _loadBlocklistFromXML do its work in chunks. This may be wontfix if we are confident we'll switch this to indexedDB and completely away from the xml files soon. I'm filing this mostly to put the profile link somewhere for future reference.
I'm working on bug 1257565 so we should just pursue that for 62.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
The patches in that trypush:
- convert the last few sync callsites so all of them are async
- switch to using XHR instead of OS.File + mainthread manual DOMParser invocation. Hopefully this means we do the XML parsing off-mainthread, and/or are more sensible about how to chunk it / stream it / whatever. Gecko solves these problems for a living so we probably can't do better by hacking something up in JS.
- reduce the processing time for the XML by roughly 30% on my machine, through various minor performance improvements that include:
-- switching from 'childNodes' to 'children' and avoiding all the node type checks
-- caching things off appinfo
-- not requesting the same attribute first with hasAttribute then with getAttribute
-- avoiding accessing the same DOM things repeatedly for the same node
-- doing some inlining
-- etc. etc.
I don't think it's become much more ugly (though TBH the code wasn't that pretty in the first place).

Then I decided that was still not enough so I added chunking to the processing. I expect the chunking to break some tests but try is going to be the simplest way of finding out which ones...
Assignee: nobody → gijskruitbosch+bugs
Status: REOPENED → ASSIGNED
Comment on attachment 8985696 [details]
Bug 1454378 - remove last sync XML blocklist reading code,

https://reviewboard.mozilla.org/r/251244/#review258460

Nice! :-)

::: toolkit/mozapps/extensions/Blocklist.jsm:837
(Diff revision 1)
> -    let text = await OS.File.read(path, { encoding: "utf-8" });
> +    let xmlDoc = await new Promise((resolve, reject) => {
> +      let request = new XMLHttpRequest();
> +      request.open("GET", Services.io.newFileURI(file).spec, true);
> +      request.overrideMimeType("text/xml");
> +      request.addEventListener("error", function(err) {
> +        reject(err);

Can this be simplified to request.addEventListener("error", reject); ?

::: toolkit/mozapps/extensions/Blocklist.jsm:849
(Diff revision 1)
> +          return;
> +        }
> +        let doc = request.responseXML;
> +        if (doc.documentElement.namespaceURI != XMLURI_BLOCKLIST) {
> +          LOG("Blocklist::_loadBlocklistFromString: aborting due to incorrect " +
> +              "XML Namespace.\r\nExpected: " + XMLURI_BLOCKLIST + "\r\n" +

You are just moving this, but what are the two \r characters for?

::: toolkit/mozapps/extensions/test/xpcshell/test_blocklist_gfx.js:61
(Diff revision 1)
>    " </gfxBlacklistEntry>" +
>    "</gfxItems>" +
>    "</blocklist>";
>    const blocklist = getBlocklist();
> -  blocklist._loadBlocklistFromString(input);
> +
> +  blocklist._loadBlocklistFromXML(gParser.parseFromString(input, "text/xml"));

We have this 5 times in the test file, how about a getBlocklistFromString helper that would call getBlocklist and then do the blocklist._loadBlocklistFromXML(gParser.parseFromString(..., "text/xml")) dance?
Attachment #8985696 - Flags: review?(florian) → review+
Comment on attachment 8985697 [details]
Bug 1454378 - switch to for...of loops and .children instead of manual .item() calls,

https://reviewboard.mozilla.org/r/251246/#review258494

Looks good to me!

::: toolkit/mozapps/extensions/Blocklist.jsm:911
(Diff revision 1)
>    },
>  
> -  _processItemNodes(itemNodes, itemName, handler) {
> +  _processItemNodes(items, itemName, handler) {
>      var result = [];
> -    for (var i = 0; i < itemNodes.length; ++i) {
> -      var blocklistElement = itemNodes.item(i);
> +    for (let item of items) {
> +      if (item.localName != itemName)

nit: It's tempting to reverse the test to simplify by removing the 'continue;' line.

::: toolkit/mozapps/extensions/Blocklist.jsm:949
(Diff revision 1)
> -      if (childElement.nodeType != childElement.ELEMENT_NODE)
> -        continue;
>        if (childElement.localName === "prefs") {
> -        let prefElements = childElement.childNodes;
> -        for (let i = 0; i < prefElements.length; i++) {
> -          let prefElement = prefElements.item(i);
> +        let prefElements = childElement.children;
> +        for (let prefElement of prefElements) {
> +          if (prefElement.localName !== "pref")

same here

::: toolkit/mozapps/extensions/Blocklist.jsm:980
(Diff revision 1)
>        blockID: null,
>        infoURL: null,
>      };
>      var hasMatch = false;
> -    for (var x = 0; x < matchNodes.length; ++x) {
> -      var matchElement = matchNodes.item(x);
> +    for (let childElement of children) {
> +      if (childElement.localName == "match") {

nit: should we put the localName in a local variable to avoid calling the getter up to 3 times?

::: toolkit/mozapps/extensions/Blocklist.jsm:1531
(Diff revision 1)
>    getBlocklistAppVersions(targetAppElement) {
>      var appVersions = [ ];
>  
>      if (targetAppElement) {
> -      for (var i = 0; i < targetAppElement.childNodes.length; ++i) {
> -        var versionRangeElement = targetAppElement.childNodes.item(i);
> +      for (let versionRangeElement of targetAppElement.children) {
> +        if (versionRangeElement.localName != "versionRange")

nit: this can also be reversed to remove the continue.
Attachment #8985697 - Flags: review?(florian) → review+
Comment on attachment 8985698 [details]
Bug 1454378 - make gfx blocklist processing faster,

https://reviewboard.mozilla.org/r/251248/#review258500

::: toolkit/mozapps/extensions/Blocklist.jsm:1203
(Diff revision 1)
> +      "feature", "featureStatus", "hardware", "manufacturer", "model", "os", "osversion",
> +      "product", "vendor", "versionRange",
> +    ];
>      // Notify `GfxInfoBase`, by passing a string serialization.
>      // This way we avoid spreading XML structure logics there.
> -    const payload = this._gfxEntries.map((r) => {
> +    let payload = "";

Are you sure that reimplementing Array.join() by hand is faster than pushing items to an array and calling join at the end?

I'm guessing it's the Object.keys(r).sort().filter((k) => !/id|last_modified/.test(k)) and the 2 map calls that you found slow and wanted to speed up.
Attachment #8985698 - Flags: review?(florian)
(In reply to Florian Quèze [:florian] from comment #12)
> Comment on attachment 8985698 [details]
> Bug 1454378 - make gfx blocklist processing faster,
> 
> https://reviewboard.mozilla.org/r/251248/#review258500
> 
> ::: toolkit/mozapps/extensions/Blocklist.jsm:1203
> (Diff revision 1)
> > +      "feature", "featureStatus", "hardware", "manufacturer", "model", "os", "osversion",
> > +      "product", "vendor", "versionRange",
> > +    ];
> >      // Notify `GfxInfoBase`, by passing a string serialization.
> >      // This way we avoid spreading XML structure logics there.
> > -    const payload = this._gfxEntries.map((r) => {
> > +    let payload = "";
> 
> Are you sure that reimplementing Array.join() by hand is faster than pushing
> items to an array and calling join at the end?

Well, the microbenchmarks I found seem to suggest that is the case, yes: https://jsperf.com/join-concat/2 (when run on my mbp, the array code is 20% slower). And when I ran this code (well, the entirety of the blocklist processing) in a loop and profiled it, the Array map/sort/filter/regex/join stuff showed up, which is why I simplified it, reducing the number of times we iterate over the list.

> I'm guessing it's the Object.keys(r).sort().filter((k) =>
> !/id|last_modified/.test(k)) and the 2 map calls that you found slow and
> wanted to speed up.

Sure, the sort/filter and double map was the slowest, but if we don't keep the map(), using an array for the string stuff didn't seem very natural, either.

Really, the whole thing is a bit stupid because we basically transform this stuff 3 times:

1) from xml to some JS objects
2) from JS objects to strings
3) from strings to C++ objects

I would have refactored this some more, but (a) some of this will go away when we move to kinto, and (b) from talking to the gfx team at allhands, we mostly migrate blocklist items into a compiled-in list once we land them - we just use the blocklist service version to do on-the-fly updates, so the list should be reasonably small most of the time. Based on that, I'm tempted to talk to them about making this list smaller, but IIRC from looking at profiles, this code takes 10% or less of the time that the add-on processing code takes, so optimizing it further might not be all that valuable anyway.


In any case, you didn't specify a clear action for me to take here - did you want me to re-introduce the arrays for the string concatenation? Or something else?
Flags: needinfo?(florian)
(In reply to Florian Quèze [:florian] from comment #11)
> Comment on attachment 8985697 [details]

> ::: toolkit/mozapps/extensions/Blocklist.jsm:980
> (Diff revision 1)
> >        blockID: null,
> >        infoURL: null,
> >      };
> >      var hasMatch = false;
> > -    for (var x = 0; x < matchNodes.length; ++x) {
> > -      var matchElement = matchNodes.item(x);
> > +    for (let childElement of children) {
> > +      if (childElement.localName == "match") {
> 
> nit: should we put the localName in a local variable to avoid calling the
> getter up to 3 times?

Now I see you are moving this to a switch() in another patch, which is even nicer. So ignore this comment.
Comment on attachment 8985699 [details]
Bug 1454378 - cache and inline things, avoid duplicate attribute or property requests, to make blocklist faster,

https://reviewboard.mozilla.org/r/251250/#review258508

::: toolkit/mozapps/extensions/Blocklist.jsm:532
(Diff revision 1)
>        return null;
>      // Returns true if the params object passes the constraints set by entry.
>      // (For every non-null property in entry, the same key must exist in
>      // params and value must be the same)
>      function checkEntry(entry, params) {
> -      for (let [key, value] of entry) {
> +      for (let [key, value] of Object.entries(entry)) {

Could you please explain this change?

::: toolkit/mozapps/extensions/Blocklist.jsm:608
(Diff revision 1)
>      if (pingCountVersion < 1)
>        pingCountVersion = 1;
>      if (pingCountTotal < 1)
>        pingCountTotal = 1;
>  
> -    dsURI = dsURI.replace(/%APP_ID%/g, gApp.ID);
> +    dsURI = dsURI.replace(/%APP_ID%/g, gAppID);

These chained .replace(/%NAME%/g, ...) calls can be pretty slow, especially if it's in a hot code path. See bug 1356593 for an example of a faster implementation that uses a single regexp and gets the values lazily.

::: toolkit/mozapps/extensions/Blocklist.jsm:927
(Diff revision 1)
>        // Atleast one of EXTENSION_BLOCK_FILTERS must get added to attributes
>      };
>  
>      // Any filter starting with '/' is interpreted as a regex. So if an attribute
>      // starts with a '/' it must be checked via a regex.
>      function regExpCheck(attr) {

Why are you inlining parseRegExp but not regExpCheck (which is also called only once)?

::: toolkit/mozapps/extensions/Blocklist.jsm:950
(Diff revision 1)
>      for (let childElement of children) {
> -      if (childElement.localName === "prefs") {
> +      let localName = childElement.localName;
> +      if (localName == "prefs" && childElement.hasChildNodes) {
>          let prefElements = childElement.children;
>          for (let prefElement of prefElements) {
>            if (prefElement.localName !== "pref")

nit: change !== to != while you are at it, or maybe even to == and remove the 'continue;'?

::: toolkit/mozapps/extensions/Blocklist.jsm:1574
(Diff revision 1)
>     *        A JS object with the following properties:
>     *          "minVersion"  The minimum version in a version range (default = null).
>     *          "maxVersion"  The maximum version in a version range (default = null).
>     */
>    getBlocklistVersionRange(versionRangeElement) {
> -    var minVersion = null;
> +    // getAttribute returns null if the attribute is not present.

Isn't it returning "" for non-existent attributes?
Attachment #8985699 - Flags: review?(florian)
(In reply to Florian Quèze [:florian] from comment #15)
> Comment on attachment 8985699 [details]
> Bug 1454378 - cache and inline things, avoid duplicate attribute or property
> requests, to make blocklist faster,
> 
> https://reviewboard.mozilla.org/r/251250/#review258508
> 
> ::: toolkit/mozapps/extensions/Blocklist.jsm:532
> (Diff revision 1)
> >        return null;
> >      // Returns true if the params object passes the constraints set by entry.
> >      // (For every non-null property in entry, the same key must exist in
> >      // params and value must be the same)
> >      function checkEntry(entry, params) {
> > -      for (let [key, value] of entry) {
> > +      for (let [key, value] of Object.entries(entry)) {
> 
> Could you please explain this change?

I switched the `entry` thing to be a plain JS object instead of a Map (elsewhere in this cset), because creating a Map instance is slow and was showing up in profiles. To use `for [k,v] of` with a plain JS object, you need Object.entries. Or were you trying to ask a different question?

> ::: toolkit/mozapps/extensions/Blocklist.jsm:927
> (Diff revision 1)
> >        // Atleast one of EXTENSION_BLOCK_FILTERS must get added to attributes
> >      };
> >  
> >      // Any filter starting with '/' is interpreted as a regex. So if an attribute
> >      // starts with a '/' it must be checked via a regex.
> >      function regExpCheck(attr) {
> 
> Why are you inlining parseRegExp but not regExpCheck (which is also called
> only once)?

I didn't think to, I guess we could.

> ::: toolkit/mozapps/extensions/Blocklist.jsm:1574
> (Diff revision 1)
> >     *        A JS object with the following properties:
> >     *          "minVersion"  The minimum version in a version range (default = null).
> >     *          "maxVersion"  The maximum version in a version range (default = null).
> >     */
> >    getBlocklistVersionRange(versionRangeElement) {
> > -    var minVersion = null;
> > +    // getAttribute returns null if the attribute is not present.
> 
> Isn't it returning "" for non-existent attributes?

No, that only happens in XUL.
(In reply to :Gijs (he/him) from comment #13)
> (In reply to Florian Quèze [:florian] from comment #12)
> > Comment on attachment 8985698 [details]
> > Bug 1454378 - make gfx blocklist processing faster,
> > 
> > https://reviewboard.mozilla.org/r/251248/#review258500
> > 
> > ::: toolkit/mozapps/extensions/Blocklist.jsm:1203
> > (Diff revision 1)
> > > +      "feature", "featureStatus", "hardware", "manufacturer", "model", "os", "osversion",
> > > +      "product", "vendor", "versionRange",
> > > +    ];
> > >      // Notify `GfxInfoBase`, by passing a string serialization.
> > >      // This way we avoid spreading XML structure logics there.
> > > -    const payload = this._gfxEntries.map((r) => {
> > > +    let payload = "";
> > 
> > Are you sure that reimplementing Array.join() by hand is faster than pushing
> > items to an array and calling join at the end?
> 
> Well, the microbenchmarks I found seem to suggest that is the case, yes:
> https://jsperf.com/join-concat/2 (when run on my mbp, the array code is 20%
> slower).

I ran this out of curiosity on my mbp too and we seem to have pretty different results, for me the concat version is 20% slower: https://screenshots.firefox.com/SPnSfkUeFfjjlvIZ/jsperf.com

Anyway, in bug 1355874 comment 8, nbp convinced me that I don't want to use this kind of micro benchmarks to make decisions.

> if we don't keep
> the map(), using an array for the string stuff didn't seem very natural,
> either.

The code in your current patch looks like this:
  let entryLine = "";
  for (...) {
    if (entryLine) {
      entryLine += "\t";
    }
    entryLine += value;
  }
  payload += entryLine;

I think this looks nicer:
  let entryLines = [];
  for (...) {
    entryLine.push(value);
  }
  payload += entryLines.join("\t");

It's shorter especially as you use the result of the join call only once, so you don't need to save it in a temporary variable.

> In any case, you didn't specify a clear action for me to take here - did you
> want mea to re-introduce the arrays for the string concatenation? Or
> something else?

I would prefer joining arrays yes, but I wanted to be sure there wasn't a solid perf-related reason for implementing the joins by hand that I had missed.
Flags: needinfo?(florian)
(In reply to :Gijs (he/him) from comment #16)

> > > -      for (let [key, value] of entry) {
> > > +      for (let [key, value] of Object.entries(entry)) {
> > 
> > Could you please explain this change?
> 
> I switched the `entry` thing to be a plain JS object instead of a Map
> (elsewhere in this cset), because creating a Map instance is slow and was
> showing up in profiles. To use `for [k,v] of` with a plain JS object, you
> need Object.entries. Or were you trying to ask a different question?

Thanks, no other question here :-). I just couldn't figure this out quickly enough and figured I would just ask.

> > > +    // getAttribute returns null if the attribute is not present.
> > 
> > Isn't it returning "" for non-existent attributes?
> 
> No, that only happens in XUL.

TIL :-)
Comment on attachment 8985700 [details]
Bug 1454378 - chunk blocklist processing so it doesn't hang the main thread continuously,

https://reviewboard.mozilla.org/r/251252/#review258526

::: toolkit/mozapps/extensions/Blocklist.jsm:912
(Diff revision 1)
>        if (item.localName != itemName)
>          continue;
>  
>        handler(item, result);
> +      if (Date.now() - now > MAX_SLICE) {
> +        await new Promise(ChromeUtils.idleDispatch);

Is there any chance of using the .timeRemaining() method of the parameter given to the idleDispatch callback instead of hardcoding 100ms and calling Date.now()? See https://searchfox.org/mozilla-central/rev/d0a41d2e7770fc00df7844d5f840067cc35ba26f/browser/components/sessionstore/content/content-sessionStore.js#846 for an example.
Attachment #8985700 - Flags: review?(florian)
Comment on attachment 8985696 [details]
Bug 1454378 - remove last sync XML blocklist reading code,

https://reviewboard.mozilla.org/r/251244/#review258460

> You are just moving this, but what are the two \r characters for?

"\r\n" is a windows-style line-end. I don't think it matters much here, but I've modified to `\n` instead.
Comment on attachment 8985696 [details]
Bug 1454378 - remove last sync XML blocklist reading code,

https://reviewboard.mozilla.org/r/251244/#review259452

::: toolkit/mozapps/extensions/Blocklist.jsm:375
(Diff revision 2)
> -          this._loadBlocklist();
> -          this._blocklistUpdated(null, null);
> +          // This is a bit messy. Especially in tests, but in principle also by users toggling
> +          // this preference repeatedly, plugin loads could race with each other if we don't
> +          // enforce that they are applied sequentially.
> +          // So we only update once the previous `_blocklistUpdated` call finishes running.
> +          let lastUpdate = this._lastUpdate;
> +          let newUpdate = this._lastUpdate = (async () => {
> +            await lastUpdate;
> +            this._clear();
> +            await this.loadBlocklistAsync();
> +            await this._blocklistUpdated(null, null);
> +            if (newUpdate == this._lastUpdate) {
> +              delete this._lastUpdate;
> +            }
> +          })().catch(Cu.reportError);

Note that I changed this a bit, so re-requesting review because of that.
Attachment #8985696 - Flags: review+ → review?(florian)
Comment on attachment 8985696 [details]
Bug 1454378 - remove last sync XML blocklist reading code,

https://reviewboard.mozilla.org/r/251244/#review259642

::: toolkit/mozapps/extensions/Blocklist.jsm:379
(Diff revisions 1 - 2)
>            gBlocklistEnabled = Services.prefs.getBoolPref(PREF_BLOCKLIST_ENABLED, true);
>            // This is a bit messy. Especially in tests, but in principle also by users toggling
>            // this preference repeatedly, plugin loads could race with each other if we don't
>            // enforce that they are applied sequentially.
>            // So we only update once the previous `_blocklistUpdated` call finishes running.
> -          (this._lastUpdate || Promise.resolve()).then(() => {
> +          let lastUpdate = this._lastUpdate;

I think we need a '|| undefined' at the end of this line to avoid a "ReferenceError: reference to undefined property" strict warning.
Attachment #8985696 - Flags: review?(florian) → review+
Comment on attachment 8985698 [details]
Bug 1454378 - make gfx blocklist processing faster,

https://reviewboard.mozilla.org/r/251248/#review259660
Attachment #8985698 - Flags: review?(florian) → review+
Comment on attachment 8985699 [details]
Bug 1454378 - cache and inline things, avoid duplicate attribute or property requests, to make blocklist faster,

https://reviewboard.mozilla.org/r/251250/#review259674

::: toolkit/mozapps/extensions/Blocklist.jsm:625
(Diff revision 2)
> +      PING_COUNT: pingCountVersion,
> +      TOTAL_PING_COUNT: pingCountTotal,
> +      DAYS_SINCE_LAST_PING: daysSinceLastPing,
> +    };
> +    dsURI = dsURI.replace(/%([A-Z_]+)%/g, function(fullMatch, name) {
> +      Cu.reportError("Got replacement call: " + name);

Please remove this leftover debug line.
Attachment #8985699 - Flags: review?(florian) → review+
Comment on attachment 8985700 [details]
Bug 1454378 - chunk blocklist processing so it doesn't hang the main thread continuously,

https://reviewboard.mozilla.org/r/251252/#review259688

Thanks!
Attachment #8985700 - Flags: review?(florian) → review+
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/6b8476e1423a
remove last sync XML blocklist reading code, r=florian
https://hg.mozilla.org/integration/autoland/rev/118b2b50e94a
switch to for...of loops and .children instead of manual .item() calls, r=florian
https://hg.mozilla.org/integration/autoland/rev/48f065cc3a34
make gfx blocklist processing faster, r=florian
https://hg.mozilla.org/integration/autoland/rev/155961fbe92c
cache and inline things, avoid duplicate attribute or property requests, to make blocklist faster, r=florian
https://hg.mozilla.org/integration/autoland/rev/1c235a552c32
chunk blocklist processing so it doesn't hang the main thread continuously, r=florian
Blocks: 1472163
Hi, I'm new to the Mozilla projects but would like to fix this bug. Feel free to assign me.
Component: Blocklist Policy Requests → Blocklist Implementation
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: