Open Bug 1938506 Opened 28 days ago Updated 3 days ago

0.1% installer size (Windows) regression on Tue December 17 2024

Categories

(Toolkit :: Blocklist Implementation, defect, P1)

defect

Tracking

()

ASSIGNED
Tracking Status
firefox-esr128 --- fix-optional
firefox133 --- wontfix
firefox134 --- fix-optional
firefox135 --- fix-optional

People

(Reporter: intermittent-bug-filer, Assigned: robwu)

References

Details

(Keywords: perf-alert, regression, Whiteboard: [addons-jira])

Perfherder has detected a build_metrics performance regression from push 1938560610a1d0ab8716d0a337e5f4dcafc61e9f. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
0.10% installer size windows2012-64 101,929,773.50 -> 102,028,444.25

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.

You can run all of these tests on try with ./mach try perf --alert 43157

The following documentation link provides more information about this command.

For more information on performance sheriffing please see our FAQ.

If you have any questions, please do not hesitate to reach out to afinder@mozilla.com.

Flags: needinfo?(afinder)
Flags: needinfo?(afinder)

(In reply to Treeherder Bug Filer from comment #0)

Perfherder has detected a build_metrics performance regression from push 1938560610a1d0ab8716d0a337e5f4dcafc61e9f.

That changeset was a test-only change, so couldn't have caused this unless something went really wrong.

The other changeset in that landing was 1033685b7f6dcca657f68af25a71a1f6cf2c67f5, which in its regular updates also has the new binary bloomfilter attachments. So this is probably a regression from bug 1922308.

Flags: needinfo?(standard8)
Regressed by: 1922308

This is not a regression of bug 1922308.

The main relevant thing from bug is softblocks-addons-mlbf.bin.meta.json, whose size changes from 4009 to 9413.

There is another change:

file: addons-mlbf.bin.meta.json
old size: 845025
new size: 931343

These increases are not minor. The same commit shows that the number of removed JSON entries (that get compressed and included in the bloomfilter) is relatively small. This increase in data size implies that there is either a lot of additional items being included (=add-ons being blocked) or a bug in the bloomfilter generation.

I'll raise this within my team.

No longer regressed by: 1922308

(In reply to Rob Wu [:robwu] from comment #3)

This is not a regression of bug 1922308.

The main relevant thing from bug is softblocks-addons-mlbf.bin.meta.json, whose size changes from 4009 to 9413.

There is another change:

file: addons-mlbf.bin.meta.json
old size: 845025
new size: 931343

These increases are not minor. The same commit shows that the number of removed JSON entries (that get compressed and included in the bloomfilter) is relatively small. This increase in data size implies that there is either a lot of additional items being included (=add-ons being blocked) or a bug in the bloomfilter generation.

I'll raise this within my team.

Hi Rob!

Any updates on this regression ? Looks like it has downstreamed to mozilla-beta.

Thanks!

Flags: needinfo?(rob)

This is not a Firefox-side regression. The addons-mlbf.bin file is fetched from Remote Settings by periodic_file_updates.sh. This script is run on all branches, so the observed binary size increase will be observed on Nightly, Beta, DevEd, Release, ESR128, ESR115. Only Firefox desktop is affected, the mobile browsers are not affected because we do not package these remote settings dumps out of size concerns.

The addons-mlbf.bin is a multi-layered bloom filter (aka cascade filter) that compactly represents the set of blocked add-ons among the full set of signed add-ons (at the time of bloom filter generation).

This remote settings attachment is updated by AMO, with generation logic at:

To verify the bloomfilter generation, the following inputs are needed:

  • The include and exclude values, i.e. the set of all add-on versions, partitioned by hard blocked (=include) and not hard blocked (=exclude). All entries should be formatted as ${addon.guid}:${addon.version}.
  • The salt used in the bloom filter. I can extract it from the start of the addons-mlbf.bin file (see filtercascade file format). The history of past dumps can be found in the git log of addons-mlbf.bin. A human-readable overview of size changes is visible in the addons-mlbf.bin.meta.json file that is included with each update (git log of addons-mlbf.bin.meta.json).

Although I don't know the inputs at the time of the generation, I am able to parse the old and new bloom filters to run an analysis:

new old older oldest
date 17 dec 2024 7 nov 2024 26 aug 2024 1 jun 2020
addons-mlbf.bin.meta.json new meta.json old meta.json older meta.json oldest meta.json
addons-mlbf.bin new addons-mlbf.bin old addons-mlbf.bin older addons-mlbf.bin oldest addons-mlbf.bin
size of addons-mlbf.bin 931343 bytes 845025 bytes 841024 bytes 787677 bytes
size of first layer 608252 bytes 379564 bytes 374473 bytes 318293 bytes
size of second layer 66032 bytes 142117 bytes 142526 bytes 145398 bytes
size of third layer 105102 bytes 105040 bytes 105327 bytes 103894 bytes
size of fourth layer 31003 bytes 66531 bytes 66732 bytes 67962 bytes
number of hash functions at first layer 3 2 2 2
number of hash functions at other layers 1 1 1 1
number of layers 24 25 24 24

The interesting observation is that the latest bloom filter is the almost doubled size, and the 3 hash functions.
These sizes are directly derived from the input, with the source code at filtercascade's calc_n_hashes and calc_size functions. Although the size is dependent on the number of elements (i.e. from the "include" set) and the falsePositiveRate, the calc_n_hashes function takes only one input:

number of hash functions at first layer = ceil( log2(1 / falsePositiveRate) )

From this, my conclusion is that the falsePositiveRate dropped so much that the result of the computation went from 2 to 3. That implies that falsePositiveRate went below 2 ^ -2.5 = 0.1767766952966369. EDIT: -2, not -2.5 - see comment 7 below.

The falsePositiveRate is computed by set_crlite_error_rates, as:

  • falsePositiveRate at first layer = include_len / (sqrt(2) * exclude_len)
  • falsePositiveRate at other layers = 0.5 (implying number of hash functions at every other layer is 1, because log2(1 / 0.5) = 1)

Since we know that the falsePositiveRate crossed 2 ^ -2.5, we can therefore compute the threshold ratio include_len / exclude_len:

include_len / exclude_len
  = sqrt(2) * falsePositiveRate
  = sqrt(2) * 2^-2.5
  = 0.25    (or lower in the new bloom filter)

Translated back to the original application, this means that for every blocked add-on (=include), there are 4 or more non-blocked add-ons (=exclude). Or equivalently, the relative number of blocked add-ons among all add-ons dropped below 20%. The most likely explanation is that the total number of add-ons increased past this threshold. For reference, in 2020 the number was closer to 33%, according to this comment in addons#7492.

Given this analysis, it looks like a way to immediately improve the space usage is to fix falsePositiveRate above 2^-2.5, e.g. to 0.176777, so that number of hash functions at first layer is fixed at 2. That can be achieved by removing set_crlite_error_rates(include_len=error_rates[0], exclude_len=error_rates[1]) in favor of cascade.set_error_rates([0.176777, 0.5]) at https://github.com/mozilla/addons-server/blob/0f718e347cde838085c9f8b2f5eec8fb45f125b4/src/olympia/blocklist/mlbf.py#L55-L57

The above magic number is specific to the current numbers on production. Given the benefits of smaller sizes, it may be worth trying to generate different bloom filters with different parameters, and taking the smallest result out of the attempts. I'll file a follow-up task for the addons-server repo.

Component: Performance → Blocklist Implementation
Flags: needinfo?(rob)
Product: Testing → Toolkit
Whiteboard: [addons-jira]

I filed https://github.com/mozilla/addons/issues/15261

I'll keep this bug open until a new MLBF has been generated and published.

Assignee: nobody → rob
Status: NEW → ASSIGNED
Severity: -- → S4
Priority: -- → P1

(In reply to Rob Wu [:robwu] from comment #5)

number of hash functions at first layer = ceil( log2(1 / falsePositiveRate) )

From this, my conclusion is that the falsePositiveRate dropped so much that the result of the computation went from 2 to 3. That implies that falsePositiveRate went below 2 ^ -2.5 = 0.1767766952966369.

Correction: since the number is rounded up instead of rounded to the nearest integer, the correct conclusion is that falsePositiveRate went below 2 ^ -2 (i.e 0.25). And the ratio therefore dropped below the following threshold, resulting in number of hash functions at first layer to grow from 2 to 3:

include_len / exclude_len
  = sqrt(2) * falsePositiveRate
  = sqrt(2) * 0.25
  = 0.353553390593    (or lower in the new bloom filter)

Given this analysis, it looks like a way to immediately improve the space usage is to fix falsePositiveRate at 0.25, so that number of hash functions at first layer is fixed at 2. That can be achieved by removing set_crlite_error_rates(include_len=error_rates[0], exclude_len=error_rates[1]) in favor of cascade.set_error_rates([0.25, 0.5]) at https://github.com/mozilla/addons-server/blob/0f718e347cde838085c9f8b2f5eec8fb45f125b4/src/olympia/blocklist/mlbf.py#L55-L57

I have identified the cause of the unexpectedly increased file size, elaborated at https://github.com/mozilla/addons/issues/15261#issuecomment-2584947155 . In short, the generation process duplicated entries. While these do not affect the logical outcome of the data represented by the MLBF, they did result in a larger file size.

The next step here is to fix the error in the generation on the addons-server side.

If for some reason we want to fast-track a reduced file size ASAP, it is possible to replace the existing addons-mlbf.bin (and addons-mlbf.bin.meta.json) file with an equivalent file that has the optimal file size (931343 -> 847859).

You need to log in before you can comment on or make changes to this bug.