This is not a Firefox-side regression. The `addons-mlbf.bin` file is fetched from Remote Settings by [`periodic_file_updates.sh`](https://searchfox.org/mozilla-central/rev/bb2e8205177734a33b458abecfe5e47949cdf98a/taskcluster/docker/periodic-updates/scripts/periodic_file_updates.sh#434-447). This script is run on all branches, so the observed binary size increase will be observed on Nightly, Beta, DevEd, Release, ESR128, ESR115. Only Firefox desktop is affected, the mobile browsers are not affected because we do not package these remote settings dumps out of size concerns. The `addons-mlbf.bin` is a multi-layered bloom filter (aka cascade filter) that compactly represents the set of blocked add-ons among the full set of signed add-ons (at the time of bloom filter generation). This remote settings attachment is updated by AMO, with generation logic at: - `generate_mlbf`: https://github.com/mozilla/addons-server/blob/ffd993045cef73fb7635e4a94ac95e1a3a9073ec/src/olympia/blocklist/mlbf.py#L41-L74 - `generate_and_write_filter`: https://github.com/mozilla/addons-server/blob/ffd993045cef73fb7635e4a94ac95e1a3a9073ec/src/olympia/blocklist/mlbf.py#L234-L280 - `filtercascade` library: https://github.com/mozilla/filter-cascade/blob/b5469a2d0efaf150e3ce4c04b0d21b4ba2306cf7/filtercascade/__init__.py To verify the bloomfilter generation, the following inputs are needed: - The `include` and `exclude` values, i.e. the set of all add-on versions, partitioned by hard blocked (=include) and not hard blocked (=exclude). All entries should be formatted as `${addon.guid}:${addon.version}`. - The salt used in the bloom filter. I can extract it from the start of the `addons-mlbf.bin` file (see [`filtercascade` file format](https://github.com/mozilla/filter-cascade/blob/b5469a2d0efaf150e3ce4c04b0d21b4ba2306cf7/filtercascade/__init__.py#L429-L440)). The history of past dumps can be found in [the git log of `addons-mlbf.bin`](https://github.com/mozilla/gecko-dev/commits/HEAD/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin). A human-readable overview of size changes is visible in the `addons-mlbf.bin.meta.json` file that is included with each update ([git log of `addons-mlbf.bin.meta.json`](https://github.com/mozilla/gecko-dev/commits/HEAD/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin.meta.json)). Although I don't know the inputs at the time of the generation, I am able to parse the old and new bloom filters to run an analysis: | | new | old | older | oldest | | - | - | - | - | - | | date | 17 dec 2024 | 7 nov 2024 | 26 aug 2024 | 1 jun 2020 | | addons-mlbf.bin.meta.json | [new meta.json](https://raw.githubusercontent.com/mozilla/gecko-dev/458a470e336a2a61e4d103e8717cf541c65a89d6/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin.meta.json) | [old meta.json](https://raw.githubusercontent.com/mozilla/gecko-dev/b1cd7e191ac57f8dfe31806be2539e51843c03eb/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin.meta.json) | [older meta.json](https://raw.githubusercontent.com/mozilla/gecko-dev/22f25e8e728a63acf1b4ba1e8ff2c227c86c826b/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin.meta.json) | [oldest meta.json](https://raw.githubusercontent.com/mozilla/gecko-dev/12faf950153701076eac10d4e1fe077b6cbd6758/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin.meta.json) | | addons-mlbf.bin | [new addons-mlbf.bin](https://raw.githubusercontent.com/mozilla/gecko-dev/458a470e336a2a61e4d103e8717cf541c65a89d6/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin.meta.json) | [old addons-mlbf.bin](https://raw.githubusercontent.com/mozilla/gecko-dev/b1cd7e191ac57f8dfe31806be2539e51843c03eb/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin.meta.json) | [older addons-mlbf.bin](https://raw.githubusercontent.com/mozilla/gecko-dev/22f25e8e728a63acf1b4ba1e8ff2c227c86c826b/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin.meta.json) | [oldest addons-mlbf.bin](https://raw.githubusercontent.com/mozilla/gecko-dev/12faf950153701076eac10d4e1fe077b6cbd6758/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin) | | size of addons-mlbf.bin | **931343 bytes** | 845025 bytes | 841024 bytes | 787677 bytes | | size of first layer | **608252 bytes** | 379564 bytes | 374473 bytes | 318293 bytes | | size of second layer | 66032 bytes | 142117 bytes | 142526 bytes | 145398 bytes | | size of third layer | 105102 bytes | 105040 bytes | 105327 bytes | 103894 bytes | | size of fourth layer | 31003 bytes | 66531 bytes | 66732 bytes | 67962 bytes | | number of hash functions at first layer | 3 | 2 | 2 | 2 | | number of hash functions at other layers | 1 | 1 | 1 | 1 | | number of layers | 24 | 25 | 24 | 24 | The interesting observation is that the latest bloom filter is the almost doubled size, and the 3 hash functions. These sizes are directly derived from the input, with the source code at [filtercascade's `calc_n_hashes` and `calc_size` functions](https://github.com/mozilla/filter-cascade/blob/b5469a2d0efaf150e3ce4c04b0d21b4ba2306cf7/filtercascade/__init__.py#L149-L164). Although the size is dependent on the number of elements (i.e. from the "include" set) and the `falsePositiveRate`, the `calc_n_hashes` function takes only one input: `number of hash functions at first layer = ceil( log2(1 / falsePositiveRate) )` From this, my conclusion is that the `falsePositiveRate` dropped so much that the result of the computation went from 2 to 3. That implies that `falsePositiveRate` went below `2 ^ -2.5 = 0.1767766952966369`. The `falsePositiveRate` is computed by [`set_crlite_error_rates`](https://github.com/mozilla/filter-cascade/blob/b5469a2d0efaf150e3ce4c04b0d21b4ba2306cf7/filtercascade/__init__.py#L244-L249), as: - `falsePositiveRate at first layer = include_len / (sqrt(2) * exclude_len)` - `falsePositiveRate at other layers = 0.5` (implying number of hash functions at every other layer is 1, because `log2(1 / 0.5) = 1`) Since we know that the `falsePositiveRate` crossed `2 ^ -2.5`, we can therefore compute the threshold ratio `include_len / exclude_len`: ``` include_len / exclude_len = sqrt(2) * falsePositiveRate = sqrt(2) * 2^-2.5 = 0.25 (or lower in the new bloom filter) ``` Translated back to the original application, this means that for every blocked add-on (=include), there are 4 or more non-blocked add-ons (=exclude). Or equivalently, the relative number of blocked add-ons among all add-ons dropped below 20%. The most likely explanation is that the total number of add-ons increased past this threshold. For reference, in 2020 the number was closer to 33%, according to [this comment in addons#7492](https://github.com/mozilla/addons/issues/7492#issuecomment-2094266217). Given this analysis, it looks like a way to immediately improve the space usage is to fix `falsePositiveRate` above `2^-2.5`, e.g. to `0.176777`, so that `number of hash functions at first layer` is fixed at 2. That can be achieved by removing `set_crlite_error_rates(include_len=error_rates[0], exclude_len=error_rates[1])` in favor of `cascade.set_error_rates([0.176777, 0.5])` at https://github.com/mozilla/addons-server/blob/0f718e347cde838085c9f8b2f5eec8fb45f125b4/src/olympia/blocklist/mlbf.py#L55-L57 The above magic number is specific to the current numbers on production. Given the benefits of smaller sizes, it may be worth trying to generate different bloom filters with different parameters, and taking the smallest result out of the attempts. I'll file a follow-up task for the addons-server repo.
Bug 1938506 Comment 5 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
This is not a Firefox-side regression. The `addons-mlbf.bin` file is fetched from Remote Settings by [`periodic_file_updates.sh`](https://searchfox.org/mozilla-central/rev/bb2e8205177734a33b458abecfe5e47949cdf98a/taskcluster/docker/periodic-updates/scripts/periodic_file_updates.sh#434-447). This script is run on all branches, so the observed binary size increase will be observed on Nightly, Beta, DevEd, Release, ESR128, ESR115. Only Firefox desktop is affected, the mobile browsers are not affected because we do not package these remote settings dumps out of size concerns. The `addons-mlbf.bin` is a multi-layered bloom filter (aka cascade filter) that compactly represents the set of blocked add-ons among the full set of signed add-ons (at the time of bloom filter generation). This remote settings attachment is updated by AMO, with generation logic at: - `generate_mlbf`: https://github.com/mozilla/addons-server/blob/ffd993045cef73fb7635e4a94ac95e1a3a9073ec/src/olympia/blocklist/mlbf.py#L41-L74 - `generate_and_write_filter`: https://github.com/mozilla/addons-server/blob/ffd993045cef73fb7635e4a94ac95e1a3a9073ec/src/olympia/blocklist/mlbf.py#L234-L280 - `filtercascade` library: https://github.com/mozilla/filter-cascade/blob/b5469a2d0efaf150e3ce4c04b0d21b4ba2306cf7/filtercascade/__init__.py To verify the bloomfilter generation, the following inputs are needed: - The `include` and `exclude` values, i.e. the set of all add-on versions, partitioned by hard blocked (=include) and not hard blocked (=exclude). All entries should be formatted as `${addon.guid}:${addon.version}`. - The salt used in the bloom filter. I can extract it from the start of the `addons-mlbf.bin` file (see [`filtercascade` file format](https://github.com/mozilla/filter-cascade/blob/b5469a2d0efaf150e3ce4c04b0d21b4ba2306cf7/filtercascade/__init__.py#L429-L440)). The history of past dumps can be found in [the git log of `addons-mlbf.bin`](https://github.com/mozilla/gecko-dev/commits/HEAD/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin). A human-readable overview of size changes is visible in the `addons-mlbf.bin.meta.json` file that is included with each update ([git log of `addons-mlbf.bin.meta.json`](https://github.com/mozilla/gecko-dev/commits/HEAD/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin.meta.json)). Although I don't know the inputs at the time of the generation, I am able to parse the old and new bloom filters to run an analysis: | | new | old | older | oldest | | - | - | - | - | - | | date | 17 dec 2024 | 7 nov 2024 | 26 aug 2024 | 1 jun 2020 | | addons-mlbf.bin.meta.json | [new meta.json](https://raw.githubusercontent.com/mozilla/gecko-dev/458a470e336a2a61e4d103e8717cf541c65a89d6/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin.meta.json) | [old meta.json](https://raw.githubusercontent.com/mozilla/gecko-dev/b1cd7e191ac57f8dfe31806be2539e51843c03eb/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin.meta.json) | [older meta.json](https://raw.githubusercontent.com/mozilla/gecko-dev/22f25e8e728a63acf1b4ba1e8ff2c227c86c826b/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin.meta.json) | [oldest meta.json](https://raw.githubusercontent.com/mozilla/gecko-dev/12faf950153701076eac10d4e1fe077b6cbd6758/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin.meta.json) | | addons-mlbf.bin | [new addons-mlbf.bin](https://raw.githubusercontent.com/mozilla/gecko-dev/458a470e336a2a61e4d103e8717cf541c65a89d6/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin) | [old addons-mlbf.bin](https://raw.githubusercontent.com/mozilla/gecko-dev/b1cd7e191ac57f8dfe31806be2539e51843c03eb/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin) | [older addons-mlbf.bin](https://raw.githubusercontent.com/mozilla/gecko-dev/22f25e8e728a63acf1b4ba1e8ff2c227c86c826b/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin) | [oldest addons-mlbf.bin](https://raw.githubusercontent.com/mozilla/gecko-dev/12faf950153701076eac10d4e1fe077b6cbd6758/services/settings/dumps/blocklists/addons-bloomfilters/addons-mlbf.bin) | | size of addons-mlbf.bin | **931343 bytes** | 845025 bytes | 841024 bytes | 787677 bytes | | size of first layer | **608252 bytes** | 379564 bytes | 374473 bytes | 318293 bytes | | size of second layer | 66032 bytes | 142117 bytes | 142526 bytes | 145398 bytes | | size of third layer | 105102 bytes | 105040 bytes | 105327 bytes | 103894 bytes | | size of fourth layer | 31003 bytes | 66531 bytes | 66732 bytes | 67962 bytes | | number of hash functions at first layer | 3 | 2 | 2 | 2 | | number of hash functions at other layers | 1 | 1 | 1 | 1 | | number of layers | 24 | 25 | 24 | 24 | The interesting observation is that the latest bloom filter is the almost doubled size, and the 3 hash functions. These sizes are directly derived from the input, with the source code at [filtercascade's `calc_n_hashes` and `calc_size` functions](https://github.com/mozilla/filter-cascade/blob/b5469a2d0efaf150e3ce4c04b0d21b4ba2306cf7/filtercascade/__init__.py#L149-L164). Although the size is dependent on the number of elements (i.e. from the "include" set) and the `falsePositiveRate`, the `calc_n_hashes` function takes only one input: `number of hash functions at first layer = ceil( log2(1 / falsePositiveRate) )` From this, my conclusion is that the `falsePositiveRate` dropped so much that the result of the computation went from 2 to 3. That implies that `falsePositiveRate` went below `2 ^ -2.5 = 0.1767766952966369`. EDIT: -2, not -2.5 - see comment 7 below. The `falsePositiveRate` is computed by [`set_crlite_error_rates`](https://github.com/mozilla/filter-cascade/blob/b5469a2d0efaf150e3ce4c04b0d21b4ba2306cf7/filtercascade/__init__.py#L244-L249), as: - `falsePositiveRate at first layer = include_len / (sqrt(2) * exclude_len)` - `falsePositiveRate at other layers = 0.5` (implying number of hash functions at every other layer is 1, because `log2(1 / 0.5) = 1`) Since we know that the `falsePositiveRate` crossed `2 ^ -2.5`, we can therefore compute the threshold ratio `include_len / exclude_len`: ``` include_len / exclude_len = sqrt(2) * falsePositiveRate = sqrt(2) * 2^-2.5 = 0.25 (or lower in the new bloom filter) ``` Translated back to the original application, this means that for every blocked add-on (=include), there are 4 or more non-blocked add-ons (=exclude). Or equivalently, the relative number of blocked add-ons among all add-ons dropped below 20%. The most likely explanation is that the total number of add-ons increased past this threshold. For reference, in 2020 the number was closer to 33%, according to [this comment in addons#7492](https://github.com/mozilla/addons/issues/7492#issuecomment-2094266217). Given this analysis, it looks like a way to immediately improve the space usage is to fix `falsePositiveRate` above `2^-2.5`, e.g. to `0.176777`, so that `number of hash functions at first layer` is fixed at 2. That can be achieved by removing `set_crlite_error_rates(include_len=error_rates[0], exclude_len=error_rates[1])` in favor of `cascade.set_error_rates([0.176777, 0.5])` at https://github.com/mozilla/addons-server/blob/0f718e347cde838085c9f8b2f5eec8fb45f125b4/src/olympia/blocklist/mlbf.py#L55-L57 The above magic number is specific to the current numbers on production. Given the benefits of smaller sizes, it may be worth trying to generate different bloom filters with different parameters, and taking the smallest result out of the attempts. I'll file a follow-up task for the addons-server repo.