Refine MakePrefixSet to reduce memory usage in SafeBrowsing

RESOLVED FIXED in Firefox 68

Status

()

enhancement
P1
normal
RESOLVED FIXED
3 months ago
5 days ago

People

(Reporter: dimi, Assigned: dimi)

Tracking

unspecified
mozilla68
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox68 fixed)

Details

Attachments

(3 attachments, 2 obsolete attachments)

Assignee

Description

3 months ago

The goal of this bug is to reduce the memory usage and memory reallocation in MakePrefixSet[1]

[1] https://searchfox.org/mozilla-central/rev/6db0a6a56653355fcbb25c4fa79c6e7ffc6f88e9/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp#99

Assignee

Updated

3 months ago
Type: defect → enhancement
Assignee

Comment 1

3 months ago

The checksum calculating code is used to find the root cause of a crash
bug during update(Bug 1362761). Since the algorithm will be update in
these series of patches, we don't need to keep it.

Assignee

Comment 2

3 months ago

The MakePrefixSet[1] reduce the memory to store prefix sets by transforming 4-bytes
prefix to 2-bytes delta, the algorithm uses the following data structures:

  1. mIndexPrefix(nsTArray<uint32_t>), prefixes that are not stored in delta form,
    we can use this prefix to restore the other prefixes from the deltas.
  2. mIndexDelta(nsTArray<nsTArray<uint16_t>>), this is a two dimension nsTArray,
    each of the element in mIndexDelta is a nsTArray object which contains all
    the deltas for a prefix(the prefix is the one with the same index in mIndexPrefix)

The total size to store the prefix set includes:

  1. Number of Index Prefix * 4 bytes
  2. Number of index delta * size of nsTArray(4bytes in 32bit platform, 8bytes in 64bit platform)
  3. Size of all the delta arrays(the expected size of a single delta array is 2bytes * number of deltas for the prefix)

But the problem is that nsTArray<uint16_t> storing deltas may consume more memory
than expected. For example, if there are only 2 deltas in a nsTArray<uint16_t>,
the real memory used by the array is not just 4 bytes, it allocates 16 bytes(maybe
because memory alignment? or preallocate? I am not sure, I haven’t checked the memory
allocation code yet, I use "ShallowSizeOfExcludingThis" to calculate the size).

So whether we can benefit from this algorithm depends on if the total memory
saved(2-bytes per delta) is larger than the memory we waste, and this is highly
related to the number of prefixes for a table .
Right now, there are 4 SafeBrowsing table uses 4-bytes prefix set: phishing, malware,
unwanted, and badinurl. The prefixes count for each table at this point are:
Phishing(1.5M), malware(40K), unwated(110k) and badinurl(27k)

And the memory usage(64-bit platform):

Table Name Without delta(only prefix) With delta Save
Phishing(1.5M) 6M 3.2M 2.8M
Malware (40K) 163K 437K -270K
Unwanted(110K) 446K 658K -210K
Badinurl(27K) 110K 320K -210K

This algorithm only improves the memory usage for phishing table.

This patch refines the algorithm by replacing the two-dimension nsTArray with
one single delta array. The benefits are:

  1. Reduce the waste in each nsTArray, only 3 nsTArray after refine
  2. Number of index delta * size of nsTArray is replaced with and uint32_t index array, which saves 4-bytes per prefix in 64-bit platform.
  3. I suspect some SafeBrowsing crashes[2] during an update is because we manipulate too many small nsTArray

The memory usage(64-bit platform):

Table Name only prefix With delta Refined algorithm Save(compared to only prefix)
Phishing(1.5M) 6M 3.2M 3.04M 3M
Malware (40K) 163K 437K 221K -60K
Unwanted(110K) 446K 658K 348K 100K
Badinurl(27K) 110K 320K 168K -50K

Overall if there are not enough prefixes, the delta algorithm isn't really useful,
we also suffer a little bit for searching performance because that we linear search
the prefix through delta array[3]. A future improvement could be: decide if we
should use delta algorithm or only store prefixes based on the number of prefixes in the table.

Note that we don't need change the version number of .pset or .vlpset
because the data structure in the file is exactly the same.

[1] https://searchfox.org/mozilla-central/rev/6db0a6a56653355fcbb25c4fa79c6e7ffc6f88e9/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp#99
[2] Bug 1362761
[3] https://searchfox.org/mozilla-central/rev/6db0a6a56653355fcbb25c4fa79c6e7ffc6f88e9/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp#

Assignee

Comment 3

3 months ago

We cannot calculate the capacity of each array without running the algorithm. Previously,
we don't call SetCapacity for |MakePrefixSet|, but this may lead to trigger too many memory
reallocation.

In this patch, we clear the data of the previous array but retain its
storage. Since the total number of prefixes shouldn't vary too much for each
update, this approach should be able to save a lot of memory reallocation overhead.

The only scenario can not benefit from this is when there is no
SafeBrowsing exists, which normally, should be the first run of Firefox.

Assignee

Comment 4

3 months ago

I hope this patch can fix Bug 1362761(or at least reduce the crash rate) which we have spent a lot of time investigating and not yet find a solution :(

I suspect some SafeBrowsing crashes[2] during an update is because we manipulate too many small nsTArray

his approach should be able to save a lot of memory reallocation overhead.

"Memory allocation overhead" is not relevant given the frequency of SafeBrowsing updates, and by itself "manipulate too many small nsTArray" is meaningless - there's nothing against doing so.

This smells like making some random changes because we're unable to find the root cause of a bug. Maybe it's OOM related, sure, but then that's an even bigger reason to get good data, because the second patch will increase your OOM problems.

there's nothing against doing so.

The code is generally good in checking for fallible allocations (and if it isn't - it should!) so if they were a problem you'd see construction failures being detected and erroring out with OOM. That's why I have my doubts about these patches changing much.

If I saw it correctly we can save a lot of memory by finding a way to deal with the byte order inversion in-place.

Assignee

Comment 7

3 months ago

The problem I am trying to fix is not really OOM problems, I want to reduce the number of reallocating in MakePrefixSet. The reason that this looks like an OOM fix is that while working on this, I found out in most of the cases, our delta algorithm actually increases the memory usage(except phishing table) and then I decided to rework on this(I don't aware the OOM issue you commented while implementing)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #5)

I suspect some SafeBrowsing crashes[2] during an update is because we manipulate too many small nsTArray

his approach should be able to save a lot of memory reallocation overhead.

"Memory allocation overhead" is not relevant given the frequency of SafeBrowsing updates, and by itself "manipulate too many small nsTArray" is meaningless - there's nothing against doing so.

This smells like making some random changes because we're unable to find the root cause of a bug. Maybe it's OOM related, sure, but then that's an even bigger reason to get good data, because the second patch will increase your OOM problems.

You are right, we can't really find the root cause, we did a lot of guess in Bug 1362761, but it doesn't work :(
I don't think OOM is the reason cause update crash, I think it is because that we don't preset Capacity, append 1.5M records(phishing table for example) to different nsTArray may trigger a lot of memory reallocation, maybe the underlying memory management went wrong while doing these reallocations(Some crash signatures are platform dependent so I guess this is more likely a lower-level problem)

The overall idea that I am trying to do here is to spend some time analyzing the whole update flow, to see if there is anything we can improve it, not randomly change things that don't help anything.

Assignee

Comment 8

3 months ago

(In reply to Gian-Carlo Pascutto [:gcp] from comment #6)

there's nothing against doing so.

The code is generally good in checking for fallible allocations (and if it isn't - it should!) so if they were a problem you'd see construction failures being detected and erroring out with OOM. That's why I have my doubts about these patches changing much.

I don't think OOM is the reason cause those crashes, but I do believe if we can minimize the number of times we allocate/re-allocate memory, it can help.

If I saw it correctly we can save a lot of memory by finding a way to deal with the byte order inversion in-place.

Yes, I have some other patches that fix problems I saw in the update flow, this is one of them.
I didn't submit it to review because I focus on changes in MakePrefixSet here.

Assignee

Comment 9

3 months ago

Hi gcp,
Thank you for your comments, that is really helpful!
I'll post another approach later and see if you have any suggestion, thanks!

Assignee

Updated

3 months ago
See Also: → 1543319
Attachment #9056825 - Attachment is obsolete: true
Attachment #9056824 - Attachment is obsolete: true
Assignee

Comment 10

2 months ago

The goal of this patch is to reduce the number of memory reallocation during
|MakePrefixSet|[1].

Here is the number of nsTArray memory reallocation occur during |MakePrefixSet|
(test in my local platform):
googpub-phish-proto: 58k times
goog-malware-proto: 9k times
goog-unwanted-proto: 25k times
goog-badbinurl-proto: 6k times

This patch improves the performance by:

  1. For tables whose prefixes are less than 128*1024(malware, unwanted,
    badinurl).

Store prefixes directly without dividing allocation into smaller chunks.
Because the maximum size to store all the prefixes in a single array for
these tables will be less than 512k, we can avoid Bug 1046038.

This simplifies the internal prefixset data structure generation and total
memory usage is also saved:
goog-malware-proto : 437K -> 163k
goog-unwanted-proto : 658k -> 446k
goog-badbinurl-proto: 320k -> 110k

The single largest allocated continuous memory size is:
goog-malware-proto : 86k -> 163k
goog-unwanted-proto : 86k -> 446k
goog-badbinurl-proto: 77k -> 110k

A further improvement can be done for this part is for tables with fewer
prefixes, we can use an one-dimension delta array to reduce the size of a
single continuous memory allocation.

  1. For tables with more prefixes:

According to experiment, when prefixes are more than 400k
the delta arrays have very high chance that are full, in the case of
phishing table, we can estimate the capacity accurately before
applying delta algorithm.

The shortcoming of this part is when prefixes are between 130k~400k,
the capacity estimation is not accurate.

[1] https://searchfox.org/mozilla-central/rev/b2015fdd464f598d645342614593d4ebda922d95/toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp#99

Assignee

Comment 11

2 months ago

hi gcp,
Comment 10 is another version of this refinement, I haven't added testcase, run try...etc so I didn't ask you to "review" it.
Would you mind taking a very quick look and let me know how do you think in general? Thanks!

Flags: needinfo?(gpascutto)

Part (1) seems like a very good idea to me. There should be no real fragmentation problems with 512K chunks and you're reducing the real memory usage, making everything simpler, etc.

I'm not clear what your plan for (2) is, or how to use that knowledge to improve things. You're going to pre-allocate everything on the assumption you'll have minimal empty cells, and then only do final swaps to compact?

Flags: needinfo?(gpascutto)
Assignee

Comment 13

2 months ago

(In reply to Gian-Carlo Pascutto [:gcp] from comment #12)

I'm not clear what your plan for (2) is, or how to use that knowledge to improve things. You're going to pre-allocate everything on the assumption you'll have minimal empty cells, and then only do final swaps to compact?

Yes,
Right now we have 3 kinds of arrays, mIndexPrefixes, mIndexDeltas, & arrays in mIndexDeltas, and we don't SetCapacity for any of them because we don't really know the final size of each at the beginning of the algorithm.

So first, we can SetCapacity for arrays in mIndexDeltas to DELTAS_LIMIT(120) because DELTAS_LIMIT is the maximum size of mIndexDeltas[i].
And by assuming the size of every mIndexDeltas[i] is DELTAS_LIMIT we can know the size of mIndexPrefixes & mIndexDeltas.

In the case of phishing tables(1.5M entries), I think this is almost always true so there should be no resize during running the prefix set generation algorithm.

If there is any other table has fewer entries, it would become we always pre-allocate 120 elements for mIndexDeltas[i] and then we compact to the real size after switching to the next array.
And we also pre-allocate a smaller size for mIndexDeltas and mIndexPrefixes so we will still need to reallocate it while running the algorithm.
In general, I would say it still triggers fewer re-allocate compare to not SetCapacity at all in this case. But I am not sure if there is any other concern.

Attachment #9062939 - Attachment description: Bug 1542744 - Refine MakePrefixSet. → Bug 1542744 - P2. Improve performance of MakePrefixSet by using different algorithm according to the number of prefixes.
Assignee

Comment 14

2 months ago

This patch does the following:

  1. Run the same prefixset tests when
  • browser.safebrowsing.prefixset.max_array_size = 0
  • browser.safebrowsing.prefixset.max_array_size = UINT32_MAX

This makes sure both of the methods to store prefixset are tested by existing testcases

  1. Refine gtest with test fixture
  2. Add TinySet and LargeSet testcases

Comment 15

Last month
Pushed by dlee@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/aedbe6cdd06f
P1. Remove calculating checksum for mIndexDelta array. r=gcp
https://hg.mozilla.org/integration/autoland/rev/c51b622bb1fe
P2. Improve performance of MakePrefixSet by using different algorithm according to the number of prefixes. r=gcp
https://hg.mozilla.org/integration/autoland/rev/f94b6f3a7fff
P3. Run the same prefixset testcases for different configuration. r=gcp

Comment 17

Last month
Backout by shindli@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/836dbcf0ffca
Backed out 3 changesets for causing perma mochitest failures in /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/StaticPrefList CLOSED TREE

There were also xpcshell failures on toolkit/components/url-classifier/tests/unit/test_partial.js
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=246491121&repo=autoland&lineNumber=2810

Comment 19

Last month
Pushed by dlee@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/afbe5300acca
P1. Remove calculating checksum for mIndexDelta array. r=gcp
https://hg.mozilla.org/integration/autoland/rev/e5c6dee921ba
P2. Improve performance of MakePrefixSet by using different algorithm according to the number of prefixes. r=gcp
https://hg.mozilla.org/integration/autoland/rev/76339e786c7c
P3. Run the same prefixset testcases for different configuration. r=gcp

Comment 20

Last month
bugherder
Status: ASSIGNED → RESOLVED
Closed: Last month
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
Assignee

Updated

Last month
Flags: needinfo?(dlee)
Blocks: 1510345
Assignee

Updated

7 days ago
See Also: → 1362761
You need to log in before you can comment on or make changes to this bug.