Closed Bug 730247 Opened 13 years ago Closed 13 years ago

Improve SafeBrowsing store SubPrefix compression

Tracking

()

Status:

RESOLVED FIXED

Milestone:

Firefox 17

People

(Reporter: gcp, Assigned: gcp)

Details

Attachments

(2 files)

Patch 1. Split data into byte slices to improve compression. 13 years ago Gian-Carlo Pascutto [:gcp] 9.31 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch. Backout 13 years ago Gian-Carlo Pascutto [:gcp] 9.28 KB, patch	akeybl : approval-mozilla-aurora+ mfinkle : approval-mozilla-central+	Details \| Diff \| Splinter Review

Gian-Carlo Pascutto [:gcp]

Assignee

Description

•

13 years ago

I've been keeping an eye on the growth of the SafeBrowsing database to be sure everything is working OK, and one thing that gets to me is the size of the malware files. When originally working on this thing, I thought that AddPrefixes and AddChunks were the important thing, and we'd get everything down to about 2.4M. And this is right, for the phishing list. The malware list is another matter: 2012-02-24 05:55:40.431215 UTC - 1657325312[7f546462b9c0]: Table goog-malware-shavar now has: 2012-02-24 05:55:40.431234 UTC - 1657325312[7f546462b9c0]: 12846 add chunks 2012-02-24 05:55:40.431239 UTC - 1657325312[7f546462b9c0]: 386464 add prefixes 2012-02-24 05:55:40.431242 UTC - 1657325312[7f546462b9c0]: 1 add completions 2012-02-24 05:55:40.431254 UTC - 1657325312[7f546462b9c0]: 10562 sub chunks 2012-02-24 05:55:40.431257 UTC - 1657325312[7f546462b9c0]: 226033 sub prefixes 2012-02-24 05:55:40.431259 UTC - 1657325312[7f546462b9c0]: 1 sub completions The amount of sub prefixes is over half the total amount. We currently use zlib (DEFLATE) on the chunk numbers (of which sub prefixes have 2), and store the sub prefixes uncompressed (mostly because PrefixSet doesn't have stream serialization support, and I didn't anticipate there being so many in the first place). Because the chunk data only spans a small amount of numbers (+-13000, see above), I did a quick hack to store them delta-encoded. To my surprise, this *worsened* the compression. After looking at it a bit more, I realized 2 issues: a) Because DEFLATE requires matches with length at least 3 to compress well, it will only be useful for consecutive numbers that aren't more than 256 apart. b) The data is stored in prefix order, which means the chunk numbers are purely random. This means (a) will almost always fail. This in turn lead to the following comment in HashStore::WriteSubPrefixes: // chunk-ordered prefixes are not compressible This was true at some point, but before the coded landed, I actually rewrote it to have everything work on the data in prefix order, and hence avoid a whole bunch of re-sorting that was otherwise necessary. But this does mean that the prefixes should be compressible now. And so should the chunk numbers, if we're a bit smarter. I kept the DEFLATE backend because we already have it and it does nice things such as huffman coding for us, but added a very simply preprocessing step that splits all 32-bit values into 4 8-bit slices. Because the MSB of our chunkdata is almost all identical, and for example the prefixes, which are sorted, will also have long runs of identical values, compressibility gains a lot. It also avoids the DEFLATE-needs-a-length-3-match problem entirely. And it's fast. Win! These are the test results on the current database. ShortSlice was a try with splitting the 32-bit data into 2 x 16-bit values, which makes it behave very close to PrefixSet. https://docs.google.com/spreadsheet/ccc?key=0AiObroK0ZHF-dGFmZWhKLVdSWnhYcGk1VDZaZjg3X1E#gid=0 This reduces our SafeBrowsing store from about 5.5M to 4.5M.

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 1

•

13 years ago

Attached patch Patch 1. Split data into byte slices to improve compression. — Details — Splinter Review

Assignee: nobody → gpascutto

Attachment #600351 - Flags: review?(dcamp)

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 2

•

13 years ago

https://tbpl.mozilla.org/?tree=Try&rev=457ee3012f48

Dave Camp (:dcamp)

Comment 3

•

13 years ago

Comment on attachment 600351 [details] [diff] [review] Patch 1. Split data into byte slices to improve compression. Neat.

Attachment #600351 - Flags: review?(dcamp) → review+

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 4

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/a01cf079ee0b

Marco Bonardo [:mak]

Comment 5

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/a01cf079ee0b

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Target Milestone: --- → Firefox 13

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 6

•

13 years ago

Attached patch Patch. Backout — Details — Splinter Review

[Approval Request Comment] Backout due to bug 744993: a01cf079ee0b Bug 730247 1a6d008acb4f Bug 729928 f8bf3795b851 Bug 729640 35bf0d62cc30 Bug 726002 a010dcf1a973 Bug 726002 e9291f227d63 Bug 725597 db52b4916cde Bug 673470 173f90d397a8 Bug 673470

Attachment #616613 - Flags: approval-mozilla-central?

Attachment #616613 - Flags: approval-mozilla-aurora?

Mark Finkle (:mfinkle) (use needinfo?)

Updated

•

13 years ago

Attachment #616613 - Flags: approval-mozilla-central? → approval-mozilla-central+

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 7

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/ad824b9f5c38

(no longer active)

Comment 8

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/ad824b9f5c38

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Alex Keybl [:akeybl]

Updated

•

13 years ago

Attachment #616613 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-

Alex Keybl [:akeybl]

Comment 9

•

13 years ago

Comment on attachment 616613 [details] [diff] [review] Patch. Backout Sorry - thought this bug was reopened because the backout was backed out. My mistake. Approved for Aurora 13 (or Beta 13 if the merge occurs before we land).

Attachment #616613 - Flags: approval-mozilla-aurora- → approval-mozilla-aurora+

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 10

•

13 years ago

https://hg.mozilla.org/releases/mozilla-beta/rev/bcf80cb89d21

Gian-Carlo Pascutto [:gcp]

Assignee

Comment 11

•

13 years ago

Relanding after fixes in bug 673470 to fix bug 744993. https://hg.mozilla.org/integration/mozilla-inbound/rev/c716b0e67d08

Ed Morley [:emorley]

Comment 12

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/c716b0e67d08

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → FIXED

Target Milestone: Firefox 13 → Firefox 17

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: Firefox → Toolkit

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Improve SafeBrowsing store SubPrefix compression

Categories

(Toolkit :: Safe Browsing, enhancement)

Tracking

()

People

(Reporter: gcp, Assigned: gcp)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Updated

Comment 9

Comment 10

Comment 11

Comment 12

Updated

Attachment

General

Description

File Name

Content Type