1135022 - Optimize ChunkSet to efficiently store ranges

Reporter

Description

•

10 years ago

SafeBrowsing uses ChunkSet to store the chunks numbers that it has or will be processed in a SafeBrowsing update. It's a simple interface implementing a set of integer numbers: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/ChunkSet.h The current implementation uses as storage a sorted array of numbers. This works fine for typical data, which looks something like 10001, 10003, 10100-10190, 10191, but it's wasteful for the ranges. The example above would require storing 94 values (1+1+91+1) in the current implementation. If we stored everything as a range (start and stop value) instead, it would require only 8 values (i.e. 10001-10001, 10003-10003, 10100-10190, 10191-10191). Most of the time the data consists of a single range, so there are potentially nice memory savings by doing this optimization. The main difficulty in this bug is implementing Set/Unset/Merge correctly.

Clean up ChunkSet API 10 years ago Gian-Carlo Pascutto [:gcp] 2.62 KB, patch	mmc : review+	Details \| Diff \| Splinter Review
Create existing range in ChunkSet::Set 10 years ago Ryan Nath [:rnath] 3.52 KB, patch		Details \| Diff \| Splinter Review
Bug1135022_revB.patch 10 years ago Ryan Nath [:rnath] 6.28 KB, patch		Details \| Diff \| Splinter Review
ChunkSet::Has modified so that the binary search checks whether the specified value is within one of the ranges. 10 years ago Ryan Nath [:rnath] 5.32 KB, patch	gcp : feedback+	Details \| Diff \| Splinter Review
ChunkSet::Remove modified so that it removes any arbitrary list of integers 10 years ago Ryan Nath [:rnath] 9.24 KB, patch	gcp : review-	Details \| Diff \| Splinter Review
Attempt at implementing ranges 10 years ago jay 5.71 KB, patch	gcp : feedback+	Details \| Diff \| Splinter Review
ChunkSet with Read() and Write() 10 years ago jay 7.12 KB, patch	gcp : feedback+	Details \| Diff \| Splinter Review
Add C++ test coverage for UrlClassifier ChunkSets 10 years ago Gian-Carlo Pascutto [:gcp] 5.25 KB, patch	francois : review+	Details \| Diff \| Splinter Review
Optimize ChunkSet to efficiently store ranges 10 years ago u546977 7.99 KB, patch	gcp : feedback-	Details \| Diff \| Splinter Review
Optimize ChunkSet to efficiently store ranges v2 10 years ago u546977 11.89 KB, patch	gcp : feedback+	Details \| Diff \| Splinter Review
Implement ranges, missing Read and Write 9 years ago B. Dahse 9.21 KB, patch	gcp : feedback+	Details \| Diff \| Splinter Review
Implementation of ranges, 2nd revision (still missing Read & Write) 9 years ago B. Dahse 11.34 KB, patch	gcp : feedback+	Details \| Diff \| Splinter Review
Implementation of ranges, 3rd revision (still missing Read & Write) 9 years ago B. Dahse 5.03 KB, patch	gcp : feedback+	Details \| Diff \| Splinter Review
Implement Write 9 years ago B. Dahse 2.97 KB, patch		Details \| Diff \| Splinter Review
Simplify Write 9 years ago B. Dahse 1.20 KB, patch	gcp : feedback+	Details \| Diff \| Splinter Review
Implement Read 9 years ago B. Dahse 895 bytes, patch	gcp : feedback-	Details \| Diff \| Splinter Review
make Read fallible 9 years ago B. Dahse 990 bytes, patch	gcp : feedback-	Details \| Diff \| Splinter Review
make Write buffer its output 9 years ago B. Dahse 1.08 KB, patch	gcp : feedback+	Details \| Diff \| Splinter Review
update error handling 9 years ago B. Dahse 1.78 KB, patch	gcp : review+	Details \| Diff \| Splinter Review
Implement ranges, 1st revision 9 years ago B. Dahse 9.82 KB, patch		Details \| Diff \| Splinter Review
Implement ranges, 2nd revision 9 years ago B. Dahse 10.08 KB, patch	gcp : review-	Details \| Diff \| Splinter Review
Failure case for last patch. 9 years ago Gian-Carlo Pascutto [:gcp] 2.09 KB, patch		Details \| Diff \| Splinter Review
reduce memory usage of Read 9 years ago B. Dahse 1.41 KB, patch	gcp : feedback+	Details \| Diff \| Splinter Review
use binary search in Has 9 years ago B. Dahse 1.17 KB, patch	gcp : feedback+	Details \| Diff \| Splinter Review
fix subranging in Merge 1st revision 9 years ago B. Dahse 1.14 KB, patch	gcp : feedback-	Details \| Diff \| Splinter Review
fix subranging in Merge 2nd revision 9 years ago B. Dahse 1.58 KB, patch	gcp : feedback+	Details \| Diff \| Splinter Review
simplify Remove 9 years ago B. Dahse 1.57 KB, patch	gcp : review+	Details \| Diff \| Splinter Review
make Set use Has 9 years ago B. Dahse 1.66 KB, patch	gcp : review+	Details \| Diff \| Splinter Review
Implement ranges, 3rd revision 9 years ago B. Dahse 10.84 KB, patch		Details \| Diff \| Splinter Review
Implement ranges, 4th revision. 9 years ago B. Dahse 10.85 KB, patch	gcp : review+	Details \| Diff \| Splinter Review
MozReview Request: Bug 1135022 - Optimize ChunkSet by storing ranges instead of numbers. r=gcp 9 years ago Gian-Carlo Pascutto [:gcp] 58 bytes, text/x-review-board-request	gcp : review+	Details
MozReview Request: Bug 1135022 - Extend UrlClassifier ChunkSet test with stress test. r?francois 9 years ago Gian-Carlo Pascutto [:gcp] 58 bytes, text/x-review-board-request	francois : review+	Details