Closed Bug 673470 Opened 13 years ago Closed 12 years ago

Replace the sqlite safeb store with a flat file

Tracking

()

Status:

RESOLVED FIXED

Milestone:

Firefox 17

People

(Reporter: dcamp, Assigned: gcp)

References

Details

(Whiteboard: [MemShrink:P2])

Attachments

(25 files, 46 obsolete files)

barely-limping WIP 13 years ago Dave Camp (:dcamp) 197.05 KB, text/plain		Details
Chromium license 13 years ago Gian-Carlo Pascutto [:gcp] 1.55 KB, text/plain		Details
quick tool to load a complete db 13 years ago Dave Camp (:dcamp) 4.11 KB, application/x-javascript		Details
Patch v1. Add the SafeBrowsing store sources 13 years ago Gian-Carlo Pascutto [:gcp] 75.69 KB, patch		Details \| Diff \| Splinter Review
Patch v2. Excorcise SQLite 13 years ago Gian-Carlo Pascutto [:gcp] 151.52 KB, patch		Details \| Diff \| Splinter Review
Patch v3. Run expires (addel) first to the simple testcases still work 13 years ago Gian-Carlo Pascutto [:gcp] 5.09 KB, patch		Details \| Diff \| Splinter Review
Patch v4. Use PrefixSet in LookupCache 13 years ago Gian-Carlo Pascutto [:gcp] 29.82 KB, patch		Details \| Diff \| Splinter Review
Patch v5 Compress the chunk-id information. Cache hashcompletions. 13 years ago Gian-Carlo Pascutto [:gcp] 23.52 KB, patch		Details \| Diff \| Splinter Review
Patch 6 Fix the sort ordering when applying/merging subs. 13 years ago Gian-Carlo Pascutto [:gcp] 8.75 KB, patch		Details \| Diff \| Splinter Review
Patch 7 Add Copyright headers 13 years ago Gian-Carlo Pascutto [:gcp] 22.17 KB, patch		Details \| Diff \| Splinter Review
Patch 1. Add the Safebrowsing store sources 13 years ago Gian-Carlo Pascutto [:gcp] 76.48 KB, patch		Details \| Diff \| Splinter Review
Patch 2. Replace SQLite with a flat file store 13 years ago Gian-Carlo Pascutto [:gcp] 178.77 KB, patch		Details \| Diff \| Splinter Review
Patch 3. Use PrefixSet for the LookupCache 13 years ago Gian-Carlo Pascutto [:gcp] 97.41 KB, patch		Details \| Diff \| Splinter Review
Patch 4. Feed the PrefixSet cache into the Hashstore 13 years ago Gian-Carlo Pascutto [:gcp] 12.23 KB, patch		Details \| Diff \| Splinter Review
Patch 1. Add the Safebrowsing store sources 13 years ago Gian-Carlo Pascutto [:gcp] 76.46 KB, patch	tony : feedback+	Details \| Diff \| Splinter Review
Patch 2. Replace SQLite by a flat file 13 years ago Gian-Carlo Pascutto [:gcp] 178.58 KB, patch		Details \| Diff \| Splinter Review
Patch 3. Use PrefixSet code in new store 13 years ago Gian-Carlo Pascutto [:gcp] 143.50 KB, patch		Details \| Diff \| Splinter Review
Patch 3. Use PrefixSet code in new store 13 years ago Gian-Carlo Pascutto [:gcp] 151.05 KB, patch		Details \| Diff \| Splinter Review
Patch 3. Use PrefixSet code in new store 13 years ago Gian-Carlo Pascutto [:gcp] 153.38 KB, patch		Details \| Diff \| Splinter Review
Patch 4. Folded version of above patches, rebased 13 years ago Gian-Carlo Pascutto [:gcp] 294.98 KB, patch	dcamp : review- dcamp : feedback+	Details \| Diff \| Splinter Review
Patch 4. Incorporate review comments 13 years ago Gian-Carlo Pascutto [:gcp] 78.59 KB, patch		Details \| Diff \| Splinter Review
Patch 5. Folded version of all patches 13 years ago Gian-Carlo Pascutto [:gcp] 314.43 KB, patch	dcamp : review-	Details \| Diff \| Splinter Review
Patch 4. Incorporate review comments 13 years ago Gian-Carlo Pascutto [:gcp] 78.17 KB, patch		Details \| Diff \| Splinter Review
Patch 5. Fix all style issues 13 years ago Gian-Carlo Pascutto [:gcp] 112.10 KB, patch		Details \| Diff \| Splinter Review
Patch 6. Don't call random number generator for noise 13 years ago Gian-Carlo Pascutto [:gcp] 6.47 KB, patch		Details \| Diff \| Splinter Review
Patch 7. Add nsCheckSumOutputStream class 13 years ago Gian-Carlo Pascutto [:gcp] 24.28 KB, patch		Details \| Diff \| Splinter Review
Patch. Amalgamation of all the above 13 years ago Gian-Carlo Pascutto [:gcp] 335.56 KB, patch		Details \| Diff \| Splinter Review
Patch 1. Add the Safebrowsing store sources 13 years ago Gian-Carlo Pascutto [:gcp] 181.31 KB, patch		Details \| Diff \| Splinter Review
Patch 2. Replace SQLite by a flat file 13 years ago Gian-Carlo Pascutto [:gcp] 154.41 KB, patch		Details \| Diff \| Splinter Review
Patch 3. Use PrefixSet code in new store 13 years ago Gian-Carlo Pascutto [:gcp] 78.39 KB, patch		Details \| Diff \| Splinter Review
Patch 1. Add the optimized store for SafeBrowsing. 13 years ago Gian-Carlo Pascutto [:gcp] 76.54 KB, patch		Details \| Diff \| Splinter Review
Patch 2. Replace the SQLite SafeBrowsing store with an optimized store. 13 years ago Gian-Carlo Pascutto [:gcp] 181.31 KB, patch		Details \| Diff \| Splinter Review
Patch 3. Use the PrefixSet code in the new LookupCache. 13 years ago Gian-Carlo Pascutto [:gcp] 154.41 KB, patch		Details \| Diff \| Splinter Review
Patch 4. Replace the sqlite safebrowsing store with a flat file. 13 years ago Gian-Carlo Pascutto [:gcp] 78.39 KB, patch		Details \| Diff \| Splinter Review
Patch 5. Ensure consistent style in UrlClassifier files 13 years ago Gian-Carlo Pascutto [:gcp] 111.95 KB, patch		Details \| Diff \| Splinter Review
Patch 6. Don't use the random number generator as key order is already random. 13 years ago Gian-Carlo Pascutto [:gcp] 6.54 KB, patch		Details \| Diff \| Splinter Review
Patch 7. Add nsCheckSummedOutputStream and use it. 13 years ago Gian-Carlo Pascutto [:gcp] 18.66 KB, patch		Details \| Diff \| Splinter Review
Patch 8. Fixups for bitrot against m-c. 13 years ago Gian-Carlo Pascutto [:gcp] 11.28 KB, patch		Details \| Diff \| Splinter Review
Patch. Amalgamation of all the above 13 years ago Gian-Carlo Pascutto [:gcp] 334.07 KB, patch		Details \| Diff \| Splinter Review
Patch 7. Add nsCheckSummedOutputStream and use it. 13 years ago Gian-Carlo Pascutto [:gcp] 18.30 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch 8. Fixups for bitrot against m-c. 13 years ago Gian-Carlo Pascutto [:gcp] 12.34 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch. Amalgamation of all the above 13 years ago Gian-Carlo Pascutto [:gcp] 333.88 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch 9. Size input buffer to file. Cache active tables. 13 years ago Gian-Carlo Pascutto [:gcp] 7.46 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch. Backout part 1 12 years ago Gian-Carlo Pascutto [:gcp] 7.42 KB, patch	akeybl : approval-mozilla-aurora+ lsblakk : approval-mozilla-central+	Details \| Diff \| Splinter Review
Patch. Backout part 2 12 years ago Gian-Carlo Pascutto [:gcp] 7.42 KB, patch	akeybl : approval-mozilla-aurora+ lsblakk : approval-mozilla-central+	Details \| Diff \| Splinter Review
Patch 1. Amalgamation of all the previous patches. 12 years ago Gian-Carlo Pascutto [:gcp] 335.03 KB, patch	gcp : review+	Details \| Diff \| Splinter Review
Patch 2. Optimize input buffer size. Cache active tables. 12 years ago Gian-Carlo Pascutto [:gcp] 8.31 KB, patch	gcp : review+	Details \| Diff \| Splinter Review
Patch 3. afeBrowsing fails to update persistent PrefixSet on Windows. 12 years ago Gian-Carlo Pascutto [:gcp] 3.06 KB, patch	gcp : review+	Details \| Diff \| Splinter Review
Patch 4. Improve OOM handling in UrlClassifier. 12 years ago Gian-Carlo Pascutto [:gcp] 2.74 KB, patch	gcp : review+	Details \| Diff \| Splinter Review
Patch 5. Clear some big nsTArrays as early as possible in updates. 12 years ago Gian-Carlo Pascutto [:gcp] 5.73 KB, patch	gcp : review+	Details \| Diff \| Splinter Review
Patch 6. Fix broken UrlClassifier assertion. 12 years ago Gian-Carlo Pascutto [:gcp] 1.32 KB, patch	gcp : review+	Details \| Diff \| Splinter Review
Patch 7. Cleanup unused cache preferences. 12 years ago Gian-Carlo Pascutto [:gcp] 4.27 KB, patch	gcp : review+	Details \| Diff \| Splinter Review
Patch 8. Use byteslice coding for SafeBrowsing data. 12 years ago Gian-Carlo Pascutto [:gcp] 9.31 KB, patch	gcp : review+	Details \| Diff \| Splinter Review
Patch 9. Fix format description in HashStore.cpp 12 years ago Gian-Carlo Pascutto [:gcp] 1.47 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch 10. Fix a small inconsistency in variable naming. 12 years ago Gian-Carlo Pascutto [:gcp] 1.30 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch 11. Remove Hash compare functions that we don't actually use. 12 years ago Gian-Carlo Pascutto [:gcp] 1.49 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch 12. Reorder SubPrefix knockouts wrt. AddCompletes. 12 years ago Gian-Carlo Pascutto [:gcp] 1.52 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch 13. Provide an option to disable per-client randomization. 12 years ago Gian-Carlo Pascutto [:gcp] 25.73 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch 14. Fix order of KeyedHash arguments. 12 years ago Gian-Carlo Pascutto [:gcp] 3.77 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch 15. Remove needless sort calls. Fix comparator bug. 12 years ago Gian-Carlo Pascutto [:gcp] 7.62 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch 16. Expire SubPrefixes that can't do anything immediately 12 years ago Gian-Carlo Pascutto [:gcp] 2.36 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch 17. Make the PrefixSet/LookupCache construction infallible again. 12 years ago Gian-Carlo Pascutto [:gcp] 5.79 KB, patch	dcamp : review+ justin.lebar+bug : feedback+	Details \| Diff \| Splinter Review
Patch 18. Remove some unneeded PromiseFlatCString. 12 years ago Gian-Carlo Pascutto [:gcp] 2.86 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch 19. crash in nsUrlClassifierPrefixSet::GetPrefixes. 12 years ago Gian-Carlo Pascutto [:gcp] 1.17 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch 20. Replace NS_QuickSort use by qsort. 12 years ago Gian-Carlo Pascutto [:gcp] 1.47 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch 21. Simplify PrefixSet by removing (unneeded) thread safety. 12 years ago Gian-Carlo Pascutto [:gcp] 18.27 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch 22. Reduce peak RAM. Add slack space. 12 years ago Gian-Carlo Pascutto [:gcp] 3.35 KB, patch		Details \| Diff \| Splinter Review
Patch 22. v2 Reduce peak RAM. Add slack space. 12 years ago Gian-Carlo Pascutto [:gcp] 3.40 KB, patch		Details \| Diff \| Splinter Review
Patch 22. Skip chunks we already have during merges 12 years ago Gian-Carlo Pascutto [:gcp] 1.08 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch 22. v2 Skip chunks we already have during merges 12 years ago Gian-Carlo Pascutto [:gcp] 2.98 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review
Patch 23. Replace copyright headers to MPL 2.0 12 years ago Gian-Carlo Pascutto [:gcp] 19.61 KB, patch	dcamp : review+	Details \| Diff \| Splinter Review

Dave Camp (:dcamp)

Reporter

Description

•

13 years ago

Attached file barely-limping WIP (obsolete) — Details

I spent a couple days gutting nsUrlClassifierDBService.cpp. The attached patch starts to replace the sqlite database with store files similar to chrome's safebrowsing implementation.

Very similar - HashStore.cpp in particular is mostly a port of chrome code, and retains their copyright header as a result. Other parts are heavily influenced. licensing@ would need to take a look at this before it was landed.

ProtocolParser.[h,cpp] is used to build a series of TableUpdate objects that are applied by the HashStore. LookupCache maintains a (currently stupid) lookup cache.

Some behavior improvements:
* Updates are a Lot Quicker with this code than the sqlite database - a complete load of the db takes about 30 seconds with this code (mostly network-bound, haven't done tests against local copies of the protocol data yet) vs. 1m:30s with the old code (mostly sqlite-bound). Haven't done real measurements yet, just wall clock stuff.
* Updates take a lot less memory. Again, no solid measurements yet but glancing at an xpcshell instance running an entire empty-to-full update shaved 10ish megs off the process. This is with the very stupid lookup cache (see below), and there are probably other quick wins too.
* The backing storage is completely dumped out of memory after an update, leaving only the lookup cache implementation (similar to the work in bug 669410).
* Backing store is a lot smaller, and is not subject to the whims of sqlite fragmentation.
* Ignores host keys entirely (in the store and in the lookup cache). I suspect the wins in simplicity and storage here outweigh the benefits of the hostkey cache, but we can revisit.

Things that aren't finished:
* Gethash result caching isn't finished yet.
* My fast-lookup cache is dumb, should use the prefix tree implementation from 669410.
* Haven't run tests lately
* There are quite a few XXX comments that will need to be dealt with. In particular, ActiveTables() is completely stupid.
* Probably a lot of things are stupid, this is all done hastily.
* Haven't ported checksumming storage files when they are read, that's important.
* All lookups are sent to the worker thread. It should be pretty easy to send NO lookups to the worker thread, with proper locking around the LookupCache.
* Finally, the documentation suuucks.

Generally, thinks are a bit of a mess. I hope to fix a few of these in future updates (particularly the documentation thing), but I have some other responsibilities and impending paternity leave. I wanted to dump this here in case someone wants to pick it up (or pull individual pieces out of it for other work).