Closed Bug 669410 Opened 13 years ago Closed 13 years ago

Use a PrefixSet to reduce amount of times SQLite DB must be queried

Categories

(Toolkit :: Safe Browsing, defect)

defect
Not set
minor

Tracking

()

RESOLVED FIXED
Firefox 9

People

(Reporter: gcp, Assigned: gcp)

References

Details

Attachments

(10 files, 34 obsolete files)

23.35 KB, patch
gcp
: review+
Details | Diff | Splinter Review
26.89 KB, patch
gcp
: review+
Details | Diff | Splinter Review
43.62 KB, patch
gcp
: review+
Details | Diff | Splinter Review
12.46 KB, patch
gcp
: review+
Details | Diff | Splinter Review
3.56 KB, patch
gcp
: review+
Details | Diff | Splinter Review
26.83 KB, patch
gcp
: review+
Details | Diff | Splinter Review
4.22 KB, patch
gcp
: review+
Details | Diff | Splinter Review
20.02 KB, patch
gcp
: review+
Details | Diff | Splinter Review
23.06 KB, patch
gcp
: review+
Details | Diff | Splinter Review
2.47 KB, patch
Details | Diff | Splinter Review
Currently, on each site visited, Firefox will query a +-50M SQLite database to gather a list of SHA-256 prefixes that contain hashed URLs with malware or phishing links on that domain. This list is potentially empty if there is no known malware on that domain.

Because the hashes are distributed randomly, browsing at a reasonable speed effectively requires this DB to be cached in memory. There is some code already present to speed up the most common cases (browsing links within a domain, revisiting a clean domain).

With limited changes and using some stuff we already have, it's possible to avoid hitting the disk in almost all cases. This is a first phase on getting rid of the SQLite DB entirely.
Assignee: nobody → gpascutto
Blocks: 669407
It's possible to build a prefix tree of known hashes and domains on startup and each time the SQLite DB gets an update. Mehdi Mulani already wrote a JavaScript version of that datastructure including the needed tests.

To get this working, the following is still required:

1) JavaScript can only run on the main thread, whereas the Safe Browsing code is using a worker thread in the background. I managed to (painfully due to inexperience) work around this by using NS_GetProxyForObject. Need to investigate if this can be replaced by IRunnable.

2) There about 1M prefixes in the DB right now. The JavaScript to construct the tree takes some time, and blocks the main thread. We probably need to construct it incrementally to not block the UI too much.

3) The DB stores both domain hashes, prefix hashes, and full hashes. We can ignore the full hashes. As long as we keep the SQLite DB, there is no disadvantage. If it disappears, we will send a bit more queries to Google when a user revisits a known malware or phising site (not a common usage scenario!!). I managed to store both the domain hashes and prefix hashes in the same tree by XOR-ing them with magic constants, but this still needs some debugging.
If you're talking about building a prefix tree of all hashes that are in the client database, I think this will probably be pretty slow -- we were doing something similar in C++ on the server side and it took about 5 minutes to build the data structure (and you'd have to repeat this on every update).  You might want to investigate using a Bloom filter like Chrome is doing.

Also, just to clarify, the host keys are provided as a hint, but it's not required that you use them if your data structure works better without them (as is the case in Chrome's implementation).
>If you're talking about building a prefix tree of all hashes that are in the 
>client database, I think this will probably be pretty slow -- we were doing 
>something similar in C++ on the server side and it took about 5 minutes to 
>build the data structure (and you'd have to repeat this on every update).

Hrm, we have this working already, and it's taking about 5s, in debug mode, with the Prefix Tree code in JavaScript. The problem is that we currently block the main thread during this 5s.

When starting up it can be serialized from disk, so it should be even faster.

The tricky part will be handling updates. I'd like to update the tree in-place, but there are many nasty cases, like deleting a key that causes the 2 neighboring keys to be more than 2^16 away from each other. So it remains to be seen if this is really feasible. Reconstructing the tree on each update would be annoying, especially on mobile where it will eat more than 5s CPU time.

>You might want to investigate using a Bloom filter like Chrome is doing.

I understood they are moving away from that. There are some disadvantages in that approach.

The main problem is that you can never have it as your main data-structure, because it has collisions. So you need the bloom filter *and* the database. Chrome reduced the overhead by storing the database more efficiently than SQLite does, but it's still overhead.

It also has false positives, which would cause us to leak where the user is browsing. 

The advantage of the Bloom filter is that you can control the size of the in-memory part.

>Also, just to clarify, the host keys are provided as a hint, but it's not 
>required that you use them if your data structure works better without them 
>(as is the case in Chrome's implementation).

I've left them in because they allow taking an early exit and not run the entire fragmentation and lookup of all the generated URLs. They take about half of the size of the tree though, so I might want to revisit this after measuring the speed impact. You certainly could be right that they aren't worth it.
(In reply to comment #3)
> Hrm, we have this working already, and it's taking about 5s, in debug mode,
> with the Prefix Tree code in JavaScript. The problem is that we currently
> block the main thread during this 5s.
> 
> When starting up it can be serialized from disk, so it should be even faster.

Ah, ok great.  Could be that the implementation we were using was just particularly slow to build the tree.

Any stats on the memory usage of the prefix tree once constructed?

> >You might want to investigate using a Bloom filter like Chrome is doing.
> 
> I understood they are moving away from that. There are some disadvantages in
> that approach.
> 
> The main problem is that you can never have it as your main data-structure,
> because it has collisions. So you need the bloom filter *and* the database.
> Chrome reduced the overhead by storing the database more efficiently than
> SQLite does, but it's still overhead.

Ah, sounds like you're more up to date on the Chrome implementation than I am.  Agreed on your points about needing a database to back the Bloom filter, as well as the FP's (hash prefixes don't uniquely identify a URL, but it's understandable if you want to avoid unneeded pingbacks).

One other thing I was wondering about the prefix tree approach -- how would you handle chunk deletes? (ad: and sd:)

> I've left them in because they allow taking an early exit and not run the
> entire fragmentation and lookup of all the generated URLs. They take about
> half of the size of the tree though, so I might want to revisit this after
> measuring the speed impact. You certainly could be right that they aren't
> worth it.

Sure, sounds reasonable.
(Just to be clear, I meant how would you handle chunk deletes if you were to get rid of the SQLite database, as it sounds like you want to do eventually)
>Any stats on the memory usage of the prefix tree once constructed?

Currently it's 2.2M, but this is including the domain prefixes. Without them, it would be about 1.2M.

I will have some patches for this up as soon as (1) and (2) in comment 1 are OK.

>One other thing I was wondering about the prefix tree approach -- how would 
>you handle chunk deletes? (ad: and sd:)

The tree needs to be augmented with the chunk information, which will double it's size (which should still be acceptable afterwards). Then we'll either have to reconstruct it (easier/slower) or delete entries from it (hairy/faster).

There are tricky aspects here. The chunk numbers are bigger than 2^16, but there are only about 12000 distinct ones in use, so we preferably want to make use of that redundancy (extra indirection).

I hope I didn't miss anything. The current update code is more complicated than I would expect from the protocol description. I'm not entirely sure what the moz_subs table we have now does, for example.
Here's a first bunch of patches. This is a work in progress, there's still the following issues, and I'm putting it here so other people can start commenting on it.

1) Doug requested that the usage of Proxy objects was replaced by using nsIRunnables. I have a patch for this, but this causes errors in an unrelated part of the code handling updates. I have no idea why; either I messed up and my patch is buggy, there are issues mixing Proxy's with Runnables, I'm exposing an existing bug, ... That patch is here:

http://hg.mozilla.org/users/gpascutto_mozilla.com/patches/file/41fcc8537504/remove-prefix-proxy

2) The list of prefixes is now sent to Javascript in chunks. The idea was that this would prevent the main thread from blocking too long while handling the tree construction. However, this causes a performance drop: the tree construction has to reallocate memory when dealing with each chunk, and because this is using ctypes.ArrayTypes, I don't believe there's a nice JS way to quickly copy everything over, so it's using a simple loop right now. If it's not possible to do that in a nicer way, the code can probably be adjusted to preallocate the memory. But it won't be as nice.

What's worse is that as far as I can tell the responsiveness of the UI is still bad during the tree construction.

3) To make (2) worse, the tree construction seems to be called much more than would be expected. It's called on every SQLite db open/close and on every update. Need to instrument where all those are coming from, and why it's done so much.

4) I didn't include a patch yet to reduce the SQLite caches. After discussing with dcamp, it likely needs to be smart and increase the caches again when processing an update. The code for this is already present, but disabled. So this will need some (performance) testing. But that doesn't make much sense until the above issues are dealt with.

5) The code right now checks a small cache of host keys in the main thread before firing off an (asynchronous) lookup to the worker thread. With the tree present, that cache can be removed and much work can be postponed until the tree (which lives in the main thread!) is checked. Unfortunately it's also nontrivial, as all the database access currently lives in the worker thread, and should stay there so I/O doesn't block the main thread. I'm have a partial patch here:

http://hg.mozilla.org/users/gpascutto_mozilla.com/patches/file/41fcc8537504/probe-in-main-thread
See Also: → 672284
Attachment #545920 - Attachment is obsolete: true
Attachment #547634 - Flags: review?(dcamp)
Attachment #545923 - Attachment is obsolete: true
Attachment #547635 - Flags: review?(dcamp)
Attachment #545924 - Attachment is obsolete: true
Attachment #545925 - Attachment is obsolete: true
Attachment #547636 - Flags: review?(dcamp)
Attachment #547637 - Flags: review?(dcamp)
Second revision. Notes:

1) The PrefixSet is now in C++ instead of JS. This is way faster, removes a lot of thread juggling, makes the chunked loading unnecessary (the prefixset construction is almost instantaneous), and the code is simpler, too. Some bugs for corner cases have been fixed.

2) Patch to reduce the caches, and increase upon updates, is included. But see comment 40 in bug 650649 for some strange stuff.

3) In the main thread, we now do a probe inside the PrefixSet and avoid posting anything to the background thread if we get a clean result. This allows removing all of the old internal caching (hmm, can prolly remove some support code in ClassifierUtils too!). If the probe indicates that the prefixset is still loading, we use the old path.

4) I kept the test code for the old caches that no longer exist (test_cleankeycache.js) but adapted them to the new behavior. This served a bit as a sanity-check to see if things were happening as expected. Maybe these can go.

5) The code assumes that there is at least one entry in the prefixset. Some of the code in the tests is adapted to ensure this, and in case the urlclassifier store is empty, a sentinel is added. 
I believed that the extra URL's we add (http://www.mozilla.org/firefox/its-a-trap.html) should make an empty prefixset in real use impossible, but that lead to: http://tbpl.mozilla.org/?tree=Try&rev=81f95515a021 
I'm not sure I fully understand why that happened.

6) This code keeps the host keys together with the fragment keys, leading to about 2.1M of memory usage. This can be reduced to 1.2M if we don't use the host keys, and hence the fast path and opportunistic probe. But I would like to see a performance measurement on the impact. What's the best way to measure this accurately?

These patches alone aren't enough to get SafeBrowsing on mobile, but it does mean that the backing store for the prefixes can be much slower or compact than before.
Will try to review this in detail quickly, but I'm pretty concerned about reading through that entire sqlite database at startup before we can classify anything.

It shouldn't be too hard to serialize the 2.1meg prefixset to disk and read that in one fell swoop.  I suspect that'd be a heck of a lot quicker than paging through the entire unnecessarily huge sqlite db at startup?
Sure, but the above patches need to be working first or we won't have anything to bootstrap the prefixset and serialize :-)

I'm going to start working on that and storing chunks + add/remove now.
Comment on attachment 547637 [details] [diff] [review]
Patch v2 4. Reduce SQLite caches when not updating


> nsresult
>+nsUrlClassifierDBServiceWorker::SetCacheSize(
>+  nsCOMPtr<mozIStorageConnection> aConnection, PRInt32 aCacheSize)

You can just take a mozIStorageConnection *, the caller is holding a reference for the lifetime of the call.
Attachment #547637 - Flags: review?(dcamp) → review+
Comment on attachment 547634 [details] [diff] [review]
Patch v2 1. Add the PrefixTree datastructure

Review of attachment 547634 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp
@@ +105,5 @@
> +  mIndexStarts.AppendElement(mDeltas.Length());
> +
> +  PRUint32 numOfDeltas = 0;
> +  PRUint32 currentItem = prefixes[0];
> +  for (int i = 1; i < aLength; i++) {

Should probably match aLength's data type here.

@@ +146,5 @@
> +  } else if (value > target) {
> +    return BinSearch(start, i-1, target);
> +  }
> +  return i;
> +}

An iterative implementation would be better here.

@@ +181,5 @@
> +
> +  if (mIndexPrefixes[i] > target ||
> +      (i < mIndexPrefixes.Length() - 2 &&
> +       mIndexPrefixes[i+2] < target)) {
> +    LOG(("XXX: this should not happen"));

NS_WARNING(), or maybe even NS_ASSERTION() should be used here.
Comment on attachment 547635 [details] [diff] [review]
Patch v2 2. Use the Prefix Tree to short-circuit lookups

Review of attachment 547635 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/url-classifier/nsIUrlClassifierPrefixSet.idl
@@ +12,2 @@
>    boolean contains(in unsigned long aPrefix);
> +  PRUint32 estimateSize();

(Usually would need to update uuid while updating an interface, but this will presumably land at the same time, so not a big deal).

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp
@@ +3957,5 @@
>    NS_ENSURE_SUCCESS(rv, rv);
>  
> +  mPrefixSet = do_CreateInstance("@mozilla.org/url-classifier/prefixset;1", &rv);
> +  NS_ENSURE_SUCCESS(rv, rv);
> +

I'd just use "new nsUrlClassifierPrefixSet()", no need to get xpcom involved here.  You can use nsRefPtr rather than nsCOMPtr to deal with the C++ class instead of the nsIGarbage.

Sad that we need to make prefixset an xpcom object at all, but it does make testing more convenient.
Comment on attachment 547636 [details] [diff] [review]
Patch v2 3. Probe the PrefixSet from the main thread

Review of attachment 547636 [details] [diff] [review]:
-----------------------------------------------------------------

This patch blocks the main UI for the amount of time it takes to build the prefixset.  That's a somewhat large operation, are you confident it'll never get near the 50ms max we try to avoid blocking?

It might be better (though with some memory consumption tradeoffs) to build up a new prefixset without holding the lock, and lock just to swap it in.

::: toolkit/components/url-classifier/nsIUrlClassifierPrefixSet.idl
@@ +9,5 @@
>                     in unsigned long aLength);
>    void addPrefixes([const, array, size_is(aLength)] in unsigned long aPrefixes,
>                     in unsigned long aLength);
>    boolean contains(in unsigned long aPrefix);
> +  boolean probe(in unsigned long aPrefix, inout boolean aReady);

When adding methods to the idl, you should generate a new uuid for the interface.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp
@@ +1415,5 @@
>  
> +  *clean = PR_TRUE;
> +
> +  for (int i = 0; i < lookupHosts.Length(); i++) {
> +    nsUrlClassifierDomainHash hostKeyHash;

Match iterator and length data types here please
Attachment #547634 - Attachment is obsolete: true
Attachment #547635 - Attachment is obsolete: true
Attachment #547636 - Attachment is obsolete: true
Attachment #547637 - Attachment is obsolete: true
Attachment #548369 - Flags: review?(dcamp)
Attachment #547634 - Flags: review?(dcamp)
Attachment #547635 - Flags: review?(dcamp)
Attachment #547636 - Flags: review?(dcamp)
Attachment #548371 - Flags: review?(dcamp)
Attachment #548372 - Flags: review?(dcamp)
This also fixes the naming to consistently use PrefixSet instead of PrefixTree.
Attachment #549026 - Flags: review?(dcamp)
Summary: Use a PrefixTree to reduce amount of times SQLite DB must be queried → Use a PrefixSet to reduce amount of times SQLite DB must be queried
Attachment #549051 - Flags: review?(dcamp)
Attachment #549051 - Flags: review?(dcamp) → review+
Comment on attachment 548369 [details] [diff] [review]
Patch v3 1. Add the PrefixTree datastructure

Review of attachment 548369 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp
@@ +99,5 @@
> +  }
> +
> +  // XXX: Probably shouldn't assume that |aPrefixes| doesn't contain
> +  // duplicates..
> +

Let's either fix this now or have a followup bug filed/mentioned.

@@ +171,5 @@
> +
> +  // XXX: |binsearch| does not necessarily return the correct index (when the
> +  // target is not found) but rather it returns an index at least one away
> +  // from the correct index.
> +  // Because of this, we need to check if the target lies before the beginning

This isn't an XXX in the "fixme" sense, is it?  Just an explanation of what the code is doing?
Attachment #548369 - Flags: review?(dcamp) → review+
Attachment #548372 - Flags: review?(dcamp) → review+
Comment on attachment 548371 [details] [diff] [review]
Patch v3 3. Probe the PrefixSet from the main thread

Review of attachment 548371 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp
@@ +1053,5 @@
> +nsresult GetHostKeys(const nsACString &spec,
> +                     nsTArray<nsCString> &hostKeys);
> +
> +// Check for a canonicalized IP address.
> +PRBool IsCanonicalizedIP(const nsACString& host);

Should these two methods be static?

@@ +1413,5 @@
>    NS_ENSURE_SUCCESS(rv, rv);
>  
> +  *clean = PR_TRUE;
> +
> +  for (nsAutoTArray<nsCString, 2>::size_type i = 0;

hrm, that's probably more correct, but PRUint32 is much easier to read (and that particular size_type is a simple typedef).

@@ +1420,5 @@
> +    hostKeyHash.FromPlaintext(lookupHosts[i], mHash);
> +
> +    // First probe the Prefix Tree for presence
> +    PRUint32 domainkey = hostKeyHash.ToUint32() ^ ENCODE_DOMAIN_MAGIC;
> +

Do we have a followup bug to test the perf/memory tradeoffs of caring about domain keys?

@@ +1425,5 @@
> +    PRBool found;
> +    PRBool ready = PR_FALSE;  /* opportunistic probe */
> +     rv = mPrefixSet->Probe(domainkey, &ready, &found);
> +     NS_ENSURE_SUCCESS(rv, rv);
> +     LOG(("CheckCleanHost Probed %X ready: %d found: %d ",

Indentation change here.
Overall this seems like a good thing to do, and is an important step toward replacing sqlite.

I think we need to quantify the performance changes from the previous implementation before landing, though - in particular:

* The impact of walking the sqlite db to rebuild that prefixtree during an update.
* The memory savings of dropping the sqlite cache aggressively vs. the extra usage of the prefixtree.
* Startup impact of reading the prefixtree on first load, with a full db.

I suspect that both of those are a win here, but we should be deliberate about it.

This is a big enough change that we should be ready to turn it off in aurora if it causes problems.  Might be as simple as reverting to the old codebase if needed.
Attachment #548370 - Flags: review?(dcamp) → feedback+
Comment on attachment 548371 [details] [diff] [review]
Patch v3 3. Probe the PrefixSet from the main thread

feedback+'ing the remaining bugs instead of r+.  The look good to me, but:

a) I think the performance changes should be understood before landing
b) I'm not technically an url-classifier peer. tony@ponderer.org should sign off before landing big changes.
Attachment #548371 - Flags: review?(dcamp) → feedback+
Blocks: 650649
Carrying forward review
Attachment #548369 - Attachment is obsolete: true
Attachment #550336 - Flags: review+
Attachment #548370 - Attachment is obsolete: true
Attachment #550337 - Flags: review?(tony)
Attachment #550337 - Flags: feedback+
Attachment #550337 - Attachment is patch: true
Attachment #550337 - Attachment description: Patch v3 2. Use the Prefix Tree to short-circuit lookups → Patch v4 2. Use the Prefix Tree to short-circuit lookups
Attachment #548371 - Attachment is obsolete: true
Attachment #550338 - Flags: review?(tony)
Attachment #550338 - Flags: feedback+
Carrying forward review
Attachment #548372 - Attachment is obsolete: true
Attachment #550339 - Flags: review+
Attachment #549026 - Attachment is obsolete: true
Attachment #550341 - Flags: review?(tony)
Attachment #549026 - Flags: review?(dcamp)
Carrying forward review
Attachment #549051 - Attachment is obsolete: true
Attachment #550342 - Flags: review+
This is an *optional* patch that reduces the storage from 2.1M to 1.2M at the cost of a GetFragments call for every lookup. It is most likely a win overall, and simplifies the code. See also comment 30. Need to get performance data on this.
Attachment #550343 - Flags: review?(tony)
(In reply to comment #32)
> Comment on attachment 548371 [details] [diff] [review] [diff] [details] [review]
> Patch v3 3. Probe the PrefixSet from the main thread
> 
> feedback+'ing the remaining bugs instead of r+.  The look good to me, but:
> 
> a) I think the performance changes should be understood before landing

I too would like to know the performance impact before landing.  Here's an idea for a perf test:
Use wireshark/fiddler to catch safebrowsing traffic and save the response data for a while (a day or maybe a few days).  Keep each update from the server is a separate file and replay them against the old code vs the new PrefixTree code.  It would also be good to see what the perf impact is if the user hasn't used Firefox in a week and a big update occurs.

Another way to measure perf might be to use Test Pilot.  That would require landing the code to collect data from nightlies, but maybe that's OK (I'm not familiar with the release process to know).
Attachment #550336 - Attachment is obsolete: true
Attachment #556492 - Flags: review+
Attachment #550337 - Attachment is obsolete: true
Attachment #556493 - Flags: review?(tony)
Attachment #556493 - Flags: feedback+
Attachment #550337 - Flags: review?(tony)
Attachment #550338 - Attachment is obsolete: true
Attachment #556494 - Flags: review?(tony)
Attachment #556494 - Flags: feedback+
Attachment #550338 - Flags: review?(tony)
Attachment #550339 - Attachment is obsolete: true
Attachment #556496 - Flags: review+
Attachment #550341 - Attachment is obsolete: true
Attachment #556497 - Flags: review?(tony)
Attachment #550341 - Flags: review?(tony)
Attachment #550342 - Attachment is obsolete: true
Attachment #556498 - Flags: review+
Attachment #550343 - Attachment is obsolete: true
Attachment #556499 - Flags: review?(tony)
Attachment #550343 - Flags: review?(tony)
Attachment #556498 - Attachment description: Patch v4 6. Remove the old fragment caches → Patch v5 6. Remove the old fragment caches
Patch rebased to current m-c on expanded on 2 parts:

- As explained in bug 669407, if the hostkeys are dropped we risk that every Firefox browser gets a collision and slowdown on the same site/page. To stop this, all keys are rehashed with a per-user random key. I originally used SHA-256 for this as described in that bug, but this is too slow. It is replaced by MurmurHash3, and I confirmed that gives approximately the same amount of collisions as SHA-256.

- There is an extra patch which adds telemetry for the critical operations. This allows us to monitor the performance impact of this patch in the field and take action if any users are seeing a bigger than expected performance impact.

Some measurements on my machine (standard desktop, somewhat old disk): 

- Complete update of the database (dcamp's script in bug 673470): before ~20.9s, now ~22.2s.
- Constructing the PrefixSet after an update or on first startup: 0.5-2 seconds, in the background. Most of this is the read of the entire DB.
- Loading the PrefixSet (normal startup): 10-50ms, depending on disk activity. It's 1 disk seek + 1M serial read.

Memory usage changes:

- During normal usage, never more than 1M (down from >40M in worst case before).
- During an update, at most 40M, and basically identical to the old behavior.

The size of the updates has no effect on the performance impact of this patch.
Fix silly typo.
Attachment #556502 - Attachment is obsolete: true
Attachment #556507 - Flags: review?(tglek)
Attachment #556502 - Flags: review?(tglek)
(In reply to Gian-Carlo Pascutto (:gcp) from comment #50)
>
> - During normal usage, never more than 1M (down from >40M in worst case
> before).

I like that! :)
Comment on attachment 556507 [details] [diff] [review]
Patch v5. 9. Add telemetry for new PrefixSet operations


> /**
>+ * Url-Classifier telemetry
>+ */
>+#ifdef MOZ_URL_CLASSIFIER
>+HISTOGRAM(URLCLASSIFIER_PS_FILELOAD_TIME, 1, 2000, 10, EXPONENTIAL, "Time spent loading PrefixSet from file (ms)")
>+HISTOGRAM(URLCLASSIFIER_PS_CONSTRUCT_TIME, 1, 10000, 15, EXPONENTIAL, "Time spent constructing PrefixSet from DB (ms)")
>+HISTOGRAM(URLCLASSIFIER_PS_LOOKUP_TIME, 1, 500, 10, EXPONENTIAL, "Time spent per PrefixSet lookup (ms)")
>+#endif

I suspect your max values are overly conservative.
I would change your maximum values: 2s->1s, 10s->3s. The exact value will still be sent with .sum component of the histogram, but the histogram will look nicer.
Attachment #556507 - Flags: review?(tglek) → review+
Comment on attachment 556493 [details] [diff] [review]
Patch v5 2. Use the Prefix Tree to short-circuit lookups

Review of attachment 556493 [details] [diff] [review]:
-----------------------------------------------------------------

It's been a long time since I've looked at this code, so this is mostly just a review of style and agreement that we should try the prefix tree.

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp
@@ +278,5 @@
>    const PRBool StartsWith(const nsUrlClassifierHash<PARTIAL_LENGTH>& hash) const {
>      NS_ASSERTION(sHashSize >= PARTIAL_LENGTH, "nsUrlClassifierHash must be at least PARTIAL_LENGTH bytes long");
>      return memcmp(buf, hash.buf, PARTIAL_LENGTH) == 0;
>    }
> +  const PRUint32 ToUint32() {

Nit: The const here doesn't really do anything.

@@ +3474,5 @@
> +    const PRUint8 *blobdomain = mDomainPrefixStatement->AsSharedBlob(0, &size);
> +    if (!blobdomain || (size != DOMAIN_LENGTH))
> +      return PR_FALSE;
> +
> +    domainval = *((PRUint32*)blobdomain);

Nit: static_cast?

@@ +3494,5 @@
> +    const PRUint8 *blobdomain = mAllPrefixStatement->AsSharedBlob(0, &size);
> +    if (!blobdomain || (size != DOMAIN_LENGTH))
> +      return PR_FALSE;
> +
> +    domainval = *((PRUint32*)blobdomain);

Nit: static_cast?

@@ +3515,5 @@
> +  }
> +
> +  LOG(("SB domains: %d prefixes: %d fulldomain: %d\n", dcnt, pcnt, fcnt));
> +
> +

Nit: Extra blank line?
Attachment #556493 - Flags: review?(tony) → review+
Comment on attachment 556494 [details] [diff] [review]
Patch v5 3. Probe the PrefixSet from the main thread

Review of attachment 556494 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp
@@ +1384,1 @@
>                                                 PRBool *clean)

Nit: Align with ( on previous line.

::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp
@@ +218,5 @@
>    return NS_OK;
>  }
> +
> +NS_IMETHODIMP
> +nsUrlClassifierPrefixSet::Probe(PRUint32 aPrefix, PRBool* aReady, PRBool* aFound)

Nit: It would be nice if the spacing around * was consistent with the rest of the file.

@@ +222,5 @@
> +nsUrlClassifierPrefixSet::Probe(PRUint32 aPrefix, PRBool* aReady, PRBool* aFound)
> +{
> +  MutexAutoLock lock(mPrefixTreeLock);
> +
> +  /* check whether we are opportunistically probing or should wait */

Nit: Maybe use C++ style // comments so it's easier to comment out a block?

::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.h
@@ +79,3 @@
>    PRUint32 BinSearch(PRUint32 start, PRUint32 end, PRUint32 target);
>  
> +  //  boolean indicating whether |setPrefixes| has been

Nit: Extra space after // ?
Attachment #556494 - Flags: review?(tony) → review+
Comment on attachment 556497 [details] [diff] [review]
Patch v5 5. Make the PrefixSet persistent + consistency fix

Review of attachment 556497 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp
@@ +329,5 @@
> +  nsCOMPtr<nsILocalFile> file(do_QueryInterface(aFile, &rv));
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  AutoFDClose fileFd;
> +  rv = file->OpenNSPRFileDesc(PR_RDONLY, 0644, &fileFd);

Do file premissions matter if you're opening as ready only?

@@ +340,5 @@
> +  PRFileMap * fileMap = PR_CreateFileMap(fileFd, size, PR_PROT_READONLY);
> +  if (!fileMap) {
> +    mapWorks = PR_FALSE;
> +  } else {
> +    buf = (PRUint32*)PR_MemMap(fileMap, 0, size);

static_cast

@@ +350,5 @@
> +
> +  if (!mapWorks) {
> +    /* fallback if mmap doesn't work */
> +    LOG(("MemMap failed, falling back to PR_Read"));
> +    return LoadFromFd(fileFd);

If you're reading the whole file and closing it, what's the benefit of mmapping?

@@ +356,5 @@
> +    PRUint32 offset = 0;
> +
> +    if (buf[offset++] == PREFIXSET_VERSION_MAGIC) {
> +      PRUint32 indexSize = buf[offset++];
> +      PRUint32 deltaSize = buf[offset++];

Do you have to verify you're not overflowing buf?

@@ +367,5 @@
> +      nsTArray<PRUint32> mNewIndexStarts;
> +      nsTArray<PRUint32> mNewIndexPrefixes;
> +      nsTArray<PRUint16> mNewDeltas;
> +
> +      mNewIndexPrefixes.AppendElements(buf+offset, indexSize);

Same here, do you need to verify you're not reading past the end of |buf|?

@@ +372,5 @@
> +      offset += indexSize;
> +      mNewIndexStarts.AppendElements(buf+offset, indexSize);
> +      offset += indexSize;
> +      if (deltaSize > 0) {
> +        PRUint16 * buf16 = (PRUint16*)(buf+offset);

static_cast
Attachment #556497 - Flags: review?(tony) → review-
Attachment #556499 - Flags: review?(tony) → review+
Comment on attachment 556501 [details] [diff] [review]
Patch v5 8. Rehash with a per-user key to prevent collisions

Review of attachment 556501 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp
@@ +3468,5 @@
> +*/
> +static nsresult KeyedHash(PRUint32 aPref, PRUint32 aDomain,
> +                          PRUint32 aKey, PRUint32 *aOut)
> +{
> +  /* This is a reimplementation of MurmurHash3 32-bit

Why did you decide to use MurmurHash3?  I'm not a hash expert, so it makes me a bit nervous and it would be nice if we could use an existing hash function.
Attachment #556492 - Attachment is obsolete: true
Attachment #558566 - Flags: review+
Attachment #556493 - Attachment is obsolete: true
Attachment #558568 - Flags: review+
Attachment #556494 - Attachment is obsolete: true
Attachment #556496 - Attachment is obsolete: true
Attachment #558572 - Flags: review+
Attachment #556497 - Attachment is obsolete: true
Attachment #558573 - Flags: review?(tony)
Attachment #556498 - Attachment is obsolete: true
Attachment #556499 - Attachment is obsolete: true
Attachment #558575 - Flags: review+
Attachment #556501 - Attachment is obsolete: true
Attachment #558577 - Flags: review?(tony)
Attachment #556501 - Flags: review?(tony)
Attachment #556507 - Attachment is obsolete: true
Attachment #558580 - Flags: review+
>I suspect your max values are overly conservative.
>I would change your maximum values: 2s->1s, 10s->3s. The exact value will still 
>be sent with .sum component of the histogram, but the histogram will look 
>nicer.

I reduced these a bit. Not too much, because there might be some bad cases when there is disk I/O that we want to spot. Also, they're exponential, so it doesn't matter that much, right?

> +  const PRUint32 ToUint32() {
>Nit: The const here doesn't really do anything.

Right, should have been at the end. I had big problems here; this function seems to be miscompiled with the version of GCC that is in Fedora 12, when compiling for 32-bit and using optimalisations including PGO (i.e. one of the configurations on the tryserver). I narrowed it down far enough that the compiler is almost certainly to blame:

http://hg.mozilla.org/try/rev/656d2af9ff1f (broken)
versus
http://hg.mozilla.org/try/rev/a4bc19779cf4 (working)

Adding the logging made the problem disappear. I ended up slightly reformulating this function and its caller, which made the problem go away.

>Nit: static_cast?

Changed most of these to reinterpret_cast (we're copying char buffers and void* thingies into uint32)

>Do file premissions matter if you're opening as ready only?

Fixed

>If you're reading the whole file and closing it, what's the benefit of mmapping?

For some reason I thought mmap was generally faster. Reading a file sequentially is not one of them, it seems. I removed this entire path and only use normal file reading now. This also fixes your complaints about potentially overrunning the buffer (the normal read path has checks).

>Why did you decide to use MurmurHash3?  I'm not a hash expert, so it makes me a bit nervous and 
>it would be nice if we could use an existing hash function.

I presume that with "existing hash function" you mean a cryptographic hash like MD5 or SHA. The problem with these is that they have elaborate setup and finalization phases, and as a result are very slow for hashing small amounts of data, which is our use case ((3 x 4 bytes) x 600 000 hashes!). Think on the order of 20-60s on a normal PC. So I had to find something faster. I ended up with MurmurHash3 for the following reasons:

- Of the non-cryptographic hashes (FNV, Jenkins, ...) it appears to be reasonably well researched.
- It is the fastest general purpose hash with good dispersion that I could find.
- It has a specific (faster) formulation for 32-bit output, which is exactly what we want.
- The code is compact and I could easily reimplement a specific version of the algorithm for the 12-byte case.
- The list of software using it (libstdc++, nginx, memcached, hadoop, kyoto cabinet) is very encouraging.
- I calculated that a perfect hash is expected to generate about 42 collisions in the Prefix set. Changing the hashing function from SHA-256 to Murmurhash3 increased the total amount of collisions by only 2. This is small enough to be neglible (increase could be random chance), and for 600k entries does not appear to be any issue for practical use.

Try run of all patches:

https://tbpl.mozilla.org/?tree=Try&usebuildbot=1&rev=cfaa1e1147f2
> I presume that with "existing hash function" you mean a cryptographic hash
> like MD5 or SHA. The problem with these is that they have elaborate setup
> and finalization phases, and as a result are very slow for hashing small
> amounts of data, which is our use case ((3 x 4 bytes) x 600 000 hashes!).
> Think on the order of 20-60s on a normal PC. So I had to find something
> faster. I ended up with MurmurHash3 for the following reasons:
> 
> - Of the non-cryptographic hashes (FNV, Jenkins, ...) it appears to be
> reasonably well researched.
> - It is the fastest general purpose hash with good dispersion that I could
> find.
> - It has a specific (faster) formulation for 32-bit output, which is exactly
> what we want.
> - The code is compact and I could easily reimplement a specific version of
> the algorithm for the 12-byte case.
> - The list of software using it (libstdc++, nginx, memcached, hadoop, kyoto
> cabinet) is very encouraging.
> - I calculated that a perfect hash is expected to generate about 42
> collisions in the Prefix set. Changing the hashing function from SHA-256 to
> Murmurhash3 increased the total amount of collisions by only 2. This is
> small enough to be neglible (increase could be random chance), and for 600k
> entries does not appear to be any issue for practical use.

Please put that wonderfully informative text in a comment in the code :)
Added Murmurhash3 explanation as requested.
Attachment #558577 - Attachment is obsolete: true
Attachment #558651 - Flags: review?(tony)
Attachment #558577 - Flags: review?(tony)
Comment on attachment 558573 [details] [diff] [review]
Patch v6 5. Make the PrefixSet persistent + consistency fix

Review of attachment 558573 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/url-classifier/nsUrlClassifierPrefixSet.cpp
@@ +318,5 @@
> +    LOG(("Version magic mismatch, not loading"));
> +    return NS_ERROR_FAILURE;
> +  }
> +
> +  LOG(("Loading PrefixSet successfull"));

nit: typo: successful

@@ +363,5 @@
> +    written = PR_Write(fileFd, mDeltas.Elements(), deltaSize * sizeof(PRUint16));
> +    NS_ENSURE_TRUE(written > 0, NS_ERROR_FAILURE);
> +  }
> +
> +  LOG(("Saving PrefixSet successfull\n"));

nit: typo: successful
Attachment #558573 - Flags: review?(tony) → review+
Comment on attachment 558651 [details] [diff] [review]
Patch v7 8. Rehash with a per-user key to prevent collisions

Review of attachment 558651 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp
@@ +1037,5 @@
> +//  www.mail.hostname.com/foo/bar -> mail.hostname.com
> +static nsresult GetKey(const nsACString& spec, nsUrlClassifierDomainHash& hash,
> +                       nsICryptoHash * aCryptoHash);
> +
> +/* We have both a prefix and a domain. Drop the domain, but

nit: Maybe use C++ style comments to be consistent with the rest of this file.

@@ +3455,5 @@
>  
>    return NS_OK;
>  }
>  
> +/* We have both a prefix and a domain. Drop the domain, but

nit: Maybe use C++ style comments to be consistent with the rest of this file.

@@ +3473,5 @@
> +*/
> +static nsresult KeyedHash(PRUint32 aPref, PRUint32 aDomain,
> +                          PRUint32 aKey, PRUint32 *aOut)
> +{
> +  /* This is a reimplementation of MurmurHash3 32-bit

nit: Maybe use C++ style comments to be consistent with the rest of this file.
Attachment #558651 - Flags: review?(tony) → review+
Incorporate review comments.
Attachment #558573 - Attachment is obsolete: true
Attachment #559152 - Flags: review+
Incorporate review comments.
Attachment #559154 - Flags: review+
Attachment #558651 - Attachment is obsolete: true
Keywords: checkin-needed
Whiteboard: [inbound]
This bug has been marked [inbound] and I see 5 patches in the repository with this bug number, but no links to landed patches have been posted in this bug...
Comment on attachment 558566 [details] [diff] [review]
Patch v6 1. Add the PrefixTree datastructure

>--- /dev/null
>+++ b/toolkit/components/url-classifier/nsIUrlClassifierPrefixSet.idl
>@@ -0,0 +1,12 @@
>+#include "nsISupports.idl"
>+
>+interface nsIArray;
>+
>+[scriptable, uuid(6e9f759a-3f8d-4552-99ed-dbf0ea0a5f67)]
>+interface nsIUrlClassifierPrefixSet : nsISupports
>+{
>+  void setPrefixes([const, array, size_is(aLength)] in unsigned long aPrefixes,
>+                   in unsigned long aLength);
>+
>+  boolean contains(in unsigned long aPrefix);
>+};

Add a license header to this file. This is not optional.
Added missing copyright header.
(In reply to Gian-Carlo Pascutto (:gcp) from comment #78)
> Created attachment 560630 [details] [diff] [review]
> Patch v1 9. Add missing copyright header

https://hg.mozilla.org/integration/mozilla-inbound/rev/d9bfdfbd9a89
Depends on: 741528
No longer depends on: 741528
Product: Firefox → Toolkit
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: