Closed Bug 665930 Opened 13 years ago Closed 13 years ago

Safebrowsing code has minor bugs, protocol violations

Categories

(Toolkit :: Safe Browsing, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 7

People

(Reporter: gcp, Assigned: gcp)

References

Details

(Whiteboard: [testday-2011-06-24])

Attachments

(3 files, 5 obsolete files)

Attached patch Patch v1 (obsolete) — Splinter Review
There are some bugs in the Safe Browsing code, leading to minor protocol violations.

1) When generating the list of URL fragments, it's possible for an empty path (top-level domain) to be generated twice. The impact is limited because the duplicate entry will be intercepted by the cache.

2) When hitting a site with a numerical IP hostname in the URL, known to containing an exploit anywhere, the fragmenting code will generate URL's containing truncated versions of the IP. This is unneeded and a violation of the protocol spec.

Patch attached.
Attachment #540775 - Flags: review?(tony)
Assignee: nobody → gpascutto
Comment on attachment 540775 [details] [diff] [review]
Patch v1

>+  // Check an empty path (for whole-domain blacklist entries)
>+  if (!path.Equals(EmptyCString())) {

!path.IsEmpty()
Attachment #540775 - Attachment is patch: true
Is it possible to write a test for this?

(cc ian to find someone on the safe browsing team to actually verify that this matches the protocol)
>Is it possible to write a test for this?

This is fairly low-level so it probably needs a compiled-code test to test the fragmenting routine. But:

http://mxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/tests/Makefile.in

has:

# XXX Get this to work in libxul builds.
## simple c++ tests (no xpcom)
#CPP_UNIT_TESTS = \
#	TestUrlClassifierUtils.cpp \
#	$(NULL)

Help?
Awesome work Gian-Carlo!

I believe (1) can be tested by making a toy HashCompleter and checking what hashes are asked to be completed when visiting a URL (kind of as described in bug 633644).

Possibly (2) could be tested in the same way.

The basic idea is to add good prefixes and bad prefixes to the DBService and check which are completed for, i.e. which the DBService matches.
Attachment #540775 - Attachment is obsolete: true
Attachment #541282 - Flags: review?(tony)
Attachment #540775 - Flags: review?(tony)
Attachment #541300 - Flags: review?(tony)
Thanks for CCing me tony. sending it around internally to see who can take a look.
Regarding case (1), the comments in bug 633644 mention another failure case that wasn't fixed by the first version of the patch. This is now fixed in v2. They're also correct that those duplicated fragments can end up calling the hashcompleter twice, so we are in fact leaking what URL is being visited, and causing unnecessary load on the service.

I included a testcase for this.

For (2), it's not easy to construct a JS test: before fragmenting, the code first gathers all known prefixes for a given host, but the code doing that correctly handles numerical IP hostnames, so it will only return prefixes for the full IP. 
If the fragmenting code now generates excess fragments by stripping parts of the IP, their 32-bit hash prefix must still match the fragments from the full IP before a hashcompletion is sent out. In other words, you need an actual 32-bit SHA-256 collision to trigger this.
The testing code for dbservice most probably has some copy-paste typos and the comments don't correctly reflect what it is doing. This should fix that.
Attachment #541301 - Flags: review?(tony)
Comment on attachment 541301 [details] [diff] [review]
Patch v2 extra Fix some dbservice testcases

Review of attachment 541301 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #541301 - Flags: review?(tony) → review+
Comment on attachment 541300 [details] [diff] [review]
Patch v2 3/3 Fix for framenting of numerical IPs

># HG changeset patch
># User Gian-Carlo Pascutto <gpascutto@mozilla.com>
># Parent 98e2a913f76b394cabff150e555de7e82b5a9855
>Bug 665930 - Safe Browsing: Fix fragmenting of numerical IPs.
>
>diff --git a/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp b/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp
>--- a/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp
>+++ b/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp
>@@ -1451,28 +1451,31 @@ nsUrlClassifierDBServiceWorker::GetLooku
>    * a) The exact hostname of the url
>    * b) The 4 hostnames formed by starting with the last 5 components and
>    *    successivly removing the leading component.  The top-level component
>    *    can be skipped.
>    */
>   nsTArray<nsCString> hosts;
>   hosts.AppendElement(host);
> 
>-  host.BeginReading(begin);
>-  host.EndReading(end);
>-  int numComponents = 0;
>-  while (RFindInReadable(NS_LITERAL_CSTRING("."), begin, end) &&
>-         numComponents < MAX_HOST_COMPONENTS) {
>-    // don't bother checking toplevel domains
>-    if (++numComponents >= 2) {
>-      host.EndReading(iter);
>-      hosts.AppendElement(Substring(end, iter));
>+  int numComponents;
>+  if (!IsCanonicalizedIP(host)) {
>+    host.BeginReading(begin);
>+    host.EndReading(end);
>+    numComponents = 0;

nit: There's not really an advantage to reusing numComponents here for the host components and later for the path components, and it invites some confusion.  I'd suggest maybe having a separate hostComponents variable that's scoped inside of this "if", and renaming the later variable to pathComponents.
Attachment #541300 - Flags: review?(tony) → review+
Comment on attachment 541283 [details] [diff] [review]
Patch v2 2/3 Add testcase for the fragmentation routine

It would be great if you could add the testcases from section 6.2 of http://code.google.com/p/google-safe-browsing/wiki/Protocolv2Spec to give us some more coverage.
Comment on attachment 541282 [details] [diff] [review]
Patch v2 1/3 Fix for the fragmentation routine

Review of attachment 541282 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/url-classifier/nsUrlClassifierDBService.cpp
@@ +1489,2 @@
>    path.BeginReading(iter);
>    path.EndReading(end);

Nit: It looks like BeginReading() has an option second param that will fill in the end of the string.  We can remove this extra EndReading() call if we use it.
Attachment #541282 - Flags: review?(tony) → review+
Comment on attachment 541283 [details] [diff] [review]
Patch v2 2/3 Add testcase for the fragmentation routine

Review of attachment 541283 [details] [diff] [review]:
-----------------------------------------------------------------

I would land this with patch part 1/3 (keep the code change and test together as a single commit).
Attachment #541283 - Flags: review?(tony) → review+
Attachment #541282 - Attachment is obsolete: true
Attachment #541283 - Attachment is obsolete: true
Attachment #541300 - Attachment is obsolete: true
Attachment #541301 - Attachment is obsolete: true
Attachment #541577 - Flags: review?(tony)
Attachment #541578 - Flags: review?(tony)
Review comments incorporated, except for the BeginReading(). The prototype seems to mismatch and I didn't find any working code outside nsXXXString itself, but lots of code doing the 2 calls.
Comment on attachment 541577 [details] [diff] [review]
Patch v3 1/3 Fix the fragmentation routine

Review of attachment 541577 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #541577 - Flags: review?(tony) → review+
Comment on attachment 541578 [details] [diff] [review]
Patch v3 2/3 Fix the fragmentation of numerical IPs

Review of attachment 541578 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #541578 - Flags: review?(tony) → review+
question: does this also block change in protocol?  ie http versus https?
>question: does this also block change in protocol?  ie http versus https?

I'm not sure what you're asking exactly, but these patches don't affect that at all. They don't operate on the scheme part of the URL, only from the hostname and on.
Just wondering if the block would occur on https://www.mozilla.org if the block occurs on http://www.mozilla.org and if we may need to worry about something like that.  Some websites might end up changing their login structure to track you if you login and it switches to https:// is what my concern was.  I'm not sure if this would be a valid concern.

(In reply to comment #24)
> >question: does this also block change in protocol?  ie http versus https?
> 
> I'm not sure what you're asking exactly, but these patches don't affect that
> at all. They don't operate on the scheme part of the URL, only from the
> hostname and on.
Whiteboard: [testday-2011-06-24]
Comment on attachment 541579 [details] [diff] [review]
Patch v3 Fix dbservice testcase typos

carrying forward review
Attachment #541579 - Flags: review+
For an external way to verify the fixes, see the description in bug 633644.
Product: Firefox → Toolkit
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: