Closed Bug 370860 Opened 17 years ago Closed 17 years ago

A very long URI ('/////...') hangs under phishing protection code

Categories

(Toolkit :: Safe Browsing, defect)

defect
Not set
critical

Tracking

()

VERIFIED FIXED

People

(Reporter: bzbarsky, Assigned: tony)

Details

(Keywords: hang, verified1.8.1.4)

Attachments

(4 files, 4 obsolete files)

Testcase coming up.  I dug a bit, and we seem to be hitting GC a lot (at least in one of my builds).  Also, PROT_URLCanonicalizer.fullyDecodeURL_ seems to be popping up.

The URL in this case is 130KB long, for what it's worth.
Flags: blocking-firefox3?
Note: this comes from bug 370445 comment 34 and following.
It might be worth profiling this with venkman too, I guess...
Flags: blocking1.8.1.3?
Status: NEW → ASSIGNED
OS: Linux → All
Hardware: PC → All
Assignee: nobody → tony
Status: ASSIGNED → NEW
Attached patch move canonicalizeURL to c++ (obsolete) — Splinter Review
I'm seeing about 2000ms spent in CanonicalizeURL for the original code with a string of length 100k.  This version takes about 45ms.

This isn't perfect, since the time spent is linear to the length of the URL.  We're discussing possibly changing the protocol with the server so we truncate after a few k of URL.  We still would have regular expressions to check for phishing URLs so we wouldn't lose coverage if we truncate.  IE only handles URLs of about 2k so that seems pretty safe.
Attachment #255872 - Flags: review?(bryner)
Comment on attachment 255872 [details] [diff] [review]
move canonicalizeURL to c++

>--- build/nsToolkitCompsCID.h	19 Oct 2006 02:24:29 -0000	1.21
>+++ build/nsToolkitCompsCID.h	21 Feb 2007 08:46:56 -0000
>+#define NS_URLCLASSIFIERUTILS_CONTRACTID \
>+    "@mozilla.org/url-classifier/urlutils;1"
>+

Seems like just "/utils" would be more consistent with the class name.

>--- /dev/null	1 Jan 1970 00:00:00 -0000
>+++ url-classifier/src/nsUrlClassifierUtils.cpp	21 Feb 2007 08:46:56 -0000
>+inline int int_to_hex_digit(int i) {
>+  NS_ASSERTION((i >= 0) && (i <= 15), "int too big in int_to_hex_digit");
>+  return ((i < 10) ? (i + '0') : ((i - 10) + 'A'));
>+}
>+
>+inline int hex_digit_to_int(char c) {
>+  NS_ASSERTION(isxdigit(c), "not a hex digit in hex_digit_to_int");
>+  return ((c >= 'a') ? ((c - 'a') + 10) :
>+          (c >= 'A') ? ((c - 'A') + 10) :
>+          (c - '0'));
>+}

I'd declare these "static", and lose the "inline" unless you know for sure that there's a big benefit to inlining and that the compiler won't do it for some reason.  Also, fix the brace style (beginning brace on a new line).

>+bool
>+nsUrlClassifierUtils::Decode(const nsACString & url, nsACString & _retval)

bool -> PRBool (unless the portability rules have changed recently).  Same in a few other spots.

>+bool
>+nsUrlClassifierUtils::SpecialEncode(const nsACString & url, nsACString & _retval)
>+{
>+  bool changed = false;
>+  const char* curChar = url.BeginReading();
>+  const char* end = url.EndReading();
>+
>+  while (curChar != end) {
>+    unsigned char c = static_cast<unsigned char>(*curChar);

NS_STATIC_CAST, same below and in the unit test.

>+bool
>+nsUrlClassifierUtils::ShouldURLEscape(const unsigned char c) {
>+  return c <=32 || c == '%' || c >=127;

space before 32.  And you could make this method const.

>--- /dev/null	1 Jan 1970 00:00:00 -0000
>+++ url-classifier/src/nsUrlClassifierUtils.h	21 Feb 2007 08:46:56 -0000
>+  // Disallow copy constructor
>+  nsUrlClassifierUtils(nsUrlClassifierUtils&);

I think the parameter needs to be "const" to override the built-in copy constructor.

>--- /dev/null	1 Jan 1970 00:00:00 -0000
>+++ url-classifier/tests/TestUrlClassifierUtils.cpp	21 Feb 2007 08:46:56 -0000
>@@ -0,0 +1,155 @@

You should probably put the standard license header here.

>+#include <stdio.h>
>+#include <ctype.h>
>+#include "nsString.h"
>+#include "nsUrlClassifierUtils.h"
>+
>+#define STRING_EQUALS(expected, actual) \
>+  { if (!(expected).Equals((actual))) { \
>+      fprintf(stderr, "FAILED: expected |%s| but was |%s|\n", \
>+              (expected).get(), (actual).get()); \
>+    } else { gPassedTests++; } \
>+    gTotalTests++; \
>+  }
>+

As it is, this could probably be a function rather than a macro, but as long as it's a macro, maybe you should use __FILE__ and __LINE__ to indicate where the error happened?  Also, the name of the macro kind of implies that it returns whether the strings are equal.  Maybe better to call it CHECK_EQUALS or something?

>+inline int int_to_hex_digit(int i) {
>+  NS_ASSERTION((i >= 0) && (i <= 15), "int too big in int_to_hex_digit");
>+  return ((i < 10) ? (i + '0') : ((i - 10) + 'A'));
>+}
>+

Same comments as in the other file.

>+void TestDecodeHelper(const char* in, const char* expected) {

brace to next line.
Attachment #255872 - Flags: review?(bryner) → review-
Attached patch v2: move canonicalizeURL to c++ (obsolete) — Splinter Review
All changes along with using print32 instead of int
Attachment #255872 - Attachment is obsolete: true
Attachment #256512 - Flags: review?(bryner)
(In reply to comment #4)
> >+bool
> >+nsUrlClassifierUtils::SpecialEncode(const nsACString & url, nsACString & _retval)
> >+{
> >+  bool changed = false;
> >+  const char* curChar = url.BeginReading();
> >+  const char* end = url.EndReading();
> >+
> >+  while (curChar != end) {
> >+    unsigned char c = static_cast<unsigned char>(*curChar);
> 
> NS_STATIC_CAST, same below and in the unit test.

Actually, I think this is okay now; see bug 348748.
Comment on attachment 256512 [details] [diff] [review]
v2: move canonicalizeURL to c++

Does it work to just have int_to_hex_digit return a char, instead?  That might save some casting.

Need to apply the PRInt32 fix to the unit test.

Looks good to check in with those changes.
Attachment #256512 - Flags: review?(bryner) → review+
Attached patch v3: move canonicalizeURL to c++ (obsolete) — Splinter Review
print32 in unittests and remove casts for int_to_hex_digit
Attachment #256512 - Attachment is obsolete: true
Attachment #256564 - Flags: superreview?(darin.moz)
Comment on attachment 256564 [details] [diff] [review]
v3: move canonicalizeURL to c++

>Index: url-classifier/src/nsUrlClassifierUtils.cpp

>+static char int_to_hex_digit(PRInt32 i)
>+{
>+  NS_ASSERTION((i >= 0) && (i <= 15), "int too big in int_to_hex_digit");
>+  return NS_STATIC_CAST(char, ((i < 10) ? (i + '0') : ((i - 10) + 'A')));
>+}
>+
>+static PRInt32 hex_digit_to_int(char c)
>+{
>+  NS_ASSERTION(isxdigit(c), "not a hex digit in hex_digit_to_int");
>+  return ((c >= 'a') ? ((c - 'a') + 10) :
>+          (c >= 'A') ? ((c - 'A') + 10) :
>+          (c - '0'));
>+}

I'd be surprised if these don't already exist somewhere else in Moz
or NSPR.


>+nsUrlClassifierUtils::CanonicalizeURL(const nsACString & url, nsACString & _retval)
>+{
>+  nsCString decodedUrl(url);
>+  nsCString temp;

nsCAutoString would probably be better here so that you benefit from
stack storage if the strings are short.


>+PRBool
>+nsUrlClassifierUtils::Decode(const nsACString & url, nsACString & _retval)
>+{
>+  PRBool changed = PR_FALSE;
>+
>+  const char* curChar = url.BeginReading();
>+  const char* end = url.EndReading();
>+
>+  _retval.Truncate();
>+  while (curChar != end) {
>+    if (*curChar == '%' && curChar + 1 != end && curChar + 2 != end && 
>+        isxdigit(*(curChar + 1)) && isxdigit(*(curChar + 2))) {
>+      // one hex digit == 4 bits,so need to shift the high hex digit up 4 bits.
>+      char c = (hex_digit_to_int(*(curChar + 1)) << 4 |
>+                hex_digit_to_int(*(curChar + 2)));
>+      _retval.Append(c);
>+      changed = PR_TRUE;
>+
>+      // Skip self plus two hex digits.
>+      curChar += 3;
>+    } else {
>+      _retval.Append(*curChar);
>+      curChar++;
>+    }
>+  }
>+  return changed;
>+}

Did you consider using one of the unescape routines from nsEscape.h?


>+PRBool
>+nsUrlClassifierUtils::SpecialEncode(const nsACString & url, nsACString & _retval)
>+{

It'd be good to document what is special about this encoding ;-)


>Index: url-classifier/src/nsUrlClassifierUtils.h

>+  // This function will encode all "special" characters in typical url encoding,
>+  // that is %hh where h is a valid hex digit.  The characters which are encoded
>+  // by this function are any ascii characters under 32(control characters and
>+  // space), 37(%), and anything 127 or above (special characters).  Url is the
>+  // string to encode, ret is the encoded string.  Function returns true if
>+  // ret != url.
>+  PRBool SpecialEncode(const nsACString & url, nsACString & _retval);

Oh, I see... perhaps a "link" to this comment should be near the implementation
in the .cpp file?
use nsEscape.h:NS_UnescapeURL instead of Decode, use nsCAutoString.  I didn't see any common code in nsEscape to hex encode an int.  It seems to be done manually in multiple places (similar implementation but uses an array of hex chars instead).
Attachment #256564 - Attachment is obsolete: true
Attachment #256601 - Flags: superreview?(darin.moz)
Attachment #256564 - Flags: superreview?(darin.moz)
Comment on attachment 256601 [details] [diff] [review]
v4: move canonicalizeURL to c++

>+class nsUrlClassifierUtils : public nsIUrlClassifierUtils
>+{
>+public:
>+  nsUrlClassifierUtils();
>+  ~nsUrlClassifierUtils() {};

Kill the trailing semi-colon.  It might cause compilation problems on some systems.

sr=darin
Attachment #256601 - Flags: superreview?(darin.moz) → superreview+
on trunk. branch patch coming up.

Checking in build/nsToolkitCompsCID.h;
/cvsroot/mozilla/toolkit/components/build/nsToolkitCompsCID.h,v  <--  nsToolkitCompsCID.h
new revision: 1.22; previous revision: 1.21
done
Checking in build/nsToolkitCompsModule.cpp;
/cvsroot/mozilla/toolkit/components/build/nsToolkitCompsModule.cpp,v  <--  nsToolkitCompsModule.cpp
new revision: 1.38; previous revision: 1.37
done
Checking in url-classifier/content/enchash-decrypter.js;
/cvsroot/mozilla/toolkit/components/url-classifier/content/enchash-decrypter.js,v  <--  enchash-decrypter.js
new revision: 1.12; previous revision: 1.11
done
Checking in url-classifier/content/trtable.js;
/cvsroot/mozilla/toolkit/components/url-classifier/content/trtable.js,v  <--  trtable.js
new revision: 1.14; previous revision: 1.13
done
Removing url-classifier/content/url-canonicalizer.js;
/cvsroot/mozilla/toolkit/components/url-classifier/content/url-canonicalizer.js,v  <--  url-canonicalizer.js
new revision: delete; previous revision: 1.5
done
Checking in url-classifier/public/Makefile.in;
/cvsroot/mozilla/toolkit/components/url-classifier/public/Makefile.in,v  <--  Makefile.in
new revision: 1.9; previous revision: 1.8
done
RCS file: /cvsroot/mozilla/toolkit/components/url-classifier/public/nsIUrlClassifierUtils.idl,v
done
Checking in url-classifier/public/nsIUrlClassifierUtils.idl;
/cvsroot/mozilla/toolkit/components/url-classifier/public/nsIUrlClassifierUtils.idl,v  <--  nsIUrlClassifierUtils.idl
initial revision: 1.1
done
Checking in url-classifier/src/Makefile.in;
/cvsroot/mozilla/toolkit/components/url-classifier/src/Makefile.in,v  <--  Makefile.in
new revision: 1.12; previous revision: 1.11
done
Checking in url-classifier/src/nsUrlClassifierTable.js;
/cvsroot/mozilla/toolkit/components/url-classifier/src/nsUrlClassifierTable.js,v  <--  nsUrlClassifierTable.js
new revision: 1.5; previous revision: 1.4
done
RCS file: /cvsroot/mozilla/toolkit/components/url-classifier/src/nsUrlClassifierUtils.cpp,v
done
Checking in url-classifier/src/nsUrlClassifierUtils.cpp;
/cvsroot/mozilla/toolkit/components/url-classifier/src/nsUrlClassifierUtils.cpp,v  <--  nsUrlClassifierUtils.cpp
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/toolkit/components/url-classifier/src/nsUrlClassifierUtils.h,v
done
Checking in url-classifier/src/nsUrlClassifierUtils.h;
/cvsroot/mozilla/toolkit/components/url-classifier/src/nsUrlClassifierUtils.h,v  <--  nsUrlClassifierUtils.h
initial revision: 1.1
done
Checking in url-classifier/tests/Makefile.in;
/cvsroot/mozilla/toolkit/components/url-classifier/tests/Makefile.in,v  <--  Makefile.in
new revision: 1.5; previous revision: 1.4
done
RCS file: /cvsroot/mozilla/toolkit/components/url-classifier/tests/TestUrlClassifierUtils.cpp,v
done
Checking in url-classifier/tests/TestUrlClassifierUtils.cpp;
/cvsroot/mozilla/toolkit/components/url-classifier/tests/TestUrlClassifierUtils.cpp,v  <--  TestUrlClassifierUtils.cpp
initial revision: 1.1
done
Checking in url-classifier/tests/test_bug356355.xhtml;
/cvsroot/mozilla/toolkit/components/url-classifier/tests/test_bug356355.xhtml,v  <--  test_bug356355.xhtml
new revision: 1.5; previous revision: 1.4
done
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Flags: in-testsuite+
Attached patch patch for branchSplinter Review
same as v4: move canonicalizeURL to C++ sans mochitest bits
Attachment #256730 - Flags: superreview?(darin.moz)
Attachment #256730 - Flags: approval1.8.1.3?
robcee/waldo: are these new c++ tests running in the TUnit tinderbox run?
https://bugzilla.mozilla.org/attachment.cgi?id=256601&action=diff#mozilla/toolkit/components/url-classifier/tests/Makefile.in_sec1

...seems to indicate that they are (see CPPSRCS and SIMPLE_PROGRAMS).  Then again, nothing's actually being tested about the programs -- not the return value, not anything printed to stdout/stderr, not that they don't crash, etc. -- so if this tree were to fall, no one would hear it.  (This might even be standard operating procedure for compiled tests; I don't recall seeing any that included some sort of halt-the-build failure mechanism, but I could have just forgotten.)
Yes, they need to be added explicitly to a make check target.
Flags: in-testsuite+ → in-testsuite?
Attached patch non-zero return code on error (obsolete) — Splinter Review
Following the instructions on the d.m.o, return a non-zero error code when a test fails.
http://developer.mozilla.org/en/docs/How_to_add_a_build-time_test
Attachment #256775 - Flags: review?(benjamin)
Comment on attachment 256775 [details] [diff] [review]
non-zero return code on error

Err, this doesn't quite work. I'll try something else.
Attachment #256775 - Attachment is obsolete: true
Attachment #256775 - Flags: review?(benjamin)
Returning a non-zero code works fine so long as it's checked.  Prepending running the test with $(EXIT_ON_ERROR) should (untested) suffice to make the build halt for such a failure.  (It expands to |set -e;|, which according to the bash man page causes execution to stop if a command exits with non-zero status.)

Benjamin, is EXIT_ON_ERROR in config/rules.mk a "public" definition for makefiles?  The number of makefiles currently using it is small enough that it's possible they should be using something else.  If it's public, we probably need to prepend that before all compiled-code tests that run so that if they fail they break the build.
well, it's not needed for most makefiles - you only need it when you use for loops or something like that (because if one line in a makefile has a nonzero exit status, it fails, so the only cases that need it is if you execute multiple stuff in a single line)
This seems to work.  Going to submit since it's just a test change.
Attachment #256730 - Flags: superreview?(darin.moz) → superreview+
Verified.  Time is now less than 3 seconds over here for sure; possibly less than one second.
Status: RESOLVED → VERIFIED
Flags: blocking1.8.1.4? → blocking1.8.1.4+
Flags: blocking-firefox3?
Comment on attachment 256730 [details] [diff] [review]
patch for branch

Once upon a time we tried pushing nsPIFoo for private interfaces, you could have used that for nsPIUrlClassifierUtils to reinforce the "DONT USE THIS DIRECTLY" comment. Never really caught on, but seemed like a useful distinction.

Given the size of the patch we'd like a second review before landing on the branch.
Attachment #256730 - Flags: review?
Comment on attachment 256730 [details] [diff] [review]
patch for branch

approved for 1.8.1.4, a=dveditz for release-drivers
Attachment #256730 - Flags: review?
Attachment #256730 - Flags: approval1.8.1.4?
Attachment #256730 - Flags: approval1.8.1.4+
on branch
Keywords: fixed1.8.1.4
Filed bug 379390 on a very similar issue posted to Full-disclosure
Summary: A very long URI hangs under phishing protection code → A very long URI ('/////...') hangs under phishing protection code
Keywords: hang
verified fixed 1.8.1.4 using :

Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.4pre) Gecko/20070508 BonEcho/2.0.0.4pre ID:2007050804

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.4) Gecko/2007050105 Firefox/2.0.0.4 Linux Fedora FC 6

Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.4) Gecko/2007050106 Firefox/2.0.0.4 (RC1)

and Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.4) Gecko/2007050106 Firefox/2.0.0.4

no hang on testcase, Firefox works stable and normal - adding verified keyword

Product: Firefox → Toolkit
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: