570975 - Don't convert UTF-8 strings to UTF-16 in MatchAutoCompleteFunction

Reporter

Description

•

14 years ago

Attached file Profile — Details

I've seen this problem on a few occasions, and now I have a useful profile over it.  With very large strings inside the location bar, Firefox starts to use 100% of the CPU, and becomes completely unresponsive, to the point that you need to kill it.

We spend 80% of our time in mozilla::places::MatchAutoCompleteFunction::OnfunctionCall.  from that, 55% of the time is spent in mozilla::storage::ArgValueArray::GetString, and 20% is spent in nsAString_internal::Assign.  The GetString call breaks down to 35% in sqlite3VdbeChangeEncoding.

I'm attaching the Shark profile.  What I did to reproduce it was open a very large URL, and click in the location bar.  I *tried* to close the tab as well, but I'm not sure if Firefox got stuck before the close tab command could be processed or not.

Shawn Wilsher :sdwilsh

Comment 1

•

14 years ago

So, the problem here is that we do a whole bunch of conversion between UTF-8 and UTF-16.  I'm throwing together a patch which should fix this, but something in the back of my mind tells me there was a reason we needed to use UTF-16...

Shawn Wilsher :sdwilsh

Comment 2

•

14 years ago

We can make location bar searches use less CPU and go faster here for Firefox 4 (hell, we could backport this if the fix is as simple as I think it is).  I think this should block as a result of this feature being used a bunch.

Assignee: nobody → sdwilsh

Status: NEW → ASSIGNED

blocking2.0: --- → ?

(no longer active)

Reporter

Comment 3

•

14 years ago

I've had *many* performance problems with the location bar.  Are you also interested in other bug reports coming from me in that regard?  I'm pretty sure that they're all places, but I don't have profiles right not to back up my assertion.

Shawn Wilsher :sdwilsh

Comment 4

•

14 years ago

(In reply to comment #3)
> I've had *many* performance problems with the location bar.  Are you also
> interested in other bug reports coming from me in that regard?  I'm pretty sure
> that they're all places, but I don't have profiles right not to back up my
> assertion.
Yes, especially if you can attach profiles.  I'm not seeing this stuff, but I do occasionally here people complaining about it.

Shawn Wilsher :sdwilsh

Comment 5

•

14 years ago

Attached patch v1.0 (obsolete) — Details — Splinter Review

Use only UTF-8 instead of doing a bunch of conversions.  We should have someone who is better familiar with our string apis to take a look at this too.  Finding someone now...

Attachment #450197 - Flags: review?(mak77)

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Comment 6

•

14 years ago

Comment on attachment 450197 [details] [diff] [review]
v1.0

>diff --git a/toolkit/components/places/src/SQLFunctions.cpp b/toolkit/components/places/src/SQLFunctions.cpp

>+    if (IsUTF8(unescapedSpec)) {
>+      _fixedSpec.Assign(unescapedSpec);
>+    }
>+    else {
>+      _fixedSpec.Assign(aURISpec);
>+    }
> 
>-    if (StringBeginsWith(_fixedSpec, NS_LITERAL_STRING("http://")))
>+    if (StringBeginsWith(_fixedSpec, NS_LITERAL_CSTRING("http://")))
>       _fixedSpec.Cut(0, 7);
>-    else if (StringBeginsWith(_fixedSpec, NS_LITERAL_STRING("https://")))
>+    else if (StringBeginsWith(_fixedSpec, NS_LITERAL_CSTRING("https://")))
>       _fixedSpec.Cut(0, 8);
>-    else if (StringBeginsWith(_fixedSpec, NS_LITERAL_STRING("ftp://")))
>+    else if (StringBeginsWith(_fixedSpec, NS_LITERAL_CSTRING("ftp://")))
>       _fixedSpec.Cut(0, 6);

inconsistent braces, I know you said me it is fixed locally, just a reminder.


>diff --git a/toolkit/components/places/src/SQLFunctions.h b/toolkit/components/places/src/SQLFunctions.h

>  * The Initial Developer of the Original Code is
>- * Mozilla Corporation.
>+ * Mozilla Foundation.

nit: "the Mozilla Foundation" iirc

looks good, and the perf gain is nice.

Attachment #450197 - Flags: review?(mak77) → review+

Shawn Wilsher :sdwilsh

Comment 7

•

14 years ago

vlad checked this out on irc, so I'll land this shortly.

Shawn Wilsher :sdwilsh

Comment 8

•

14 years ago

Unfortunately, this patch isn't going to work so well:
NeilAway> sdwilsh: ok, so the difference between Strings and CStrings is that case insensitive comparison on CStrings is ascii-only

This would be a regression...

Shawn Wilsher :sdwilsh

Updated

•

14 years ago

Depends on: 571098

Shawn Wilsher :sdwilsh

Comment 9

•

14 years ago

Comment on attachment 450197 [details] [diff] [review]
v1.0

This patch is no good for non-ASCII...

Attachment #450197 - Attachment is obsolete: true

Attachment #450197 - Flags: review-

Shawn Wilsher :sdwilsh

Comment 10

•

14 years ago

This bug can't be fixed until some resolution of bug 571098.

Assignee: sdwilsh → nobody

Status: ASSIGNED → NEW

Shawn Wilsher :sdwilsh

Updated

•

14 years ago

Depends on: 145975
No longer depends on: 571098

Dietrich Ayala (:dietrich)

Comment 11

•

14 years ago

(In reply to comment #10)
> This bug can't be fixed until some resolution of bug 571098.

which is a dupe of bug 145975.

Dietrich Ayala (:dietrich)

Comment 12

•

14 years ago

i'm not clear on how widespread of a problem this is. we don't know yet if this is a regression or if it's been here since 3.0.

we also have no clear owner or path to fixing this (bug 145975 is ancient and unfixed).

i'm not going to mark this blocking at this time.

if we can get details from support that this occurs often enough, or 100% reproducible STR that'll tell us how often it might be hit, then i'll re-evaluate blocking at that time.

blocking2.0: ? → -

Timothy Nikkel (:tnikkel)

Comment 14

•

14 years ago

If this bug is indeed the same as bug 526727, I had 100% reproducible STR in bug 526727, and I even found the regression range. It is new since 3.5 at least. Asking for blocking again to re-evaluate based on this.

blocking2.0: - → ?

(no longer active)

Reporter

Updated

•

14 years ago

Keywords: regression

Dave Townsend [:mossop]

Comment 15

•

14 years ago

Pasting data urls into the address bar is not a common action so I wouldn't block based on that.

blocking2.0: ? → -

Justin Lebar (not reading bugmail)

Assignee

Updated

•

14 years ago

Summary: Firefox chews 100% of CPU with large strings inside the location bar → Don't convert UTF-8 strings to UTF-16 in MatchAutoCompleteFunction

Justin Lebar (not reading bugmail)

Assignee

Comment 16

•

14 years ago

Attached patch Patch v2 (obsolete) — Details — Splinter Review

This patch is based on Patch v3.3 (attachment 468545 [details] [diff] [review]) in bug 145975.

I'll look at the perf difference in a moment.

Justin Lebar (not reading bugmail)

Assignee

Comment 17

•

14 years ago

Well, the perf difference is...big.  I guess we could time how long these queries take to run.  But we're cutting out a whole hot function, ConvertUTF8toUTF16.  Also, UTF8CharEnumerator no longer shows up in profiles either.  Those together are about equal to the overhead of pthreads lock / unlock I see.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

14 years ago

Blocks: 590376

Justin Lebar (not reading bugmail)

Assignee

Comment 18

•

14 years ago

Attached patch Patch v3 (obsolete) — Details — Splinter Review

Built on top of bug 145975, patch v4.

The UTF-8 findOnBoundary is about 1.8x faster for ASCII strings than converting to UTF-16 and comparing.  Ignoring the UTF-8 -> UTF-16 conversion overhead, the new findOnBoundary function is about as fast as the old one for ASCII strings.

Unfortunately, for unicode strings, the new code is slower by a factor of 0.6, including the UTF-8 -> UTF-16 overhead of the old code.

Since URLs are ASCII and both URLs and page titles go through MatchAutoCompleteFunctions::findOnBoundary, I'd expect this to roughly be a wash for users with lots of non-ASCII pages in their history, and a clear win for users with mostly ASCII text in their histories.

We can probably do better, though.  Bug 590376 may be a start.

Attachment #468597 - Attachment is obsolete: true

Shawn Wilsher :sdwilsh

Comment 19

•

14 years ago

(In reply to comment #18)
> Since URLs are ASCII and both URLs and page titles go through
> MatchAutoCompleteFunctions::findOnBoundary, I'd expect this to roughly be a
> wash for users with lots of non-ASCII pages in their history, and a clear win
> for users with mostly ASCII text in their histories.
URLs, page/bookmark titles, tags, and the search string to be more exact

Shawn Wilsher :sdwilsh

Comment 20

•

14 years ago

(In reply to comment #18)
> Since URLs are ASCII and both URLs and page titles go through
> MatchAutoCompleteFunctions::findOnBoundary, I'd expect this to roughly be a
> wash for users with lots of non-ASCII pages in their history, and a clear win
> for users with mostly ASCII text in their histories.
Also, URLs are not required to be ASCII.

Justin Lebar (not reading bugmail)

Assignee

Comment 21

•

14 years ago

Turns out the UTF8 case-insensitive comparison code was missing a crucial fastpath.  I've added it back in, and now I'm seeing a speedup of 2.6x on ASCII and 2.1x on Unicode strings.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

14 years ago

Blocks: 593893

Justin Lebar (not reading bugmail)

Assignee

Comment 22

•

14 years ago

Attached patch Patch v4 (obsolete) — Details — Splinter Review

New patch, based on patch v5 in Bug 145975.  Benchmark results in a moment.

Attachment #471355 - Attachment is obsolete: true

Justin Lebar (not reading bugmail)

Assignee

Comment 23

•

14 years ago

I'm seeing a 1.5x - 2x speedup in findOnBoundary and findAnywhere, depending on the workload.  Also, just -O3 gives about a 1.5x speedup across the board (including for the current UTF-16 routines).

Full results are in bug 145975 comment 77.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

14 years ago

Attachment #472876 - Flags: review?(mak77)

Attachment #472876 - Flags: feedback?(sdwilsh)

Shawn Wilsher :sdwilsh

Comment 24

•

14 years ago

(In reply to comment #23)
> I'm seeing a 1.5x - 2x speedup in findOnBoundary and findAnywhere, depending on
> the workload.  Also, just -O3 gives about a 1.5x speedup across the board
> (including for the current UTF-16 routines).
Is that -O3 for just this file?

Justin Lebar (not reading bugmail)

Assignee

Comment 25

•

14 years ago

(In reply to comment #24)
> Is that -O3 for just this file?

It's -O3 everywhere.  I suspect that we'd see a similar speedup using -O3 just on that file and in nsUnicharUtils.cpp, but it's somewhat tricky to specify -O3 on nsUnicharUtils.cpp alone, since it gets built twice, both with and without PIC, and once is in toolkit/library, by way of some dependency...  Anyway, it could be done if wanted to, but I didn't try after I saw that it wasn't easy.

I should say that my 1.5x - 2x speedup is in *addition* to the 1.5x speedup you get from -O3.  I'm comparing UTF-16 convert and compare with -O3 to UTF-8 compare with -O3.

Shawn Wilsher :sdwilsh

Comment 26

•

14 years ago

Could you maybe provide an interdiff between my changes and yours?  It's hard to tell what's different to be honest.

Justin Lebar (not reading bugmail)

Assignee

Comment 27

•

14 years ago

I unbitrotted patch v1, but interdiff still can't handle it.  Sorry.  I tried Brendan's hack, `diff patch1 patch2 | grep '[^<>] [-+]'` but the output is basically garbage, at least to my eyes.

fixupURISpec and OnFunctionCall are unchanged.  The header file has just lost a few functions, but is otherwise unchanged.  The rest is substantially different.

Shawn Wilsher :sdwilsh

Comment 28

•

14 years ago

(In reply to comment #27)
> I unbitrotted patch v1, but interdiff still can't handle it.  Sorry.  I tried
> Brendan's hack, `diff patch1 patch2 | grep '[^<>] [-+]'` but the output is
> basically garbage, at least to my eyes.
> 
> fixupURISpec and OnFunctionCall are unchanged.  The header file has just lost a
> few functions, but is otherwise unchanged.  The rest is substantially
> different.
Alright, I can do the review of this, but probably next week.

Marco Bonardo [:mak] (Away Apr 25 - May 5)

Comment 29

•

14 years ago

(In reply to comment #28)
> Alright, I can do the review of this, but probably next week.

feel free to steal it from me, since I'm starting thinking I won't be able to go at it till wednesday.

Justin Lebar (not reading bugmail)

Assignee

Comment 30

•

14 years ago

Attached patch Patch v4.1 (obsolete) — Details — Splinter Review

Minor unbitrotting.

Attachment #472876 - Attachment is obsolete: true

Attachment #474273 - Flags: review?(sdwilsh)

Attachment #472876 - Flags: review?(mak77)

Attachment #472876 - Flags: feedback?(sdwilsh)

Justin Lebar (not reading bugmail)

Assignee

Comment 31

•

14 years ago

I wonder if this 1.5x - 2x location bar perf win is something we should block on.  If the current trajectory is to be believed, if we don't mark it as a blocker, it (and its dependency, bug 145975) probably won't get reviewed.

blocking2.0: - → ?

Shawn Wilsher :sdwilsh

Comment 32

•

14 years ago

(In reply to comment #31)
> I wonder if this 1.5x - 2x location bar perf win is something we should block
> on.  If the current trajectory is to be believed, if we don't mark it as a
> blocker, it (and its dependency, bug 145975) probably won't get reviewed.
Yeah, I can't see myself or mak really having cycles for this given the open blocker count right now...

Mike Beltzner [:beltzner, not reading bugmail]

Comment 33

•

14 years ago

So, if the risk involved is low and it's just a matter of review cycles, I actually think we *should* block on this.

Perception of performance is a major goal for Firefox 4, and a 2x more responsive awesomebar would go a long way to that end.

Shawn Wilsher :sdwilsh

Comment 34

•

14 years ago

(In reply to comment #33)
> Perception of performance is a major goal for Firefox 4, and a 2x more
> responsive awesomebar would go a long way to that end.
Note that that is 2x for ASCII text, less for non-ASCII.  Let's not let our English speaking bias over-inflate the win here, since I know we have a lot of users who don't speak or type English (although many URls are certainly predominantly ASCII).

Justin Lebar (not reading bugmail)

Assignee

Comment 35

•

14 years ago

(In reply to comment #34)
> Note that that is 2x for ASCII text, less for non-ASCII.  Let's not let our
> English speaking bias over-inflate the win here, since I know we have a lot of
> users who don't speak or type English (although many URls are certainly
> predominantly ASCII).

See bug 145975 comment 77: It's 2x for ASCII and 1.6-1.9x faster for the Unicode strings I was testing.

Since FindOnBoundary doesn't really understand non-Latin text, I'd expect bug 593893 to have a larger impact on Unicode text than on Latin text.  (I might also expect it to have a larger impact on snappiness than this bug, to be honest.)

Dietrich Ayala (:dietrich)

Comment 36

•

14 years ago

Up to 2x awesomebar perf improvement, fixing a perf regression, patches in hand, just waiting on review -> blocking+. We can re-evaluate if the review result increases the known risk of the change.

Marco/Shawn: is there something about this patch that makes it a difficult or time-consuming review?

blocking2.0: ? → betaN+

Justin Lebar (not reading bugmail)

Assignee

Comment 37

•

14 years ago

FWIW, bug 145975 is probably the more-difficult one to review.

Shawn Wilsher :sdwilsh

Comment 38

•

14 years ago

(In reply to comment #36)
> Marco/Shawn: is there something about this patch that makes it a difficult or
> time-consuming review?
I haven't spent much time looking at non-blocking stuff because I'm buried in blockers right now.  Will skim it tomorrow and let you know.

Shawn Wilsher :sdwilsh

Comment 39

•

14 years ago

(In reply to comment #37)
> FWIW, bug 145975 is probably the more-difficult one to review.
Also, needs that bug, which was marked blocking-

Justin Lebar (not reading bugmail)

Assignee

Comment 40

•

14 years ago

(In reply to comment #39)
> Also, needs that bug, which was marked blocking-

Not to get off topic, but was bug 145975 ever marked blocking-?  It's currently blocking?, but maybe it was blocking- at some point in the past.

Shawn Wilsher :sdwilsh

Comment 41

•

14 years ago

(In reply to comment #40)
> Not to get off topic, but was bug 145975 ever marked blocking-?  It's currently
> blocking?, but maybe it was blocking- at some point in the past.
bsmedberg marked it blocking- on 2010-08-23 10:06:25 PDT

Shawn Wilsher :sdwilsh

Comment 42

•

14 years ago

(In reply to comment #38)
> I haven't spent much time looking at non-blocking stuff because I'm buried in
> blockers right now.  Will skim it tomorrow and let you know.
Hey look!  I lied.  To make up for it, I'm doing the full review now.

Shawn Wilsher :sdwilsh

Comment 43

•

14 years ago

Comment on attachment 474273 [details] [diff] [review]
Patch v4.1

I'll note that your re-ordering of some methods in these files has made the diff unnecessarily complex.  In the next revision, could you please undo that?  It's making it more difficult to review as a result.

>+++ b/toolkit/components/places/src/SQLFunctions.cpp
>-  /* static */
>-  void
>+   /* static */
>+   void
please don't change the indentation here.  It should line up with MatchAutoCompleteFunction::fixupURISpec

>   MatchAutoCompleteFunction::fixupURISpec(const nsCString &aURISpec,
>-                                          nsString &_fixedSpec)
>+                                          nsCString &_fixedSpec)
>   {
>     nsCString unescapedSpec;
>     (void)NS_UnescapeURL(aURISpec, esc_SkipControl | esc_AlwaysCopy,
>                          unescapedSpec);
> 
>-    // If this unescaped string is valid UTF-8, we'll convert it.  Otherwise,
>+    // If this unescaped string is valid UTF-8, we'll use it.  Otherwise,
>     // we will simply convert our original string.
You'll want to change the use of convert in the second line too

>+  MatchAutoCompleteFunction::findBeginning(const nsDependentCSubstring &aToken,
>+                                           const nsACString &aSourceString)
>   {
>+    // We can't use StringBeginsWith here, unfortunately.  Although it will
>+    // happily take a case-insensitive UTF8 comparator, it eventually calls
>+    // nsACString::Equals, which checks that the two strings contain the same
>+    // number of bytes before calling the comparator.  This is clearly not what
>+    // we want.
>+
>+    if (aToken.IsEmpty())
>+      return true;
nit: brace this please

>+
>+    const_char_iterator tokenStart(aToken.BeginReading()),
>+                        tokenEnd(aToken.EndReading()),
>+                        sourceStart(aSourceString.BeginReading()),
>+                        sourceEnd(aSourceString.EndReading());
>+
>+    // We didn't check that the source was non-empty, so check upfront in the loop.
I think it'd be more clear to do the check explicitly.

>+    PRBool err;
>+    while (sourceStart < sourceEnd &&
>+           CaseInsensitiveUTF8CharsEqual(sourceStart, tokenStart,
>+                                         sourceEnd, tokenEnd,
>+                                         &sourceStart, &tokenStart, &err)) {
sad to see that this uses PRBool - can we still change it to bool?

>+      // We found the token!
>+      if (tokenStart >= tokenEnd)
>+        return true;
nit: brace please

>+  // findAnywhere and findOnBoundary do almost the same thing, so it's natural
>+  // to implement them in terms of a single function.  They're both
>+  // performance-critical functions, however, and checking aOnlyOnBoundary
>+  // makes them a bit slower.  Our solution is to define findInString as
>+  // NS_ALWAYS_INLINE and rely on the compiler to optimize out the
>+  // aOnlyOnBoundary check.
>+  static NS_ALWAYS_INLINE bool
>+  findInString(const nsDependentCSubstring &aToken,
>+               const nsACString &aSourceString,
>+               bool aOnlyOnBoundary)
So, this function is static to the file, yeah?  Places code would prefer you to use an anonymous namespace, and place it at the top in a heading for anonymous helpers like here:
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/places/src/History.cpp#157

>   {
>+    // It's not particularly meaningful to call this function with an empty
>+    // token, but CaseInsensitiveUTF8CharsEqual assumes that there's at least
>+    // one byte left in the both strings, so we should check.
I hope it asserts this invariant.

>+    if (aToken.IsEmpty())
>+      return true;
nit: brace please

>+    nsACString::const_char_iterator tokenStart(aToken.BeginReading()),
>+                                    tokenEnd(aToken.EndReading()),
>+                                    sourceStart(aSourceString.BeginReading()),
>+                                    sourceEnd(aSourceString.EndReading());
Of course, if you make it a static method on the class, you can use the class typedef so you don't have super long lines

> 
>     // The start of aSourceString is considered a word boundary, so start there.
>     do {
>+      // We are on a word boundary (if aOnlyOnBoundary is true).  See if aToken
>+      // matches sourceStart.
one of these comments is redundant, I think

>+      // Check whether the first character in the token matches the character
>+      // at sourceStart.  At the same time, get a pointer to the next character
>+      // in both the token and the source.
>+      nsACString::const_char_iterator sourceNext, tokenCur;
>+      PRBool err;
>+      if (CaseInsensitiveUTF8CharsEqual(sourceStart, tokenStart,
>+                                        sourceEnd, tokenEnd,
>+                                        &sourceNext, &tokenCur, &err)) {
> 
>+        // Don't need to check err here -- if CaseInsensitiveUTF8CharCompare
>+        // returns err, it'll also return false, and we'll catch err outside
>+        // the if.
I think that in both cases we should just call err dummy, or maybe CaseInsensitiveUTF8CharsEqual should make passing that in optional (I would prefer the latter).  Seems like we don't usually need to know error information, so no need in making consumers have to pass something in for it.

>+      // If something went wrong above, get out of here!
>+      if (NS_UNLIKELY(err))
>+        return false;
Although, we seem to care here about it.  Can we use "error" as the variable name instead please?

>+      if (aOnlyOnBoundary &&
>+          (('a' <= *sourceStart && *sourceStart <= 'z') ||
>+           ('A' <= *sourceStart && *sourceStart <= 'Z'))) {
We added the isWordBoundary method for code readability.  In theory, it'd be inlined anyway.  Can we keep it please?

>+        // Advance to the next word boundary.  We consider anything that's not
>+        // [a-z] to be a word boundary -- this lets us match CamelCase words.
>+        //
>+        // Since we'll halt as soon as we see a non-ASCII letter we can do a
>+        // simple byte-by-byte comparison and avoid the overhead of a
>+        // UTF8CharEnumerator.
>+
>+        sourceStart++;
>+        while (sourceStart < sourceEnd && 'a' <= *sourceStart && *sourceStart <= 'z')
>+          sourceStart++;
ditto with nextWordBoundary

>+  bool
>+  MatchAutoCompleteFunction::findAnywhere(const nsDependentCSubstring &aToken,
>+                                          const nsACString &aSourceString)
>+    // We can't use FindInReadable here; it works only for ASCII.
>+
>+    return findInString(aToken, aSourceString, false);
I'd actually prefer to pass in an enum to findInString to dictate behavior as it's more readable at the call site.

This is mostly r- because I want to see it again.  This looks OK though, but one more iteration please.

Attachment #474273 - Flags: review?(sdwilsh) → review-

Justin Lebar (not reading bugmail)

Assignee

Comment 44

•

14 years ago

(In reply to comment #43)

Thanks for the review!  I'll put up a new patch soon -- for now, a few comments:

> >+    const_char_iterator tokenStart(aToken.BeginReading()),
> >+                        tokenEnd(aToken.EndReading()),
> >+                        sourceStart(aSourceString.BeginReading()),
> >+                        sourceEnd(aSourceString.EndReading());
> >+
> >+    // We didn't check that the source was non-empty, so check upfront in the loop.
> I think it'd be more clear to do the check explicitly.

Here's the full code:

> +    while (sourceStart < sourceEnd &&
> +           CaseInsensitiveUTF8CharsEqual(...)) {
> +
> +      if (tokenStart >= tokenEnd)
> +        return true;
> +    }
> +
> +    return false;

Do you mean changing it to

  if (sourceStart < sourceEnd)
    return false;

  while (CaseInsensitiveUTF8CharsEqual(...) {
    if (tokenStart >= tokenEnd)
      return true;

    if (sourceStart < sourceEnd)
      return false;
  }

  return false;

?  This seems worse to me.

> >+    PRBool err;
> >+    while (sourceStart < sourceEnd &&
> >+           CaseInsensitiveUTF8CharsEqual(sourceStart, tokenStart,
> >+                                         sourceEnd, tokenEnd,
> >+                                         &sourceStart, &tokenStart, &err)) {
> sad to see that this uses PRBool - can we still change it to bool?

I'm not really familiar with our rules for PRBool vs. bool, but I believe that
CaseInsensitiveUTF8CharsEqual lives somewhere where PRBool is required, or is
exported in such a way as to require PRBool.  I could check if you can direct
me to the rules.

> >   {
> >+    // It's not particularly meaningful to call this function with an empty
> >+    // token, but CaseInsensitiveUTF8CharsEqual assumes that there's at least
> >+    // one byte left in the both strings, so we should check.
> I hope it asserts this invariant.

It does, although only in debug mode.

> >+    nsACString::const_char_iterator tokenStart(aToken.BeginReading()),
> >+                                    tokenEnd(aToken.EndReading()),
> >+                                    sourceStart(aSourceString.BeginReading()),
> >+                                    sourceEnd(aSourceString.EndReading());
> Of course, if you make it a static method on the class, you can use the class
> typedef so you don't have super long lines

I'll see if I can NS_ALWAYS_INLINE a class-static method without putting the implementation in the header file.  If I can't, I'll put it in an anonymous namespace.

> >+      // Check whether the first character in the token matches the character
> >+      // at sourceStart.  At the same time, get a pointer to the next character
> >+      // in both the token and the source.
> >+      nsACString::const_char_iterator sourceNext, tokenCur;
> >+      PRBool err;
> >+      if (CaseInsensitiveUTF8CharsEqual(sourceStart, tokenStart,
> >+                                        sourceEnd, tokenEnd,
> >+                                        &sourceNext, &tokenCur, &err)) {
> > 
> >+        // Don't need to check err here -- if CaseInsensitiveUTF8CharCompare
> >+        // returns err, it'll also return false, and we'll catch err outside
> >+        // the if.
> I think that in both cases we should just call err dummy, or maybe
> CaseInsensitiveUTF8CharsEqual should make passing that in optional (I would
> prefer the latter).  Seems like we don't usually need to know error
> information, so no need in making consumers have to pass something in for it.

I'd prefer to keep the error parameter as required so that ignoring it is an
explicit decision.

> >+      if (aOnlyOnBoundary &&
> >+          (('a' <= *sourceStart && *sourceStart <= 'z') ||
> >+           ('A' <= *sourceStart && *sourceStart <= 'Z'))) {
> We added the isWordBoundary method for code readability.  In theory, it'd be
> inlined anyway.  Can we keep it please?
>
> [...]
>
> ditto with nextWordBoundary

I took these out because I was initially pretty confused by them.

  if (!isWordBoundary(ToLowerCase(*sourceStart++)))
    sourceStart = nextWordBoundary(sourceStart, sourceEnd);

What's a word boundary?  Is it whitespace, or whitespace and punctuation?
Numbers too?  Why do we have to convert *sourceStart to lower-case?  What does
case have to do with word boundaries, anyway?  And if it's *next* word
boundary, why do we have to *sourceStart++?

At least, that's what went through my head when I read the original code.  I
figured that it was pretty reasonable to factor it out when it was called from
two places, but now that it appears only once, it's clearer to me to write it
explicitly.

That said, it's not a big deal either way.

Justin Lebar (not reading bugmail)

Assignee

Comment 45

•

14 years ago

(In reply to comment #43)
> I'll note that your re-ordering of some methods in these files has made the
> diff unnecessarily complex.  In the next revision, could you please undo that? 
> It's making it more difficult to review as a result.

I agree that this is a nasty diff, although I feel like it's more important that the code itself follows a logical order than that this patch is easy to read.  I'll see if patience diff helps, and if not, I'll see if I can fix the order to make a nicer diff.

Shawn Wilsher :sdwilsh

Comment 46

•

14 years ago

(In reply to comment #44)
> Do you mean changing it to
No, I mean before the loop, do:
if (aSourceString.Length() == 0) {
  return false;
}
because that is what the comment led me to believe that you are checking.  However, now that I think about it, we should never ever have a source string having zero length (and should assert that).  I think maybe your comment is not properly explaining what you are actually checking (it is to some degree looking at the code you posted, but I didn't realize you cared about the second case based on the comment)?

> I'm not really familiar with our rules for PRBool vs. bool, but I believe that
> CaseInsensitiveUTF8CharsEqual lives somewhere where PRBool is required, or is
> exported in such a way as to require PRBool.  I could check if you can direct
> me to the rules.
There's nothing set in stone.   It's usually module owners discretion, and we can use it anywhere that we do not cross XPConnect boundaries.

> I took these out because I was initially pretty confused by them.
> 
>   if (!isWordBoundary(ToLowerCase(*sourceStart++)))
>     sourceStart = nextWordBoundary(sourceStart, sourceEnd);
> 
> What's a word boundary?  Is it whitespace, or whitespace and punctuation?
> Numbers too?  Why do we have to convert *sourceStart to lower-case?  What does
> case have to do with word boundaries, anyway?  And if it's *next* word
> boundary, why do we have to *sourceStart++?
Right, but all these questions is exactly why we factored it out into its own method in the first place.  The consumer only cares about it being on a word boundary, whatever we define that to be.  We let isWordBoundary dictate that for us, and it makes the consumer more readable.  Right now we have to read a bunch of code in the would-be-consumer that is basically "determine if this is a word boundary".
These search functions are complicated, so we tried to make them as simple as possible to read and understand at a high level, and if you wanted to drill down, see the helper methods.  I can find the review comment where I was asked factor this stuff out in the first place if you'd like since I think it had a better argument for it (I had the same opinion as you at one point).

> At least, that's what went through my head when I read the original code.  I
> figured that it was pretty reasonable to factor it out when it was called from
> two places, but now that it appears only once, it's clearer to me to write it
> explicitly.
Although, it's also true that we wanted it factored out because it was in two places, that wasn't the only reason.

Justin Lebar (not reading bugmail)

Assignee

Comment 47

•

14 years ago

(In reply to comment #46)
> No, I mean before the loop, do:
> if (aSourceString.Length() == 0) {
>   return false;
> }
> because that is what the comment led me to believe that you are checking. 
> However, now that I think about it, we should never ever have a source string
> having zero length (and should assert that).

Okay.

> I think maybe your comment is not
> properly explaining what you are actually checking (it is to some degree
> looking at the code you posted, but I didn't realize you cared about the second
> case based on the comment)?

Ah, I see.  Yes, that comment is misleading.  Maybe we should just take it out.
 
> > I'm not really familiar with our rules for PRBool vs. bool, but I believe that
> > CaseInsensitiveUTF8CharsEqual lives somewhere where PRBool is required, or is
> > exported in such a way as to require PRBool.  I could check if you can direct
> > me to the rules.
> There's nothing set in stone.   It's usually module owners discretion, and we
> can use it anywhere that we do not cross XPConnect boundaries.
> 
> > I took these out because I was initially pretty confused by them.
> > 
> >   if (!isWordBoundary(ToLowerCase(*sourceStart++)))
> >     sourceStart = nextWordBoundary(sourceStart, sourceEnd);
> 
> Although, it's also true that we wanted it factored out because it was in two
> places, that wasn't the only reason.

Sure, I'm happy to put it back in.  Something like

  sourceStart = nextWordBoundary(sourceStart, sourceNext, sourceEnd);

would be clearer to me, since it encapsulates the fact that we always move forward at least one character, and we end up at a word boundary.

Shawn Wilsher :sdwilsh

Comment 48

•

14 years ago

(In reply to comment #47)
>   sourceStart = nextWordBoundary(sourceStart, sourceNext, sourceEnd);
sourceNext would be an [out] param?  I might bicker about the signature a bit once I see it implemented, but I'm fine with this.

Justin Lebar (not reading bugmail)

Assignee

Comment 49

•

14 years ago

Attached patch Patch v5 (obsolete) — Details — Splinter Review

Attachment #474273 - Attachment is obsolete: true

Attachment #484986 - Flags: review?(sdwilsh)

Justin Lebar (not reading bugmail)

Assignee

Comment 50

•

14 years ago

Attached patch Patch v5 - patience diff of SQLFunctions.cpp (obsolete) — Details — Splinter Review

This is a much nicer diff of SQLFunctions.cpp.

Justin Lebar (not reading bugmail)

Assignee

Comment 51

•

14 years ago

Attached patch Interdiff v4.1 -> v5 — Details — Splinter Review

Justin Lebar (not reading bugmail)

Assignee

Comment 52

•

14 years ago

Hm...actually, that patience diff didn't seem to help as much as I'd hoped.

Justin Lebar (not reading bugmail)

Assignee

Comment 53

•

14 years ago

Attached patch Patch v5.1 (nicer diff than v5) — Details — Splinter Review

Clever diffing algorithms are of no use!  I broke down and just re-ordered the functions, like I should have from the beginning.  :)

Attachment #484986 - Attachment is obsolete: true

Attachment #484988 - Attachment is obsolete: true

Attachment #484992 - Flags: review?(sdwilsh)

Attachment #484986 - Flags: review?(sdwilsh)

Shawn Wilsher :sdwilsh

Comment 54

•

14 years ago

Note: I'm probably fine with a part 2 that moves the functions which will then be trivial to review.

Shawn Wilsher :sdwilsh

Comment 55

•

14 years ago

Comment on attachment 484992 [details] [diff] [review]
Patch v5.1 (nicer diff than v5)

>+++ b/toolkit/components/places/src/SQLFunctions.cpp

>+////////////////////////////////////////////////////////////////////////////////
>+//// Anonymous Helpers
>+namespace {
nit: newline between these please

>+  enum FindInStringBehavior {
>+    eFindOnBoundary,
>+    eFindAnywhere
>+  };
Move this above the FindInString method please

>+  static NS_ALWAYS_INLINE const_char_iterator
>+  nextWordBoundary(const_char_iterator start,
>+                   const_char_iterator next,
>+                   const_char_iterator end) {
Places style dictates formatting like this:
/**
 * blah blah blah
 *
 * @param aStart
 *        blah blah blah
 * ...
 * @return blah blah blah
 */
static
NS_ALWAYS_INLINE const_char_iterator
nextWordBoundary(const_char_iterator aStart,
                 const_char_iterator aNext,
                 const_char_iterator aEnd)
{

Additionally, we are not mutating these, so I'd like to make them all const_char_iterator const (so const char* const).

>+    // Skip to the next word boundary if we're at an ASCII letter (i.e.
>+    // [a-zA-Z]).  Otherwise, advance forward one character using the next
>+    // pointer we're passed.
>+
lose the newline since it makes the comment seem detached from the code below it.  This applies in other places too.

>+  // findAnywhere and findOnBoundary do almost the same thing, so it's natural
>+  // to implement them in terms of a single function.  They're both
>+  // performance-critical functions, however, and checking aOnlyOnBoundary
>+  // makes them a bit slower.  Our solution is to define findInString as
>+  // NS_ALWAYS_INLINE and rely on the compiler to optimize out the
>+  // aOnlyOnBoundary check.
>+  static NS_ALWAYS_INLINE bool
>+  findInString(const nsDependentCSubstring &aToken,
>+               const nsACString &aSourceString,
>+               FindInStringBehavior aBehavior)
>+  {
ditto re: formatting

>+    // It's not particularly meaningful to call this function with an empty
>+    // token, but CaseInsensitiveUTF8CharsEqual assumes that there's at least
>+    // one byte left in the both strings, so we should check.
>+    if (aToken.IsEmpty())
>+      return true;
The token can never ever be empty, but the search string can be (I lied in comment 46.  I was thinking about the token, not the source string).  Just assert it is non-empty.

>+    // We cannot match anything if there is nothing to search.
>+    if (aSourceString.IsEmpty())
>+      return false;
nit: brace this please

>+  MatchAutoCompleteFunction::findBeginning(const nsDependentCSubstring &aToken,
>+                                           const nsACString &aSourceString)
>   {
>+    NS_PRECONDITION(!aSourceString.IsEmpty(),
>+                    "Source string should be non-empty.");
As stated above, it's the token we can assert about, not the source string.
nit: move this to the first line of the method please.

r=sdwilsh with these changes

Attachment #484992 - Flags: review?(sdwilsh) → review+

Shawn Wilsher :sdwilsh

Updated

•

14 years ago

Whiteboard: [needs new patch]

Justin Lebar (not reading bugmail)

Assignee

Comment 56

•

14 years ago

There are testcases which push empty source strings through these functions.

See http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1287698308.1287699852.1354.gz

May I change the assertion back to an if, or do you want me to try and fix the testcases?

Shawn Wilsher :sdwilsh

Comment 57

•

14 years ago

Like I said in comment 55 - I was mistaken in comment 46.  The source string can be null, but the token should never be.  Assert about the token, not the source.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

14 years ago

Whiteboard: [needs new patch]

Justin Lebar (not reading bugmail)

Assignee

Comment 58

•

14 years ago

Oh, I see -- I pushed the wrong version to try and ended up being very confused.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

14 years ago

Assignee: nobody → justin.lebar+bug

Justin Lebar (not reading bugmail)

Assignee

Comment 59

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/e7bf870a8bb4

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

neil@parkwaycc.co.uk

Comment 60

•

13 years ago

Comment on attachment 484992 [details] [diff] [review]
Patch v5.1 (nicer diff than v5)

>+    // We can't use StringBeginsWith here, unfortunately.  Although it will
>+    // happily take a case-insensitive UTF8 comparator, it eventually calls
>+    // nsACString::Equals, which checks that the two strings contain the same
>+    // number of bytes before calling the comparator.  This is clearly not what
>+    // we want.
Because changing the case of a character could change the number of bytes?

Justin Lebar (not reading bugmail)

Assignee

Comment 61

•

13 years ago

Yes.  The comment should be more precise!

Shawn Wilsher :sdwilsh

Comment 62

•

13 years ago

(In reply to comment #61)
> Yes.  The comment should be more precise!
rs=me to fix that

Justin Lebar (not reading bugmail)

Assignee

Comment 63

•

13 years ago

Comment updated in cedar:

http://hg.mozilla.org/projects/cedar/rev/3078ed6318fb

(no longer active)

Reporter

Comment 64

•

13 years ago

(In reply to comment #63)
> Comment updated in cedar:
> 
> http://hg.mozilla.org/projects/cedar/rev/3078ed6318fb

And on m-c:

http://hg.mozilla.org/mozilla-central/rev/3078ed6318fb

Profile 14 years ago (no longer active) 6.03 KB, text/plain		Details
v1.0 14 years ago Shawn Wilsher :sdwilsh 15.56 KB, patch	mak : review+ sdwilsh : review-	Details \| Diff \| Splinter Review
Patch v2 14 years ago Justin Lebar (not reading bugmail) 14.29 KB, patch		Details \| Diff \| Splinter Review
Patch v3 14 years ago Justin Lebar (not reading bugmail) 18.33 KB, patch		Details \| Diff \| Splinter Review
Patch v4 14 years ago Justin Lebar (not reading bugmail) 20.48 KB, patch		Details \| Diff \| Splinter Review
Patch v4.1 14 years ago Justin Lebar (not reading bugmail) 20.34 KB, patch	sdwilsh : review-	Details \| Diff \| Splinter Review
Patch v5 14 years ago Justin Lebar (not reading bugmail) 21.29 KB, patch		Details \| Diff \| Splinter Review
Patch v5 - patience diff of SQLFunctions.cpp 14 years ago Justin Lebar (not reading bugmail) 14.11 KB, patch		Details \| Diff \| Splinter Review
Interdiff v4.1 -> v5 14 years ago Justin Lebar (not reading bugmail) 12.28 KB, patch		Details \| Diff \| Splinter Review
Patch v5.1 (nicer diff than v5) 14 years ago Justin Lebar (not reading bugmail) 21.42 KB, patch	sdwilsh : review+	Details \| Diff \| Splinter Review