Open Bug 590376 Opened 10 years ago Updated 4 years ago

Speed up MatchAutoCompleteFunction's functions by lower-casing the token before it hits SQL

Categories

(Toolkit :: Places, defect, P3)

defect

Tracking

()

People

(Reporter: justin.lebar+bug, Unassigned)

References

Details

(Keywords: perf)

Even with the patch in bug 570975, MatchAutoCompleteFunction::findOnBoundary and the case-insensitive UTF8 comparator is a major hotspot.

findOnBoundary spends the vast majority of its time not finding matches.  In particular, the most common operation is:
 * compare the first character of the token we're searching for to a character in the search string
 * notice that they're not identical
 * convert both characters to lower-case
 * compare the lower-cased characters
 * notice that they're not identical.

One way I think we could speed this up is to convert the token to lower-case before we query the database.  Then findOnBoundary could use a custom half-case-insensitive comparator which assumes that one operand is already lowercased.

bz pointed out in bug 145975 comment #59 that this will do the Wrong Thing with some letters, such as the Turkish dotless i, but it appears that we don't handle this properly right now anyway.

Although they're less of a hotspot in my profiles, we could do the same for MatchAutoCompleteFunction::findAnywhere and MatchAutoCompleteFunction::findBeginning.
Keywords: perf
For my reference, some relevant code lives in nsPlacesAutoComplete.js; search for params.searchString.
I'd like to add that we should assert in debug builds that whatever is passed to this is, in fact, lowercase already.
I think it makes sense to convert the search string to UTF-32 before passing it to the autocomplete function.  That way, we can skip parsing its Unicode inside the loop altogether.
(In reply to comment #3)
> I think it makes sense to convert the search string to UTF-32 before passing it
> to the autocomplete function.  That way, we can skip parsing its Unicode inside
> the loop altogether.
Do we even have UTF-32 string classes?
(In reply to comment #4)
> Do we even have UTF-32 string classes?

Not to my knowledge, no.
It looks like I can pass blobs through the SQL parameters, but the JS interface is still magic to me.
(In reply to comment #6)
> It looks like I can pass blobs through the SQL parameters, but the JS interface
> is still magic to me.
I can probably shine light on the magic, but it is unclear what you are trying to do here.
My thought is: When I convert the string the user typed to lower-case, also convert the string to UTF-32.  Pass the lower-case UTF-32 string to the search function.

This way, the search function does no work on the typed string.  The case-insensitive comparison is:

 * Take a character from the UTF-8 string from the database.
 * Convert it to lower-case.  The result is a uint32.
 * Compare that lower-case character to the current character in the UTF-32 search string.
(In reply to comment #6)
> It looks like I can pass blobs through the SQL parameters, but the JS interface
> is still magic to me.
You want bindBlobByName then.
The magic code is in mozStorageAsyncStatementParams, which takes query.params.foo="bar" and calls mStatement->BindByName(foo).

For now, I'm just going to add a function mozIStorageStatementParams::bindStringAsUTF32.  We can try to figure out a cleaner way to do this later.
(In reply to comment #10)
> The magic code is in mozStorageAsyncStatementParams, which takes
> query.params.foo="bar" and calls mStatement->BindByName(foo).
> 
> For now, I'm just going to add a function
> mozIStorageStatementParams::bindStringAsUTF32.  We can try to figure out a
> cleaner way to do this later.
I thought you wanted to bind a blob here, in which case I gave you the method you want to call on a mozIStorageBindingParams object (which a mozIStorageAsyncStatement object is)
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.