Autocomplete should match any part of the string

VERIFIED FIXED in mozilla1.9.2a1

Status

()

Toolkit
Form Manager
P3
enhancement
VERIFIED FIXED
10 years ago
2 years ago

People

(Reporter: Mike Linksvayer, Assigned: MattN)

Tracking

(Blocks: 1 bug, {verified1.9.2})

unspecified
mozilla1.9.2a1
verified1.9.2
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 8 obsolete attachments)

1.92 KB, application/xml
Details
98.55 KB, patch
Details | Diff | Splinter Review
(Reporter)

Description

10 years ago
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9) Gecko/2008061015 Firefox/3.0
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9) Gecko/2008061015 Firefox/3.0

The search bar seems to match previous searches only from the beginning of the string, e.g., if I've previously searched for "perl program" and "programming perl" typing "perl" would only match the first.

It would be great if the search bar matched text entered against text anywhere in previous search strings.

Reproducible: Always

Steps to Reproduce:
1. From search bar, search for "python program"
2. From search bar, search for "programming in python"
3. From search bar, type "python" 
Actual Results:  
only "python program" appears in list of previous searches.

Expected Results:  
Both "python program" and "programming in python" should appear in previous searches list.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Hardware: PC → All
Yeah, we should do this when we start making autocomplete smart.
Priority: -- → P3
Summary: search bar should find previous searches as location bar finds visited pages: not only beginning of string → search bar autocomplete should match any part of the string
Target Milestone: --- → Firefox 3.2a1
I'll have a patch shortly for satchel that will make this work
Assignee: nobody → mnoorenberghe
Status: NEW → ASSIGNED
Component: Search → Form Manager
Depends on: 469443
Product: Firefox → Toolkit
QA Contact: search → form.manager
Summary: search bar autocomplete should match any part of the string → Autocomplete should match any part of the string
Target Milestone: Firefox 3.6a1 → mozilla1.9.2a1
Created attachment 381348 [details] [diff] [review]
v.1 with word boundary bonuses (WIP to apply to 370117)

Working patch with word boundary bonuses but does not re-use previous search results since word boundary bonuses need to be re-calculated every time the input text is changed.
Attachment #381348 - Flags: review?(dolske)
Created attachment 381407 [details] [diff] [review]
 v.1.1 with preference for prefix matching bonus (WIP)

Patch applies to patch on bug 370117. See https://wiki.mozilla.org/Firefox/Namoroka/Improved_form_history for information about prefs in browser.formfill.*.
Attachment #381348 - Attachment is obsolete: true
Attachment #381407 - Flags: review?(dolske)
Attachment #381348 - Flags: review?(dolske)
Created attachment 386654 [details] [diff] [review]
v.2 use previous results

I have added a unit test to check the performance of satchel's autocomplete search.  When comparing the implementation in bug 469443 to the implementation in this patch I came up with the following results for search time:

Search with 30000 entries for field	Old	New time (ms)
"" w/o previous result			1392	2713
"r" w/o previous result			149	266
"re" w/o previous result		123	806
"rea" w/o previous result		74	216
"rea" with "re" previous result		99	158
"real" with "rea" previous result	8	16
"real" with "re" previous result	105	94
"real" w/o previous result		73	157

Search with 1000 entries for field (more realistic)
"" w/o previous result			46	94
"re" w/o previous result		6	30

There are three key outcomes:
1. The SQLite queries are slower now because the query is more complex and returns more records
2. Filtering previous results is faster
3. The first set of data is for a form history database with 30,000 records for the one field I am searching on which is very high and would be extremely rare for the average user.
Attachment #381407 - Attachment is obsolete: true
Attachment #386654 - Flags: review?(dolske)
Attachment #381407 - Flags: review?(dolske)
Created attachment 387257 [details]
moz_formhistory DBMonster Schema

I used a tool called DBMonster (http://dbmonster.kernelpanic.pl/) to generate the test databases with random English words.  Attached is the XML schema file for the moz_formhistory table.  Date column ranges need to be updated before using the schema file.  The value field contains random English words with a maximum length of 30.

Set dbmonster.jdbc.url to the path to formhistory.sqlite in dbmonster.properties:
dbmonster.jdbc.url=jdbc:sqlite:/Users/mozilla/profiles/my_profile/formhistory.sqlite
Can you update this patch after updating your patch to bug 370117? [It doesn't apply cleanly right now, so might as well kill two birds with one stone...]
Created attachment 389174 [details] [diff] [review]
v.3 (fix bitrot)
Attachment #386654 - Attachment is obsolete: true
Attachment #389174 - Flags: review?(dolske)
Attachment #386654 - Flags: review?(dolske)
Comment on attachment 389174 [details] [diff] [review]
v.3 (fix bitrot)


>-        if (aPreviousResult &&
>-                aSearchString.substr(0, aPreviousResult.searchString.length) == aPreviousResult.searchString) {
>+         if (aPreviousResult && aPreviousResult.searchString.length != 1 &&
>+                 aSearchString.indexOf(aPreviousResult.searchString) >= 0) {

The indexOf() bit isn't immediately obvious... Add a comment noting that we'll reuse the previous search when the new search result will be a subset of the old result.

>+             let tempEntries = result.wrappedJSObject.entries;

s/tempEntries/entries/.

>+             entryLoop:
>             for (let i = result.matchCount - 1; i >= 0; i--) {

Use tempEntries.length instead of matchCount.

>-                let match = result.getValueAt(i);
>+                 let match = tempEntries[i].text;

"match" is only used once, but tempEntries[i] is used repeatedly. Better to have a "let entry = entries[i]", and just use entry.text instead of "match".

>+                 for (var token in searchTokens) {
>+                     // Remove results that do not contain the token
>+                     // XXX bug 394604 -- .toLowerCase can be wrong for some intl chars
>+                     if (match.toLowerCase().indexOf(searchTokens[token]) < 0) {

This would be better as "for each (let token in searchTokens)" and "...indexOf(token)".

>+                         this.log("Removing autocomplete entry '" + match + "'");
>+                         tempEntries.splice(i, 1);

Hmmmm...

An alternate approach to using splice() would be create a new array, and only .push() entries that we want to leave on. I'm not sure what the cost of splice() is, might be good to ask the JS guys. I think shaver made it fast in FF3.0...

>+                 // re-calculate scores on remaining items
>+                 tempEntries[i].totalScore = this._calculateScore(tempEntries[i], aSearchString, searchTokens);

Just update .totalScore in _calculateScore(), no need to return it.

The comment seems superfluous (and maybe slightly wrong). Maybe s/calculateScore/updateEntryWeight/?


>+                 if (this._debug) // XXX update score in text when debugging
>+                     tempEntries[i].text =
>+                         tempEntries[i].text.replace(/\(.+\)$/, "(" + tempEntries[i].totalScore + ")");

Nuke this, since it makes using form history in debug mode difficult.

>+        let trimmedSearchLength = searchString.trim().length;

Can we just do "searchString = searchString.trim()" here? I'm not sure having the whitespace is useful, though maybe I'm missing something.

>+         // if only one character typed, only do prefix matching
>+         if (trimmedSearchLength > 1) {

Nit: make the comment match the block... "Only do substring matching when more than 1 character is typed"

>+             // for each word, calculate word boundary weights and add word to WHERE clause of SQL
>+             for (var i in searchTokens) {

I think this should just be a plain "for(i=0, i < searchTokens.length; i++)" loop, to make it crystal clear that the SQL string you're building just contains inserted numbers, and not the actual search string (which would be unsafe).

>+                 weightCalc += "(value LIKE :tokenBegin" + i + " ESCAPE '/') + " +
>+                                     "(value LIKE :tokenBoundary" + i + " ESCAPE '/') + ";
>+                 where += "AND (value LIKE :tokenContains" + i + " ESCAPE '/') ";
>+             }
>+             // add more weight if we have a traditional prefix match and
>+             // multiply boundary bonuses by boundary weight
>+             weightCalc += ":prefixWeight * (value LIKE :valuePrefix ESCAPE '/')";
>+             params.prefixWeight = this._prefixWeight;

>+             weightedBonus = "* (1 + (" + this._boundaryWeight + " / 100.0) * (" + weightCalc + "))";

Set weightCalc to this value at the top of the block, so it's just built up sequentially. Shouldn't need separate "weightedBonus" and "weightCalc" vars.

Should probably have a brief comment at the top of the block explaining the approach here, too.

>+         } else if (trimmedSearchLength == 1) {
>+             where = "AND (value LIKE :valuePrefix ESCAPE '/') ";
>+         }

I'd add an else{} here to explicitly deal with trimmedSearchLength == 0, so it's clear that |where| will be an empty string and that's intentional.


>+         let query = "SELECT value, " +
>+                     "ROUND( " +
>+                         "(1.0 + :agedWeight / 100.0 * (firstUsed < :expiryDate)) * " +
>+                         "timesUsed / MAX(1.0, (lastUsed - firstUsed) / :timeGroupingSize) * " +
>+                         "MAX(1.0, :maxTimeGroupings - (:now - lastUsed) / :timeGroupingSize) / "+
>+                         ":bucketSize "+
>+                     ", 3) AS frecency, " +
>+                     "ROUND( " +
>+                         "(1.0 + :agedWeight / 100.0 * (firstUsed < :expiryDate)) * " +
>+                         "timesUsed / MAX(1.0, (lastUsed - firstUsed) / :timeGroupingSize) * " +
>+                         "MAX(1.0, :maxTimeGroupings - (:now - lastUsed) / :timeGroupingSize) / "+
>+                         ":bucketSize "+
>+                         weightedBonus +

It seems suboptimal to be computing the frecency twice in the statement...

Can statements refer to previous values? EG, "... AS frecency, ROUND(frecency + ..."

If not, I'd compute the boundary bonus as an independent item, and just have the JS code set "entry.totalScore = row.frecency + row+boundaryBonus"?


>+             if (trimmedSearchLength == 1)
>+                 stmt.params.valuePrefix = stmt.escapeStringForLIKE(searchString.trim(), "/") + "%";
>+             else if (trimmedSearchLength > 0)
>+                 stmt.params.valuePrefix = escapedSearch + "%";

This seems odd, because the else would contain the first condition too. Is > 1 what's meant here? What about length == 0? Add that as an explicit else block, as above?

>+             if (trimmedSearchLength > 1)
>+                 for (var i in searchTokens) {
>+                     stmt.params["tokenBegin" + i] = searchTokens[i] + "%";
>+                     stmt.params["tokenBoundary" + i] =  "% " + searchTokens[i] + "%";
>+                     stmt.params["tokenContains" + i] = "%" + searchTokens[i] + "%";
>+                 }

If-block needs braces around the for-block, since although there's just 1 statement in the block it's a multiline statement.


>+    _calculateScore : function (entry, aSearchString, searchTokens) {
>+        let weightCalc = 0;
>+        // for each word, calculate word boundary weights
>+        for (var token in searchTokens) {

Use "for each" here too.

>+            weightCalc += (entry.text.toLowerCase().indexOf(searchTokens[token]) === 0);

== 0, not === 0.

>+        // now add more weight if we have a traditional prefix match and
>+        // multiply boundary bonuses by boundary weight
>+        weightCalc += this._prefixWeight * (entry.text.toLowerCase().indexOf(aSearchString.toLowerCase()) === 0);
== 0, not === 0.


Will look at tests in more detail in next review pass.
Attachment #389174 - Flags: review?(dolske) → review-
Created attachment 390327 [details] [diff] [review]
v.4 (push instead of splice reused + other review comments)

(In reply to comment #9)
> (From update of attachment 389174 [details] [diff] [review])
> >+                         this.log("Removing autocomplete entry '" + match + "'");
> >+                         tempEntries.splice(i, 1);
> 
> Hmmmm...
> 
> An alternate approach to using splice() would be create a new array, and only
> .push() entries that we want to leave on. I'm not sure what the cost of
> splice() is, might be good to ask the JS guys. I think shaver made it fast in
> FF3.0...

From a quick benchmark, pushing entries into a new array is much more efficient than splicing for large arrays.  Since we generally remove more entries than we keep with every keystroke it makes sense to switch to .push()


> >+        let trimmedSearchLength = searchString.trim().length;
> 
> Can we just do "searchString = searchString.trim()" here? I'm not sure having
> the whitespace is useful, though maybe I'm missing something.

Currently a trailing space affects the prefix match calculation so that if a user enters "fat " (trailing space) for example, existing entries that have "fat " as a prefix (ie. "fat cat") would get a bonus whereas "fattest cow" would not.
 
> >+         let query = "SELECT value, " +
> >+                     "ROUND( " +
> >+                         "(1.0 + :agedWeight / 100.0 * (firstUsed < :expiryDate)) * " +
> >+                         "timesUsed / MAX(1.0, (lastUsed - firstUsed) / :timeGroupingSize) * " +
> >+                         "MAX(1.0, :maxTimeGroupings - (:now - lastUsed) / :timeGroupingSize) / "+
> >+                         ":bucketSize "+
> >+                     ", 3) AS frecency, " +
> >+                     "ROUND( " +
> >+                         "(1.0 + :agedWeight / 100.0 * (firstUsed < :expiryDate)) * " +
> >+                         "timesUsed / MAX(1.0, (lastUsed - firstUsed) / :timeGroupingSize) * " +
> >+                         "MAX(1.0, :maxTimeGroupings - (:now - lastUsed) / :timeGroupingSize) / "+
> >+                         ":bucketSize "+
> >+                         weightedBonus +
> 
> It seems suboptimal to be computing the frecency twice in the statement...
> 
> Can statements refer to previous values? EG, "... AS frecency, ROUND(frecency +
> ..."

Unfortunately, SELECT expressions cannot refer to aliases but I have changed the query to calculate the parts individually and just order by the product.

> 
> If not, I'd compute the boundary bonus as an independent item, and just have
> the JS code set "entry.totalScore = row.frecency + row+boundaryBonus"?

To clarify, the values are multiplied:
"entry.totalScore = row.frecency * row.boundaryBonus"
Attachment #389174 - Attachment is obsolete: true
Attachment #390327 - Flags: review?(dolske)
Comment on attachment 390327 [details] [diff] [review]
v.4 (push instead of splice reused + other review comments)

>+            let entries = result.wrappedJSObject.entries;
>+            let currentEntries = [];

Nit: currentEntries isn't a good name, as it implies it's the, err, current entries (when it's really the new results being filted). Maybe newEntries or filteredEntries?

>+                for each (let token in searchTokens) {
>+                    // Remove results that do not contain the token
>+                    // XXX bug 394604 -- .toLowerCase can be wrong for some intl chars
>+                    if (entry.text.toLowerCase().indexOf(token) < 0) {
>+                        continue entryLoop;
>+                    }
>+                }

A bit more compact:

  // Remove results that do not contain the token
  // XXX bug 394604 -- .toLowerCase can be wrong for some intl chars
  if(searchTokens.some(function (tok) entry.text.toLowerCase().indexOf(tok) < 0))
      continue

Eliminates the need for the entryLoop label, too.

>+                // re-calculate score
>+                this._calculateScore(entry, aSearchString, searchTokens);

I guess no need for the comment at all now. :)

>+            result.wrappedJSObject.entries.sort(this._sortBytotalScore);

This is the only place the sort callback is used. Make it a function local autoCompleteSearch(), or maybe as an anonymous inline...

>+        // only do substring matching when more than one character is typed
>+        if (trimmedSearchLength > 1) {

Move "let where, weightCalc" to right after the comment.

>+            weightCalc += ":prefixWeight * (value LIKE :valuePrefix ESCAPE '/')";
>+            params.prefixWeight = this._prefixWeight;
>+            weightCalc += "))";

The last weightCalc part can just go in the string 2 lines up.

Also, make sure you're adding trailing spaces where needed. You're generating a final query that looks like "...blah()blah...", which is probably only working because the parser is ok with no space after a paren?

>+        } else if (trimmedSearchLength == 1) {
>+            where = "AND (value LIKE :valuePrefix ESCAPE '/')";
>+        } else {
>+            where = "";
>+        }

Explicitly set weightCalc to 1 in these two blocks, so it's clear that it's not really being used.

Let's also rename weightCalc / weightedBonus to be more specific that it's the word boundary bonus. boundaryCalc / boundaryBonus, maybe?

>-                        "timesUsed / max(1.0, (lastUsed - firstUsed) / :timeGroupingSize) * " +
>-                        "max(1.0, :maxTimeGroupings - (:now - lastUsed) / :timeGroupingSize) * "+
>-                        "max(1.0, :agedWeight * (firstUsed < :expiryDate)) / " +
>+                        "timesUsed / MAX(1.0, (lastUsed - firstUsed) / :timeGroupingSize) * " +
>+                        "MAX(1.0, :maxTimeGroupings - (:now - lastUsed) / :timeGroupingSize) * "+
>+                        "MAX(1.0, :agedWeight * (firstUsed < :expiryDate)) / " +

Hmm, it would be better to s/max/MAX/ in your other patch which is adding these.

>+            if (trimmedSearchLength > 1) {
>+                stmt.params.valuePrefix = stmt.escapeStringForLIKE(searchString, "/") + "%";
>+                for (var i in searchTokens) {

let is the new var! Change the other "var"s in this patch to "let"s.

This is also another spot that should be a plain "for(i=0...)" loop.

>+                    stmt.params["tokenBegin" + i] = searchTokens[i] + "%";
>+                    stmt.params["tokenBoundary" + i] =  "% " + searchTokens[i] + "%";
>+                    stmt.params["tokenContains" + i] = "%" + searchTokens[i] + "%";

These all need to use stmt.escapeStringForLIKE() before adding "%".

>+            } else if (trimmedSearchLength == 1) {
>+                stmt.params.valuePrefix = stmt.escapeStringForLIKE(searchString.trim(), "/") + "%";
>+            } else {
>+                // no addional params need to be substituted into the query when the length is zero
>+            }

In the final else, stmt.params.valuePrefix will be null, no? Just set that outside the trimmedSearchLength checks, and I guess the last two blocks can simply go away if they're both empty.

>+    _calculateScore : function (entry, aSearchString, searchTokens) {
...
>+        weightCalc += this._prefixWeight * (entry.text.toLowerCase().indexOf(aSearchString.toLowerCase()) == 0);
>+        entry.totalScore = Math.round(entry.frecency * (1 + (this._boundaryWeight / 100.0) * weightCalc));

Can you clean up the line wrapping here to be ~80 column sane?


>+++ b/toolkit/components/satchel/test/unit/perf_autocomplete.js

>+        fh = Cc["@mozilla.org/satchel/form-history;1"].
>+              getService(Ci.nsIFormHistory2);
>+        fac = Cc["@mozilla.org/satchel/form-autocomplete;1"].
>+                getService(Ci.nsIFormAutoComplete);
>+        prefs = Cc["@mozilla.org/preferences-service;1"].
>+                  getService(Ci.nsIPrefBranch);
>+

Nit: alignment Cc/getService.

>+    } catch (e) {
>+      throw "FAILED in test #" + testnum + " -- " + e;
>+    }
>+}
>\ No newline at end of file

Add newline. :)
Attachment #390327 - Flags: review?(dolske) → review-
Created attachment 390614 [details] [diff] [review]
v.5 bug fix with test & address review
Attachment #390327 - Attachment is obsolete: true
Attachment #390614 - Flags: review?(dolske)
Attachment #390614 - Flags: review?(dolske) → review-
Comment on attachment 390614 [details] [diff] [review]
v.5 bug fix with test & address review

>+                this.log("Reusing autocomplete entry '" + entry.text + "'");
>+                this._calculateScore(entry, aSearchString, searchTokens);

I think aSearchString should be the trimmed search string here.

Typing "<space>foo" won't be counted as a prefix match to the saved entry "foobar".

The more I think about it, the less I think trailing space (as affected by trim()) is important, and it seems more like a complication that's going to lead to bugs. I think the function argument should be aUntrimmedSearchString, and then set "let searchString = aUntrimmedSearchString.trim()".

We could get a similar effect, though, by having the scoring give a bonus when the search token exactly matches a token in the saved entry's value. This then also works for and token within a multitoken search.

>+            result.wrappedJSObject.entries = filteredEntries;
>+            result.wrappedJSObject.entries.sort(sortBytotalScore);

filteredEntries.sort(sortBytotalScore);
result.wrappedJSObject.entries = filteredEntries;

>+            // update searchResult to reflect if there are no matches left
>+            if (entries.length == 0)
>+                result.wrappedJSObject.searchResult = Ci.nsIAutoCompleteResult.RESULT_NOMATCH;

This should check filteredEntries.length.

Better yet, make .searchResult a getter that returns RESULT_NOMATCH or RESULT_SUCCESS depending on the length of .entries...

Does .defaultIndex need updating here too? I think so. Needs to go to -1 when there's NOMATCH, and should reflect the new position in filteredEntries (if the item is added, or the current filtedEntries.length if it's not).

>             this.log("Creating new autocomplete search result.");
>             let entries = this.getAutoCompleteValues(aInputName, aSearchString);

I think the trimmed search string can be passed in here now.

>+            // build up the word boundary and prefix match bonus calculation
>+            boundaryCalc = "(1 + (" + this._boundaryWeight + " / 100.0) * (";

No /100 scaling.

>+            // for each word, calculate word boundary weights for the SELECT clause and
>+            // add word to the WHERE clause of the query
>+            for (let i = 0; i < searchTokens.length; i++) {
>+                boundaryCalc += "(value LIKE :tokenBegin" + i + " ESCAPE '/') + " +
>+                              "(value LIKE :tokenBoundary" + i + " ESCAPE '/') + ";

>+                where += "AND (value LIKE :tokenContains" + i + " ESCAPE '/') ";

So, I've noticed while running with this patch that short search strings yield lots of irrelevant results because they're matching in the middle of the string (vs at a word boundary).

For now, can we only match on word boundaries and deal with actual substring matches in a separate patch? We'll need to tweak some weights so that they're sometimes still available, but should generally be ordered below the word boundary matches.

>+    _calculateScore : function (entry, aSearchString, searchTokens) {
>+        let boundaryCalc = 0;

Double check that this computes the same weight as the query would? Looks right to me, but just to be sure. :)

Finally, can you rerun the perf numbers with the updated patch to see if they're inline with previous results?
Created attachment 391388 [details] [diff] [review]
v.6 trim search, defaultIndex, bonuses

(In reply to comment #13)
> So, I've noticed while running with this patch that short search strings yield
> lots of irrelevant results because they're matching in the middle of the string
> (vs at a word boundary).
> 
> For now, can we only match on word boundaries and deal with actual substring
> matches in a separate patch? We'll need to tweak some weights so that they're
> sometimes still available, but should generally be ordered below the word
> boundary matches.

As discussed, I tweaked the weighting to move the word boundary matches higher.

> >+    _calculateScore : function (entry, aSearchString, searchTokens) {
> >+        let boundaryCalc = 0;
> 
> Double check that this computes the same weight as the query would? Looks right
> to me, but just to be sure. :)

I tested the calculations with a large set of data and the results were the same.
Attachment #390614 - Attachment is obsolete: true
Attachment #391388 - Flags: review?(dolske)
Comment on attachment 391388 [details] [diff] [review]
v.6 trim search, defaultIndex, bonuses


>+++ b/modules/libpref/src/init/all.js
...
>-pref("browser.formfill.bucketSize",       5);
>+pref("browser.formfill.bucketSize",       1);

FTR, we should file a followup bug to either fix or remove the bucketing, since the current scheme (w/ size > 1) wasn't working well. The ideas we discussed were:
  - Use some kind of non-linear grouping, since the weights were spanning a large range
  - Order the first N results (~10?) by totalweight, and make the rest just alphabetical.

>--- a/toolkit/components/satchel/src/nsFormAutoComplete.js
...
>+        if (aPreviousResult && aPreviousResult.searchString.trim().length > 1 &&
>+            searchString.indexOf(aPreviousResult.searchString.trim()) >= 0) {
>             this.log("Using previous autocomplete result");
>             result = aPreviousResult;
>-            result.wrappedJSObject.searchString = aSearchString;
>+            result.wrappedJSObject.searchString = searchString;

Shouldn't this be set to aUntrimmedSearchString?

Ditto for when creating a new FormAutoCompleteResult.

>+                this.log("Reusing autocomplete entry '" + entry.text + "' (" + entry.frecency +" / " + entry.totalScore + ")");
>+                this._calculateScore(entry, searchString, searchTokens);

Clean up line wrapping, and put the log after the calculateScore() so that it shows the new weight.

Still need to update .defaultIndex here, since replacing .entries will likely make it invalid.

>+            if (entries.length == 0)
>+                result.wrappedJSObject.defaultIndex = -1;

Still need to change this to check filteredEntries.length.

>+            // build up the word boundary and prefix match bonus calculation
>+            boundaryCalc = "MAX(1, :boundaryWeight * (";
...

I think it flows better if you start with the prefixWeight...

  boundaryCalc = "MAX(1, :prefixWeight * (value LIKE :valuePrefix ESCAPE '/') ";
  for (...) {
      boundaryCalc += "+ (value LIKE ...) " +
                      "+ (value LIKE ...) ";
  }
  boundaryCalc += ")";

Which also avoids the need for the dummy "0".

>     if (entries.length > 0) {
>-        this.searchResult = Ci.nsIAutoCompleteResult.RESULT_SUCCESS;
>         this.defaultIndex = 0;
>     }

Remove braces too.
Attachment #391388 - Flags: review?(dolske) → review-
Created attachment 391471 [details] [diff] [review]
v.7 address review comments

(In reply to comment #15)
> >+++ b/modules/libpref/src/init/all.js
> ...
> >-pref("browser.formfill.bucketSize",       5);
> >+pref("browser.formfill.bucketSize",       1);
> 
> FTR, we should file a followup bug to either fix or remove the bucketing, since
> the current scheme (w/ size > 1) wasn't working well. The ideas we discussed
> were:
>   - Use some kind of non-linear grouping, since the weights were spanning a
> large range
>   - Order the first N results (~10?) by totalweight, and make the rest just
> alphabetical.

Filed as bug 507271
Attachment #391388 - Attachment is obsolete: true
Attachment #391471 - Flags: review?(dolske)
Comment on attachment 391471 [details] [diff] [review]
v.7 address review comments

Yay. I went through everything again, and just had a few more final small nits. r+ with these fixed...

>+        let searchString = aUntrimmedSearchString.trim();

I think we should throw a .toLowerCase() in here too, to make sure it isn't missed elsewhere. That also avoids the need for a number of toLowerCase() calls, which means less string creation.

>+            while (stmt.step()) {
>+                let entry = {
>+                    text:       stmt.row.value,
>+                    frecency:   stmt.row.frecency,
>+                    totalScore: Math.round(...)
>+                }
>+                values.push(entry);

Similarly, there are a number of places where we keep checking entry.text.toLowerCase(), so it should be faster to just go ahead and store a textLowerCase property once (again, less string creation).

>             stmt = this._dbCreateStatement(query, params);

Nuts. Now that we're generating different queries depending on the input, it's not such a good idea to cache these statements. Add a 3rd arg to _dbCreateStatement(), and have it not cache the statement when it's set.


> toolkit/components/satchel/test/unit/formhistory_1000.sqlite

FWIW, I think the script you used to create this has a bug... There seem to be quite a few entries with value == "".
Attachment #391471 - Flags: review?(dolske) → review+
(In reply to comment #17)

> Nuts. Now that we're generating different queries depending on the input, it's
> not such a good idea to cache these statements. Add a 3rd arg to
> _dbCreateStatement(), and have it not cache the statement when it's set.

Eh, never mind, you've changed my mind. There are almost always only going to be 1 or 2 token terms in the query; as the user types more we handle the filtering in JS. Creating queries with more token terms isn't very likely to happen, and is still going to be a fairly small total number of statements.
Created attachment 392014 [details] [diff] [review]
v.8 reduce string creation

(In reply to comment #17)
> (From update of attachment 391471 [details] [diff] [review])
> > toolkit/components/satchel/test/unit/formhistory_1000.sqlite
> 
> FWIW, I think the script you used to create this has a bug... There seem to be
> quite a few entries with value == "".

I didn't set the minimum length so it probably defaults to 0 which gives the empty string.  There is also no unique constraint enforced on the name/value columns to stop duplicates at the DB level.
Attachment #391471 - Attachment is obsolete: true
Pushed http://hg.mozilla.org/mozilla-central/rev/0f04a94f8089

Woot! Great job!
Status: ASSIGNED → RESOLVED
Last Resolved: 9 years ago
Flags: in-testsuite+
Resolution: --- → FIXED

Updated

9 years ago
Blocks: 510168
verified with: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2b1pre) Gecko/20090925 Namoroka/3.6b1pre
Status: RESOLVED → VERIFIED
Flags: in-litmus?
Keywords: verified1.9.2
removing in-litmus flag, it no longer exists
Flags: in-litmus?
Depends on: 1325603
You need to log in before you can comment on or make changes to this bug.