Closed Bug 370117 Opened 17 years ago Closed 15 years ago

form autocomplete should sort by frequency of use

Categories

(Toolkit :: Form Manager, enhancement)

enhancement
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla1.9.2a1

People

(Reporter: cragos, Assigned: MattN)

References

()

Details

(Keywords: verified1.9.2)

Attachments

(1 file, 11 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1

The drop down history of text boxes is currently sorted in a case sensitive manner, similar to the UNIX sort command.  However, in Windows, Mac OSX, and most Linux desktop environments, this does not match the normal text sorting methods.

In the short term, it should start sorting without regard to case.

However, in the long term, it would be awesome to be able to sort them by frequency of use.  I have one text box where I've got 20-30 entries in the history, but 80-90% of the time I enter text into that box, I use the same string.  If Firefox recognized this, I wouldn't have to type anything.  It would be at the top of the text box' history list, and available with only two clicks.

Reproducible: Always

Steps to Reproduce:
Click on a frequently used text box, such as the google search box.  Then hit the down arrow or click on it again, to pull up the history sorted in the manner described above.
Actual Results:  
text sorted by case

Expected Results:  
text not sorted by case, and, hopefully, by frequency
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a3pre) Gecko/20070211 Minefield/3.0a3pre
See also Bug 360113.
I entered 6 different addresses in the location bar beginning with an "a" but could not discover an understandable order in the history dropdown.
The autocomplete on the Google page adds search results to the bottom.
I assume for both cases there are existing bugs but I couldn't find them. Maybe the next triager has more luck. 
Summary: Text box history currently sorted by case - should either not be case sensitive or sort by frequency of use → Text box history (autocomplete drop-down) currently sorted by case - should either not be case sensitive or sort by frequency of use
Attached two PNG screenshots, these taken in Ubuntu Linux, Firefox version 1.5.0.9, while GNOME runs as the WM.  So that's two seperate versions of the Fox and two seperate OSes that I've produced the problem in.
https://bugzilla.mozilla.org/show_bug.cgi?id=360113 is tangentially related to this one.  There, the reporter points out that the address bar is already sorted by frequency of use, which means the code's already in place to track such things.

I propose some intrepid and competent soul create the option requested above,
but make it control not only the address bar, but all auto-fill drop down
lists.  Let us choose whether we want all such dialogs sorted by frequency of
use or alphabetical order.
Yeah, when I click 5 times on the same entry it finally moves to top.
Sub-domains are added alphabetically.
I would expect that the last typed or chosen address moved to top. That's why I first didn't see any logic in the order.
Might be an idea for an extension.
We should figure out how to do this, shouldn't be that hard...
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: blocking-firefox3?
OS: Windows XP → All
Hardware: PC → All
Summary: Text box history (autocomplete drop-down) currently sorted by case - should either not be case sensitive or sort by frequency of use → form autocomplete should sort by frequency of use
Whiteboard: [wanted-firefox3]
Target Milestone: --- → Firefox 3 M9
We'd need to start storing how many times a given entry has been used, which would require adding a column to the form history table, I think.
Assignee: nobody → mano
Attached patch patch (obsolete) — Splinter Review
Attachment #278078 - Flags: review?(gavin.sharp)
Flags: blocking-firefox3? → blocking-firefox3+
Whiteboard: [wanted-firefox3]
moving out bugs that don't need to block b1
Target Milestone: Firefox 3 M9 → Firefox 3 M10
Target Milestone: Firefox 3 M10 → Firefox 3 M11
Priority: -- → P5
Gavin, what's up here?
Flags: wanted-firefox3+
Flags: blocking-firefox3-
Flags: blocking-firefox3+
Priority: P5 → P4
Target Milestone: Firefox 3 Mx → Firefox 3 M11
(In reply to comment #11)
> Gavin, what's up here?

I gave some comments to Mano over IRC a little while ago, I should have put them in the bug. I think the autocomplete changes from bug 395739 allows us to fix both this bug and that one in a cleaner way, although I wouldn't be opposed to do it this way for Firefox 3 if it's really wanted. My other comment was about changing the API and using a more efficient SQLite statement, I'll post that log in a sec.
Comment on attachment 278078 [details] [diff] [review]
patch

14:06 <gavin_> it seems like the API we'd rather have is incrementUseCount() and a readonly useCount attr
14:07 <gavin_> and use [07 12:20:25] <Mossop> update <table> set <column>=<column>+1 where ...
14:07 <gavin_> for incrementuseCount()
14:07 <Mano> gavin_: i could see the use case for resetUseCount though
14:07 <gavin_> well, we can add that if we need it
Attachment #278078 - Flags: review?(gavin.sharp) → review-
It can be useful to go up and down in both additive and multiplicative fashion, so the query could be combined if we really wanted to share it.

UPDATE ... use_count * ?1 + ?2 ...

increment: ?1 = 1, ?2 = 1
reset: ?1 = 0, ?2 = 0
rebalance: ?1 = .75, ?2 = 0 (lessen weights of unselected items)
saturating-inc: ?1 = .99, ?2 = 1 (maxes out at 100)

I kinda like the REPLACE INSERT that I've got for bug 395739... It'll either update or create rows when necessary and only if the url exists in moz_places.

// Leverage the PRIMARY KEY (place_id, input) to insert/update entries
"REPLACE INTO moz_inputhistory "
"SELECT h.id, IFNULL(input, ?2), IFNULL(use_count, 0) + 1 "
"FROM moz_places h "
"LEFT JOIN moz_inputhistory ON place_id = h.id AND input = ?2 "
"WHERE h.url = ?1"

But if I'm going to rebalance all the other entries, I probably wouldn't do it one by one though....
Product: Firefox → Toolkit
Attached patch Patch v.1 (obsolete) — Splinter Review
Taking this for the 3.1 code sprint...

This is a fairly trivial change, now that we already store and update a timesUsed column for form data.

We should eventually look at using something closer to a real frecency algorothm (and matching with different field names), but this is a simple, low-risk change that could easily make FF3.1.
Assignee: mano → dolske
Attachment #254786 - Attachment is obsolete: true
Attachment #254787 - Attachment is obsolete: true
Attachment #278078 - Attachment is obsolete: true
Attachment #364012 - Flags: review?(gavin.sharp)
Attachment #364012 - Flags: review?(gavin.sharp) → review-
Comment on attachment 364012 [details] [diff] [review]
Patch v.1

Ugh, so...
timesUsed is not an index.  If it is an index, sorting is going to be faster.  We should check this before we add an index, but my understanding of the optimizer says this is the case.
Talked IRL, and decided that isn't an issue, because we're not really doing any more work than we already do. So it being an index or not shouldn't matter.

But Gavin noted that just a usage-count is probably going to make entries jump around. I think it shouldn't be too hard to add in some real frecency code, so I'll do that.

I haven't looked at how Places calculated things yet, but I'm basically thinking:

* primarily sort by use count
* penalize entries that haven't been used recently
* promote entries that have been used very recently ("I just did this")
* probably bin the above, and then sort alphabetically. (ie, entries with approximately the same frecency should just be alphabetical)

I guess the one thing form history doesn't have that Places does is the adaptive learning parts. Dunno if we need to add that here or not.
Frecency works by calculating things ahead of time. If we assume the number of autocomplete results won't be too many, dynamically sorting might not be too bad.

Like an estimate could be.. ORDER BY (NOW() - lastUse < 1 month) * timesUsed DESC, alphabetically

So only for things used within the month, sort it by its timesUsed value; otherwise sort alphabetically.

I just ran this and it seems to work..

SELECT * FROM moz_formhistory WHERE fieldname="email" ORDER BY (1235600000000000 - lastUsed < 2000000000000) * timesUsed DESC, UPPER(value) ASC

Definitely won't be hitting on an index, but that's all the code change you need -- no need to precompute a frecency value ahead of time.
What about (additionally or instead of Edward's suggestion) sorting by ROUND(timesUsed / 10) so that if you frequently use two or three different values, they don't always appear in an unpredictable place if their timesUsed count happens to be quite close? Alternatively, we could also create logarithmic buckets with ROUND(LOG(timesUsed + 1)) and a log-base to be determined as a poor man's distinction between hardly, seldom, occasionally and often used, which would also improve result stability.
Assignee: dolske → mnoorenberghe
Severity: minor → enhancement
Priority: P4 → --
Target Milestone: mozilla1.9beta3 → mozilla1.9.2
Patch depends on JS implementation in bug 439716
See https://wiki.mozilla.org/Firefox/Namoroka/Improved_form_history for prefs
Feedback would be appreciated on the default pref. values.  Benchmarking & performance testing is still required.
Attachment #378399 - Flags: review?(dolske)
Depends on: 439716
I am currently working on an improved patch that matches based on words instead of exact phrases.
Depends on: 469443
No longer depends on: 439716
Removed other features from last patch that were not directly related to this bug
Attachment #378399 - Attachment is obsolete: true
Attachment #380547 - Flags: review?(dolske)
Attachment #378399 - Flags: review?(dolske)
Justin, if you'd kindly explain what has been addressed here that makes my issue moot? Unless the "frequency of use" variable only counts one time uses, which I don't see anywhere, and seems like a gross misuse of what is trying to be accomplished here...
We're not going to implement chronological-only sorting (as you suggested in 474880), but are implement frecency based sorting (a blend of frequent and recent, which has been hugely useful in the awesomebar). That's close enough to dupe it over.
No, it's not.
Attached patch Patch v.4 (with unit tests) (obsolete) — Splinter Review
Attachment #380547 - Attachment is obsolete: true
Attachment #385276 - Flags: review?(dolske)
Attachment #380547 - Flags: review?(dolske)
Attachment #364012 - Attachment is obsolete: true
FYI, for Weave, we're using a "frecency" like value to pick the "top 200" form entries to sync.

I've got it set up to normalize timesUsed to a value from 0 to 1 and a lastUsed from 0 to 1. I multiply the two values together for each form entry and sort by that descending.

From bug 494952 comment 10:

SELECT * FROM moz_formhistory ORDER BY 1.0 * (lastUsed - (SELECT lastUsed FROM
moz_formhistory ORDER BY lastUsed ASC LIMIT 1)) / ((SELECT lastUsed FROM
moz_formhistory ORDER BY lastUsed DESC LIMIT 1) - (SELECT lastUsed FROM
moz_formhistory ORDER BY lastUsed ASC LIMIT 1)) * timesUsed / (SELECT timesUsed
FROM moz_formhistory ORDER BY timesUsed DESC LIMIT 1) DESC LIMIT 200;

;)

Which is more readable form..

SELECT * FROM moz_formhistory ORDER BY 1.0 * (lastUsed - minLast) / (maxLast -
minLast) * timesUsed / minTimes DESC LIMIT 200
Comment on attachment 385276 [details] [diff] [review]
Patch v.4 (with unit tests)

>diff --git a/modules/libpref/src/init/all.js b/modules/libpref/src/init/all.js

> // Satchel (Form Manager) prefs
>-pref("browser.formfill.enable",     true);
>-pref("browser.formfill.debug",      false);
>-
>+pref("browser.formfill.bucketSize",       5);
>+pref("browser.formfill.debug",            false);
>+pref("browser.formfill.enable",           true);
>+pref("browser.formfill.maxTimeGroupings", 14);
>+pref("browser.formfill.timeGroupingSize", 604800);
>+pref("browser.formfill.timesUsedScale",   10);

Don't make these alphabetical, better to group by functionality here.

I think we eventually don't want these prefs, they'll just be hardcoded constants in the JS. But let's go ahead and leave these in for a short period of time, so that if people find awesomecomplete isn't working well, they can tweak them. And file a followup bug to remove these prefs.


>-    _prefBranch  : null,
>-    _enabled : true,  // mirrors browser.formfill.enable preference
>-    _debug   : false, // mirrors browser.formfill.debug
>+    _prefBranch        : null,
>+    _bucketSize        : 5,
>+    _debug             : false, // mirrors browser.formfill.debug
>+    _enabled           : true,  // mirrors browser.formfill.enable preference
>+    _maxTimeGroupings  : 14,
>+    _timeGroupingSize  : 7 * 24 * 60 * 60 * 1000 * 1000,
>+    _timesUsedScale    : 10,

Ditto for grouping.

>+        let query = "SELECT value, " +
>+                    "ROUND( " +
>+                        "CAST(timesUsed as REAL) / :timesUsedScale * " +
>+                        "timesUsed / max(1.0, (lastUsed - firstUsed) / :timeGroupingSize) * " +
>+                        "max(1.0, :maxTimeGroupings - (:now - lastUsed) / :timeGroupingSize) / "+
>+                        ":bucketSize "+
>+                    ") AS frecency " +

As discussed... I think we can remove the "CAST(timesUsed as REAL) / :timesUsedScale" term from the query. The second term is already weighting for timesUsed (and scaling it to account for age), so having it twice doesn't seem necessary. It also result in it scaling by the square of timesUsed, which seems undesirable. The timesUsed scaling will still be unbounded, whereas the age scale is capped to 14, so timesUsed will still tend to be the dominant term (which does seem desirable).

I think we should also add another term here, to give a special bonus to items with firstUsed < expirationTime. Long-lived entries that have defeated expiration seem more likely to be useful that random recent data.

>-            while (stmt.step())
>-                values.push(stmt.row.value);
>+            while (stmt.step()) {
>+                let rowVal = stmt.row.value;
>+                if (this._debug)
>+                    rowVal += " (" + stmt.row.frecency + ")";
>+                this.log(rowVal);
>+                values.push(rowVal);

Don't change this block.

Altering rowVal in debug mode makes it hard to use form history (because we'll actually autocomplete that value, so you have to keep deleting it). This is a mildly hot loop, so even the this.log() call has some overhead. And I don't want the debug spew in my console. :)


>+++ b/toolkit/components/satchel/test/unit/test_autocomplete.js
...
>+        // ===== Tests with constant timesUsed and varying lastUsed date =====
>+
>+        var now = 1000 * Date.now();
>+        for (let i = 0; i < maxTimeGroupings; i++) {
>+            let useDate = now - (i * timeGroupingSize);
>+            fh.DBConnection.executeSimpleSQL(
>+              "INSERT INTO moz_formhistory "+
>+              "(fieldname, value, timesUsed, firstUsed, lastUsed) " +
>+              "VALUES ("+
>+                  "'field1', " +
>+                  "'value" + i + "', " +
>+                  timesUsedScale * bucketSize * timeGroupingSize + ", " +
>+                  useDate + ", " +
>+                  useDate +
>+              ");");
>+        }

Why not just set timesUsed to "1" here?

I think it would be better to have useDate increasing with i, so that it's going against the expected alphabetical order. Actually, it would be better to add twice as many entries, to test that entries within a bucket are sorted alphabetically.

Eg, results would look like:

   [ v        ]
    | valueE |
    | valueF |
    | valueC |
    | valueD |
    | valueA |
    | valueB |
    ----------

Where valueA is added 1st with oldest time, valueB 2nd with a slightly younger time (but same bucket), etc.

>+        for (let i = 0; i < maxTimeGroupings; i++) {
>+            do_check_true(parseInt(results.getValueAt(i).substr(5)) == ++lastFound);
>+        }

These kind of tests would be clearer as:

do_check_eq(results.getValueAt(i), "value"+x);

That way if unexpected stuff comes in, the test failure will be explicit about what was actually received.

Or, instead of computing x (since the values jump around), just add a big block of

do_check_eq(results.getValueAt(1), "valueA");
do_check_eq(results.getValueAt(2), "valueB");
do_check_eq(results.getValueAt(3), "valueC");

Cut'n'paste is cheap here. :)

>+        // ===== Tests with constant use dates and varying timesUsed =====

Similar comments for the tests in this section.

Also add a test for having some "senior citizen" entries (firstUsed > expireTime), to make sure they're weighted higher.

Also, do all of the existing satchel tests (unit test, mochitest) pass with these changes?
Attachment #385276 - Flags: review?(dolske) → review-
Yes, unit and mochitests pass
Attachment #385276 - Attachment is obsolete: true
Attachment #389058 - Flags: review?(dolske)
Attached patch Patch v.5 (pref. changes) (obsolete) — Splinter Review
Attachment #389058 - Attachment is obsolete: true
Attachment #389173 - Flags: review?(dolske)
Attachment #389058 - Flags: review?(dolske)
Comment on attachment 389173 [details] [diff] [review]
Patch v.5 (pref. changes)


>+        this._initFormExpiryDays();

Move this down with the other initial pref setup.

It would be clearer to just have this._expireDays = this._getFormExpireDays().


>+        let query = "SELECT value, " +
>+                    "ROUND( " +
>+                        "(1.0 + :agedWeight / 100.0 * (firstUsed < :expiryDate)) * " +

The agedWeight usage is weird, because it seems to imply it scales from 0-100, but ends up becoming 1-2.

Just use "max(1.0, :agedWeight * (firstUsed < :expiryDate))"?

And move towards the end of the statement, since the next 2 parts are really the key frecency bits.

>+                        "max(1.0, :maxTimeGroupings - max(0, :now - lastUsed) / :timeGroupingSize) / "+

The inner max(0) shouldn't be needed, if you somehow have entries from the future, they'll just get a bigger bonus. Not worth special casing, I think.

IE, just "max(1.0, :maxTimeGroupings - (:now - lastUsed) / :timeGroupingSize)"

>+                    ") AS frecency " +

It would be good to add a comment right above the statement, briefly explaining the parts/rationale that makes up the the frecency calculation.


>+    _initFormExpiryDays : function () {
>+        if (this._expireDays)
>+            return this._expireDays;

This doesn't need to act like a smart getter, since it's only called once.

>+        this._prefsBranch = Cc["@mozilla.org/preferences-service;1"].
>+                                getService(Ci.nsIPrefBranch);

"Cc" and "getService" should be aligned vertically.

>+        if (this._prefsBranch.prefHasUserValue("browser.formfill.expire_days")) {
>+            this._expireDays = this._prefsBranch.getIntPref("browser.formfill.expire_days");
>+        } else {
>+            this._expireDays = this._prefsBranch.getIntPref("browser.history_expire_days");
>+        }
>+        return this._expireDays;

No { } braces needed since each block is 1 line. Just return the value directly.

  if (x)
      return ...
  else
      return ...


>+function padLeft(number, length) {
>+    var str = number + '';
>+    while (str.length < length) {
>+        str = '0' + str;
>+    }

Nit: no braces needed


>+function getFormExpiryDays () {
>+    let expireDays;
>+    if (prefs.prefHasUserValue("browser.formfill.expire_days")) {
>+        expireDays = prefs.getIntPref("browser.formfill.expire_days");
>+    } else {
>+        expireDays = prefs.getIntPref("browser.history_expire_days");
>+    }
>+    return expireDays;
>+}

Similar comments as above...


>+        fh = Cc["@mozilla.org/satchel/form-history;1"].
>+              getService(Ci.nsIFormHistory2);
>+        fac = Cc["@mozilla.org/satchel/form-autocomplete;1"].
>+                getService(Ci.nsIFormAutoComplete);
>+        prefs = Cc["@mozilla.org/preferences-service;1"].
>+                  getService(Ci.nsIPrefBranch);

Nit: "Cc" and "getService" should be aligned.


>+        for (let i = 0; i < numRecords; i+=2) {
>+            let useDate = now - (i/2 * bucketSize * timeGroupingSize);
>+            fh.DBConnection.executeSimpleSQL(

Wrap this loop with a transaction, it should run faster.


>+        for (let i = 0; i < timesUsedSamples; i++) {
>+            let timesUsed = (timesUsedSamples - i);
>+            fh.DBConnection.executeSimpleSQL(

Ditto.

>+        // ===== 6 =====
>+        // Check search result ordering with empty search term
>+        testnum++;
>+        results = fac.autoCompleteSearch("field2", "v", null);

Wrong comment, search string is "v". :)
Attachment #389173 - Flags: review?(dolske) → review-
Attached patch v.6 address review comments (obsolete) — Splinter Review
Attachment #389173 - Attachment is obsolete: true
Attachment #390120 - Flags: review?(dolske)
Attachment #390120 - Flags: review?(dolske) → review+
Comment on attachment 390120 [details] [diff] [review]
v.6 address review comments

>+         * see https://wiki.mozilla.org/Firefox/Namoroka/Improved_form_history for more details

URLs tend to go stale / drift out of sync with the code, remove.

>+    _getFormExpiryDays : function () {
>+        let prefsBranch = Cc["@mozilla.org/preferences-service;1"].
>+                            getService(Ci.nsIPrefBranch);

Nit: align Cc/getService.
Attached patch Patch v.6 final (fixed nits) (obsolete) — Splinter Review
Attachment #390120 - Attachment is obsolete: true
Keywords: checkin-needed
Attachment #390540 - Attachment is obsolete: true
http://hg.mozilla.org/mozilla-central/rev/913aaa2f38d3
Status: NEW → RESOLVED
Closed: 15 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: mozilla1.9.2 → mozilla1.9.2a1
Flags: in-testsuite+
No longer blocks: 360113
verified with: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2b1pre) Gecko/20090925 Namoroka/3.6b1pre
Status: RESOLVED → VERIFIED
Flags: in-litmus?
Keywords: verified1.9.2
This order is totally stupid. How to find the correct e-mail address or key-words if not in alphabetical order? :-|

Other people request the update of extension Searchbar Autocomplete Order for Firefox 3.6: https://addons.mozilla.org/en-US/firefox/addon/7826 or bug 418343


I can't figure the persons creating this regression really use a browser... All other browsers are in alphabetical order. Please don't lose new users, it's already difficult to convince them.
(In reply to comment #42)
> This order is totally stupid. How to find the correct e-mail address or
> key-words if not in alphabetical order? :-|

You usually type them in :)

Additionally the list is narrowed once you start typing. In my opinion the list is not for "searching key-words" but for filling out forms with your most used words.
(In reply to comment #43)
> In my opinion the list is not for "searching key-words" but for filling
> out forms with your most used words.



The Firefox search engine (ctrl+K) is in the same order. It is for key words.

And the forms are not only for name/address, most of the time they are for words, title, URL (eg Digg), companies (financial sites), search, etc...


I guess you use more often search engines than put your name in a form.
removing in-litmus flag, it no longer exists
Flags: in-litmus?
Depends on: 1325603
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: