Closed Bug 410133 Opened 12 years ago Closed 11 years ago

Create a custom SQL matching/ranking function for functionality and performance

Categories

(Toolkit :: Places, defect)

defect
Not set

Tracking

()

RESOLVED WONTFIX

People

(Reporter: Mardak, Unassigned)

References

Details

Attachments

(2 files, 2 obsolete files)

We're using LIKE for matching titles and urls in the autocomplete, but we don't need the full functionality of LIKE (multiple %, _, escaping, etc) and it doesn't provide enough information (its output is 0 or 1).

Using multiple LIKES on similar patterns to determine "betterness" of matches is wasteful. (%LIKE%, LIKE%, %.LIKE%)

We can implement our own function to do the same functionality and more as well as avoiding unnecessary LIKE code paths.

1) Did the query match the string? (How much used if only partial?)
2) Is the query an exact match on the target? (How matched of the target was it?)
3) Did it match at the beginning of the string? (How close to the front?)
4) Is it at a word boundary? (How close to a boundary?)

This has uses for matching the title, url, as well as adaptive input history.
Attached patch v1 (obsolete) — Splinter Review
In the context of urls..
1) if the title matches, how much did the query match the url
2) typing/modifying a url will prefer the exact match over extra ?query #hash
3) give preference to domains near the front (bug 409895)
4) word boundaries like / and CamelCase (bug 393678)

titles..
1) how much of the query matched the title
2) typing out the full/short title
3) match the part of the title that users see first
4) prefer words after spaces

adaptive..
1) support partial adaptive matches
2) adaptive input are generally be short and can match fully
3) prefer starting matches
Assignee: nobody → edilee
Status: NEW → ASSIGNED
Attachment #294803 - Flags: review?(sspitzer)
Blocks: 410136
Ideas for unit tests..

query: moz
http://moz over http://mac

query: http://site/moz
http://site/moz over http://site/moz?stuff

query: moz
http://site/moz over http://site/other/stuff/moz

query: moz
http://site/TheMoz over http://site/Themoz


query: moz
Mozilla over Mac

query: mozilla
Mozilla over Mozillazine

query: moz
"Mozilla" over "Some blog about Mozilla"

query: mo
"The Mozilla Site" over "Seamonkey"
Blocks: 409023
Attached patch v1.1 (obsolete) — Splinter Review
Added preference for closeness to end boundary as well. So it'll be able to give preference to fully typed words. e.g., "new" better matches "The new stuff" over "Some news".
Attachment #294803 - Attachment is obsolete: true
Attachment #294876 - Flags: review?(sspitzer)
Attachment #294803 - Flags: review?(sspitzer)
These builds contain just patch v1.1 and a patch that uses the rank function to filter and sort results. No adaptive stuff here. This still uses chunking by time, so the first cut is still determined mainly by visit date.

https://build.mozilla.org/tryserver-builds/2007-12-29_21:05-edward.lee@engineering.uiuc.edu-awesome.rankend/
Can you measure how much of a perf impact this will have? Esp w/ the first few chars typed.
Attached patch v1.2Splinter Review
For typing a single char 'p'..
I instrumented the AutoCompleteFullHistorySearch method to print out how long it takes to complete and I averaged over ~75 calls to it.

trunk: 48.19ms

v1.2 and select those with rank 20+: 54.53ms
WHERE (url_rank >= 20 OR title_rank >= 20)

v1.2 and sort by rank: 59.80ms
ORDER BY MAX(url_rank, title_rank) * 10 + url_rank + title_rank + h.typed * 50 DESC
Attachment #294876 - Attachment is obsolete: true
Attachment #295136 - Flags: review?(sspitzer)
Attachment #294876 - Flags: review?(sspitzer)
Attached image screenshot of v1.2
Trunk matches 'p' for httP, so in comparison..
I know this is not the right place to ask but...
What, if any, method can extension developers use to tweak or override the awesomebar results? I think we will see extensions want some kind of way to "influence" the results.
(In reply to comment #8)
> I know this is not the right place to ask but...
> What, if any, method can extension developers use to tweak or override the
> awesomebar results? I think we will see extensions want some kind of way to
> "influence" the results.

Just looking at the code, an extension could implement mozIStorageFunction and supply that as the ranking function instead. Of course, the current patch only uses nsNavHistoryRank, but it'd be easy enough to change it to use the category manager instead.

Also, «if (PRUnichar('A') <= aChar && aChar <= PRUnichar('Z'))» seems like a slightly unwise thing to do. :)

(In reply to comment #9)
> > What, if any, method can extension developers use to tweak or override the
> > awesomebar results? I think we will see extensions want some kind of way to
> > "influence" the results.
> 
> Just looking at the code, an extension could implement mozIStorageFunction and
> supply that as the ranking function instead. Of course, the current patch only
> uses nsNavHistoryRank, but it'd be easy enough to change it to use the category
> manager instead.
> 
> Also, «if (PRUnichar('A') <= aChar && aChar <= PRUnichar('Z'))» seems like a
> slightly unwise thing to do. :)
> 

Daniel-  good idea.

Edward / Seth - can we add something like Daniel suggests? -> Use category support for the mozIStorageFunction)and also clean up some of the code that would be non-extension friendly.
In response to http://ed.agadak.net/2008/01/not-just-awesome and this bug.

I find the Awesome bar in 3.0b2 perfect as is. I don't see any benefit in trying to enhance the search results (see blog entry), which I do not find an enhancement, but rather a regression.

I'm now using the Awesome bar to find matches inside words or word endings, not just at the beginning. I can find very quickly what I'm looking for doing is this way.

For example, when looking for "planet" I can type "pl" which would find all "pl" instances in my URL in addition to all "planet" strings. Actually I'm reading a lot of URL containing "planet".

So to find one "planet" insteat of the other:

- I type "/p" to find all "..../p...." or for me ".../planet". One result.
- I type "t." to find all "....t....." of for me "t.mozilla.org". One result.

If instead I type "pl" I will find both ".../planet" and "planet.mozilla.org" as well as other not so relevant results. I have more precision when searching inside strings.

By the way' I now manage to find almost anything now using only two letters combinations.

So, if this patch does not let me do that kind of two letters search, then I find that this would be a regression.
(In reply to comment #11)
> In response to http://ed.agadak.net/2008/01/not-just-awesome and this bug.
> 
> I find the Awesome bar in 3.0b2 perfect as is.

In further reply (and because I have an account here and not on Edward Lee's blog) I find awesome bar is a regression for many URLs which I could previously type and *know* that FF2.0.x would match from history, e.g.
 
by typing 'n' it would remember news.bbc.co.uk
by typing 'l' it would remember lwn.net
by typing 'i' it would remember ibank.barclays.co.uk

Now I need to type several letters am and still never sure what it is going to match on, sometimes it's parts of the URL, other times it's part of the <TITLE>

Even turning on browser.urlbar.matchOnlyTyped doesn't make it as friendly as it used to be.

I'd appreciate if FF3 did show all the new matches, but show them further down the list than the matches on the leftmost parts of the URL as typed.
I haven't tried myself, but because I'm just adding a custom sql function, an extension would only need to unregister this function and register their own.

See what I did to register a javascript sql callback function for the download manager:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/toolkit/mozapps/downloads/content/downloads.js&rev=1.127&mark=107-111#107
(In reply to comment #13)
> I haven't tried myself, but because I'm just adding a custom sql function, an
> extension would only need to unregister this function and register their own.
> 
> See what I did to register a javascript sql callback function for the download
> manager:

That wouldn't work in this case, because the C++ code does «new nsNavHistoryRank» in order to initialize the sql function. If it obtained the object using a contract id instead, then an extension could register their own implementation under that contract id, essentially replacing your implementation. I only suggest the category manager because it would make it easier for the extension to give the user multiple choices.
I agree with Daniel, you need to create via contract id. If there was a way to pass in the sub-name to the contract id (like autosuggest does) that would make it very easy for extensions to modify the algorithm
edward, see bug #394038 comment #48  and #50 for a potential issue.
Attachment #295136 - Flags: review?(moco)
No longer blocks: 409023
We've switched to processing results in the places code for custom matching. Bug 399213 would cover making it more extensible potentially allowing other sources and filters.
Assignee: edilee → nobody
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Component: Location Bar and Autocomplete → Places
Product: Firefox → Toolkit
QA Contact: location.bar → places
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.