Closed Bug 725821 Opened 12 years ago Closed 12 years ago

Use an FTS to speed up awesomebar searches

Categories

(Toolkit :: Storage, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: taras.mozilla, Unassigned)

References

()

Details

(Whiteboard: [Snappy:P1])

We currently do a bunch of expensive SQL to power awesomebar searches. A l10n/unicode-friendly FTS would be a huge speedup.
Thunderbird has a version of the porter stemmer that can handle non-ASCII stuff and does bi-gram token generation for CJK stuff:

http://mxr.mozilla.org/comm-central/source/mailnews/extensions/fts3/src/fts3_porter.c
yeah, the first test implementation should likely use the thunderbird tokenizer. If we find issues we may start exploring other possibilities.
Drew already made lots of research in the past and collected notes on those. IIRC in the end he found there was not an actual speed gain, in matches on the beginning of the strings there was maybe also a brief loss. Though here we don't really look at pure speed, rather to a way to avoid doing full table scans to save on IO.
(In reply to Taras Glek (:taras) from comment #0)
> We currently do a bunch of expensive SQL to power awesomebar searches. A
> l10n/unicode-friendly FTS would be a huge speedup.

Which FTS? Where's the data showing this is a win?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #3)
> (In reply to Taras Glek (:taras) from comment #0)
> > We currently do a bunch of expensive SQL to power awesomebar searches. A
> > l10n/unicode-friendly FTS would be a huge speedup.
> 
> Which FTS? Where's the data showing this is a win?

has to be measured, Drew did some measurements in the past finding in the most common cases the perf is the same. Though currently a not-matching query, or a query with late results, has to read all the table and scan through each single row of it, cause there is no index to use. Considered the table is many MB of data it's a lot of IO for each typed char.
and as an added note, practically 90% of Places memory usage is caused by the awesomebar searches
I e-mailed Drew Willcoxon regarding his past experience adding FTS to AwesomeBar, these were his main comments:

- We didn't have a good way to tokenize non-ASCII text, and without a good tokenizer FTS is doomed

- FTS worked shockingly well on whole-word queries, but on prefix queries it was sometimes actually much worse than the full-table, linear scan done by LIKE.  Also, AwesomeBar needs arbitrary substring matching. 

- Sharding didn't make enough difference, especially since we need to find the top dozen or so results ordered by frecency
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.