Use an FTS in Places
Categories
(Toolkit :: Places, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox54 | --- | affected |
People
(Reporter: mak, Unassigned)
References
(Depends on 2 open bugs, Blocks 4 open bugs)
Details
(Whiteboard: [fxsearch])
Reporter | ||
Comment 1•9 years ago
|
||
Comment 2•7 years ago
|
||
Reporter | ||
Comment 3•7 years ago
|
||
Comment 4•7 years ago
|
||
Reporter | ||
Comment 5•7 years ago
|
||
Comment 6•7 years ago
|
||
Reporter | ||
Comment 7•7 years ago
|
||
Comment 8•7 years ago
|
||
Comment 9•7 years ago
•
|
||
Ah, so I looked into doing this in mozilla/application-services (e.g. rust-places), and there's a larger issue than tokenizers: FTS queries don't support non-prefix matches.
That is to say, it only supports MATCH_BEGINNING, e.g. if you're searching for 'ample', there's no way to use the FTS index such that it would find 'example'. (exam
would find it though, since it has good support for prefix matches, although they can have some perf issues unless you add a prefix index).
Because of this, it seems like you would need to always have the current logic present as a fallback. It's possible this would still be good enough, if we only use the fallback when the FTS query doesn't return enough results. Still, this would be a degradation in terms of search quality.
It's worth noting that we'd need a custom tokenizer still to implement our notion of 'boundary' also.
Reporter | ||
Comment 10•7 years ago
|
||
(In reply to Thom Chiovoloni [:tcsc] from comment #9)
That is to say, it only supports MATCH_BEGINNING, e.g. if you're searching for 'ample', there's no way to use the FTS index such that it would find 'example'. (
exam
would find it though, since it has good support for prefix matches, although they can have some perf issues unless you add a prefix index).Because of this, it seems like you would need to always have the current logic present as a fallback. It's possible this would still be good enough, if we only use the fallback when the FTS query doesn't return enough results. Still, this would be a degradation in terms of search quality.
It's worth noting that we'd need a custom tokenizer still to implement our notion of 'boundary' also.
Yeah, I'm not sure in general the value of matching anywhere is so critical... For our use cases matching on boundaries may be enough and it's the default behavior. Surely to have proper boundaries we need a proper tokenizer, for urls for example.
Comment 11•7 years ago
|
||
As more Thunderbird use-case context, Thunderbird's global search autocomplete logic used to build[1] suffix trees[2] built against a preloaded set of the user's more popular contacts and tags/labels so that you could type "drew" and match "Andrew" even though the FTS index would never find that.
While it would be possible to build a suffix-tree just from the most "frecent" data, it should also be possible to use the fts5vocab virtual table[3] to periodically extract a "likely terms" list from the database and build a suffix-tree over those terms in order to use that as a basis for FTS queries for those discovered terms.
I should also note that Thunderbird's plan going forward had been to engage in proactive database sharding, partitioning databases based on both time and frecency-esque factors which were already used for ordering results lists. That way we would query a smaller set of data that was much more likely to contain what we'd display at the top of the results list, and then only continue sequentially opening and searching the other databases until we obtained enough results or reached an arbitrary threshold before showing a "search more" affordance.
1: It looks like autocomplete is broken on Ubuntu's current Thunderbird build, presumably due to platform changes.
2: https://en.wikipedia.org/wiki/Suffix_tree with my probably not amazing implementation at https://searchfox.org/comm-central/source/mailnews/db/gloda/modules/suffixtree.js
3: https://www.sqlite.org/fts5.html#the_fts5vocab_virtual_table_module
Updated•3 years ago
|
Description
•