Closed Bug 808872 Opened 12 years ago Closed 4 years ago

Use text indexing for AwesomeBar searches

Categories

(Firefox for Android Graveyard :: Data Providers, defect, P5)

All
Android
defect

Tracking

(fennec+)

RESOLVED INCOMPLETE
Tracking Status
fennec + ---

People

(Reporter: rnewman, Unassigned)

References

(Depends on 1 open bug)

Details

(Whiteboard: [Snappy][diamond][lang=java][needs android expertise])

Attachments

(1 file, 1 obsolete file)

This is probably in the wrong component, but I'm hoping it'll be duped, so hey :) Right now, queries in the AwesomeBar use a SQL string containment query to filter inline in the frecency order query. That requires a full-table scan, which of course is linear in database size, which is why we've been spending so much time lately working on expiry, pruning, dropping icons, etc. Our database indices are largely useless for this reason. That it takes 3+ seconds *warm* to filter for a search term in history on my Galaxy S3 is an opportunity for improvement. So how about this: build a text index on our search columns. In Postgres that'd be easy -- just add a trigram index. On Android there are other issues, so we'll probably have to lift out of the DB, but I'm vaguely optimistic that we can index in a way that beats our current times and works with our memory limitations. This also offers the opportunity to do fancy awesomebar improvements like typo correction (fanout in the index search according to Levenshtein distance…). Folks: please dupe or WONTFIX as appropriate :)
Thanks for reporting this rnewman. Apparently, Android's SQLite ships with a FTS extension that we could use. See: http://blog.andresteingress.com/2011/09/30/android-quick-tip-using-sqlite-fts-tables/ http://sqlite.org/fts3.html This looks very promising. Definitely worth trying.
tracking-fennec: --- → ?
Assignee: nobody → lucasr.at.mozilla
tracking-fennec: ? → +
Assignee: lucasr.at.mozilla → nobody
Blocks: 937179
Hardware: ARM → All
Whiteboard: [Snappy] → [Snappy][mentor=rnewman][diamond bug][lang=java][needs android expertise]
Mentor: rnewman
Whiteboard: [Snappy][mentor=rnewman][diamond bug][lang=java][needs android expertise] → [Snappy][diamond bug][lang=java][needs android expertise]
Whiteboard: [Snappy][diamond bug][lang=java][needs android expertise] → [Snappy][diamond][lang=java][needs android expertise]
(In reply to Lucas Rocha (:lucasr) from comment #1) > Thanks for reporting this rnewman. Apparently, Android's SQLite ships with a > FTS extension that we could use. See: > > > http://blog.andresteingress.com/2011/09/30/android-quick-tip-using-sqlite- > fts-tables/ > http://sqlite.org/fts3.html > > This looks very promising. Definitely worth trying. This looks promising but doesn't seem to solve quite the problem we want. See the latter paragraph of the introductory section: http://sqlite.org/fts3.html#section_1 This system matches only complete tokens. While the tokeniser is customisable (and we could write our own, I suppose), none of the provided tokenisers break words into fragments (though there is a Porter stemmer, which is quite exciting). This means that we'll never have any sensible suggestions until the user finishes typing a word: significantly less useful than what we have right now (though faster, of course). I think we need to be careful not to severely reduce suggestion quality in our quest to make the performance suck less. Richard mentioned trigram indices in comment 0: one of those would do nicely. These seem to exist for SQLite: http://jonasfj.dk/blog/2012/08/trilite-fast-string-matching-in-sqlite/ https://github.com/jonasfj/trilite If we used that, we'd never have any suggestions until you've typed three letters, then things would get sensible. It's possible the post-processing step of the trigram index query might eat into our performance gain: unclear.
On the other hand, the documentation Trilite says "do NOT use TriLite in a production environment, unless you are me.", so maybe what we really want is an evil hack on top of FTS using a custom tokeniser. But, yes, this unfortunately isn't as simple as "plug in FTS and have a party" :( ... That said, there is one obvious and straightforward optimisation we could do with FTS, though its usefulness is questionable: The query in the address bar can be considered a sequence of complete tokens and a single incomplete token (the one the user is in the process of typing). We could run a fast query with FTS using the complete tokens first and then run an evil table scan with the incomplete token afterwards (unioning the results when the slow query gets back to us). It's unclear if that situation happens often enough to be considered useful.
(In reply to Chris Kitching [:ckitching] from comment #2) > This system matches only complete tokens. While the tokeniser is > customisable (and we could write our own, I suppose), none of the provided > tokenisers break words into fragments (though there is a Porter stemmer, > which is quite exciting). This means that we'll never have any sensible > suggestions until the user finishes typing a word: significantly less useful > than what we have right now (though faster, of course). I think we need to > be careful not to severely reduce suggestion quality in our quest to make > the performance suck less. Aaand forget most of that: I somehow managed to miss that FTS has prefix querying. That *is* cool. This seems like just what we need, then...
Well, I've got a patch for this that seems to work. My main problem at this point is Android appears to crash hideously when I attempt to use the notindexed flag for fts. That's used to prevent the FTS from indexing fields we don't want it to, and can at worst be worked around by using a view (though I'd prefer to avoid the clutter by just using a cleverly-configured external-content FTS table connected to "combined" with notindexed set on everything that's not title or url). This patch is somewhat rough (with debugging code and whatnot still left in)./ Is the approach taken sane? This feels too easy. Did I miss something stupid, or should I get this cleaned up and landed?
Assignee: nobody → chriskitching
Status: NEW → ASSIGNED
Attachment #8467530 - Flags: feedback?(rnewman)
Comment on attachment 8467530 [details] [diff] [review] Use SQLite's FTS module to accelerate AwesomeBar queries. Review of attachment 8467530 [details] [diff] [review]: ----------------------------------------------------------------- ::: mobile/android/base/db/BrowserDatabaseHelper.java @@ +737,5 @@ > + public void createFreeTextIndex(SQLiteDatabase db) { > + Log.e("TOASTERTIME", "Creating free text indexes!"); > + > + // Create the virtual table to index the combined view. > + db.execSQL("CREATE VIRTUAL TABLE IF NOT EXISTS " + COMBINED_FREE_TEXT_INDEX + " USING fts4(" + fts4 is API 11+ only. This is probably relevant to your interests: http://stackoverflow.com/questions/2421189/version-of-sqlite-used-in-android/4377116#4377116 @@ +748,5 @@ > + Combined.DATE_LAST_VISITED + "," + > + Combined.FAVICON_ID + "," + > + // Don't generate a free text index on the columns that we don't care about... > + // TODO: Figure out why this is broken on Android... :/ > +// "notindexed=\"" + Combined.BOOKMARK_ID + "\"," + These don't need quotes. @@ +750,5 @@ > + // Don't generate a free text index on the columns that we don't care about... > + // TODO: Figure out why this is broken on Android... :/ > +// "notindexed=\"" + Combined.BOOKMARK_ID + "\"," + > +// "notindexed=\"" + Combined.HISTORY_ID + "\"," + > +// "notindexed=\"" + Combined._ID + "\"," + If this is _rowid, you might not be able to avoid indexing it, assuming sqlite doesn't just ignore numeric columns. @@ +756,5 @@ > +// "notindexed=\"" + Combined.DATE_LAST_VISITED + "\"," + > +// "notindexed=\"" + Combined.FAVICON_ID + "\"," + > + "matchinfo=fts3," + > + // Suck information you don't have from the combined view: allowing us to query this table > + // as if it werre the combined view. were @@ +757,5 @@ > +// "notindexed=\"" + Combined.FAVICON_ID + "\"," + > + "matchinfo=fts3," + > + // Suck information you don't have from the combined view: allowing us to query this table > + // as if it werre the combined view. > + "content=\"" + VIEW_COMBINED + "\"," + If this continues to fail, consider trying separate indexes on history and bookmarks, avoiding the use of a view here.
(In reply to Chris Kitching [:ckitching] from comment #5) > Well, I've got a patch for this that seems to work. My main problem at this > point is Android appears to crash hideously when I attempt to use the > notindexed flag for fts. What's the error? What does the log say with setprop log.tag.Database VERBOSE ?
Comment on attachment 8467530 [details] [diff] [review] Use SQLite's FTS module to accelerate AwesomeBar queries. Clearing flag after discussing in person. This is the right direction.
Attachment #8467530 - Flags: feedback?(rnewman)
Mentor: rnewman
See Also: → 1054580
So this turns out to not be nearly as straightforward as expected. At first, it is tempting to try constructing an FTS index for the bookmarks and history tables independently. Then, we could make a modified combined view that uses these two indexes. That's not possible: if a column in a view contains values from more than one FTS table, use of the MATCH keyword causes a crash. Okay, so we can have a history_fts and a bookmarks_fts column in our new view, and do something like "history_fts MATCH ... OR bookmarks_fts MATCH ... ", right? Nope. The MATCH keyword cannot be used in the context of an OR. (it has a bunch of other restrictions, too). Plus, two MATCH operations would be slower. So, the way forward that seems destined to not fail hideously is to construct a single FTS table on the combined view itself. We could then construct a combined_with_fts view which is `combined JOIN combined_fts` (where combined_fts is the FTS virtual table). ... Join on what, exactly? For a vanilla table, we would just JOIN on the auto-incrementing id field. Then you end up with a view containing both the normal and the FTS-indexed versions of the data and make sure to write your WHERE conditions carefully. This is necessary because one cannot create *ordinary* indexes on an FTS virtual table. Equality operations on an FTS table are both inaccurate (due to data discarded by the tokeniser) and slow. As a result, we'd have url, url_fts, title, title_fts columns in our new view. We'd then rewrite equality checks to hit url (which would use the ordinary index) and MATCH checks to use url_fts (which would use the FTS index). However, combined has no equivalent of an id field. Adding a new column that is just ROW_NUMBER() is no good, because this is prone to change randomly as stuff gets inserted "into the middle" of the view (which can occur due to modifcations of the bookmarks/history tables). That'd cause the docids in our FTS table to no longer match the id value, and our suggestions to all be nonsense. Each row in the combined view has either a history_id, a bookmarks_id, or both. Each of these ids is an auto-increment field that started from zero, so they're not unique (you can have two rows with a.history_id = b.bookmark_id). Were history_ids and bookmark_ids mutually unique, we could JOIN combined to combined_fts on "COALESCE(bookmark_id, history_id) = combined_fts.docid" (remembering that an FTS table consists of primary key "docid" field, plus some indexed text fields.). It seems like this might be a neat approach to the problem. Any objections to my rewriting absolutely all the ids in the database?
Actually, a ridiculous alternative idea occurs to me that avoids needing to rewrite every id in the universe. Add Integer.MAX_VALUE/2 to every bookmark (or history) id. Rows in the FTS table will no longer be colliding (you won't have a history entry in the FTS index that wants to have the same id as a totally unrelated bookmarks index), and all will be well. Until someone's history table grows astronomically large. ... Thoughts on this hack? Alternative ideas welcomed?
I misspoke slightly above: It's also impossible to have separate URL and title columns in the combined FTS index, because you cannot use MATCH with OR. So we cannot say: SELECT url, title FROM combined_with_fts WHERE url_fts MATCH "*cake*" OR title_fts MATCH "*cake*" Which is what we would want to do. Instead, we'd need to make a single column (probably holding the concatenation of url and title) to use in our WHERE clause.
Attachment #8467530 - Attachment is obsolete: true
Well, here's my current work. This patch should prettymuch do the trick: does this look like a sane way to go? My concern is that testing the SQLite query performance on my laptop shows that queries on combined_with_fts are exceptionally slow (about ten times slower than the old LIKE query), yet EXPLAIN QUERY PLAN is not showing anything scary. This can't be right: comparing performance of a straightforward query on history/bookmarks using LIKE vs. using FTS (plus a join) shows a big win for FTS on any nontrivally-sized database, yet blend it with the combined view in this way and it all goes horribly wrong. :(
Does this problem get any easier if we manually insert into a separate table of arbitrary schema, rather than relying on indexing existing tables or views?
(In reply to Richard Newman [:rnewman] from comment #14) > Does this problem get any easier if we manually insert into a separate table > of arbitrary schema, rather than relying on indexing existing tables or > views? You mean to build a table more amenable to full-text indexing, then index that and use it for Awesomebar queries? Sure, we could do that. The most obvious approach would be to have a table that is just an actualised version of the combined view that we keep in sync and give an auto-increment id column to. That's a sort of ugly solution, though: keeping it in sync would need quite a lot of evil spaghetti in a lot of places. A nicer approach would be to eliminate the duplication of data between the history/bookmarks table. If we created a "webpages" table that held, say, url, title, favicon_id (and maybe a few other things: stuff that is associated with a page), then we could greatly simplify our awesomebar query. We could then eliminate the UNION ALL clause from the combined view and rewrite it to use a FTS index defined over the "webpages" table. The bookmarks/history tables would be redefined to refer to the webpages table for those columns. Hopefully SQLite isn't stupid enough for the cost of the new JOINs we'd need to be troublesome...
filter on [mass-p5]
Priority: -- → P5
Assignee: chriskitching → nobody
Status: ASSIGNED → NEW
See Also: → 1173164
See Also: → 775672
Depends on: 1425333
Re-triaging per https://bugzilla.mozilla.org/show_bug.cgi?id=1473195 Needinfo :susheel if you think this bug should be re-triaged.
We have completed our launch of our new Firefox on Android. The development of the new versions use GitHub for issue tracking. If the bug report still reproduces in a current version of [Firefox on Android nightly](https://play.google.com/store/apps/details?id=org.mozilla.fenix) an issue can be reported at the [Fenix GitHub project](https://github.com/mozilla-mobile/fenix/). If you want to discuss your report please use [Mozilla's chat](https://wiki.mozilla.org/Matrix#Connect_to_Matrix) server https://chat.mozilla.org and join the [#fenix](https://chat.mozilla.org/#/room/#fenix:mozilla.org) channel.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → INCOMPLETE
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: