Last Comment Bug 1184823 - Elastic Quicksearch
: Elastic Quicksearch
Status: NEW
: bmo-goal
Product: bugzilla.mozilla.org
Classification: Other
Component: Search (show other bugs)
: Production
: Unspecified Unspecified
-- normal (vote)
: ---
Assigned To: Dylan Hardison [:dylan]
:
:
Mentors:
Depends on: 1307485 1274418 1316660
Blocks:
  Show dependency treegraph
 
Reported: 2015-07-16 20:53 PDT by Dylan Hardison [:dylan]
Modified: 2017-01-04 13:50 PST (History)
7 users (show)
See Also:
Due Date:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
WIP.patch (14.38 KB, patch)
2016-03-04 14:19 PST, Dylan Hardison [:dylan]
no flags Details | Diff | Splinter Review
1184823_3.patch (43.96 KB, patch)
2016-03-07 10:30 PST, Dylan Hardison [:dylan]
no flags Details | Diff | Splinter Review
1184823_4.patch (58.11 KB, patch)
2016-03-28 20:28 PDT, Dylan Hardison [:dylan]
no flags Details | Diff | Splinter Review
1184823_5.patch (109.30 KB, patch)
2016-04-04 21:17 PDT, Dylan Hardison [:dylan]
dkl: review-
Details | Diff | Splinter Review
1184823_6.patch (65.92 KB, patch)
2016-04-07 09:55 PDT, Dylan Hardison [:dylan]
no flags Details | Diff | Splinter Review

Description User image Dylan Hardison [:dylan] 2015-07-16 20:53:12 PDT
Tracking bug for work to make quicksearch faster via ElasticSearch.
Comment 1 User image Byron Jones ‹:glob› 2015-07-16 21:21:47 PDT
most of this looks sane to me.

>  Write translation layer to translate quicksearches into ES queries.
>  Modifications in the form of additional hooks to the BMO and
>  TrackingFlags extensions.

don't use hooks - update the search code directly.

> Some consideration to paging will need to be thought of.

pagination is a non-goal of this work.
Comment 2 User image David Lawrence [:dkl] 2015-07-17 09:08:36 PDT
(In reply to Byron Jones ‹:glob› from comment #1)
> >  Write translation layer to translate quicksearches into ES queries.
> >  Modifications in the form of additional hooks to the BMO and
> >  TrackingFlags extensions.
> 
> don't use hooks - update the search code directly.

Ugh. Wish this was happening after the master merge. Anything we add to core now is that much more possible conflict to resolve. Wish extension hooks did not come with the overhead we try to avoid as they make this type of thing much easier :(

dkl
Comment 3 User image Dylan Hardison [:dylan] 2015-07-17 14:22:18 PDT
(In reply to Byron Jones ‹:glob› from comment #1)
> most of this looks sane to me.
> 
> >  Write translation layer to translate quicksearches into ES queries.
> >  Modifications in the form of additional hooks to the BMO and
> >  TrackingFlags extensions.
> 
> don't use hooks - update the search code directly.
Treat those extensions as if they were core? Okay.

> > Some consideration to paging will need to be thought of.
> 
> pagination is a non-goal of this work.

Well, the simple case is a max limit on results. I'll do that.
Comment 4 User image Dylan Hardison [:dylan] 2015-07-17 14:23:50 PDT
(In reply to David Lawrence [:dkl] from comment #2)
> (In reply to Byron Jones ‹:glob› from comment #1)
> > >  Write translation layer to translate quicksearches into ES queries.
> > >  Modifications in the form of additional hooks to the BMO and
> > >  TrackingFlags extensions.
> > 
> > don't use hooks - update the search code directly.
> 
> Ugh. Wish this was happening after the master merge. Anything we add to core
> now is that much more possible conflict to resolve. Wish extension hooks did
> not come with the overhead we try to avoid as they make this type of thing
> much easier :(

There are only a few places in the code where I will be making changes, buglist.cgi and some related areas.
Mostly the code will be in the form of new modules (Bugzilla::Search::Elastic::*, and a PushConnector).
Comment 5 User image Mark Côté [:mcote] 2015-10-13 08:39:30 PDT
This got derailed due to security work, so downgrading this from a goal to a "big" item until we catch up.
Comment 6 User image Dylan Hardison [:dylan] 2016-02-08 10:58:45 PST
Progress on this has been good, actually!

comments are now stored as parent/child relations to the bugs. After reading and re-reading the elastic search book (The O'Reilly one?) I believe this is the best use case for our data.

Meanwhile, adding tracking flags has caused the spectre of the out of memory killer to rear its head.
Also using $user->address was leaking memory too (filed a bug for the UserProfile as well).

In addition to getting bulk loading right, I have been cooking up an API to get "what has changed" information out of Bugzilla. For non-bug objects, I'm using the audit tables. Bug objects themselves provide enough information.

What still remains an open question is: there could be huge performance impacts from user's changing their names (and thus invalidating many many elasticsearch records). :(
Comment 7 User image Dylan Hardison [:dylan] 2016-03-04 14:19:01 PST
Created attachment 8726942 [details] [diff] [review]
WIP.patch

Note this doesn't pass sanity tests, or have boilerplate. It's been severely gutted and re-architectured a few times now. 

However, the bulk_index.pl script works. It can work incrementally (But you have to comment out the ->create_mapping() call)
and it is pretty fast. 

Meanwhile, it is able to parse the most common types of quicksearches.

The plan is for it to raise exceptions when asked for field that don't exist in ES -- it doesn't do that yet.

You can index your bugzilla db with the indexer and use search.pl to generate ES-compatible queries. But mostly this just up for general feedback.
Comment 8 User image Dylan Hardison [:dylan] 2016-03-07 08:27:31 PST
I got around to making my "find changed bugs" logic use bugs_activity. 
Considering this is going to be run against (a slave) every 20s, I'd like a DBA to look it over.


SELECT DISTINCT bug_id
          FROM bugs_activity
          JOIN fielddefs on fieldid = fielddefs.id
        WHERE UNIX_TIMESTAMP(bug_when) > $mtime
          AND fielddefs.name IN ("keywords", "short_desc", "product", "component", 
                                 "cf_crash_signature", "alias", "status_whiteboard", "bug_status", "resolution")

How horrible is this going to be? What can I do to make it better, if anything?
Comment 9 User image Dylan Hardison [:dylan] 2016-03-07 09:25:22 PST
It looks like the following is faster by a lot:

SELECT DISTINCT bug_id
          FROM bugs_activity
          JOIN fielddefs on fieldid = fielddefs.id
        WHERE bug_when > FROM_UNIXTIME($mtime)
          AND fielddefs.name IN ("keywords", "short_desc", "product", "component", 
                                 "cf_crash_signature", "alias", "status_whiteboard", "bug_status", "resolution")
Comment 10 User image Dylan Hardison [:dylan] 2016-03-07 10:30:14 PST
Created attachment 8727511 [details] [diff] [review]
1184823_3.patch

More stuff. This has the js/field user auto-completion being backed by a native REST wrapper around elasticsearch. it's not quite as fast, but still pretty fast.

Currently adding anonymized search queries to t/015_*.t (not in this patch, because it could be sensitive)

Based on the corpus, the operators I need to support still are:

1) anywords (used by the keyword syntax, !foo)
2) notsubstring
3) notequals

Probably more, but the not* ones present a particular problem with translating them to ES. If I can't figure it out today, they will be dropped from initial support. We'll still be able to handle a large number of searches this way.
Comment 11 User image Dylan Hardison [:dylan] 2016-03-28 20:28:52 PDT
Created attachment 8735691 [details] [diff] [review]
1184823_4.patch

last semi-working copy (in the middle of refactoring, so this is a snapshot 'cause I wanted to share)
Comment 12 User image Dylan Hardison [:dylan] 2016-04-04 21:17:31 PDT
Created attachment 8738025 [details] [diff] [review]
1184823_5.patch
Comment 13 User image David Lawrence [:dkl] 2016-04-07 08:32:46 PDT
Comment on attachment 8738025 [details] [diff] [review]
1184823_5.patch

Review of attachment 8738025 [details] [diff] [review]:
-----------------------------------------------------------------

Remove references to Alive.pm (debugging) and spin new patch. 
Thanks
Comment 14 User image Dylan Hardison [:dylan] 2016-04-07 09:55:47 PDT
Created attachment 8739091 [details] [diff] [review]
1184823_6.patch
Comment 15 User image Jesse Ruderman 2016-06-01 20:06:39 PDT
Will I still be able to search for exact strings, including strings that contain punctuation? (I ask because it's important to me in Bugzilla, and because many search systems don't support it.)
Comment 16 User image Dylan Hardison [:dylan] 2016-06-02 07:38:21 PDT
(In reply to Jesse Ruderman from comment #15)
> Will I still be able to search for exact strings, including strings that
> contain punctuation? (I ask because it's important to me in Bugzilla, and
> because many search systems don't support it.)

Yes with caveats (for comments this isn't allowed for space considerations).
It is also possible to force a fallback onto the sql-based system.
(A fallback also occurs if a field that is not indexed is searched for, or if an operator that is not supported is used.)

If you provide me a list of your searches, I'll test them against my system and tell you the result. I used a corpus of 4000 common quicksearches to figure out what queries to focus on.

Note You need to log in before you can comment on or make changes to this bug.