Closed Bug 1184823 Opened 5 years ago Closed 2 years ago

Elastic Quicksearch

Categories

(bugzilla.mozilla.org :: Search, enhancement)

Production
enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: dylan, Assigned: dylan)

References

Details

Attachments

(5 obsolete files)

Tracking bug for work to make quicksearch faster via ElasticSearch.
most of this looks sane to me.

>  Write translation layer to translate quicksearches into ES queries.
>  Modifications in the form of additional hooks to the BMO and
>  TrackingFlags extensions.

don't use hooks - update the search code directly.

> Some consideration to paging will need to be thought of.

pagination is a non-goal of this work.
(In reply to Byron Jones ‹:glob› from comment #1)
> >  Write translation layer to translate quicksearches into ES queries.
> >  Modifications in the form of additional hooks to the BMO and
> >  TrackingFlags extensions.
> 
> don't use hooks - update the search code directly.

Ugh. Wish this was happening after the master merge. Anything we add to core now is that much more possible conflict to resolve. Wish extension hooks did not come with the overhead we try to avoid as they make this type of thing much easier :(

dkl
(In reply to Byron Jones ‹:glob› from comment #1)
> most of this looks sane to me.
> 
> >  Write translation layer to translate quicksearches into ES queries.
> >  Modifications in the form of additional hooks to the BMO and
> >  TrackingFlags extensions.
> 
> don't use hooks - update the search code directly.
Treat those extensions as if they were core? Okay.

> > Some consideration to paging will need to be thought of.
> 
> pagination is a non-goal of this work.

Well, the simple case is a max limit on results. I'll do that.
(In reply to David Lawrence [:dkl] from comment #2)
> (In reply to Byron Jones ‹:glob› from comment #1)
> > >  Write translation layer to translate quicksearches into ES queries.
> > >  Modifications in the form of additional hooks to the BMO and
> > >  TrackingFlags extensions.
> > 
> > don't use hooks - update the search code directly.
> 
> Ugh. Wish this was happening after the master merge. Anything we add to core
> now is that much more possible conflict to resolve. Wish extension hooks did
> not come with the overhead we try to avoid as they make this type of thing
> much easier :(

There are only a few places in the code where I will be making changes, buglist.cgi and some related areas.
Mostly the code will be in the form of new modules (Bugzilla::Search::Elastic::*, and a PushConnector).
User Story: (updated)
This got derailed due to security work, so downgrading this from a goal to a "big" item until we catch up.
Keywords: bmo-goalbmo-big
Progress on this has been good, actually!

comments are now stored as parent/child relations to the bugs. After reading and re-reading the elastic search book (The O'Reilly one?) I believe this is the best use case for our data.

Meanwhile, adding tracking flags has caused the spectre of the out of memory killer to rear its head.
Also using $user->address was leaking memory too (filed a bug for the UserProfile as well).

In addition to getting bulk loading right, I have been cooking up an API to get "what has changed" information out of Bugzilla. For non-bug objects, I'm using the audit tables. Bug objects themselves provide enough information.

What still remains an open question is: there could be huge performance impacts from user's changing their names (and thus invalidating many many elasticsearch records). :(
Keywords: bmo-bigbmo-goal
Depends on: 1250688
Attached patch WIP.patch (obsolete) — Splinter Review
Note this doesn't pass sanity tests, or have boilerplate. It's been severely gutted and re-architectured a few times now. 

However, the bulk_index.pl script works. It can work incrementally (But you have to comment out the ->create_mapping() call)
and it is pretty fast. 

Meanwhile, it is able to parse the most common types of quicksearches.

The plan is for it to raise exceptions when asked for field that don't exist in ES -- it doesn't do that yet.

You can index your bugzilla db with the indexer and use search.pl to generate ES-compatible queries. But mostly this just up for general feedback.
Attachment #8726942 - Flags: feedback?(dkl)
User Story: (updated)
I got around to making my "find changed bugs" logic use bugs_activity. 
Considering this is going to be run against (a slave) every 20s, I'd like a DBA to look it over.


SELECT DISTINCT bug_id
          FROM bugs_activity
          JOIN fielddefs on fieldid = fielddefs.id
        WHERE UNIX_TIMESTAMP(bug_when) > $mtime
          AND fielddefs.name IN ("keywords", "short_desc", "product", "component", 
                                 "cf_crash_signature", "alias", "status_whiteboard", "bug_status", "resolution")

How horrible is this going to be? What can I do to make it better, if anything?
Flags: needinfo?(mpressman)
It looks like the following is faster by a lot:

SELECT DISTINCT bug_id
          FROM bugs_activity
          JOIN fielddefs on fieldid = fielddefs.id
        WHERE bug_when > FROM_UNIXTIME($mtime)
          AND fielddefs.name IN ("keywords", "short_desc", "product", "component", 
                                 "cf_crash_signature", "alias", "status_whiteboard", "bug_status", "resolution")
Attached patch 1184823_3.patch (obsolete) — Splinter Review
More stuff. This has the js/field user auto-completion being backed by a native REST wrapper around elasticsearch. it's not quite as fast, but still pretty fast.

Currently adding anonymized search queries to t/015_*.t (not in this patch, because it could be sensitive)

Based on the corpus, the operators I need to support still are:

1) anywords (used by the keyword syntax, !foo)
2) notsubstring
3) notequals

Probably more, but the not* ones present a particular problem with translating them to ES. If I can't figure it out today, they will be dropped from initial support. We'll still be able to handle a large number of searches this way.
Attachment #8726942 - Attachment is obsolete: true
Attachment #8726942 - Flags: feedback?(dkl)
Attachment #8727511 - Flags: feedback?(dkl)
Attached patch 1184823_4.patch (obsolete) — Splinter Review
last semi-working copy (in the middle of refactoring, so this is a snapshot 'cause I wanted to share)
Attached patch 1184823_5.patch (obsolete) — Splinter Review
Attachment #8727511 - Attachment is obsolete: true
Attachment #8735691 - Attachment is obsolete: true
Attachment #8727511 - Flags: feedback?(dkl)
Flags: needinfo?(mpressman)
Attachment #8738025 - Flags: review?(dkl)
Comment on attachment 8738025 [details] [diff] [review]
1184823_5.patch

Review of attachment 8738025 [details] [diff] [review]:
-----------------------------------------------------------------

Remove references to Alive.pm (debugging) and spin new patch. 
Thanks
Attachment #8738025 - Flags: review?(dkl) → review-
Attached patch 1184823_6.patch (obsolete) — Splinter Review
Attachment #8738025 - Attachment is obsolete: true
Attachment #8739091 - Flags: review?(dkl)
Blocks: 1274418
Will I still be able to search for exact strings, including strings that contain punctuation? (I ask because it's important to me in Bugzilla, and because many search systems don't support it.)
(In reply to Jesse Ruderman from comment #15)
> Will I still be able to search for exact strings, including strings that
> contain punctuation? (I ask because it's important to me in Bugzilla, and
> because many search systems don't support it.)

Yes with caveats (for comments this isn't allowed for space considerations).
It is also possible to force a fallback onto the sql-based system.
(A fallback also occurs if a field that is not indexed is searched for, or if an operator that is not supported is used.)

If you provide me a list of your searches, I'll test them against my system and tell you the result. I used a corpus of 4000 common quicksearches to figure out what queries to focus on.
Flags: needinfo?(jruderman)
Flags: needinfo?(jruderman)
Depends on: 1307478
Depends on: 1307485
Attachment #8739091 - Flags: review?(dkl)
Depends on: 1316660
No longer depends on: 1250688
No longer depends on: 1307478
No longer blocks: 1274418
Depends on: 1274418
Attachment #8739091 - Attachment is obsolete: true
User Story: (updated)
Blocks: 1362222

As per dylan, Elasticsearch is no more.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.