Elasticsearch Indexer / Bulk Indexer

RESOLVED FIXED

Status

()

bugzilla.mozilla.org
General
RESOLVED FIXED
11 months ago
8 months ago

People

(Reporter: dylan, Assigned: dylan)

Tracking

Production
Dependency tree / graph

Details

Attachments

(1 attachment, 5 obsolete attachments)

(Assignee)

Description

11 months ago
Split out the search indexer from bug 1184823 so that that it can be reviewed and deployed separately from the more complicated search code
(Assignee)

Updated

11 months ago
Blocks: 1307485
(Assignee)

Comment 1

10 months ago
Created attachment 8800789 [details] [diff] [review]
1307478_1.patch

here's the indexer code. It's a little improved from before.
Attachment #8800789 - Flags: review?(dkl)
(Assignee)

Comment 2

10 months ago
Created attachment 8800815 [details] [diff] [review]
1307478_2.patch

Fixed some warnings and a performance issue. This loads about 900 bugs per second now.
Attachment #8800789 - Attachment is obsolete: true
Attachment #8800789 - Flags: review?(dkl)
Attachment #8800815 - Flags: review?(dkl)
(Assignee)

Comment 3

10 months ago
Created attachment 8800817 [details] [diff] [review]
1307478_3.patch

accidentally included the commented out comment indexer part, fixed now.
Attachment #8800815 - Attachment is obsolete: true
Attachment #8800815 - Flags: review?(dkl)
Attachment #8800817 - Flags: review?(dkl)
(Assignee)

Comment 4

10 months ago
Created attachment 8800917 [details] [diff] [review]
1307478_4.patch

Added more code to the bulk indexer, it can run as a service now.

Also added dependency info

It doesn't perform daemonization itself but I think that's okay. Rest of the code is the same.
Attachment #8800817 - Attachment is obsolete: true
Attachment #8800817 - Flags: review?(dkl)
Attachment #8800917 - Flags: review?(dkl)

Comment 5

10 months ago
Comment on attachment 8800917 [details] [diff] [review]
1307478_4.patch

Review of attachment 8800917 [details] [diff] [review]:
-----------------------------------------------------------------

1. missing template/en/default/admin/params/elastic.html.tmpl for config descriptions.

Have a few more things I want to try but you can work on these in meantime.

::: Bugzilla/Comment.pm
@@ +22,4 @@
>  use Bugzilla::Util;
>  
>  use List::Util qw(first);
> +use Scalar::Util qw(blessed);

Still need weaken and isweak for line 287 so do not change this line.

::: Bugzilla/Elastic/Indexer.pm
@@ +175,5 @@
> +sub bulk_load {
> +    my ( $self, $class ) = @_;
> +
> +    $self->put_mapping($class);
> +    my $bulk    = $self->_bulk_helper($class);

nit: extra whitespace

::: bulk_index.pl
@@ +33,5 @@
> +);
> +
> +my $indexer = Bugzilla::Elastic::Indexer->new(
> +    $debug_sql ? ( debug_sql => 1 ) : (),
> +    $progress_bar ? ('Term::ProgressBar' ) : (),

Change to:

 $progress_bar ? ( progress_bar => 'Term::ProgressBar' ) : (),

@@ +60,5 @@
> +        my $start_comments = time;
> +        $indexer->bulk_load('Bugzilla::Comment');
> +        print "    ", time - $start_comments, " seconds\n" if $verbose > 1;
> +
> +        $loop->stop unless $loop;

$once is not used currently. Change to:

$loop->stop if $once || !$loop;
Attachment #8800917 - Flags: review?(dkl) → review-
(Assignee)

Comment 6

9 months ago
Created attachment 8808399 [details] [diff] [review]
1307478_6.patch

also on dev
Attachment #8800917 - Attachment is obsolete: true
Attachment #8808399 - Flags: review?(dkl)
(Assignee)

Comment 7

9 months ago
Created attachment 8808402 [details] [diff] [review]
1307478_7.patch

Forgot one thing. :-)
Attachment #8808399 - Attachment is obsolete: true
Attachment #8808399 - Flags: review?(dkl)
Attachment #8808402 - Flags: review?(dkl)

Comment 8

9 months ago
Comment on attachment 8808402 [details] [diff] [review]
1307478_7.patch

Review of attachment 8808402 [details] [diff] [review]:
-----------------------------------------------------------------

fubar would know best how he wants the bulk-index.pl to run on the jobqueue server so the it should be a service daemon probably. Or maybe just a cron job. Not sure.

Also should bulk-index.pl live in scripts/? No fussed either way.

Awesome work! r=dkl

::: Bugzilla/Comment.pm
@@ +22,4 @@
>  use Bugzilla::Util;
>  
>  use List::Util qw(first);
> +use Scalar::Util qw(blessed);

Need to add back weaken and isweak.

::: bulk_index.pl
@@ +5,5 @@
> +#
> +# This Source Code Form is "Incompatible With Secondary Licenses", as
> +# defined by the Mozilla Public License, v. 2.0.
> +
> +# This file has detailed POD docs, do "perldoc checksetup.pl" to see them.

Remove this line. Not relevant.

@@ +14,5 @@
> +BEGIN { Bugzilla->extensions }
> +
> +use Bugzilla::Elastic::Indexer;
> +
> +use Term::ProgressBar;

You have this as a recommends in Makefile.PL but bulk-index.pl will fail if this module is not installed.

@@ +29,5 @@
> +    'verbose|v+'  => \$verbose,
> +    'debug-sql'    => \$debug_sql,
> +    'progress-bar' => \$progress_bar,
> +    'once|n'       => \$once,
> +);

Would like to see some usage output if --help (maybe a new bug)
Attachment #8808402 - Flags: review?(dkl) → review+
(In reply to David Lawrence [:dkl] from comment #8)
>
> fubar would know best how he wants the bulk-index.pl to run on the jobqueue
> server so the it should be a service daemon probably. Or maybe just a cron
> job. Not sure.

One question, in two parts: What happens if the daemon/cronjob fail to run for an extended amount of time, and what about if/when we fail over to AWS where ES does not exist (and probably won't for a while) ?

If the answer is "nothing bad" then I'm ok with treating it like other cronjobs that run on the admin node. Otherwise, we can run it as a daemon and have Nagios check for it and alert the MOC like the push or jobqueue daemons.
(In reply to Kendall Libby [:fubar] from comment #9)
> (In reply to David Lawrence [:dkl] from comment #8)
> >
> > fubar would know best how he wants the bulk-index.pl to run on the jobqueue
> > server so the it should be a service daemon probably. Or maybe just a cron
> > job. Not sure.
> 
> One question, in two parts: What happens if the daemon/cronjob fail to run
> for an extended amount of time, and what about if/when we fail over to AWS
> where ES does not exist (and probably won't for a while) ?

Dylan may be able to provide better answer but from the way it is coded currently, it should be a non-issue as it will import the bugs that do not yet exist and update any that have changed based on the modification times.

> If the answer is "nothing bad" then I'm ok with treating it like other
> cronjobs that run on the admin node. Otherwise, we can run it as a daemon
> and have Nagios check for it and alert the MOC like the push or jobqueue
> daemons.

I feel that having a nagios monitor would be a nice to have. Right now it is positioned as a convenience system to speed up user completion and bug searches and BMO would continue to operate fine if it were to disappear temporarily but I could see us definitely relying on it more in the future.

dkl
(Assignee)

Comment 11

9 months ago
(In reply to Kendall Libby [:fubar] from comment #9)
> (In reply to David Lawrence [:dkl] from comment #8)
> >
> > fubar would know best how he wants the bulk-index.pl to run on the jobqueue
> > server so the it should be a service daemon probably. Or maybe just a cron
> > job. Not sure.
> 
> One question, in two parts: What happens if the daemon/cronjob fail to run
> for an extended amount of time, and what about if/when we fail over to AWS
> where ES does not exist (and probably won't for a while) ?
> 
> If the answer is "nothing bad" then I'm ok with treating it like other
> cronjobs that run on the admin node. Otherwise, we can run it as a daemon
> and have Nagios check for it and alert the MOC like the push or jobqueue
> daemons.

If the indexer isn't running, searches will return stale data. Stale data is bad -- so I think it should be an alert if the daemon isn't running.
(In reply to Dylan Hardison [:dylan] from comment #11)
> If the indexer isn't running, searches will return stale data. Stale data is
> bad -- so I think it should be an alert if the daemon isn't running.

Good point. I was still thinking the stale data would be harmless but I see where it could cause confusion if search results returned bugs that it was not supposed to or bugs were missing that should be there. 

dkl
So... what happens if/when we fail over to AWS? Exactly how bad is bad?
(In reply to Kendall Libby [:fubar] from comment #13)
> So... what happens if/when we fail over to AWS? Exactly how bad is bad?

IMO and it may work this way in Dylans patch for quicksearch and user autocomplete is that if the elastic server is not available it just falls back to old behavior without any error to the user. That would work for AWS being that we do not have an elastic server there.

dkl
(Assignee)

Updated

9 months ago
Depends on: 1319503
(Assignee)

Updated

9 months ago
Depends on: 1321662

Updated

8 months ago
No longer blocks: 1184823
(Assignee)

Comment 15

8 months ago
To git@github.com:mozilla-bteam/bmo.git
   840bbad83..419f3ae9f  master -> master
Status: NEW → RESOLVED
Last Resolved: 8 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.