Closed
Bug 1184431
Opened 9 years ago
Closed 9 years ago
Bug searching is slow on Postgres
Categories
(Bugzilla :: Query/Bug List, enhancement)
Bugzilla
Query/Bug List
Tracking
()
RESOLVED
FIXED
Bugzilla 5.0
People
(Reporter: mtyson, Assigned: mtyson)
Details
Attachments
(1 file, 3 obsolete files)
|
6.10 KB,
patch
|
LpSolit
:
review+
|
Details | Diff | Splinter Review |
The Bugzilla Search.pm code uses the SQL POSITION() function to do substring matching. Under some circumstances this can be horrendously slow with a Postgres database. Attached is a patch to remove the use of POSITION() and replace it with the LIKE operator. For the query I'm using to test with, this reduces execution time from 15 minutes to 12 seconds. MySQL performance appears to be unaffected by the change.
Attachment #8634569 -
Flags: review?(dylan)
| Assignee | ||
Comment 1•9 years ago
|
||
Comment on attachment 8634569 [details] [diff] [review] likesearch.patch As glob pointed out on IRC: <glob> mtyson, what if $fragment contains % ?
Attachment #8634569 -
Flags: review?(dylan)
Comment 2•9 years ago
|
||
Sql has the escape keyword to handle that.
Comment 3•9 years ago
|
||
Comment on attachment 8634569 [details] [diff] [review] likesearch.patch >+# sql_nlike - A case sensitive NOT LIKE operation >+# sql_nilike - A case insensitive NOT LIKE operation You should name them sql_not_like and sql_not_ilike, because the "n" by itself is confusing. It makes me think about sort vs nsort (numeric sort). Also, please add POD for these methods, else your patch won't pass runtests.pl.
Comment 4•9 years ago
|
||
(In reply to Matt Tyson from comment #0) > For the query I'm using to test with, this reduces execution time from 15 > minutes to 12 seconds. That's an interesting improvement. :) Do you have some queries to suggest to the reviewer to validate the perf improvement?
| Assignee | ||
Comment 5•9 years ago
|
||
(In reply to Frédéric Buclin from comment #4) > That's an interesting improvement. :) Do you have some queries to suggest to > the reviewer to validate the perf improvement? I think to test the performance properly you would need a postgres database with a large number of users, components and bugs in it. Due to the amount of data we have in the database the actual generated SQL for this bugzilla search is about 800kb when saved out to a file. If you don't have such a database handy then testing the performance of this might be difficult. The query goes like this: 'Flags' 'matches regular expression' 'regex' 'Flags' 'does not match regular expression' 'slightly different regex' 'Flags' 'does not match regular expression' 'yet another regex' 'Reporter' 'contains none of the strings' 'foo@,bar@,baz@,foobar@,foobarbaz@' 'Component' 'contains none of the strings' 'component1,component2,component3,component4' It's the 'contains none of the strings' parts that wreck performance.
| Assignee | ||
Comment 6•9 years ago
|
||
Ok, this is a second attempt. This leaves the SQL POSITION() style functions intact, but puts them in the new sql_like() functions. This shouldn't cause any changes to the generated SQL for any of the databases that bugzilla supports, but allows us to override them for a specific database implementation if required. Also included in this patch is a postgres overload, which does use the SQL LIKE functions, and escapes any special characters that may affect the output of LIKE.
Attachment #8634569 -
Attachment is obsolete: true
Attachment #8635103 -
Flags: review?(dylan)
| Assignee | ||
Comment 7•9 years ago
|
||
The previous patch would fail when the column is an integer type. ERROR: operator does not exist: integer ~~* text This updated patch will cast the field to a text type so the LIKE operation doesn't fail.
Attachment #8635103 -
Attachment is obsolete: true
Attachment #8635103 -
Flags: review?(dylan)
Attachment #8645569 -
Flags: review?(dylan)
Comment 8•9 years ago
|
||
Comment on attachment 8645569 [details] [diff] [review] sql_like.patch >+++ b/Bugzilla/DB.pm >+sub sql_like { >+ my ($self, $fragment, $column) = @_; Here, the 3rd argument is a DB column so $column is a good name for this variable, but in DB/Pg.pm, the 3rd argument is $text, which is confusing. I think $column better reflects what this 3rd argument really is. >+Outputs SQL to search for an instance of a string (fragment) >+in a table column (column) Nit: please add a dot at the end of the sentence. >+Note: This is not necessarily generate an ANSI LIKE statement, but s/is/does/ ? >+Formatted SQL to return results from columns that contain the fragment Nit: dot. >+Outputs SQL to search for columns (column) that I<do not> contain >+instances of the string (fragment) ditto >+Formated sql to return results from columns that do not contain the fragment s/sql/SQL/ >+++ b/Bugzilla/DB/Pg.pm >+sub sql_like { >+ my ($self, $fragment, $text, $not) = @_; As said above, $column is a better name than $text, because it better reflects what that argument really is. Please rename this variable to $column. I see that the 4th argument $not is never passed, but it can stay for future usage. >+sub sql_ilike { >+ my ($self, $fragment, $text, $not) = @_; Same comment about $text vs $column. >+++ b/Bugzilla/Search.pm > sub _casesubstring { > my ($self, $args) = @_; >+ my ($full_field, $unquoted) = @$args{qw(full_field value)}; $value is a better name than $unquoted, IMO. > sub _substring { > my ($self, $args) = @_; >+ my ($full_field, $unquoted) = @$args{qw(full_field value)}; Same here. > sub _notsubstring { > my ($self, $args) = @_; >+ my ($full_field, $unquoted) = @$args{qw(full_field value)}; And here. Otherwise it looks good, so r=LpSolit with the changes above. I'm just wondering why we don't use LIKE everywhere instead of in Pg.pm only. To not break anything? :) I also did some testing with Pg 9.4.4, and queries are 30-50% faster with your patch even with a small DB, so it's still a valuable perf improvement. Very nice!
Attachment #8645569 -
Flags: review?(dylan) → review+
Comment 9•9 years ago
|
||
IMO, it should land in the 5.0 branch, especially based on the huge perf improvement reported in comment 0. If we regress something, we have time to back it out before 5.0.2.
Status: NEW → ASSIGNED
Flags: approval5.0?
Target Milestone: --- → Bugzilla 5.0
Comment 10•9 years ago
|
||
Comment on attachment 8645569 [details] [diff] [review] sql_like.patch >diff --git a/Bugzilla/DB.pm b/Bugzilla/DB.pm >+ return $self->sql_position($quoted, $column) . " > 0" >+ return $self->sql_iposition($quoted, $column) . " = 0" I almost forgot: please add a semicolon at the end of these lines.
| Assignee | ||
Comment 11•9 years ago
|
||
> I'm just wondering why we don't use LIKE everywhere instead of in Pg.pm only. To not break anything? :)
Yes. Postgres is a bit persnickety when it comes to comparing field types. For the LIKE operations to work we need to explicitly cast every field to text otherwise the search can fail when numeric fields are involved. We would need to override in Pg.pm anyway to use the postgres cast operators.
MySQL doesn't care about types so much, but the search operators are different. Postgres has ILIKE to do a case insensitive match, MySQL (and Oracle?) require you to do UPPER(column) LIKE UPPER('foobar').
Since MySQL performance doesn't seem affected by this patch, and I don't have the ability to test Oracle. I felt it was safer to leave them alone.
Attachment #8645569 -
Attachment is obsolete: true
Attachment #8661512 -
Flags: review?(LpSolit)
Comment 12•9 years ago
|
||
(In reply to Matt Tyson from comment #11) > For the LIKE operations to work we need to explicitly cast every field to > text otherwise the search can fail when numeric fields are involved. We > would need to override in Pg.pm anyway to use the postgres cast operators. Ah right, good point. So yes, I agree, let's use LIKE for Pg only. > Since MySQL performance doesn't seem affected by this patch, and I don't > have the ability to test Oracle. I felt it was safer to leave them alone. Yes, I'm fine with that. :)
Comment 13•9 years ago
|
||
Comment on attachment 8661512 [details] [diff] [review] sql_like.patch Perfect, thanks! :) r=LpSolit
Attachment #8661512 -
Flags: review?(LpSolit) → review+
Updated•9 years ago
|
Flags: approval5.0? → approval5.0+
Comment 14•9 years ago
|
||
To ssh://gitolite3@git.mozilla.org/bugzilla/bugzilla.git cdcb6f1..05f74fa master -> master To ssh://gitolite3@git.mozilla.org/bugzilla/bugzilla.git f7de896..cea1aff 5.0 -> 5.0
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•