Replace CONCAT and MATCH with Bugzilla::DB function call

RESOLVED FIXED in Bugzilla 2.20

Status

()

Bugzilla
Bugzilla-General
P1
enhancement
RESOLVED FIXED
13 years ago
13 years ago

People

(Reporter: Max Kanat-Alexander, Assigned: Tomas Kopal)

Tracking

2.19.2
Bugzilla 2.20
Bug Flags:
approval +

Details

(URL)

Attachments

(1 attachment, 7 obsolete attachments)

8.32 KB, patch
Max Kanat-Alexander
: review+
Details | Diff | Splinter Review
(Reporter)

Description

13 years ago
There is one place we use CONCAT in Bugzilla, which is not ANSI SQL, and not
supported by PostgreSQL.

From the MySQL manual:

MySQL Server doesn't support the standard SQL || operator for string
concatenation; use CONCAT() instead. Because CONCAT() takes any number of
arguments, it's easy to convert use of the || operator to MySQL Server.
(Reporter)

Comment 1

13 years ago
By the way, it's this line in Bugzilla::Search:

$ff = "CONCAT($flagtypes.name, $flags.status)";
(Reporter)

Comment 2

13 years ago
Note that you must explicitly cast both arguments to varchar for the PostgreSQL
version -- PostgreSQL cannot CONCAT a varchar and a char, whereas MySQL can.
(Reporter)

Comment 3

13 years ago
Let's also throw MATCH into this patch, since they both only exist in one place.

For now, we'll just use a LIKE operator for MATCH, I think.
Summary: Replace CONCAT with Bugzilla::DB function call → Replace CONCAT and MATCH with Bugzilla::DB function call
(Reporter)

Updated

13 years ago
Priority: -- → P1
Target Milestone: --- → Bugzilla 2.20
(Assignee)

Comment 4

13 years ago
Created attachment 176118 [details] [diff] [review]
V1

Here we go...
(Assignee)

Updated

13 years ago
Attachment #176118 - Flags: review?
(Assignee)

Comment 5

13 years ago
(In reply to comment #2)
> Note that you must explicitly cast both arguments to varchar for the PostgreSQL
> version -- PostgreSQL cannot CONCAT a varchar and a char, whereas MySQL can.

Hmmm, weird. It "works for me"(TM) :-). varchar(50) and char(1), Postgres 7.4.7.
Can you test it again, please?

And I forgot to mention: the patch is well tested on MySQL, but I have done just
very basic testing on Postgres.
(Assignee)

Updated

13 years ago
Status: NEW → ASSIGNED
(Reporter)

Comment 6

13 years ago
I'm using PostgreSQL 7.3, so that's probably the difference.
(Assignee)

Comment 7

13 years ago
(In reply to comment #6)
> I'm using PostgreSQL 7.3, so that's probably the difference.

Ok, so... Should we add the type cast, or should we bump the version up to 7.4?
Adding the type cast is not that simple, IMHO. If we cast it in the Pg module to
varchar, the function is no longer generic - what if you pass in e.g. text? And
what limit will we put on the varchar?
As it is used at exactly one place ATM, it does not matter so much, but we can
hit a problem if someone use it somewhere else. Maybe casting both arguments to
text would help?
(Reporter)

Comment 8

13 years ago
Yeah, both arguments should be casted to the same thing, probably text.

As you'll see in the PostgreSQL manual, the CONCAT operation always returns a
"text" type. So if we cast both of the arguments to text, there's no problem.
(Assignee)

Comment 9

13 years ago
(In reply to comment #8)
> Yeah, both arguments should be casted to the same thing, probably text.
> 
> As you'll see in the PostgreSQL manual, the CONCAT operation always returns a
> "text" type. So if we cast both of the arguments to text, there's no problem.

Ok, sounds good, I will fix the patch.
(Assignee)

Updated

13 years ago
Attachment #176118 - Flags: review?
(Assignee)

Comment 10

13 years ago
Created attachment 176269 [details] [diff] [review]
V1.1

The only change from the previous version is that Pg specific implementation
was added. It's similar to ANSI version, it just casts both parameters to text
prior to joining them.
Attachment #176118 - Attachment is obsolete: true
Attachment #176269 - Flags: review?
(Reporter)

Comment 11

13 years ago
Comment on attachment 176269 [details] [diff] [review]
V1.1

>+our @_abstract_methods = qw(new REQUIRED_VERSION PROGRAM_NAME

  I'd rather the constants were on their own line. That's why I put them on
their own line. I plan to add more later.

>+sub sql_string_concat {
>+    my ($self, @params) = @_;
>+    
>+    return join(' || ', @params);
>+}
>+

  Are there no existing sql_ methods in this thing? I forget if this is where
this ought to go -- see my patch on bug 284352; I'm working on adding a little
bit of organization to Bugzilla::DB.

>-             push(@supptables, "INNER JOIN longdescs $table ON bugs.bug_id " . 
>-                               "= $table.bug_id");
>+             push(@supptables, "INNER JOIN longdescs AS $table " .
>+                               "ON bugs.bug_id = $table.bug_id");

  It looks like this change has nothing to do with this patch...?

>-             my $term1 = "MATCH($table.thetext) AGAINST(".&::SqlQuote($v).")";
>-             my $term2 = "MATCH(bugs.short_desc) AGAINST(".&::SqlQuote($v).")";
>+             my $term1 = $dbh->sql_fulltext_search("$table.thetext",
>+                                                   ::SqlQuote($v));

  Nit: Put ${table}.thetext so it's clearer that "thetext" isn't part of the
variable. :-)

>+             # Generate the condition by running the operator-specific

  I think you mean "database-specific"?

>+             $ff = $dbh->sql_string_concat("$flagtypes.name", "$flags.status");

  Nit: Same thing there, about the {}.

>+    # This is a quick hack to get at least some searching with postgres
>+    # working. We need to revisit this and implement proper full text
>+    # search (probably employing the tsearch2 contrib module).
>+    return "$column ~* $text";

  That should be LIKE "%$text%" -- it's much faster, usually. Or maybe ILIKE,
if the match is supposed to be case-insensitive.

>+    return 'CAST(' . join(' AS text) || CAST(', @params) . ' AS text)';

  Can you not use ::text for some reason, instead of CAST?

  Also, you didn't join @params, so that will just give us something like
ARRAY. We probably need to foreach over @params there, I'd think.

  -Max
Attachment #176269 - Flags: review? → review-
(Assignee)

Comment 12

13 years ago
Created attachment 176331 [details] [diff] [review]
V1.2
Attachment #176269 - Attachment is obsolete: true
Attachment #176331 - Flags: review?(mkanat)
(Reporter)

Comment 13

13 years ago
Comment on attachment 176331 [details] [diff] [review]
V1.2

The regex search thing really does need to be changed to LIKE for PostgreSQL.
Attachment #176331 - Flags: review?(mkanat) → review-
(Assignee)

Comment 14

13 years ago
You are too fast, Max, let me at least comment on your points ;-).

(In reply to comment #11)
> (From update of attachment 176269 [details] [diff] [review] [edit])
> >+our @_abstract_methods = qw(new REQUIRED_VERSION PROGRAM_NAME
> 
>   I'd rather the constants were on their own line. That's why I put them on
> their own line. I plan to add more later.

Agreed, sorry, I overlooked the new. I am used to have constants first and
methods next :-), and I would prefer to keep them in the same order in the list
as in the modules, so that's the reason I changed this.

> 
> >+sub sql_string_concat {
> >+    my ($self, @params) = @_;
> >+    
> >+    return join(' || ', @params);
> >+}
> >+
> 
>   Are there no existing sql_ methods in this thing? I forget if this is where
> this ought to go -- see my patch on bug 284352; I'm working on adding a little
> bit of organization to Bugzilla::DB.

Currently, all sql_ methods are abstract, these are the first. My intention was
to have sql_ first, and bz_ next, but I didn't quite suceed :-). Fixed.
It's hard to keep them in order when there are several patches modifying the
same file :-).

> 
> >-             push(@supptables, "INNER JOIN longdescs $table ON bugs.bug_id " . 
> >-                               "= $table.bug_id");
> >+             push(@supptables, "INNER JOIN longdescs AS $table " .
> >+                               "ON bugs.bug_id = $table.bug_id");
> 
>   It looks like this change has nothing to do with this patch...?

Well, yes, you are right. It's just few lines above the change I was doing and I
think this is more readable. I will undo it if you insist :-).

> 
> >-             my $term1 = "MATCH($table.thetext) AGAINST(".&::SqlQuote($v).")";
> >-             my $term2 = "MATCH(bugs.short_desc) AGAINST(".&::SqlQuote($v).")";
> >+             my $term1 = $dbh->sql_fulltext_search("$table.thetext",
> >+                                                   ::SqlQuote($v));
> 
>   Nit: Put ${table}.thetext so it's clearer that "thetext" isn't part of the
> variable. :-)

This was just copy of the old code, but fixed anyway :-).

> 
> >+             # Generate the condition by running the operator-specific
> 
>   I think you mean "database-specific"?

I don't know what the original author meant, I was just re-wrapping the lines to
make them reasonably short :-). Again, will undo if you want.

> 
> >+             $ff = $dbh->sql_string_concat("$flagtypes.name", "$flags.status");
> 
>   Nit: Same thing there, about the {}.

Done.

> 
> >+    # This is a quick hack to get at least some searching with postgres
> >+    # working. We need to revisit this and implement proper full text
> >+    # search (probably employing the tsearch2 contrib module).
> >+    return "$column ~* $text";
> 
>   That should be LIKE "%$text%" -- it's much faster, usually. Or maybe ILIKE,
> if the match is supposed to be case-insensitive.

Well, that's not so simple. The $text is already quoted, so you will need to use
string concatenation to add the % marks. I think regexp is simpler, and we need
to put proper full text there as soon as posible anyway. But I will change it if
you insist hard enough :-).

> 
> >+    return 'CAST(' . join(' AS text) || CAST(', @params) . ' AS text)';
> 
>   Can you not use ::text for some reason, instead of CAST?
> 

Yes, I can, but according to Postgres doco, CAST is the standart and prefered
way of diong it, ::text is there just for historical reasons. I prefer not to
use obsoleted stuff :-). At the end it will produce the same result and I am
sure there is no performance difference :-).

>   Also, you didn't join @params, so that will just give us something like
> ARRAY. We probably need to foreach over @params there, I'd think.

I did, look again. It's tested and it works :-).
(Reporter)

Comment 15

13 years ago
OK.

Yes, please change it from a regex to an ILIKE, even if it takes string
contatenation.
(Assignee)

Comment 16

13 years ago
Created attachment 176337 [details] [diff] [review]
V1.3

Ok, the Pg full text search now splits the phrase we are looking for up to
separate words and searches for all of them using ILIKE. It's probably the
closest we can get to full blown full text search without actually using one
:-). Do not expect this to be fast, though.
Attachment #176331 - Attachment is obsolete: true
Attachment #176337 - Flags: review?(mkanat)
(Reporter)

Comment 17

13 years ago
Comment on attachment 176337 [details] [diff] [review]
V1.3

$text is already sql-quoted. :-)
Attachment #176337 - Flags: review?(mkanat) → review-
(Assignee)

Comment 18

13 years ago
Created attachment 176342 [details] [diff] [review]
V2

Another iteration. The simulated full text search is now fully ANSI SQL to
provide fall back for any DB not supporting full text search natively, and as
such has been moved to DB.pm. Problem with quoting fixed.
Attachment #176337 - Attachment is obsolete: true
Attachment #176342 - Flags: review?
(Reporter)

Comment 19

13 years ago
Comment on attachment 176342 [details] [diff] [review]
V2

>+    # convert all words to lowercase
>+    my @lower_words = map(lc, @words);

  I think that has to be \&lc, doesn't it?

  Otherwise, this looks fine. :-)
Attachment #176342 - Flags: review? → review+
(Assignee)

Comment 20

13 years ago
(In reply to comment #19)
> (From update of attachment 176342 [details] [diff] [review] [edit])
> >+    # convert all words to lowercase
> >+    my @lower_words = map(lc, @words);
> 
>   I think that has to be \&lc, doesn't it?
> 
>   Otherwise, this looks fine. :-)
> 

No, it's ok as it is. See http://perldoc.perldrunks.org/functions/map.html for
examples.
But as you suggested, it will be simpler to do lc on the whole text before the
split. I will change that...
(Assignee)

Comment 21

13 years ago
Created attachment 176352 [details] [diff] [review]
V2.1

Fixed the lower case conversion on the text, no other changes.
Attachment #176342 - Attachment is obsolete: true
Attachment #176352 - Flags: review?(mkanat)
(Reporter)

Comment 22

13 years ago
Comment on attachment 176352 [details] [diff] [review]
V2.1

Looks great! :-)
Attachment #176352 - Flags: review?(mkanat) → review+
(Reporter)

Updated

13 years ago
Flags: approval?
(Assignee)

Comment 23

13 years ago
Bugger, just realized another thing: this generic implementation of full text
search using LIKE will do the search, but the relevance computation which is
done later is broken. Search.pm is using the feature of MySQL which returns
relevance when you actually select the MATCH function (opposed to using it in
the WHERE clause). But if you do select on "text LIKE expr", you will get boolean.

The best solution I can think of right now is to enclose the generic
implementation in CASE statement returning 1 on match and 0 on no match. That
way we'll get relevance 0 or 1 - not perfect, but better than nothing.
(Reporter)

Comment 24

13 years ago
Can we just return relevance 0 or 1 just for the generic function, but keep 
everything else the same for MySQL?
(Assignee)

Comment 25

13 years ago
(In reply to comment #24)
> Can we just return relevance 0 or 1 just for the generic function, but keep 
> everything else the same for MySQL?

Yes, MySQL is not affected by this at all.
What I am just not sure is on Postgres (and other DBs), what will happen when
the integer result gets in the WHERE clause - will something like "SELECT * FROM
bugs WHERE (CASE WHEN id > 100 THEN 1 ELSE 0)" work? Does WHERE accept integer
and treat is boolean, or will it break?

I will have to test it tonight.
(Assignee)

Comment 26

13 years ago
Created attachment 176558 [details] [diff] [review]
V2.2

Ok, here is a version I believe should work correctly - always. I modified the
generic full text search function to return 1 or 0 as a result (MySQL is
returning floating point relevance, so the meaning is similar), and the part in
Search.pm which is going to WHERE clause is changed to result in boolean (as
Posgtres requires). I believe that even the relevance as computed over all
comments will make some sense now :-).
No other changes were done (except for un-bitrotting).
Attachment #176352 - Attachment is obsolete: true
Attachment #176558 - Flags: review?(mkanat)
(Reporter)

Comment 27

13 years ago
Comment on attachment 176558 [details] [diff] [review]
V2.2

OK, this looks good to me. If it causes any real trouble on PostgreSQL, we can
work on that when we have a running PostgreSQL Bugzilla. :-)
Attachment #176558 - Flags: review?(mkanat) → review+
(Assignee)

Comment 28

13 years ago
Created attachment 176706 [details] [diff] [review]
V2.3

Ooops, typo in the postgres full text search. But now it's actually tested on
both postgres and MySQL :-).
Attachment #176558 - Attachment is obsolete: true
Attachment #176706 - Flags: review?(mkanat)
(Reporter)

Updated

13 years ago
Attachment #176706 - Flags: review?(mkanat) → review+
Flags: approval? → approval+
(Reporter)

Comment 29

13 years ago
Checking in Bugzilla/DB.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB.pm,v  <--  DB.pm
new revision: 1.27; previous revision: 1.26
done
Checking in Bugzilla/Search.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Search.pm,v  <--  Search.pm
new revision: 1.88; previous revision: 1.87
done
Checking in Bugzilla/DB/Mysql.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Mysql.pm,v  <--  Mysql.pm
new revision: 1.4; previous revision: 1.3
done
Checking in Bugzilla/DB/Pg.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Pg.pm,v  <--  Pg.pm
new revision: 1.3; previous revision: 1.2
done
Status: ASSIGNED → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.