Last Comment Bug 173130 - (bz-sybase) Allow Bugzilla to work with Sybase
(bz-sybase)
: Allow Bugzilla to work with Sybase
Status: NEW
:
Product: Bugzilla
Classification: Server Software
Component: Bugzilla-General (show other bugs)
: unspecified
: All All
: P4 enhancement with 6 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
: default-qa
Mentors:
Depends on: 201030 283076 285547 bz-dbschema 153129 156834 174295 180086 182136 190432 196101 248001 bz-dbinstall 250547 250832 253357 253360 280493
Blocks: bz-zippy
  Show dependency treegraph
 
Reported: 2002-10-07 14:17 PDT by Myk Melez [:myk] [@mykmelez]
Modified: 2008-08-24 04:37 PDT (History)
15 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Search.pm modifications (15.57 KB, patch)
2003-03-07 19:38 PST, Dave Miller [:justdave] (justdave@bugzilla.org)
no flags Details | Diff | Splinter Review
original Sybase schema conversion (16.56 KB, text/plain)
2003-06-10 10:54 PDT, Dave Miller [:justdave] (justdave@bugzilla.org)
no flags Details
Bugzilla/DBCompat.pm (5.20 KB, text/plain)
2003-06-12 17:13 PDT, Dave Miller [:justdave] (justdave@bugzilla.org)
no flags Details

Description Myk Melez [:myk] [@mykmelez] 2002-10-07 14:17:30 PDT
Bugzilla should support Sybase databases.
Comment 1 Myk Melez [:myk] [@mykmelez] 2002-10-07 14:18:09 PDT
See also bug 98304 (PostgreSQL support).
Comment 2 krishna 2003-01-29 06:25:06 PST
See also Bug# 189947 (Oracle support)
Comment 3 Dave Miller [:justdave] (justdave@bugzilla.org) 2003-03-07 19:38:46 PST
Created attachment 116610 [details] [diff] [review]
Search.pm modifications

This may be worth moving to another bug and checking in separately, but we can
probably decide that later.  Wanted to put this somewhere that people could
look at it.  This attachment is the changes I had to make to Search.pm to get
buglists to work correctly on Sybase.  And this code happens to work on MySQL,
too :)	BBaetz tells me this SQL will suck on Postgres prior to 7.4 though.

The primary changes here is that Sybase performs explicit joins (those declared
using the word JOIN) first before it performs implicit joins (those with
commas).  This causes any LEFT JOIN that includes the bugs table to fail,
because bugs is listed first, and there's a comma between bugs and the next
mentioned table in better than 90% of the queries.  By changing all of the
comma joins to be INNER JOIN, the joins are performed in order, and the
bugs.bug_id column will already exist in the prepared table when the following
joins against it happen.

Also, ANSI SQL says if we use GROUP BY, that we have to include every column
mentioned in the SELECT part in the GROUP BY clause.  Sybase won't let you put
more than 31 columns in a GROUP BY, and also has temporary table size
limitations with GROUP BY that are pretty darn restrictive.  So the "GROUP BY
bugs.bug_id" has been replaced with "DISTINCT bugs.bug_id" at the beginning of
the SELECT part.

The join ordering may have other side effects (like if a LEFT JOIN happens
before a later INNER JOIN perhaps?) but we haven't run into any problems yet on
Sybase at least.  All of the columns come back correctly, and the queries seem
to product the same results as the old version.  I'd love it if anyone could
prove whether those possible side effects would really happen or not.
Comment 4 Dave Miller [:justdave] (justdave@bugzilla.org) 2003-03-07 19:44:22 PST
CCing other people that are working on cross-DB support for comment
Comment 5 Bradley Baetz (:bbaetz) 2003-03-07 19:53:07 PST
actually, forget what I said about the order mattering - I don't think it does,
since the USING/ON constrains what we're joining against

-        $query .= " LEFT JOIN cc ON cc.bug_id = bugs.bug_id AND cc.who =
$::userid ";
+        $query .= " LEFT JOIN cc ON bugs.bug_id = cc.bug_id AND cc.who =
$::userid ";

Why did you have to change that ordering arround?
Comment 6 Dave Miller [:justdave] (justdave@bugzilla.org) 2003-03-08 10:33:26 PST
>-        $query .= " LEFT JOIN cc ON cc.bug_id = bugs.bug_id AND cc.who =
$::userid ";
>+        $query .= " LEFT JOIN cc ON bugs.bug_id = cc.bug_id AND cc.who =
$::userid ";
>
>Why did you have to change that ordering arround?

Because I've had experience in some places where it'll choke if the column
references on the left side of the comparison isn't already in the prepared table.
Comment 7 Dave Miller [:justdave] (justdave@bugzilla.org) 2003-06-10 10:54:59 PDT
Created attachment 125323 [details]
original Sybase schema conversion

This is the *original* conversion of the Schema to Sybase syntax that we did
back in December.  There's been a lot of changes to it since then, but getting
the current schema coerced back into something I can post publicly would take a
while, and I'm mainly posting this for the benefit of someone asking a question
about it on the developers mailing list.
Comment 8 Dave Miller [:justdave] (justdave@bugzilla.org) 2003-06-12 17:13:21 PDT
Created attachment 125542 [details]
Bugzilla/DBCompat.pm

This attachment is the DB Compatibility module we did.	There are a couple
usage examples at the bottom, but I suck at POD docs, so the documentation is
pretty incomplete. :)
Comment 9 Jeroen Ruigrok van der Werven 2003-09-05 15:26:05 PDT
See bug 98304, attachment 130979 [details] for an updated DBCompat.pm with more 
documentation.
Comment 10 Tomas Kopal 2004-07-17 23:27:14 PDT
Also see bug 251960 with very similar patch, working on Postgres.
Comment 11 Tomas Kopal 2004-07-19 04:03:06 PDT
I have compared the first patch (Search.pm) against the attachment 153568 [details] [diff] [review] in bug
251960, and they are different only in:
 - attachment 153568 [details] [diff] [review] is against a bit newer sources, making it easier to apply
 - attachment 153568 [details] [diff] [review] is nicer formatted ;-)
 - attachment 153568 [details] [diff] [review] does not address the GROUP BY vs DISTINCT problem.

IMHO the best move would be to go for attachment 153568 [details] [diff] [review] in bug 251960 and to
move the GROUP BY vs DISTINCT discussion to other bug, probably bug 174295.
Comment 12 Tomas Kopal 2004-07-27 20:56:39 PDT
I have just looked at the DISTINCT vs GROUP BY problem from the PostgreSQL point
of view and, unfortunatelly, this does not work here. As soon as you do JOIN,
you have to have all columns in GROUP BY, except for columns used only in
aggregate function call (if I understand correctly ANSI SQL 92, which PostgreSQL
implements in this case). So we need GROUP BY for all columns for PostgreSQL. If
Sybase can't do it, we need special case either for Sybase or PostgreSQL,
probably in DBCompat.

Suggestions?
Comment 13 Joel Peshkin 2004-07-27 21:21:58 PDT
I think we are going to need an SQL object that lets us add selected fields, add
selected aggregate fields, add group-by fields that we mean, and then let it use
its knowledge of the database backend to decide which of the other fields need
to be added to the group-by.  Essentially, SQL code generation will need to have
an abstraction layer that knows which of the selected fields need to be
auto-appended to the group-by condition.

Similarly, this will have to understand where it can use aliases and where it
needs to look bak to the original definition of the alias and inline it. etc...
Comment 14 Tomas Kopal 2004-07-27 23:31:19 PDT
Actually, ignore my previous post (comment #12), I was wrong. If I use SELECT
DISTINCT and remove the GROUP BY, it mostly works even with Postgres. Problems
starts when HAVING kicks in, e.g. if you specify boolean operation "flag is not
xxxx". Does this work on Sybase?
I still hope we can come up with a query which just works, without the need for
another abstraction object, which would be probably pretty complex and error prone.
Comment 15 Tomas Kopal 2004-07-28 00:37:45 PDT
I can't find any way how to get rid of the HAVING clause, and with it Postgres
refuses to run the query without all columns in the GROUP BY.
Solution would be to use subselects (I got it actually working with subselect).
What is current opinion on this? Are we going to require MySQL 4.1 for 2.20? Can
we start using subselects?
Comment 16 Joel Peshkin 2004-07-28 02:49:34 PDT
2.20 will still support mysql3 but large sites will be expected to be on mysql4
for performance.  SQL that mysql3 does not optimize well will be permitted.

2.22 will probably require mysql4
Comment 17 Ed Sabol 2004-07-28 02:52:01 PDT
(In reply to comment #14)
> Actually, ignore my previous post (comment #12), I was wrong. If I use SELECT
> DISTINCT and remove the GROUP BY, it mostly works even with Postgres. Problems
> starts when HAVING kicks in, e.g. if you specify boolean operation "flag is not
> xxxx". Does this work on Sybase?

I'm having difficulty following your train of thought. What are you asking about Sybase? If you're asking 
if it's legal in Sybase to specify a HAVING clause without a GROUP BY clause, then, yes, it is, but, it's not 
very common.

> I still hope we can come up with a query which just works, without the need for
> another abstraction object, which would be probably pretty complex and error prone.

Me too.
Comment 18 Bradley Baetz (:bbaetz) 2004-07-28 03:31:11 PDT
The way to do this sort of thing is with subselects. Depending on exactly what
you're trying to do, you may need to use an 'inline view' (ie SELECT * from a,
(SELECT * from b ....) AS c WHERE ....), or have an IN or EXISTs subselect. Or
possibly a correlated subquery.

Exactly which one is better to use depends on exactly what you're trying to do
in a particular situation.

Which is another reason why DB abstraction layers inherently suck. :)
Comment 19 Joel Peshkin 2004-07-28 04:13:00 PDT
I'm not suggesting a universal abstraction.  However, as we assemble sql
statements if we build the parts up using something slightly more insightful
than text concatenation, when we get to assembling the whole statement, we can
adjust the final assembly of that particular type of statement to suit the
various databases.

For the example at hand, this may be a matter of pushing select fields into a
list and group fields (that we really mean) onto a list and then having a common
function that takes the 2 lists and returns the GROUP BY clause.

Incidentally, if I'm not mistaken, HAVING is death on query performance, though
optimizers have gotten a lot better since I first adopted that dogma.

Comment 20 Bradley Baetz (:bbaetz) 2004-07-28 04:31:13 PDT
HAVING isn't always the best thing to do, but its better than doing it yourself
in perl.

My point is that abstracting SQL away to that level of detail lose flexability.
Can't we just drop mysql3 support, and get rid of 90% of this.
Comment 21 Tomas Kopal 2004-07-28 05:25:11 PDT
Another option would be to have two paths, one doing it "the right way (tm)",
i.e. with subselects, and having some fallback for DBs which does not support it
yet (MySQL3). Once MySQL4 is widely deployed and we stop supporting it, we can
remove this "legacy" stuff. In Search.pm, it would affect just two places which
currently use HAVING, search for "flag is not" and search for "percent complete".
I know it's close to hack and can complicate maintenance, but it's probably
still better than abstraction layer.
BTW, MySQL4.0 is AFAIK still not enough for subselects, 4.1 is required. So we
still probably need a version of Search.pm without subselects for a while :-(.
Comment 22 Tomas Kopal 2004-07-28 05:33:37 PDT
(In reply to comment #17)
> (In reply to comment #14)
> > Actually, ignore my previous post (comment #12), I was wrong. If I use SELECT
> > DISTINCT and remove the GROUP BY, it mostly works even with Postgres. Problems
> > starts when HAVING kicks in, e.g. if you specify boolean operation "flag is not
> > xxxx". Does this work on Sybase?
> 
> I'm having difficulty following your train of thought. What are you asking
about Sybase? If you're asking 
> if it's legal in Sybase to specify a HAVING clause without a GROUP BY clause,
then, yes, it is, but, it's not 
> very common.

I was actually asking whether anyone tried doing search for a bug not having
specified flag on Sybase with this change applied, i.e. to excercise the query
with HAVING part, and with GROUP BY replaced by DISTINCT. But if Sybase supports
HAVING without GROUP BY, it will probably work...
Comment 23 Ed Sabol 2004-07-28 20:28:02 PDT
(In reply to comment #15)
> I can't find any way how to get rid of the HAVING clause, and with it Postgres
> refuses to run the query without all columns in the GROUP BY.

I'm sorry... Could you post a complete example of such a query?
Comment 24 Tomas Kopal 2004-08-30 06:02:52 PDT
(In reply to comment #23)
> (In reply to comment #15)
> > I can't find any way how to get rid of the HAVING clause, and with it Postgres
> > refuses to run the query without all columns in the GROUP BY.
> 
> I'm sorry... Could you post a complete example of such a query?

Sure. Here is relevant part of the query Search.pm generate when searching for
all bugs which do not have review flag set:

SELECT bugs.bug_id, SUM(CASE WHEN (flagtypes_0.name || flags_0.status) IS NOT
NULL THEN 1 ELSE 0 END) AS allflags_0, SUM(CASE WHEN (flagtypes_0.name ||
flags_0.status) != 'review' THEN 1 ELSE 0 END) AS matchingflags_0 FROM bugs LEFT
JOIN flags AS flags_0 ON (bugs.bug_id = flags_0.bug_id AND flags_0.is_active =
1) LEFT JOIN flagtypes AS flagtypes_0 ON (flags_0.type_id = flagtypes_0.id)
GROUP BY bugs.bug_id HAVING SUM(CASE WHEN (flagtypes_0.name || flags_0.status)
IS NOT NULL THEN 1 ELSE 0 END) = SUM(CASE WHEN (flagtypes_0.name ||
flags_0.status) != 'review' THEN 1 ELSE 0 END);

Trouble is with the "GROUP BY bugs.bug_id". If you specify it, postgres wants to
have all other columns which appear in the query in the GROUP BY too. Which
means some twenty or thirty plus columns for full query. That seems to upset
sybase, which have limit on number of GROUP BY parameters. If you leave it out
completely, postgres complains that it's not there.

It seems that the HAVING is not the only problem, if I remove it and leave just
the select, I still have to provide GROUP BY... I have very vague idea why is it
so, and no idea how to fix it without having special code for different DBs :-(.
Comment 25 Bradley Baetz (:bbaetz) 2004-08-30 06:11:36 PDT
Use a subselect instead. It becomes much clearer, cleaner, and quicker.
Comment 26 David Lawrence [:dkl] 2004-08-30 06:17:43 PDT
(In reply to comment #25)
> Use a subselect instead. It becomes much clearer, cleaner, and quicker.

If MySQL 4.0+ is now required (which I assume this is the case now with 2.19+)
then you can do that otherwise, if will be another if ($db_driver eq 'mysql') {}
elsif ($db_driver eq 'EverythingElse') {}.

Comment 27 Tomas Kopal 2004-08-30 06:42:36 PDT
(In reply to comment #26)
> (In reply to comment #25)
> > Use a subselect instead. It becomes much clearer, cleaner, and quicker.
> 
> If MySQL 4.0+ is now required (which I assume this is the case now with 2.19+)
> then you can do that otherwise, if will be another if ($db_driver eq 'mysql') {}
> elsif ($db_driver eq 'EverythingElse') {}.
> 
> 

I would love to use them, but, unfortunatelly, subselects starts at MySQL 4.1.
So we can't use them yet, at least no without the mentioned if{}else{} (which
make everything even worse).
Comment 28 Joel Peshkin 2004-08-30 06:55:33 PDT
For 2.20, the current plan is that we can use SQL that is not optimal for MySQL3
but still works.   Anybody who complains about performance can be pushed to
MySQL4, but we have not agreed to break compatibility with MySQL3.
Comment 29 Bradley Baetz (:bbaetz) 2004-08-30 14:00:32 PDT
Yes, it will need ot be conditional, and also mysql4 needs to be tested for
speed - the curent query was very much tuned for it and its
one-index-per-table-per-query restriction.
Comment 30 Myk Melez [:myk] [@mykmelez] 2004-08-31 12:46:01 PDT
(In reply to comment #28)
> For 2.20, the current plan is that we can use SQL that is not optimal for MySQL3
> but still works.   Anybody who complains about performance can be pushed to
> MySQL4, but we have not agreed to break compatibility with MySQL3.

Bug 204217 is about requiring MySQL 4.0, which I think we should do in the 2.19
timeframe.
Comment 31 Dave Miller [:justdave] (justdave@bugzilla.org) 2004-08-31 14:24:30 PDT
(In reply to comment #30)
> Bug 204217 is about requiring MySQL 4.0, which I think we should do in the 2.19
> timeframe.

You realize we're freezing for 2.20 in 2 weeks, right?  Might be better to hold
off on that for 2.21/2.22.  I doubt the freeze period for 2.20 will be anywhere
near as long as 2.18 took.
Comment 32 Myk Melez [:myk] [@mykmelez] 2004-08-31 15:12:44 PDT
>You realize we're freezing for 2.20 in 2 weeks, right?  Might be better to hold
>off on that for 2.21/2.22.  I doubt the freeze period for 2.20 will be anywhere
>near as long as 2.18 took.

Sure, makes sense.
Comment 33 Dave Miller [:justdave] (justdave@bugzilla.org) 2005-03-24 00:20:50 PST
Reassigning bugs that I'm not actively working on to the default component owner
in order to try to make some sanity out of my personal buglist.  This doesn't
mean the bug isn't being dealt with, just that I'm not the one doing it.  If you
are dealing with this bug, please assign it to yourself.

Note You need to log in before you can comment on or make changes to this bug.