Open Bug 173130 (bz-sybase) Opened 22 years ago Updated 16 years ago

Allow Bugzilla to work with Sybase

Categories

(Bugzilla :: Bugzilla-General, enhancement, P4)

enhancement

Tracking

()

People

(Reporter: myk, Unassigned)

References

(Depends on 3 open bugs)

Details

Attachments

(3 files)

Bugzilla should support Sybase databases.
See also bug 98304 (PostgreSQL support).
Blocks: bz-zippy
Depends on: 156834
Alias: bz-sybase
Depends on: 180086
Depends on: 153129
Depends on: 182136
Depends on: 174295
Summary: support Sybase → Allow Bugzilla to work with Sybase
Depends on: 190432
See also Bug# 189947 (Oracle support)
Depends on: 196101
Attached patch Search.pm modifications — — Splinter Review
This may be worth moving to another bug and checking in separately, but we can
probably decide that later.  Wanted to put this somewhere that people could
look at it.  This attachment is the changes I had to make to Search.pm to get
buglists to work correctly on Sybase.  And this code happens to work on MySQL,
too :)	BBaetz tells me this SQL will suck on Postgres prior to 7.4 though.

The primary changes here is that Sybase performs explicit joins (those declared
using the word JOIN) first before it performs implicit joins (those with
commas).  This causes any LEFT JOIN that includes the bugs table to fail,
because bugs is listed first, and there's a comma between bugs and the next
mentioned table in better than 90% of the queries.  By changing all of the
comma joins to be INNER JOIN, the joins are performed in order, and the
bugs.bug_id column will already exist in the prepared table when the following
joins against it happen.

Also, ANSI SQL says if we use GROUP BY, that we have to include every column
mentioned in the SELECT part in the GROUP BY clause.  Sybase won't let you put
more than 31 columns in a GROUP BY, and also has temporary table size
limitations with GROUP BY that are pretty darn restrictive.  So the "GROUP BY
bugs.bug_id" has been replaced with "DISTINCT bugs.bug_id" at the beginning of
the SELECT part.

The join ordering may have other side effects (like if a LEFT JOIN happens
before a later INNER JOIN perhaps?) but we haven't run into any problems yet on
Sybase at least.  All of the columns come back correctly, and the queries seem
to product the same results as the old version.  I'd love it if anyone could
prove whether those possible side effects would really happen or not.
CCing other people that are working on cross-DB support for comment
actually, forget what I said about the order mattering - I don't think it does,
since the USING/ON constrains what we're joining against

-        $query .= " LEFT JOIN cc ON cc.bug_id = bugs.bug_id AND cc.who =
$::userid ";
+        $query .= " LEFT JOIN cc ON bugs.bug_id = cc.bug_id AND cc.who =
$::userid ";

Why did you have to change that ordering arround?
>-        $query .= " LEFT JOIN cc ON cc.bug_id = bugs.bug_id AND cc.who =
$::userid ";
>+        $query .= " LEFT JOIN cc ON bugs.bug_id = cc.bug_id AND cc.who =
$::userid ";
>
>Why did you have to change that ordering arround?

Because I've had experience in some places where it'll choke if the column
references on the left side of the comparison isn't already in the prepared table.
This is the *original* conversion of the Schema to Sybase syntax that we did
back in December.  There's been a lot of changes to it since then, but getting
the current schema coerced back into something I can post publicly would take a
while, and I'm mainly posting this for the benefit of someone asking a question
about it on the developers mailing list.
Attached file Bugzilla/DBCompat.pm —
This attachment is the DB Compatibility module we did.	There are a couple
usage examples at the bottom, but I suck at POD docs, so the documentation is
pretty incomplete. :)
See bug 98304, attachment 130979 [details] for an updated DBCompat.pm with more 
documentation.
Depends on: bz-dbschema
Depends on: 201030
Depends on: bz-dbinstall
Depends on: 248001
Depends on: 250832
Depends on: 250547
Also see bug 251960 with very similar patch, working on Postgres.
I have compared the first patch (Search.pm) against the attachment 153568 [details] [diff] [review] in bug
251960, and they are different only in:
 - attachment 153568 [details] [diff] [review] is against a bit newer sources, making it easier to apply
 - attachment 153568 [details] [diff] [review] is nicer formatted ;-)
 - attachment 153568 [details] [diff] [review] does not address the GROUP BY vs DISTINCT problem.

IMHO the best move would be to go for attachment 153568 [details] [diff] [review] in bug 251960 and to
move the GROUP BY vs DISTINCT discussion to other bug, probably bug 174295.
I have just looked at the DISTINCT vs GROUP BY problem from the PostgreSQL point
of view and, unfortunatelly, this does not work here. As soon as you do JOIN,
you have to have all columns in GROUP BY, except for columns used only in
aggregate function call (if I understand correctly ANSI SQL 92, which PostgreSQL
implements in this case). So we need GROUP BY for all columns for PostgreSQL. If
Sybase can't do it, we need special case either for Sybase or PostgreSQL,
probably in DBCompat.

Suggestions?
I think we are going to need an SQL object that lets us add selected fields, add
selected aggregate fields, add group-by fields that we mean, and then let it use
its knowledge of the database backend to decide which of the other fields need
to be added to the group-by.  Essentially, SQL code generation will need to have
an abstraction layer that knows which of the selected fields need to be
auto-appended to the group-by condition.

Similarly, this will have to understand where it can use aliases and where it
needs to look bak to the original definition of the alias and inline it. etc...
Depends on: 253357
Depends on: 253360
Actually, ignore my previous post (comment #12), I was wrong. If I use SELECT
DISTINCT and remove the GROUP BY, it mostly works even with Postgres. Problems
starts when HAVING kicks in, e.g. if you specify boolean operation "flag is not
xxxx". Does this work on Sybase?
I still hope we can come up with a query which just works, without the need for
another abstraction object, which would be probably pretty complex and error prone.
I can't find any way how to get rid of the HAVING clause, and with it Postgres
refuses to run the query without all columns in the GROUP BY.
Solution would be to use subselects (I got it actually working with subselect).
What is current opinion on this? Are we going to require MySQL 4.1 for 2.20? Can
we start using subselects?
2.20 will still support mysql3 but large sites will be expected to be on mysql4
for performance.  SQL that mysql3 does not optimize well will be permitted.

2.22 will probably require mysql4
(In reply to comment #14)
> Actually, ignore my previous post (comment #12), I was wrong. If I use SELECT
> DISTINCT and remove the GROUP BY, it mostly works even with Postgres. Problems
> starts when HAVING kicks in, e.g. if you specify boolean operation "flag is not
> xxxx". Does this work on Sybase?

I'm having difficulty following your train of thought. What are you asking about Sybase? If you're asking 
if it's legal in Sybase to specify a HAVING clause without a GROUP BY clause, then, yes, it is, but, it's not 
very common.

> I still hope we can come up with a query which just works, without the need for
> another abstraction object, which would be probably pretty complex and error prone.

Me too.
The way to do this sort of thing is with subselects. Depending on exactly what
you're trying to do, you may need to use an 'inline view' (ie SELECT * from a,
(SELECT * from b ....) AS c WHERE ....), or have an IN or EXISTs subselect. Or
possibly a correlated subquery.

Exactly which one is better to use depends on exactly what you're trying to do
in a particular situation.

Which is another reason why DB abstraction layers inherently suck. :)
I'm not suggesting a universal abstraction.  However, as we assemble sql
statements if we build the parts up using something slightly more insightful
than text concatenation, when we get to assembling the whole statement, we can
adjust the final assembly of that particular type of statement to suit the
various databases.

For the example at hand, this may be a matter of pushing select fields into a
list and group fields (that we really mean) onto a list and then having a common
function that takes the 2 lists and returns the GROUP BY clause.

Incidentally, if I'm not mistaken, HAVING is death on query performance, though
optimizers have gotten a lot better since I first adopted that dogma.

HAVING isn't always the best thing to do, but its better than doing it yourself
in perl.

My point is that abstracting SQL away to that level of detail lose flexability.
Can't we just drop mysql3 support, and get rid of 90% of this.
Another option would be to have two paths, one doing it "the right way (tm)",
i.e. with subselects, and having some fallback for DBs which does not support it
yet (MySQL3). Once MySQL4 is widely deployed and we stop supporting it, we can
remove this "legacy" stuff. In Search.pm, it would affect just two places which
currently use HAVING, search for "flag is not" and search for "percent complete".
I know it's close to hack and can complicate maintenance, but it's probably
still better than abstraction layer.
BTW, MySQL4.0 is AFAIK still not enough for subselects, 4.1 is required. So we
still probably need a version of Search.pm without subselects for a while :-(.
(In reply to comment #17)
> (In reply to comment #14)
> > Actually, ignore my previous post (comment #12), I was wrong. If I use SELECT
> > DISTINCT and remove the GROUP BY, it mostly works even with Postgres. Problems
> > starts when HAVING kicks in, e.g. if you specify boolean operation "flag is not
> > xxxx". Does this work on Sybase?
> 
> I'm having difficulty following your train of thought. What are you asking
about Sybase? If you're asking 
> if it's legal in Sybase to specify a HAVING clause without a GROUP BY clause,
then, yes, it is, but, it's not 
> very common.

I was actually asking whether anyone tried doing search for a bug not having
specified flag on Sybase with this change applied, i.e. to excercise the query
with HAVING part, and with GROUP BY replaced by DISTINCT. But if Sybase supports
HAVING without GROUP BY, it will probably work...
(In reply to comment #15)
> I can't find any way how to get rid of the HAVING clause, and with it Postgres
> refuses to run the query without all columns in the GROUP BY.

I'm sorry... Could you post a complete example of such a query?
(In reply to comment #23)
> (In reply to comment #15)
> > I can't find any way how to get rid of the HAVING clause, and with it Postgres
> > refuses to run the query without all columns in the GROUP BY.
> 
> I'm sorry... Could you post a complete example of such a query?

Sure. Here is relevant part of the query Search.pm generate when searching for
all bugs which do not have review flag set:

SELECT bugs.bug_id, SUM(CASE WHEN (flagtypes_0.name || flags_0.status) IS NOT
NULL THEN 1 ELSE 0 END) AS allflags_0, SUM(CASE WHEN (flagtypes_0.name ||
flags_0.status) != 'review' THEN 1 ELSE 0 END) AS matchingflags_0 FROM bugs LEFT
JOIN flags AS flags_0 ON (bugs.bug_id = flags_0.bug_id AND flags_0.is_active =
1) LEFT JOIN flagtypes AS flagtypes_0 ON (flags_0.type_id = flagtypes_0.id)
GROUP BY bugs.bug_id HAVING SUM(CASE WHEN (flagtypes_0.name || flags_0.status)
IS NOT NULL THEN 1 ELSE 0 END) = SUM(CASE WHEN (flagtypes_0.name ||
flags_0.status) != 'review' THEN 1 ELSE 0 END);

Trouble is with the "GROUP BY bugs.bug_id". If you specify it, postgres wants to
have all other columns which appear in the query in the GROUP BY too. Which
means some twenty or thirty plus columns for full query. That seems to upset
sybase, which have limit on number of GROUP BY parameters. If you leave it out
completely, postgres complains that it's not there.

It seems that the HAVING is not the only problem, if I remove it and leave just
the select, I still have to provide GROUP BY... I have very vague idea why is it
so, and no idea how to fix it without having special code for different DBs :-(.
Use a subselect instead. It becomes much clearer, cleaner, and quicker.
(In reply to comment #25)
> Use a subselect instead. It becomes much clearer, cleaner, and quicker.

If MySQL 4.0+ is now required (which I assume this is the case now with 2.19+)
then you can do that otherwise, if will be another if ($db_driver eq 'mysql') {}
elsif ($db_driver eq 'EverythingElse') {}.

(In reply to comment #26)
> (In reply to comment #25)
> > Use a subselect instead. It becomes much clearer, cleaner, and quicker.
> 
> If MySQL 4.0+ is now required (which I assume this is the case now with 2.19+)
> then you can do that otherwise, if will be another if ($db_driver eq 'mysql') {}
> elsif ($db_driver eq 'EverythingElse') {}.
> 
> 

I would love to use them, but, unfortunatelly, subselects starts at MySQL 4.1.
So we can't use them yet, at least no without the mentioned if{}else{} (which
make everything even worse).
For 2.20, the current plan is that we can use SQL that is not optimal for MySQL3
but still works.   Anybody who complains about performance can be pushed to
MySQL4, but we have not agreed to break compatibility with MySQL3.
Yes, it will need ot be conditional, and also mysql4 needs to be tested for
speed - the curent query was very much tuned for it and its
one-index-per-table-per-query restriction.
(In reply to comment #28)
> For 2.20, the current plan is that we can use SQL that is not optimal for MySQL3
> but still works.   Anybody who complains about performance can be pushed to
> MySQL4, but we have not agreed to break compatibility with MySQL3.

Bug 204217 is about requiring MySQL 4.0, which I think we should do in the 2.19
timeframe.
(In reply to comment #30)
> Bug 204217 is about requiring MySQL 4.0, which I think we should do in the 2.19
> timeframe.

You realize we're freezing for 2.20 in 2 weeks, right?  Might be better to hold
off on that for 2.21/2.22.  I doubt the freeze period for 2.20 will be anywhere
near as long as 2.18 took.
>You realize we're freezing for 2.20 in 2 weeks, right?  Might be better to hold
>off on that for 2.21/2.22.  I doubt the freeze period for 2.20 will be anywhere
>near as long as 2.18 took.

Sure, makes sense.
Depends on: 280493
Depends on: 283076
Depends on: 285547
Reassigning bugs that I'm not actively working on to the default component owner
in order to try to make some sanity out of my personal buglist.  This doesn't
mean the bug isn't being dealt with, just that I'm not the one doing it.  If you
are dealing with this bug, please assign it to yourself.
Assignee: justdave → general
QA Contact: mattyt-bugzilla → default-qa
Severity: normal → enhancement
Priority: -- → P4
You need to log in before you can comment on or make changes to this bug.