Create a database abstraction layer

RESOLVED DUPLICATE of bug 237862

Status

()

enhancement
RESOLVED DUPLICATE of bug 237862
16 years ago
14 years ago

People

(Reporter: anthony, Assigned: mkanat)

Tracking

Details

(Reporter)

Description

16 years ago
I've been giving some thought to this, because I'm developing a new
web-database application (in which, of course, I'm wildly stealing ideas from
Bugzilla :-). The summary says it all. We currently have people wanting to run
Bugzilla on PostgreSQL (bug 98304), Sybase (bug 173130), Oracle (bug 189947),
and of course MySQL. So here's a bunch of extra-crazy ideas which raise some
issues, but which I think are worth considering because they could be worked
into something good.

Essentially the idea is that instead of

 SendSQL("INSERT INTO mytab (cola, colb, colc) VALUES ($vala, $valb, $valc)");

you'd write

  db_insert('mytable', ['cola', 'colb', 'colc'], [vala, valb, valc]); 

Now of course this is the basic idea, and of course it is really much harder
than that. But I think that it might be not as hard as it initially seems. OK,
db_insert needs to know the data types (so that it quotes the values, or
converts them if they are dates). You can give it a third list, with the data
types.  But this might not be needed; db_insert might already know the data
types. How?  By having learnt about the database structure at the beginning of
execution. This is explained in bug 146679, so I'll only give a coding example
here. Here's the definition of the users table from checksetup.pl:

$table{profiles} =
   'userid mediumint not null auto_increment primary key,
    login_name varchar(255) not null,
    cryptpassword varchar(34),
    realname varchar(255),
    disabledtext mediumtext not null,
    mybugslink tinyint not null default 1,
    emailflags mediumtext,
    refreshed_when datetime not null,
    unique(login_name)';

Here, by contrast, is the definition of the users table in my application, sorry
it's Python:

tables = [
    ...

    db.DbTable("users", "Registered users",
        (("id",         "integer",      NOTNULL,  PK,   NOFK,   SEQ),
         ("login",      "varchar(16)",  NOTNULL),
         ("password",   "varchar(28)",  NOTNULL),
         ("full_name",  "varchar(64)",  WITHNULL),
         ("user_type",  "smallint",     NOTNULL, NOPK,  "user_types(id)"),
         ("remarks",    "text",         WITHNULL))),

   ...
]

We need not explain all details; you get the idea: instead of writing SQL, we
make a table describing the table, and our abstraction mechanism will then have
to convert that into SQL. The tables definition goes out of checksetup.pl into
a module which is imported by checksetup.pl and by the rest of Bugzilla.
Result: Bugzilla is always aware of the database structure.

So far so good. We can insert rows. We can create tables. We can update the
database on version upgrades more elegantly:

  foreach my $table (@tables) {
    if ($table->exists) {
      $table->check_columns;
    } else {
      $table->create;
    }
  }

But what about selects? Selecting is the hardest part. Let's give it a try:

  db_select(list_of_expressions,
            list_of_inner_joined_tables,
            description_of_outer_joins, # this mustn't be too hard
            conditions                  # (esp when compared to this one :-)
  );

The conditions is what must be converted to a where clause. A where clause is
simple expressions ANDed and ORed together. The conditions could be specified
as a tree, the leaves of which would be the expressions and the way of
connecting them would show whether they should be ANDed or ORed. Alternatively,
since in all SQLs I guess that the syntax of the ANDs, ORs, and brackets is
identical, we could manually join the expressions together in a string. The
tree is probably more complex, but it is a generalization of the @specialchart
used in Bugzilla::Search, and might lead to better query construction code.

OK, but how do I specify an expression (either for the SELECT list of
expressions or for the conditions)? I'd guess that defining another data
structure might really start becoming overkill, so expressions could be
constructed as strings, the way proposed by David Miller with DBCompat.pm (bug
173130). (We can forget about subselects for the time being, since we don't use
them.)

This was ideas on the interface. What about the implementation? This can be
done either as in DBCompat.pm, which is simple but has support for all RDBMS's
in the same file in the form of "if ... elsif ... elsif ...", or it could be
done using ideas such as DBIx::AnyDBD, which keeps RDBMS interfaces separate
but adds yet another abstraction layer.

Oh, and I just found out I forgot about GROUP BY and ORDER BY :-), but if we
can do the rest, we can do these as well.

See also bug 104682, bug 131136, bug 174295, bug 204217, bug 98446, and the
DBIx::Abstract module.
(Reporter)

Comment 1

16 years ago
4 months later...

The disillusionment :-)

What has happened during the last 4 months is that I've been laying the
foundation to the Python web application I was talking about above. I mostly did
database design. I also read Fabian Pascal's book ("Practical Issues in Database
Management: A Reference for the Thinking Practitioner").

First, describing the schema in a hash or other Perl or Python data structure
results, as I found out, in code that is much more difficult for your eye to
parse than are SQL DDL statements. Rather than doing this, it may be better to
write a database creation script using CREATE TABLE statements, and write a Perl
script to parse the SQL and create the data structures.

Second, it is obvious that the whole idea is, to a certain extent, absurd: there
exists, already, an abstraction layer, namely SQL, but sadly each RDBMS has a
different SQL, which makes us want to build another abstraction layer on top of
what should be the abstraction layer.

Third, if you care about enforcing integrity at the database level, you have to
write not only the primary key, unique, foreign key, and check constraints; you
have to write quite a few triggers. In addition, if you use object-oriented
concepts in the database design, you are likely to have supertypes and subtypes,
and you'll need views. These are things that you either can't do in our
abstraction layer, or that are extremely hard to do. I can't see any way to
abstract database functions; you'd have to write them for each database.

Fourth, RDBMS's differ not only in the interface, but in the functionality as
well. If an application is to be portable across a number of RDBMS's, it has to
use the lowest common denominator of these RDBMS's, and if your database is
simple enough to use that, you might well use BerkeleyDB or something, to make
it neat and easy to install and administer. If you really need an RDBMS, it's
probably more reasonable to choose one that fits your needs and stick to that.

I did considerable work in order to write a partly functioning abstraction layer
in Python, but I abandoned it a week ago as I came to realize all the above.

The bottom line is that if Bugzilla's database is simple enough to be made
portable, it might be easier and more reasonable to use BerkeleyDB and write the
extra code needed to do all required queries, than to create an abstraction
layer, which would require, I estimate, several weeks of full time work, the
result of which would be of doubtful theoretical and practical value.

So, in the short term, I propose that we implement RDBMS support with less
extreme solutions, such as bug 131136.

Suggest WONTFIX.
Assignee: justdave → mkanat
QA Contact: mattyt-bugzilla → default-qa
(Assignee)

Comment 2

14 years ago
Yeah, I considered doing something like this, too. In the end, what we ended up
doing was the current Bugzilla::DB structure, along with Bugzilla::DB::Schema.

I think it works pretty well.

*** This bug has been marked as a duplicate of 237862 ***
Status: NEW → RESOLVED
Last Resolved: 14 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.