Closed Bug 309663 Opened 19 years ago Closed 17 years ago

Oracle thinks that an empty string is a NULL value

Categories

(Bugzilla :: Database, enhancement)

2.21
enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Bugzilla 3.2

People

(Reporter: lance.larsh, Assigned: xiaoou.wu)

References

Details

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20041119
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20041119

In Oracle SQL, empty strings are considered NULL values.  A number of tables in
the current schema have NOT NULL constraints on columns which are routinely
populated with empty strings.  When using an Oracle database, these
INSERT/UPDATE statements fail because they violate the NOT NULL constraint.

In many cases, removing the NOT NULL constraint from affected columns may be the
most appropriate fix, but there may be cases where this isn't right.  Should we
file a separate bug for each case, or maybe just file a separate bug for the
ones where removing the NOT NULL won't work?

The examples I've found so far, just by running checksetup.pl:
   RESOLUTION.VALUE (this one will go away when bug 106589 gets fixed)
   PRODUCTS.MILESTONEURL
   GROUPS.USERREGEXP
   PROFILES.DISABLEDTEXT

Reproducible: Always

Steps to Reproduce:
(NOTE:  Testing obviously requires a database driver for the DB of interest, and
as of this moment the driver for Oracle isn't done yet since this bug blocks it.
 The steps below show how to test the code against any database for which a
driver is available.)
1. Run checksetup.pl on a clean installation.
2. Edit localconfig to set the db_* parameters to point to a fresh db with no
schema created yet.
3. Run checksetup.pl again (this time to create the schema).

Actual Results:  
As an example, checksetup.pl fails populating RESOLUTION.VALUE, with the error
below:
  checksetup.pl: DBD::Oracle::st execute failed: 
    ORA-01400: cannot insert NULL into ("BUGS"."RESOLUTION"."VALUE")
    (DBD ERROR: error possibly near <*> indicator at char 57 in
      'INSERT INTO resolution (value,sortkey,isactive)
       VALUES (:<*>p1,:p2,:p3)')
    [for Statement
      "INSERT INTO resolution (value,sortkey,isactive)
       VALUES (?,?,?)"
     with ParamValues: :p3=1, :p1='', :p2=100] at ./checksetup.pl line 1775
  checksetup.pl:      
    main::PopulateEnumTable('resolution','','FIXED','INVALID','WONTFIX',
                            'LATER','REMIND','DUPLICATE','WORKSFORME',...)
    called at ./checksetup.pl line 1809


Expected Results:  
INSERTs/UPDATEs should succeed.
Blocks: bz-oracle
Version: unspecified → 2.21
OK, this is one I actually have an issue with. An empty string is *not* a NULL
value. There's no way to change this behavior in Oracle, per-database?

Removing the NOT NULL constraints globally is not acceptable. At the *absolute
worst*, we could have the Oracle driver automatically make all columns NULL
instead of NOT NULL.
Assignee: database → lance.larsh
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Don't insert empty strings into NOT NULL columns → Oracle thinks that an empty string is a NULL value
Or, we could override bz_alter_column_raw and bz_add_column to not create NOT
NULL constraints ever, on text/varchar/char fields.

This is really just so... silly. There's no way to make it behave ANSI-standard?
Yeah, this issue can be a real headache.  Unfortunately there's no way to change
that behavior in Oracle.

(In reply to comment #2)
> Or, we could override bz_alter_column_raw and bz_add_column to not create NOT
> NULL constraints ever, on text/varchar/char fields.

Removing the NOT NULL constraint just for Oracle won't really work, because some
queries would end up returning different results on Oracle than on other
databases.  For example, if the app inserts '' into a row, the query "SELECT *
FROM mytable WHERE textcol <> 'whatever'" would return that row on MySQL but
would not return that row on Oracle (since SQL's "<>" operator doesn't match
NULL values, and Oracle would consider '' as NULL).  To really make that work,
we'd also have to change the app to always insert NULL instead of '' (in
addition to removing the NOT NULL constraint).  Then queries would always return
the same results on all databases, but it could end up being a ton of work to
change the app.

Perhaps a better approach (which would be much less work) would be to leave the
NOT NULL constraint on the columns, but have the Oracle module use a trigger on
each column to change empty strings to a single space on INSERT/UPDATE.  A space
is still not the same as the empty string, but at least then I think Oracle's
behavior would match Sybase, SQL Server and DB2 in their handling of empty
strings.  It might even work to use DBI's ChopBlanks attribute to turn this
single space back into an empty string automatically on SELECT.
Oy, the headaches.

Yeah, I remember this problem quite well from the farce of a Sybase port. :)

We did the "make an empty string be a single space" thing, and just ran trim()
on everything that came out of the DB that could potentially be empty. (We do
this on half the fields already anyway, just extend it to the ones that
routinely deal in empty strings).

This would work well across all databases if we enforce it across the board.

Sybase actually converts an empty string to a space for you (in an attempt to be
ANSI-compat, since it knows an empty string isn't NULL, it just can't deal with
it internally).

OK. We need to come up with some way to handle this bug that doesn't impact
every Bugzilla developer for all of the future. Anything that would require us
to catch things in review won't work -- we'll just end up breaking Oracle
support all the time, and it'll be mysterious to our end users.

In order to do this... we may have to get into SQL parsing. That is, we may have
to detect every time we put an empty string into the database, and replace it
with a string containing a control character (some character with a low hex
value that will never show up normally). And every time we get that string back
out of the database, the driver should convert it to an empty string.

This could involve some interesting DBI work, but it would also be valuable code
for other databases.
Do we have anything that actually needs to care about the difference between an
empty string and a space?  Why not just use a space instead of some mysterious
control character?  Sybase actually converts empty strings to single spaces for
you, and I think Lance mentioned there was a trick that could be pulled
server-side on Oracle to make it do the same.  As long as we make it a hard and
fast rule to trim() anything that comes out of the database that shouldn't trip
anyone up.
I think the problem is that "making it a hard and fast rule" is something that a
reviewer has to do, and something we can't automatically enforce in the code.
That means we *could* miss it, and probably *will* at some point. And it's the
sort of bug that would be *extremely* difficult to track down once it started
occuring.

As it stands now, I can think of very few places where we trim data that comes
out of the DB. Usually we trim it before we put it in.
I know it's not pleasant to think about, but honestly the way to get maximum
portability of string handling across all databases is to avoid the use of empty
strings altogether, since I think all databases support NULL strings and
non-empty strings in a similar fashion.  It's only the handling of empty strings
that varies significantly.  But I won't bring it up again...  ;)

(In reply to comment #7)
> I think the problem is that "making it a hard and fast rule" is something that
> a reviewer has to do, and something we can't automatically enforce in the
> code.

If we go with the "convert empty string to single space" approach, can't we use
DBI's ChopBlanks attribute on the connection, so that DBI automatically trims
everything that it gets from the DB?  Then we don't have to worry about the
requiring the app's SQL to trim things.  Of course, this approach would cause
problems if there's anywhere that the app requires the leading/trailing space to
be preserved.

Another thing to worry about is that any database which converts empty strings
to single spaces will return different results from queries which use SQL
functions that compute a string's length (unless of course the query trims the
argument to the length function).  But this may not be a big deal because I'm
not sure the length functions have portable names anyway (meaning we'd likely be
avoiding those functions already).

Of course, we're still left with the hard-and-fast rule "don't rely on the
database preserving space".  But if we used ChopBlanks on all databases, at
least such problems would appear to all developers, no matter which db they're
using.
The only field I can think of where the leading and trailing spaces are
important is the comment field, and attachments (but attachments aren't text
anyhow).

However, ChopBlanks won't help. The docs say this:

"The ChopBlanks attribute can be used to control the trimming of trailing space
characters from fixed width character (CHAR) fields. No other field types are
affected, even where field values have trailing spaces."

I don't think we even have any fixed-width character columns.

I'm fairly sure the only way to handle this will be to do some DBI hacking. It
would be interesting to investigate how other cross-DB perl libraries (or
non-perl libraries) handle this particular problem.

I don't think it's really feasible to treat all empty strings as NULL. Somebody
could really want to enter an empty string into a field, and I think an empty
string is a valid value for some columns, and not for others.
(In reply to comment #8)
> Then we don't have to worry about the requiring the app's SQL to trim things. 

I meant in the perl after retrieving the values.

I don't really think this is anywhere near as big a deal as mkanat is thinking
it'll be.

I can think of two places in the existing code where we rely on a string coming
out of the database to be empty.

shutdownhtml, and disabledtext on useraccounts.

Fix those two places to trim() the data after it comes out of the database
before comparing (or better yet, make them both flags in addition to the text
fields, and don't compare the text as a means of telling whether it's active),
add the server-side trigger to enforce it on Oracle, and be done with it.

Yes, it might trip someone down the road, but realistically, what would you ever
do that requires telling the difference between an empty string and a non-empty
one in a database field where it doesn't make sense to allow nulls?  Although
the logic might trip someone who needs to set up something like that, the
chances of someone ever setting up something like that are almost nil.
Well, we're still going to need to hack DBI for data on the way in, or require
reviewers to always put in a single space when they mean to put in an empty
string. Isn't there an "is empty" Boolean Chart?

I'm concerned about overloading "a string with a single space" to mean "an empty
string that isn't NULL." It's not so much that I can point out a particular
problem with it at the moment (besides ambiguity), but that it requires us to
predict a future that we can't predict. That is, I really have no idea if
choosing to use a single space for an empty string will cause us problems in the
future.

The problem is two things, as far as I know:

(a) The fact that for Oracle, '' will never be matched by anything unless we use
IS NULL. That's the one we've been talking about, mostly.

(b) Lance: Does DBI take out an empty string from Oracle as an empty string, or
as "undef"? DBI treats NULL as "undef" and we rely on that behavior. We also
rely on it not to treat an empty string as "undef."

In general, the best solution that I can see is, indeed, as we have all
basically decided, to come up with some way to differentiate between NULL and an
empty string. I'm just trying to come up with the way that will be the most
automatic, and won't require attention from me or another reviewer. Reviewers
already have a lot of things to remember about database-compatibility, and they
slip by a lot of the time if I'm not reviewing the bug personally.
(In reply to comment #11)
> Well, we're still going to need to hack DBI for data on the way in, or require
> reviewers to always put in a single space when they mean to put in an empty
> string. Isn't there an "is empty" Boolean Chart?

No DBI hacking required, it's all done server-side.


> (b) Lance: Does DBI take out an empty string from Oracle as an empty string,
> or as "undef"? DBI treats NULL as "undef" and we rely on that behavior. We
> also rely on it not to treat an empty string as "undef."

There's no such thing as an empty string in Oracle, that's the point of this
discussion.  If it's actually an empty string, you'll get a NULL back, which
will be an undef to Perl.  The transition from empty string to a space happens
at insertion time, and is done automatically by the server (we don't have to
mess with it and neither does DBI) or at least the server can be configured to
do so.

> I'm just trying to come up with the way that will be the most
> automatic, and won't require attention from me or another reviewer.

Fix the two places that need to know the difference so they trim() before
comparison, and stop worrying about it. ;)

It's really a pretty rare situation.  The two places that depend on it now are
major hacks anyway, and probably shouldn't be depending on it at all to begin with.
(In reply to comment #12)
> (In reply to comment #11)
> > Well, we're still going to need to hack DBI for data on the way in, or require
> > reviewers to always put in a single space when they mean to put in an empty
> > string. Isn't there an "is empty" Boolean Chart?
> 
> No DBI hacking required, it's all done server-side.

Well, we CAN handle it server-side, as I originally suggested.  But this would
require creating a trigger (a db schema object) on every NOT NULL column, and it
could get messy managing that through schema upgrades.

A possibly better alternative I discovered yesterday is that DBD::Oracle already
has an option to automatically convert SQL parameters bound to empty Perl
strings to ' ' during execute().  This would allow DBD::Oracle to do the '' to '
' conversion for us without having to maintain all those triggers in the schema.
 The downside is that it only works for bound parameters, as in:

   my $sth = $dbh->prepare("INSERT INTO t (textcol) VALUES (?, ?)");
   $sth->execute("");

but does not catch embedded SQL string literals, as in:

  $dbh->do("INSERT INTO t (textcol) VALUES ('')");

Note that using this option would affect every SQL operation (SELECT, INSERT,
UPDATE, etc) using bound parameters.  It might require some thought to figure
out if that would cause problems or not.  My first guess is that it should work,
since all operations are transformed the same way (assuming you don't use SQL
string literals, that is).

> There's no such thing as an empty string in Oracle, that's the point of this
> discussion.

Exactly.  Think of it this way:  Oracle has no concept of an empty string- only
NULL strings and non-empty strings.  NULL strings in Oracle are handled by
Perl/DBI just like NULL strings from any other db (i.e., undef).
(In reply to comment #11)
> (a) The fact that for Oracle, '' will never be matched by anything unless we use
> IS NULL. That's the one we've been talking about, mostly.

Correct, but we avoid the need to use IS NULL by doing the single space conversion.

Also, keep in mind that lots of other big databases (e.g., SQL Server, Sybase,
DB2) convert '' to spaces by default anyway, so if you want Bugzilla eventually
to run on those databases, they would face the same issue.
Hrm. The DBD::Oracle solution doesn't sound too terrible, but does it do the
same thing for strings coming *out* of the database? That is, in perl, '' is
false but ' ' is true (and we likely rely on that all over the place).

The only problem with it is requiring people to always use placeholders, which
is another point that requires oversight from me. It could also involve possibly
serious hacking, depending on exactly what code we'd have to change. I think
we're in for changing more than just disabledtext and shutdownhtml, to get total
consistency on this.

Also, we don't want to *always* use placeholders -- using them frequently
prevents the database from caching query results. And there are places we
definitely want the DB to cache query results if at all possible.

The triggers also sound like they could be a good idea (I don't mind managing
schema changes, since that's what DB::Schema does best), but would they also
work when we're comparing things in a WHERE clause?
(In reply to comment #15)
> Hrm. The DBD::Oracle solution doesn't sound too terrible, but does it do the
> same thing for strings coming *out* of the database? That is, in perl, '' is
> false but ' ' is true (and we likely rely on that all over the place).

Not sure, but from my quick glance at it, I don't think the CompatMode option
does anything on the result side.

> The triggers also sound like they could be a good idea (I don't mind managing
> schema changes, since that's what DB::Schema does best), but would they also
> work when we're comparing things in a WHERE clause?

For every text column of every table, we would set up a trigger to fire on each
INSERT or UPDATE to a column, changing the inserted/updated value to a space
whenever the SQL tried to use an empty string, as follows:

   CREATE TABLE t ( col VARCHAR(32) NOT NULL );
   CREATE TRIGGER t_col_tr
     BEFORE INSERT OR UPDATE OF col ON t
     FOR EACH ROW
     WHEN ( NEW.col IS NULL )
     BEGIN
       :NEW.col := ' ';
     END;

As a result, the database really stores the non-empty string that consists of a
single space, and the NOT NULL constraint is not violated.  Triggers can't do
anything on the query side, so we still face the issue of '' not being the same
as '' without trimming.
OK. So, what we need is:

* Every INSERT or UPDATE SQL statement containing an empty string needs to
insert *A* (where *A* is some particular value we choose) instead of an empty
string. Most DBs use a single-space for *A*; I wanted a control character. We
could use a single space, I suppose.

* Every SQL statement containing an empty string as a WHERE or HAVING criteria
needs to automatically replace that empty string with *A*.

* Every time we get *A* out of the database, it needs to be translated back into
an empty string before anything sees it.

All we need is for all of the above to be true, as far as perl is concerned. I
don't really care what happens in the DB, as long as those three conditions are
all kept. From our discussion here, I'm still fairly certain that the only
totally reliable way of accomplishing the above is to override some DBI
functions in Bugzilla::DB::Oracle, but if you can come up with some other way of
maintaining the above three conditions, I'm OK with that, too.
(In reply to comment #17)
> From our discussion here, I'm still fairly certain that the only
> totally reliable way of accomplishing the above is to override some DBI
> functions in Bugzilla::DB::Oracle

Ok, I've added method overrides in Bugzilla::DB::Oracle for all of the relevant
DBD::Oracle::db/DBD::Oracle::st methods.  Now that I've done the work, the
difference between using a single space, a control character, or a control
string is trivial.

Some of the DBI fetch methods call each other internally, so whatever we do at
fetch time to undo what we did at insert time needs to be idempotent.  Therefore
I opted against using a single space, since it's difficult to make removal of a
space idempotent.  I decided to use a control string "__BZ_EMPTY_STR__".  This
will be easier to debug than a single control character since it will be more
obvious where to start debugging if the app gets "__BZ_EMPTY_STR__" back from a
query.
(In reply to comment #18)
Great, that sounds perfect.
Severity: normal → enhancement
Assignee: lance.larsh → xiaoou.wu
Target Milestone: --- → Bugzilla 3.2
This was worked around in our creation of Bugzilla::DB::Oracle.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.