Closed Bug 98446 Opened 23 years ago Closed 18 years ago

Use SELECT COUNT(*), not SELECT *, to check for record existence

Categories

(Bugzilla :: Bugzilla-General, enhancement, P3)

2.15
enhancement

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: CodeMachine, Unassigned)

References

Details

It is more efficient to do a SELECT COUNT(*) FROM table WHERE condition and
check the number returned than it is to do a SELECT * FROM table WHERE condition
and check whether a row has been returned.

The latter is used all over Bugzilla, and is slower because the DB has to
prepare all the data even if it doesn't have to send it (or is it that the data
is batched so we always receive at least X records?).

Here are some timings on the bugs table in my copy of the Landfill DB (* =
SELECT *, C = SELECT COUNT(*), B = SELECT bug_id:

* 0.0120871067047119
C 0.00199902057647705
C 0.00133705139160156
* 0.0156440734863281
* 0.0149340629577637
B 0.00438308715820312
C 0.0013120174407959
C 0.00119400024414062
B 0.00460505485534668
B 0.00448298454284668


I'd expect this to be worse on b.m.o, but who knows.  I'm not sure whether any
common code paths have this perf drain and therefore whether we're going to get
a decent perf gain anywhere for b.m.o, but there's a second part to this report.

It is common to simply want to determine whether a record exists that matches a
condition, and I'm introducing a function called RecordExists to deal with the
problem.  The function is called like:

RecordExists( "table", "condition" )

which is a lot simpler than doing it long hand and so will reduce code
complexity.  RecordExists uses COUNT(*) so it's an easy way to bring the usage
about.

RecordExists is introduced on the patch on bug #94534.  We will also need to
deal with bug #95426 first since that bites this usage.
Depends on: bz-custres, 95426
Priority: -- → P3
Target Milestone: --- → Bugzilla 2.18
Mine.
Assignee: justdave → matty
QA Contact: matty → jake
Status: NEW → ASSIGNED
Blocks: bz-perf
In fact, its simpler to SELECT <somefield> LIMIT 1, because then the db can stop
at the first entry it finds - COUNT requires finding all of them. I don't thin
its worth abstracting that away, really.
Summary: Use SELECT COUNT(*) not SELECT * for counts and use RecordExists. → Use SELECT * LIMIT 1 not SELECT * for counts and use RecordExists.
This doesn't require cust res :-)

Gerv
No longer depends on: bz-custres
SELECT 1 FROM whatever WHERE yada LIMIT 1 might be even faster (doesn't require
a single record read) but I don't know if all dbs will accept it; MySQL and
PostgreSQL do.
This was dependent on custres because that had the RecordExists sub.

And if a database didn't handle LIMIT, the RecordExists sub would be a nice way
of handling the dialectal issue in a central place.

Mind you, SELECT COUNT(*) FROM table ( with no where condition ) is probably
still faster.  AFAICR, this was the default if you specified no condition to
RecordExists.
Taking Jake's bugs...  his Army Reserve unit has been deployed.
QA Contact: jake → justdave
All 2.18 bugs that haven't been touched in over 60 days and aren't flagged as
blockers are getting pushed out to 2.20
Target Milestone: Bugzilla 2.18 → Bugzilla 2.20
Severity: normal → enhancement
Summary: Use SELECT * LIMIT 1 not SELECT * for counts and use RecordExists. → Use SELECT COUNT(*), not SELECT *, to check for record existence
This bug has not been touched by its owner in over six months, even though it is
targeted to 2.20, for which the freeze is 10 days away. Unsetting the target
milestone, on the assumption that nobody is actually working on it or has any
plans to soon.

If you are the owner, and you plan to work on the bug, please give it a real
target milestone. If you are the owner, and you do *not* plan to work on it,
please reassign it to nobody@bugzilla.org or a .bugs component owner. If you are
*anybody*, and you get this comment, and *you* plan to work on the bug, please
reassign it to yourself if you have the ability.
Target Milestone: Bugzilla 2.20 → ---
(In reply to comment #4)
> SELECT 1 FROM whatever WHERE yada LIMIT 1 might be even faster

I agree. I also think this is the fastest query we can have to check the
existence of a record:

my $has_db_entries =
    $dbh->selectrow_array("SELECT 1 FROM $table WHERE ... " .
                           dbh->sql_limit(1), undef, @args) || 0;

$dbh->sql_limit() is DB specific.

myk, justdave, could you give us some test results on b.m.o or landfill for example?
Keywords: qawanted
Target Milestone: --- → Bugzilla 2.22
The trunk is now frozen to prepare Bugzilla 2.22. Enhancement bugs are retargetted to 2.24.
Target Milestone: Bugzilla 2.22 → Bugzilla 2.24
Assignee: mattyt-bugzilla → general
Status: ASSIGNED → NEW
QA Contact: justdave → default-qa
We don't have any "SELECT *" statements in all of Bugzilla anymore, from my grep.
Status: NEW → RESOLVED
Closed: 18 years ago
Keywords: qawanted
Resolution: --- → WORKSFORME
Target Milestone: Bugzilla 2.24 → ---
You need to log in before you can comment on or make changes to this bug.