Open Bug 829413 Opened 12 years ago Updated 1 year ago

Bugzilla::Bug->comments() shouldn't load all comments if 'after' or 'to' args are provided

Tracking

()

Status:

NEW

People

(Reporter: glob, Unassigned)

Details

(Keywords: perf)

Keywords:

perf

Votes:

Attachments

(1 obsolete file)

same problem 1 year ago Joshua Zubaly 35 bytes, text/html		Details

:glob ✱

Reporter

Description

•

12 years ago

currently Bugzilla::Bug->comments() acts as follows: if (!self->{comments}) { load all comments, save in self->{comments} } @comments = self->{comments} if (params->{after}) { @comments = grep { comment->date > params->{after} } @comments } if (params->{to}) { @comments = grep { comment->date <= params->{to} } @comments } return @comments the problem with this approach is even if the caller only wants a subset of comments, we always hit the database for all comments. this could be quite expensive on bugs with a lot of comments. as $self->{comments} lives for the duration of the current request only, it doesn't provide any value for most calls. i don't see anywhere in our code where we use 'after' or 'to', outside the webservice modules. i think if 'after' or 'to' parameters are provided, instead of loading all comments from the database, we should pass the date filters to the database and let it perform filtering.

Frédéric Buclin

Comment 1

•

12 years ago

With the current DB schema, there is no perf penalty with the current code. Here is the SQL query executed by Bugzilla::Bug->comments: EXPLAIN SELECT comment_id, bug_id, who, bug_when, work_time, thetext, isprivate, already_wrapped, type, extra_data FROM longdescs WHERE bug_id = 29286 ORDER BY bug_when, comment_id; +------+-------------+-----------+------+----------------------+----------------------+---------+-------+------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-----------+------+----------------------+----------------------+---------+-------+------+-----------------------------+ | 1 | SIMPLE | longdescs | ref | longdescs_bug_id_idx | longdescs_bug_id_idx | 3 | const | 180 | Using where; Using filesort | +------+-------------+-----------+------+----------------------+----------------------+---------+-------+------+-----------------------------+ Adding |AND bug_when > 'YYYY-MM-DD'| to the WHERE part doesn't help. As you can see below, despite the longdescs_bug_when_idx index is a possible candidate to retrieve data, the DB server is still using the longdescs_bug_id_idx index and the 180 rows/comments belonging to this bug are still scanned to compare the date. So the DB server still has to retrieve all comments from this bug: EXPLAIN SELECT comment_id, bug_id, who, bug_when, work_time, thetext, isprivate, already_wrapped, type, extra_data FROM longdescs WHERE bug_id = 29286 AND bug_when > '2009-02-21' ORDER BY bug_when, comment_id; +------+-------------+-----------+------+---------------------------------------------+----------------------+---------+-------+------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-----------+------+---------------------------------------------+----------------------+---------+-------+------+-----------------------------+ | 1 | SIMPLE | longdescs | ref | longdescs_bug_when_idx,longdescs_bug_id_idx | longdescs_bug_id_idx | 3 | const | 180 | Using where; Using filesort | +------+-------------+-----------+------+---------------------------------------------+----------------------+---------+-------+------+-----------------------------+ Things become different if we add a new index foo(bug_id bug_when). The current longdescs_bug_when_idx(bug_when) index is of no use to get comments as seen above, but the new one gives interesting results. The first query, without the time range, gives: +------+-------------+-----------+------+----------------------------------------+-------------------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-----------+------+----------------------------------------+-------------------+---------+-------+------+-------------+ | 1 | SIMPLE | longdescs | ref | longdescs_bug_id_idx,longdescs_foo_idx | longdescs_foo_idx | 3 | const | 179 | Using where | +------+-------------+-----------+------+----------------------------------------+-------------------+---------+-------+------+-------------+ As you can see, the DB server now chooses the new longdescs_foo_idx index and the Extra column shows that "Using filesort" is gone. With the time range, we get: +------+-------------+-----------+-------+---------------------------------------------------------------+-------------------+---------+------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-----------+-------+---------------------------------------------------------------+-------------------+---------+------+------+-----------------------+ | 1 | SIMPLE | longdescs | range | longdescs_bug_when_idx,longdescs_bug_id_idx,longdescs_foo_idx | longdescs_foo_idx | 11 | NULL | 1 | Using index condition | +------+-------------+-----------+-------+---------------------------------------------------------------+-------------------+---------+------+------+-----------------------+ Now type = range instead of type = ref, but more interesting is that there is now only one row/comment retrieved instead of the 180 comments found before. Now, I don't know if this translates into a real perf win. In my testing, the current code is already very fast: 0.01 second to collect 180 comments. Things can hardly be faster.

Severity: normal → enhancement

Keywords: perf

OS: Mac OS X → All

Hardware: x86 → All

:glob ✱

Reporter

Comment 2

•

12 years ago

thanks for looking at this so quickly :) be aware you're only measuring the database query -- it takes time and memory to marshal the results to perl, to store them, and then to filter the array.

Frédéric Buclin

Comment 3

•

12 years ago

(In reply to Byron Jones ‹:glob› from comment #2) > be aware you're only measuring the database query -- it takes time and > memory to marshal the results to perl, to store them, and then to filter the > array. Sure, but to be really efficient, I showed that adding an index on (bug_id, bug_when) would let the DB server avoid an index scan and only return relevant rows. So not only would Perl have less rows to compute, but also the DB server would have less rows to analyze. So what I'm saying is that if we want to fix this bug in a very efficient way, we should both add an index and restrict the time range. But we would have to be careful and check if not using the cache would be a perf penalty or not.

Frédéric Buclin

Comment 4

•

12 years ago

(In reply to Byron Jones ‹:glob› from comment #0) > doesn't provide any value for most calls. i don't see anywhere in our code > where we use 'after' or 'to', outside the webservice modules. We use 'after' and 'to' to generate bugmails, to get the new comment(s), see Bugzilla::BugMail::Send(): my $comments = $bug->comments({ after => $start, to => $end });

:glob ✱

Reporter

Updated

•

11 years ago

Assignee: general → glob

:glob ✱

Reporter

Updated

•

11 years ago

Assignee: glob → general

BMO Automation

Updated

•

1 year ago

Attachment #9383262 - Attachment is obsolete: true

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Bugzilla::Bug->comments() shouldn't load all comments if 'after' or 'to' args are provided

Categories

(Bugzilla :: Bugzilla-General, enhancement)

Tracking

()

People

(Reporter: glob, Unassigned)

References

Details

(Keywords: perf)

Crash Data

Security

(public)

User Story

Attachments

(1 obsolete file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Updated

Updated

Attachment

General

Description

File Name

Content Type