Closed Bug 399370 Opened 17 years ago Closed 16 years ago

Fulltext search with a LIKE on bugs.short_desc is too slow.

Categories

(Bugzilla :: Query/Bug List, defect)

3.1.2
defect
Not set
major

Tracking

()

RESOLVED FIXED
Bugzilla 3.2

People

(Reporter: mkanat, Assigned: mkanat)

References

Details

(Keywords: perf, Whiteboard: [wanted-bmo])

Attachments

(1 file, 2 obsolete files)

Right now, fulltext search on b.m.o is just too slow, because we're doing a LIKE on bugs.short_desc, and that requires a pretty intensive table scan of a large table.

My suggested solution is to create a new table called bugs_fulltext, and to have one row per bug with short_desc and the full, combined text of all the comments. This will also allow us to do fulltext searching on the *entirety* of the comments.
Attached patch v1 (obsolete) — Splinter Review
Okay, this does exactly what I said, above. This works really well, and finally handles every problem we had with fulltext search. It should also lead to a much better and more relevant fulltext search on MySQL-based installations.

There are two comments fields--one with all comments and one with only non-private comments, so that non-insiders can't search and find bugs based on comments they shouldn't be able to see.
Assignee: query-and-buglist → mkanat
Status: NEW → ASSIGNED
Attachment #290083 - Flags: review?(bugzilla-mozilla)
Comment on attachment 290083 [details] [diff] [review]
v1

>Index: Bugzilla/Install/DB.pm

>+sub _populate_bugs_fulltext {

>+    # We only populate the table if it's empty...
>+    if (!$fulltext) {

Nit: you should rather write: return unless $fulltext instead of this big IF block. You would avoid useless indentation.


Note that I'm not too happy to see comments duplicated three times (assuming 99% of comments are not private). And you are updating the bugs_fulltext table even if the bug summary didn't change nor its comments (nor their privacy). Is there no better way to do this, using views or something?
(In reply to comment #2)
> Note that I'm not too happy to see comments duplicated three times (assuming
> 99% of comments are not private). And you are updating the bugs_fulltext table
> even if the bug summary didn't change nor its comments (nor their privacy). Is
> there no better way to do this, using views or something?

  The problem is that there's no standard GROUP_CONCAT function. So even if there was a view, we couldn't make it in any ANSI way. Also, I'm not sure you can have a FULLTEXT index on a view of an InnoDB table. (I'm not sure you can have a FULLTEXT index on a view at all.)
Attachment #290083 - Flags: review?(justdave)
Comment on attachment 290083 [details] [diff] [review]
v1

Can I get *anybody* to review this?
Attachment #290083 - Flags: review?(wurblzap)
Attachment #290083 - Flags: review?(dkl)
We can't release 3.2 if fulltext continues to be as slow as it is. It's practically useless, it's so annoyingly slow on bmo.
Flags: blocking3.2+
Don't suppose Jesse could review this?
Comment on attachment 290083 [details] [diff] [review]
v1

Yes, I would totally take a review from Jesse. I'm actually the DB owner, but this touches a few tiny places that I *don't* own, so I at least need somebody else to look at it and comment.

This patch has probably bitrotted, but you can still look it over for general goodness.
Attachment #290083 - Flags: review?(wurblzap)
Attachment #290083 - Flags: review?(justdave)
Attachment #290083 - Flags: review?(jjclark1982)
Attachment #290083 - Flags: review?(dkl)
Attachment #290083 - Flags: review?(bugzilla-mozilla)
Attachment #290083 - Flags: review?
Attachment #290083 - Flags: review?
(In reply to comment #7)
> (From update of attachment 290083 [details] [diff] [review])
bugs_fulltext.comments and comments_noprivate should be LONGTEXT, right?
(In reply to comment #8)
> (In reply to comment #7)
> > (From update of attachment 290083 [details] [diff] [review] [details])
> bugs_fulltext.comments and comments_noprivate should be LONGTEXT, right?

  Right. I wrote this patch before we had those. I do need to update it.
Comment on attachment 290083 [details] [diff] [review]
v1

I have verified that this version creates, updates, and searches the fulltext database correctly under MySQL 5.1.
Attachment #290083 - Flags: review?(jjclark1982) → review+
Attached patch v1 - jjclark (obsolete) — Splinter Review
The version I tested, including updates for Search.pm functions and bug->process. I assume that the new database transaction model does not need the fulltext table to be specifically named for locking.
Attachment #290083 - Attachment is obsolete: true
Attachment #301210 - Flags: review?(mkanat)
Great, thanks! Do you have any comments on the design, or anything?
Do you think there would be any performance issues with extending this method to include all fields?
(In reply to comment #13)
> Do you think there would be any performance issues with extending this method
> to include all fields?

  You mean every field in the bugs table concatted into one field? Or into separate fields? I think it would just add complexity for little gain, really.
(In reply to comment #13)
> Do you think there would be any performance issues with extending this method
> to include all fields?
> 

Probably a new bug should be added to make this an option for custom fields. I can see an advantage of also searching whiteboard, and keywords. But I think that probably needs to be taken care of in a separate bug since its new functionality.

Comment on attachment 301210 [details] [diff] [review]
v1 - jjclark

This patch has _sync_fulltext in at least one wrong place, so I've fixed it and am making a new patch.
Attachment #301210 - Flags: review?(mkanat) → review-
Attached patch v3Splinter Review
Okay, this one seems to work just fine, and has everything in the right place. It also addresses Xiaoou's comments.
Attachment #301210 - Attachment is obsolete: true
Attachment #301425 - Flags: review+
Flags: approval?
v3 crashes while updating databases to UTF-8. (mysql  Ver 14.14 Distrib 5.1.22-rc, for apple-darwin8.5.1 )

I get no errors when having checksetup create a new database, but when I create an empty database and then run checksetup, I get this error:

Converting bugs_fulltext.short_desc to be stored as UTF-8...
DBD::mysql::db do failed: Column 'short_desc' cannot be part of FULLTEXT index [for Statement "ALTER TABLE bugs_fulltext CHANGE COLUMN short_desc short_desc 
                              varchar(255) CHARACTER SET binary NOT NULL"] at Bugzilla/DB/Mysql.pm line 660
	Bugzilla::DB::Mysql::bz_setup_database('Bugzilla::DB::Mysql=HASH(0xdf87e4)') called at checksetup.pl line 144

"CHARACTER SET binary" ??  That doesn't look like UTF-8 to me.
Whiteboard: [wanted-bmo]
(In reply to comment #18)
> I get no errors when having checksetup create a new database, but when I create
> an empty database and then run checksetup, I get this error:

  Hrm, try that with the very latest tip. We have some code in there that might prevent that, that I *just* checked in.

  However, I suspect it will still fail in the following case:

  1) Drop your DB.
  2) Turn off the utf8 parameter.
  3) Run checksetup.
  4) Turn on the utf8 parameter.
  5) Run checksetup again.

(In reply to comment #19)
> "CHARACTER SET binary" ??  That doesn't look like UTF-8 to me.

  We do an intermediate transition to BINARY so that MySQL doesn't try to translate the bytes as latin1.
Yes, it seems to be working now. Can you explain the change so we can apply it to our local branch?
(In reply to comment #20)
> (In reply to comment #19)
> > "CHARACTER SET binary" ??  That doesn't look like UTF-8 to me.
> 
>   We do an intermediate transition to BINARY so that MySQL doesn't try to
> translate the bytes as latin1.

Oh, I didn't realize that was just the intermediate one.
(In reply to comment #21)
> Yes, it seems to be working now. Can you explain the change so we can apply it
> to our local branch?

  If you mean PRACA, you should talk to me about that separately. If your local branch is not currently exactly sync'ed with Bugzilla tip, then I strongly advise that you not pull patches from the tip. However, the change was bug 374951.
Okay, since nobody's reported any problems with this patch, it's good to go, as far as I know.
Flags: approval? → approval+
We have been using this patch on PRACA for a couple weeks and have not experienced any problems.
Is there no perf issues due to comments duplication, even on slow hardware? Some bugs have like 300 comments, and adding a single "ok" to a comment is going to recreate entries twice (private and no private).

Also, to be sure: when a bug is deleted, DELETE => CASCADE is going to remove entries from the bugs_fulltext table automatically, right? Are we sure that all supported DBs support triggers? If not, you should add "DELETE FROM bugs_fulltext WHERE bug_id = ?" to $bug->remove_from_db().
(In reply to comment #26)
> Is there no perf issues due to comments duplication, even on slow hardware?

  Correct. The maximum comment length is 64K. So all you'd be doing is writing 64K to the disk twice or three times instead of once--not a big deal. (Also, any comment that's actually 64K would be pretty ridiculous.)

> Also, to be sure: when a bug is deleted, DELETE => CASCADE is going to remove
> entries from the bugs_fulltext table automatically, right? Are we sure that all
> supported DBs support triggers?

  Yes, DELETE CASCADE will do this, and yes, all DBs support triggers.
Besides code simplicity.. is their a reason why you didn't update the full text columns instead of replacing them? The current code would do an extra 2 DB Select statements on the longdescs table. Now the time for that operation would not be a lot.. but I am just curious why you didnt want to do an append instead.
(In reply to comment #28)
> Besides code simplicity.. is their a reason why you didn't update the full text
> columns instead of replacing them?

  Um, I do update them.
+    $dbh->do('UPDATE bugs_fulltext SET comments = ?, comments_noprivate = ?
+               WHERE bug_id = ?', undef, $all, $nopriv_string, $self->id);
+}

This will update the values... but what I am asking is why not do the following

 $dbh->do('UPDATE bugs_fulltext SET comments = CONCAT(comments,?), comments_noprivate = CONCAT(comments_noprivate,?)
               WHERE bug_id = ?', undef, $all, $nopriv_string, $self->id);

And then
$all, $nopriv_string can only be the new comment.

This has advantages and disadvantages, but would be a lot faster for the database to do.
I don't solve performance problems that don't exist, particularly if I don't have direct evidence that the solution solves the problem. (Which is pretty hard if the problem doesn't exist.)
Since to build the comment you are doing 2 comment queries... it will be slower as currently patched. Of course your right its not likely to be a problem. Why do premature optimizations anyway. I simply asked as a response to lpsolits comment.
Checking in importxml.pl;
/cvsroot/mozilla/webtools/bugzilla/importxml.pl,v  <--  importxml.pl
new revision: 1.82; previous revision: 1.81
done
Checking in Bugzilla/Bug.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Bug.pm,v  <--  Bug.pm
new revision: 1.236; previous revision: 1.235
done
Checking in Bugzilla/Search.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Search.pm,v  <--  Search.pm
new revision: 1.158; previous revision: 1.157
done
Checking in Bugzilla/DB/Mysql.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Mysql.pm,v  <--  Mysql.pm
new revision: 1.58; previous revision: 1.57
done
Checking in Bugzilla/DB/Pg.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Pg.pm,v  <--  Pg.pm
new revision: 1.27; previous revision: 1.26
done
Checking in Bugzilla/DB/Schema.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema.pm,v  <--  Schema.pm
new revision: 1.98; previous revision: 1.97
done
Checking in Bugzilla/DB/Schema/Mysql.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema/Mysql.pm,v  <--  Mysql.pm
new revision: 1.20; previous revision: 1.19
done
Checking in Bugzilla/Install/DB.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Install/DB.pm,v  <--  DB.pm
new revision: 1.50; previous revision: 1.49
done
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Blocks: 437369
Blocks: 437602
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: