Fulltext search with a LIKE on bugs.short_desc is too slow.

RESOLVED FIXED in Bugzilla 3.2

Status

()

Bugzilla
Query/Bug List
--
major
RESOLVED FIXED
11 years ago
10 years ago

People

(Reporter: Max Kanat-Alexander, Assigned: Max Kanat-Alexander)

Tracking

({perf})

3.1.2
Bugzilla 3.2
Dependency tree / graph
Bug Flags:
approval +
blocking3.2 +

Details

(Whiteboard: [wanted-bmo])

Attachments

(1 attachment, 2 obsolete attachments)

v3
16.64 KB, patch
Max Kanat-Alexander
: review+
Details | Diff | Splinter Review
(Assignee)

Description

11 years ago
Right now, fulltext search on b.m.o is just too slow, because we're doing a LIKE on bugs.short_desc, and that requires a pretty intensive table scan of a large table.

My suggested solution is to create a new table called bugs_fulltext, and to have one row per bug with short_desc and the full, combined text of all the comments. This will also allow us to do fulltext searching on the *entirety* of the comments.
(Assignee)

Comment 1

11 years ago
Created attachment 290083 [details] [diff] [review]
v1

Okay, this does exactly what I said, above. This works really well, and finally handles every problem we had with fulltext search. It should also lead to a much better and more relevant fulltext search on MySQL-based installations.

There are two comments fields--one with all comments and one with only non-private comments, so that non-insiders can't search and find bugs based on comments they shouldn't be able to see.
Assignee: query-and-buglist → mkanat
Status: NEW → ASSIGNED
Attachment #290083 - Flags: review?(bugzilla-mozilla)

Comment 2

11 years ago
Comment on attachment 290083 [details] [diff] [review]
v1

>Index: Bugzilla/Install/DB.pm

>+sub _populate_bugs_fulltext {

>+    # We only populate the table if it's empty...
>+    if (!$fulltext) {

Nit: you should rather write: return unless $fulltext instead of this big IF block. You would avoid useless indentation.


Note that I'm not too happy to see comments duplicated three times (assuming 99% of comments are not private). And you are updating the bugs_fulltext table even if the bug summary didn't change nor its comments (nor their privacy). Is there no better way to do this, using views or something?
(Assignee)

Comment 3

11 years ago
(In reply to comment #2)
> Note that I'm not too happy to see comments duplicated three times (assuming
> 99% of comments are not private). And you are updating the bugs_fulltext table
> even if the bug summary didn't change nor its comments (nor their privacy). Is
> there no better way to do this, using views or something?

  The problem is that there's no standard GROUP_CONCAT function. So even if there was a view, we couldn't make it in any ANSI way. Also, I'm not sure you can have a FULLTEXT index on a view of an InnoDB table. (I'm not sure you can have a FULLTEXT index on a view at all.)
(Assignee)

Updated

11 years ago
Attachment #290083 - Flags: review?(justdave)
(Assignee)

Comment 4

11 years ago
Comment on attachment 290083 [details] [diff] [review]
v1

Can I get *anybody* to review this?
Attachment #290083 - Flags: review?(wurblzap)
Attachment #290083 - Flags: review?(dkl)
(Assignee)

Comment 5

11 years ago
We can't release 3.2 if fulltext continues to be as slow as it is. It's practically useless, it's so annoyingly slow on bmo.
Flags: blocking3.2+

Comment 6

11 years ago
Don't suppose Jesse could review this?
(Assignee)

Comment 7

11 years ago
Comment on attachment 290083 [details] [diff] [review]
v1

Yes, I would totally take a review from Jesse. I'm actually the DB owner, but this touches a few tiny places that I *don't* own, so I at least need somebody else to look at it and comment.

This patch has probably bitrotted, but you can still look it over for general goodness.
Attachment #290083 - Flags: review?(wurblzap)
Attachment #290083 - Flags: review?(justdave)
Attachment #290083 - Flags: review?(jjclark1982)
Attachment #290083 - Flags: review?(dkl)
Attachment #290083 - Flags: review?(bugzilla-mozilla)
Attachment #290083 - Flags: review?
(Assignee)

Updated

11 years ago
Attachment #290083 - Flags: review?

Comment 8

11 years ago
(In reply to comment #7)
> (From update of attachment 290083 [details] [diff] [review])
bugs_fulltext.comments and comments_noprivate should be LONGTEXT, right?
(Assignee)

Comment 9

11 years ago
(In reply to comment #8)
> (In reply to comment #7)
> > (From update of attachment 290083 [details] [diff] [review] [details])
> bugs_fulltext.comments and comments_noprivate should be LONGTEXT, right?

  Right. I wrote this patch before we had those. I do need to update it.

Comment 10

10 years ago
Comment on attachment 290083 [details] [diff] [review]
v1

I have verified that this version creates, updates, and searches the fulltext database correctly under MySQL 5.1.
Attachment #290083 - Flags: review?(jjclark1982) → review+

Comment 11

10 years ago
Created attachment 301210 [details] [diff] [review]
v1 - jjclark

The version I tested, including updates for Search.pm functions and bug->process. I assume that the new database transaction model does not need the fulltext table to be specifically named for locking.
Attachment #290083 - Attachment is obsolete: true
Attachment #301210 - Flags: review?(mkanat)
(Assignee)

Comment 12

10 years ago
Great, thanks! Do you have any comments on the design, or anything?

Comment 13

10 years ago
Do you think there would be any performance issues with extending this method to include all fields?
(Assignee)

Comment 14

10 years ago
(In reply to comment #13)
> Do you think there would be any performance issues with extending this method
> to include all fields?

  You mean every field in the bugs table concatted into one field? Or into separate fields? I think it would just add complexity for little gain, really.

Comment 15

10 years ago
(In reply to comment #13)
> Do you think there would be any performance issues with extending this method
> to include all fields?
> 

Probably a new bug should be added to make this an option for custom fields. I can see an advantage of also searching whiteboard, and keywords. But I think that probably needs to be taken care of in a separate bug since its new functionality.

(Assignee)

Comment 16

10 years ago
Comment on attachment 301210 [details] [diff] [review]
v1 - jjclark

This patch has _sync_fulltext in at least one wrong place, so I've fixed it and am making a new patch.
Attachment #301210 - Flags: review?(mkanat) → review-
(Assignee)

Comment 17

10 years ago
Created attachment 301425 [details] [diff] [review]
v3

Okay, this one seems to work just fine, and has everything in the right place. It also addresses Xiaoou's comments.
Attachment #301210 - Attachment is obsolete: true
Attachment #301425 - Flags: review+
(Assignee)

Updated

10 years ago
Flags: approval?

Comment 18

10 years ago
v3 crashes while updating databases to UTF-8. (mysql  Ver 14.14 Distrib 5.1.22-rc, for apple-darwin8.5.1 )

I get no errors when having checksetup create a new database, but when I create an empty database and then run checksetup, I get this error:

Converting bugs_fulltext.short_desc to be stored as UTF-8...
DBD::mysql::db do failed: Column 'short_desc' cannot be part of FULLTEXT index [for Statement "ALTER TABLE bugs_fulltext CHANGE COLUMN short_desc short_desc 
                              varchar(255) CHARACTER SET binary NOT NULL"] at Bugzilla/DB/Mysql.pm line 660
	Bugzilla::DB::Mysql::bz_setup_database('Bugzilla::DB::Mysql=HASH(0xdf87e4)') called at checksetup.pl line 144

"CHARACTER SET binary" ??  That doesn't look like UTF-8 to me.
Whiteboard: [wanted-bmo]
(Assignee)

Comment 20

10 years ago
(In reply to comment #18)
> I get no errors when having checksetup create a new database, but when I create
> an empty database and then run checksetup, I get this error:

  Hrm, try that with the very latest tip. We have some code in there that might prevent that, that I *just* checked in.

  However, I suspect it will still fail in the following case:

  1) Drop your DB.
  2) Turn off the utf8 parameter.
  3) Run checksetup.
  4) Turn on the utf8 parameter.
  5) Run checksetup again.

(In reply to comment #19)
> "CHARACTER SET binary" ??  That doesn't look like UTF-8 to me.

  We do an intermediate transition to BINARY so that MySQL doesn't try to translate the bytes as latin1.

Comment 21

10 years ago
Yes, it seems to be working now. Can you explain the change so we can apply it to our local branch?
(In reply to comment #20)
> (In reply to comment #19)
> > "CHARACTER SET binary" ??  That doesn't look like UTF-8 to me.
> 
>   We do an intermediate transition to BINARY so that MySQL doesn't try to
> translate the bytes as latin1.

Oh, I didn't realize that was just the intermediate one.
(Assignee)

Comment 23

10 years ago
(In reply to comment #21)
> Yes, it seems to be working now. Can you explain the change so we can apply it
> to our local branch?

  If you mean PRACA, you should talk to me about that separately. If your local branch is not currently exactly sync'ed with Bugzilla tip, then I strongly advise that you not pull patches from the tip. However, the change was bug 374951.
(Assignee)

Comment 24

10 years ago
Okay, since nobody's reported any problems with this patch, it's good to go, as far as I know.
Flags: approval? → approval+

Comment 25

10 years ago
We have been using this patch on PRACA for a couple weeks and have not experienced any problems.

Comment 26

10 years ago
Is there no perf issues due to comments duplication, even on slow hardware? Some bugs have like 300 comments, and adding a single "ok" to a comment is going to recreate entries twice (private and no private).

Also, to be sure: when a bug is deleted, DELETE => CASCADE is going to remove entries from the bugs_fulltext table automatically, right? Are we sure that all supported DBs support triggers? If not, you should add "DELETE FROM bugs_fulltext WHERE bug_id = ?" to $bug->remove_from_db().
(Assignee)

Comment 27

10 years ago
(In reply to comment #26)
> Is there no perf issues due to comments duplication, even on slow hardware?

  Correct. The maximum comment length is 64K. So all you'd be doing is writing 64K to the disk twice or three times instead of once--not a big deal. (Also, any comment that's actually 64K would be pretty ridiculous.)

> Also, to be sure: when a bug is deleted, DELETE => CASCADE is going to remove
> entries from the bugs_fulltext table automatically, right? Are we sure that all
> supported DBs support triggers?

  Yes, DELETE CASCADE will do this, and yes, all DBs support triggers.

Comment 28

10 years ago
Besides code simplicity.. is their a reason why you didn't update the full text columns instead of replacing them? The current code would do an extra 2 DB Select statements on the longdescs table. Now the time for that operation would not be a lot.. but I am just curious why you didnt want to do an append instead.
(Assignee)

Comment 29

10 years ago
(In reply to comment #28)
> Besides code simplicity.. is their a reason why you didn't update the full text
> columns instead of replacing them?

  Um, I do update them.

Comment 30

10 years ago
+    $dbh->do('UPDATE bugs_fulltext SET comments = ?, comments_noprivate = ?
+               WHERE bug_id = ?', undef, $all, $nopriv_string, $self->id);
+}

This will update the values... but what I am asking is why not do the following

 $dbh->do('UPDATE bugs_fulltext SET comments = CONCAT(comments,?), comments_noprivate = CONCAT(comments_noprivate,?)
               WHERE bug_id = ?', undef, $all, $nopriv_string, $self->id);

And then
$all, $nopriv_string can only be the new comment.

This has advantages and disadvantages, but would be a lot faster for the database to do.
(Assignee)

Comment 31

10 years ago
I don't solve performance problems that don't exist, particularly if I don't have direct evidence that the solution solves the problem. (Which is pretty hard if the problem doesn't exist.)

Comment 32

10 years ago
Since to build the comment you are doing 2 comment queries... it will be slower as currently patched. Of course your right its not likely to be a problem. Why do premature optimizations anyway. I simply asked as a response to lpsolits comment.
(Assignee)

Comment 33

10 years ago
Checking in importxml.pl;
/cvsroot/mozilla/webtools/bugzilla/importxml.pl,v  <--  importxml.pl
new revision: 1.82; previous revision: 1.81
done
Checking in Bugzilla/Bug.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Bug.pm,v  <--  Bug.pm
new revision: 1.236; previous revision: 1.235
done
Checking in Bugzilla/Search.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Search.pm,v  <--  Search.pm
new revision: 1.158; previous revision: 1.157
done
Checking in Bugzilla/DB/Mysql.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Mysql.pm,v  <--  Mysql.pm
new revision: 1.58; previous revision: 1.57
done
Checking in Bugzilla/DB/Pg.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Pg.pm,v  <--  Pg.pm
new revision: 1.27; previous revision: 1.26
done
Checking in Bugzilla/DB/Schema.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema.pm,v  <--  Schema.pm
new revision: 1.98; previous revision: 1.97
done
Checking in Bugzilla/DB/Schema/Mysql.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Schema/Mysql.pm,v  <--  Mysql.pm
new revision: 1.20; previous revision: 1.19
done
Checking in Bugzilla/Install/DB.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Install/DB.pm,v  <--  DB.pm
new revision: 1.50; previous revision: 1.49
done
Status: ASSIGNED → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED

Updated

10 years ago
Blocks: 437369

Updated

10 years ago
Blocks: 437602
(Assignee)

Updated

10 years ago
Duplicate of this bug: 287169
You need to log in before you can comment on or make changes to this bug.