Closed
Bug 232612
Opened 20 years ago
Closed 19 years ago
enable boolean mode fulltext searches
Categories
(Bugzilla :: Query/Bug List, enhancement)
Bugzilla
Query/Bug List
Tracking
()
RESOLVED
FIXED
Bugzilla 2.22
People
(Reporter: myk, Assigned: myk)
References
(Blocks 1 open bug, )
Details
(Whiteboard: [requires MySQL 4.0.1 or higher])
Attachments
(1 file, 4 obsolete files)
4.52 KB,
patch
|
LpSolit
:
review+
|
Details | Diff | Splinter Review |
Boolean mode fulltext searches provide advanced options for finding bugs and should be turned on in Bugzilla once we require MySQL 4.0 (the first version of MySQL to support them). They are worse at determining relevance, however, so we may not want to turn them on by default but rather have an option for turning them on (either explicitly or by scanning the search string for boolean prefixes).
Assignee | ||
Updated•20 years ago
|
Assignee: justdave → myk
Target Milestone: --- → Bugzilla 2.20
Assignee | ||
Comment 1•20 years ago
|
||
Assignee | ||
Comment 2•20 years ago
|
||
Assignee | ||
Updated•20 years ago
|
Attachment #140693 -
Attachment is obsolete: true
Comment 3•20 years ago
|
||
Very MySQL specific. But we knew that already ;) I will try to implement this somehow in PostgreSQL.
Comment 4•20 years ago
|
||
Bugzilla 2.20 feature set is now frozen as of 15 Sept 2004. Anything flagged enhancement that hasn't already landed is being pushed out. If this bug is otherwise ready to land, we'll handle it on a case-by-case basis, please set the blocking2.20 flag to '?' if you think it qualifies.
Target Milestone: Bugzilla 2.20 → Bugzilla 2.22
Updated•20 years ago
|
Whiteboard: [patchv2 applied to b.m.o]
Updated•20 years ago
|
Attachment #140694 -
Flags: review?
Assignee | ||
Comment 5•20 years ago
|
||
Comment on attachment 140694 [details] [diff] [review] patch v2: fix for syntax error and indentation Dave, can you review this simple, simple patch?
Attachment #140694 -
Flags: review? → review?(justdave)
Assignee | ||
Updated•19 years ago
|
Attachment #140694 -
Flags: review?(justdave) → review?(LpSolit)
Comment 6•19 years ago
|
||
/me notes the dependencies on this bug. Feature requires MySQL 4.0.1 or newer. We have two options. 1) Start requiring MySQL 4.0.1 or newer (it's probably a bit premature to require anything newer than 3.23.x right now because of distribution difficulties, but that'll *very* likely change within the next year) 2) Check the server version, and only attempt to use this feature if the installed server is new enough for it.
Comment 7•19 years ago
|
||
Comment on attachment 140694 [details] [diff] [review] patch v2: fix for syntax error and indentation Tested on my installation (with MySQL 4.0.20) and it works fine. So r=LpSolit. But we have to increase the minimal version of MySQL to 4.0.1 before checking in this patch!
Attachment #140694 -
Flags: review?(LpSolit) → review+
Updated•19 years ago
|
Status: NEW → ASSIGNED
Whiteboard: [patchv2 applied to b.m.o] → [patchv2 applied to b.m.o] [requires MySQL 4.0.1 or higher]
Assignee | ||
Comment 8•19 years ago
|
||
This patch unrots the previous one, since the way we handle boolean mode searches has changed.
Attachment #140694 -
Attachment is obsolete: true
Attachment #188992 -
Flags: review?(LpSolit)
Comment 9•19 years ago
|
||
Comment on attachment 188992 [details] [diff] [review] patch v3: unrotted >+ # Figure out if we're dealing with a boolean search. >+ my $boolean_mode = ($v =~ /[+-<>()~*"]/); Why don't we just do that inside of Bugzilla::DB::Mysql's sql_fulltext_search? Then we'll get it whenever we fulltext-search anything.
Comment 10•19 years ago
|
||
Comment on attachment 188992 [details] [diff] [review] patch v3: unrotted $boolean_mode description has to be added into the POD docs of DB.pm. Your patch seems to work as expected, based on the few tests I did. So r=LpSolit assuming you will post a new patch with the $boolean_mode description included. (carry forward my r+)
Attachment #188992 -
Flags: review?(LpSolit) → review+
Assignee | ||
Comment 11•19 years ago
|
||
> Why don't we just do that inside of Bugzilla::DB::Mysql's
> sql_fulltext_search? Then we'll get it whenever we fulltext-search anything.
Good point. Done. Frederic, this means that the $boolean_mode parameter is no
more, so I haven't added it to the POD documentation. It also means that patch
is sufficiently different from the way it used to be that it needs another
review.
In particular, the code now SQL quotes the text inside sql_fulltext_search()
rather than in Search.pm, since boolean mode needs to be determined from the
unquoted string (otherwise we'd think all searches were in boolean mode on a
database that quotes with double-quotes).
The new approach is also cleaner, since we're quoting closer to where the
quotes are actually needed, and simpler, since we no longer need to unquote the
string before using it in the ANSI version of the method.
Attachment #188992 -
Attachment is obsolete: true
Attachment #189094 -
Flags: review?(LpSolit)
Assignee | ||
Comment 12•19 years ago
|
||
We don't need to trick taint about @words, since we derive @words via a regexp, which suffices to detaint them.
Assignee | ||
Updated•19 years ago
|
Attachment #189094 -
Attachment is obsolete: true
Attachment #189127 -
Flags: review?(LpSolit)
Updated•19 years ago
|
Attachment #189094 -
Flags: review?(LpSolit)
Comment 13•19 years ago
|
||
Comment on attachment 189127 [details] [diff] [review] patch v5: no need to trick taint about @words I did some searches using both the "Find a specific bug" and "Advanced Search". Here are the typical SQL statements I get: Find a specific bug (MySQL): SELECT bugs.bug_id, bugs.bug_severity, bugs.priority, bugs.bug_status, bugs.resolution, map_assigned_to.login_name, map_reporter.login_name, bugs.bug_status, bugs.resolution, bugs.short_desc, (SUM(MATCH(longdescs_0.thetext) AGAINST('test >assignee' IN BOOLEAN MODE))/COUNT(MATCH(longdescs_0.thetext) AGAINST('test >assignee' IN BOOLEAN MODE)) + MATCH(bugs.short_desc) AGAINST('test >assignee' IN BOOLEAN MODE)) AS relevance FROM bugs INNER JOIN profiles AS map_assigned_to ON (bugs.assigned_to = map_assigned_to.userid) INNER JOIN profiles AS map_reporter ON (bugs.reporter = map_reporter.userid) INNER JOIN longdescs AS longdescs_0 ON (bugs.bug_id = longdescs_0.bug_id AND longdescs_0.isprivate < 1) LEFT JOIN bug_group_map ON bug_group_map.bug_id = bugs.bug_id WHERE ((bugs.bug_status IN ('UNCONFIRMED','NEW','ASSIGNED','REOPENED'))) AND (((MATCH(longdescs_0.thetext) AGAINST('test >assignee' IN BOOLEAN MODE) > 0) OR (bugs.short_desc REGEXP '(^|[^a-z0-9])test($|[^a-z0-9])' AND bugs.short_desc REGEXP '(^|[^a-z0-9])\\>assignee($|[^a-z0-9])'))) AND bugs.creation_ts IS NOT NULL AND ((bug_group_map.group_id IS NULL)) GROUP BY bugs.bug_id ORDER BY relevance LIMIT 200 Find a specific bug (non-MySQL): SELECT bugs.bug_id, bugs.bug_severity, bugs.priority, bugs.bug_status, bugs.resolution, map_assigned_to.login_name, map_reporter.login_name, bugs.bug_status, bugs.resolution, bugs.short_desc, (SUM(CASE WHEN (LOWER(longdescs_0.thetext) LIKE '%test%' AND LOWER(longdescs_0.thetext) LIKE '%>assignee%') THEN 1 ELSE 0 END)/COUNT(CASE WHEN (LOWER(longdescs_0.thetext) LIKE '%test%' AND LOWER(longdescs_0.thetext) LIKE '%>assignee%') THEN 1 ELSE 0 END) + CASE WHEN (LOWER(bugs.short_desc) LIKE '%test%' AND LOWER(bugs.short_desc) LIKE '%>assignee%') THEN 1 ELSE 0 END) AS relevance FROM bugs INNER JOIN profiles AS map_assigned_to ON (bugs.assigned_to = map_assigned_to.userid) INNER JOIN profiles AS map_reporter ON (bugs.reporter = map_reporter.userid) INNER JOIN longdescs AS longdescs_0 ON (bugs.bug_id = longdescs_0.bug_id AND longdescs_0.isprivate < 1) LEFT JOIN bug_group_map ON bug_group_map.bug_id = bugs.bug_id WHERE ((bugs.bug_status IN ('UNCONFIRMED','NEW','ASSIGNED','REOPENED'))) AND (((CASE WHEN (LOWER(longdescs_0.thetext) LIKE '%test%' AND LOWER(longdescs_0.thetext) LIKE '%>assignee%') THEN 1 ELSE 0 END > 0) OR (bugs.short_desc REGEXP '(^|[^a-z0-9])test($|[^a-z0-9])' AND bugs.short_desc REGEXP '(^|[^a-z0-9])\\>assignee($|[^a-z0-9])'))) AND bugs.creation_ts IS NOT NULL AND ((bug_group_map.group_id IS NULL)) GROUP BY bugs.bug_id ORDER BY relevance desc LIMIT 200 Advanced search (MySQL): SELECT bugs.bug_id, bugs.bug_severity, bugs.priority, bugs.bug_status, bugs.resolution, map_assigned_to.login_name, map_reporter.login_name, bugs.bug_status, bugs.resolution, bugs.short_desc, (SUM(MATCH(longdescs_0.thetext) AGAINST('test >assignee' IN BOOLEAN MODE))/COUNT(MATCH(longdescs_0.thetext) AGAINST('test >assignee' IN BOOLEAN MODE)) + MATCH(bugs.short_desc) AGAINST('test >assignee' IN BOOLEAN MODE)) AS relevance FROM bugs INNER JOIN profiles AS map_assigned_to ON (bugs.assigned_to = map_assigned_to.userid) INNER JOIN profiles AS map_reporter ON (bugs.reporter = map_reporter.userid) INNER JOIN longdescs AS longdescs_0 ON (bugs.bug_id = longdescs_0.bug_id AND longdescs_0.isprivate < 1) LEFT JOIN bug_group_map ON bug_group_map.bug_id = bugs.bug_id WHERE ((MATCH(longdescs_0.thetext) AGAINST('test >assignee' IN BOOLEAN MODE) > 0)) AND bugs.creation_ts IS NOT NULL AND ((bug_group_map.group_id IS NULL)) GROUP BY bugs.bug_id ORDER BY bugs.bug_id LIMIT 200 Advanced search (non-MySQL): SELECT bugs.bug_id, bugs.bug_severity, bugs.priority, bugs.bug_status, bugs.resolution, map_assigned_to.login_name, map_reporter.login_name, bugs.bug_status, bugs.resolution, bugs.short_desc, (SUM(CASE WHEN (LOWER(longdescs_0.thetext) LIKE '%test%' AND LOWER(longdescs_0.thetext) LIKE '%>assignee%') THEN 1 ELSE 0 END)/COUNT(CASE WHEN (LOWER(longdescs_0.thetext) LIKE '%test%' AND LOWER(longdescs_0.thetext) LIKE '%>assignee%') THEN 1 ELSE 0 END) + CASE WHEN (LOWER(bugs.short_desc) LIKE '%test%' AND LOWER(bugs.short_desc) LIKE '%>assignee%') THEN 1 ELSE 0 END) AS relevance FROM bugs INNER JOIN profiles AS map_assigned_to ON (bugs.assigned_to = map_assigned_to.userid) INNER JOIN profiles AS map_reporter ON (bugs.reporter = map_reporter.userid) INNER JOIN longdescs AS longdescs_0 ON (bugs.bug_id = longdescs_0.bug_id AND longdescs_0.isprivate < 1) LEFT JOIN bug_group_map ON bug_group_map.bug_id = bugs.bug_id WHERE ((CASE WHEN (LOWER(longdescs_0.thetext) LIKE '%test%' AND LOWER(longdescs_0.thetext) LIKE '%>assignee%') THEN 1 ELSE 0 END > 0)) AND bugs.creation_ts IS NOT NULL AND ((bug_group_map.group_id IS NULL)) GROUP BY bugs.bug_id ORDER BY bugs.bug_id LIMIT 200 Some remarks, constatations: 1) The WHERE clause: - The WHERE clause does not look at the bug summary in the Advanced Search, meaning that if the words we are looking for are in the bug summary only, we will miss the bug. What would prevent us from doing something like: WHERE (MATCH(longdescs_0.thetext, bugs.short_desc) AGAINST('test >assignee' IN BOOLEAN MODE) > 0) ? If that's not a performance issue, it would make sense to open a bug to implement this feature. - Why does "Find a specific bug" use LIKE for comments in the WHERE clause, but REGEXP for the bug summary? Couldn't we use LIKE too? Maybe would LIKE be faster than this REGEXP? If yes, open a bug. 2) The ORDER BY clause: - "Find a specific bug" orders the results by relevance. Advanced Search doesn't. That's a bug IMO. Especially because buglist.cgi displays "Bugs on this list are sorted by relevance, with the most relevant bugs at the top.". But we know that's not true, except when using "Find a specific bug". But AFAIK, all these points are not related to your patch. So r=LpSolit
Attachment #189127 -
Flags: review?(LpSolit) → review+
Updated•19 years ago
|
Flags: approval?
Assignee | ||
Comment 14•19 years ago
|
||
> - The WHERE clause does not look at the bug summary in the Advanced Search, > meaning that if the words we are looking for are in the bug summary only, we > will miss the bug. Right. That's a known limitation of Bugzilla which is actually fixed in the Bugzilla tip. Unfortunately, that fix regresses performance significantly for large installations, so we've backed out that patch for b.m.o. > What would prevent us from doing something like: > WHERE (MATCH(longdescs_0.thetext, bugs.short_desc) AGAINST('test >assignee' IN > BOOLEAN MODE) > 0) ? Well, first of all, you can only list multiple columns from the same table in the MATCH function (they have to exactly match a set of columns on which a fulltext index has been created, and you can't create cross-table indexes), so this would return an error. You could instead do two separate MATCHes, but then MySQL won't use the fulltext indexes, and it will be extremely slow (so slow, in fact, that queries will time out on b.m.o and probably many other Bugzilla installations). At one point, I knew why this was the case, but I don't quite remember anymore. Bug 145588, comment 35 says something about it. It may have something to do with MySQL's one index per table limitation, or it may be something about doing a union of summary and comment matches (i.e. summary matches *OR* comments match). > - Why does "Find a specific bug" use LIKE for comments in the WHERE clause, > but REGEXP for the bug summary? Couldn't we use LIKE too? Maybe would LIKE > be faster than this REGEXP? If yes, open a bug. I guess you are referring to the ANSI version. Presumably, the ANSI version is not intended to be used unless absolutely necessary, so great care has not been taken with making it work as well as possible. I imagine that LIKE would be faster but REGEXP would be more accurate (since it's possible to specify word boundaries in it, and fulltext searches are concerned with words, not substring matches). > - "Find a specific bug" orders the results by relevance. Advanced Search > doesn't. That's a bug IMO. Especially because buglist.cgi displays "Bugs on > this list are sorted by relevance, with the most relevant bugs at the top.". > But we know that's not true, except when using "Find a specific bug". Yes. It's tough, since the advanced search has a sort field that always has some value selected, but I imagine that most users searching for content want their results sorted by relevance, so it would make sense to override that value (but we should find a way to distinguish between users who didn't explicitly set an order and those who did, perhaps by changing the default sort order to something like "default"). Thanks for the review! Checking in... Checking in Bugzilla/Search.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/Search.pm,v <-- Search.pm new revision: 1.102; previous revision: 1.101 done Checking in Bugzilla/DB.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB.pm,v <-- DB.pm new revision: 1.57; previous revision: 1.56 done Checking in Bugzilla/DB/Mysql.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Mysql.pm,v <-- Mysql.pm new revision: 1.26; previous revision: 1.25 done
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Flags: approval? → approval+
Resolution: --- → FIXED
Updated•18 years ago
|
Whiteboard: [patchv2 applied to b.m.o] [requires MySQL 4.0.1 or higher] → [requires MySQL 4.0.1 or higher]
Updated•11 years ago
|
QA Contact: matty_is_a_geek → default-qa
You need to log in
before you can comment on or make changes to this bug.
Description
•