Closed Bug 232612 Opened 20 years ago Closed 19 years ago

enable boolean mode fulltext searches

Categories

(Bugzilla :: Query/Bug List, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Bugzilla 2.22

People

(Reporter: myk, Assigned: myk)

References

(Blocks 1 open bug, )

Details

(Whiteboard: [requires MySQL 4.0.1 or higher])

Attachments

(1 file, 4 obsolete files)

Boolean mode fulltext searches provide advanced options for finding bugs and
should be turned on in Bugzilla once we require MySQL 4.0 (the first version of
MySQL to support them).  They are worse at determining relevance, however, so we
may not want to turn them on by default but rather have an option for turning
them on (either explicitly or by scanning the search string for boolean prefixes).
No longer depends on: 204217
Depends on: 204217
Assignee: justdave → myk
Target Milestone: --- → Bugzilla 2.20
Attached patch patch v1: implements feature (obsolete) — Splinter Review
Attachment #140693 - Attachment is obsolete: true
Very MySQL specific. But we knew that already ;) I will try to implement this
somehow in PostgreSQL.
Bugzilla 2.20 feature set is now frozen as of 15 Sept 2004.  Anything flagged
enhancement that hasn't already landed is being pushed out.  If this bug is
otherwise ready to land, we'll handle it on a case-by-case basis, please set the
blocking2.20 flag to '?' if you think it qualifies.
Target Milestone: Bugzilla 2.20 → Bugzilla 2.22
Whiteboard: [patchv2 applied to b.m.o]
Attachment #140694 - Flags: review?
Comment on attachment 140694 [details] [diff] [review]
patch v2: fix for syntax error and indentation

Dave, can you review this simple, simple patch?
Attachment #140694 - Flags: review? → review?(justdave)
Attachment #140694 - Flags: review?(justdave) → review?(LpSolit)
/me notes the dependencies on this bug.

Feature requires MySQL 4.0.1 or newer.

We have two options.
1) Start requiring MySQL 4.0.1 or newer (it's probably a bit premature to
require anything newer than 3.23.x right now because of distribution
difficulties, but that'll *very* likely change within the next year)
2) Check the server version, and only attempt to use this feature if the
installed server is new enough for it.
Comment on attachment 140694 [details] [diff] [review]
patch v2: fix for syntax error and indentation

Tested on my installation (with MySQL 4.0.20) and it works fine. So r=LpSolit.

But we have to increase the minimal version of MySQL to 4.0.1 before checking
in this patch!
Attachment #140694 - Flags: review?(LpSolit) → review+
Status: NEW → ASSIGNED
Whiteboard: [patchv2 applied to b.m.o] → [patchv2 applied to b.m.o] [requires MySQL 4.0.1 or higher]
Attached patch patch v3: unrotted (obsolete) — Splinter Review
This patch unrots the previous one, since the way we handle boolean mode
searches has changed.
Attachment #140694 - Attachment is obsolete: true
Attachment #188992 - Flags: review?(LpSolit)
Comment on attachment 188992 [details] [diff] [review]
patch v3: unrotted

>+            # Figure out if we're dealing with a boolean search.
>+            my $boolean_mode = ($v =~ /[+-<>()~*"]/);

  Why don't we just do that inside of Bugzilla::DB::Mysql's
sql_fulltext_search? Then we'll get it whenever we fulltext-search anything.
Comment on attachment 188992 [details] [diff] [review]
patch v3: unrotted

$boolean_mode description has to be added into the POD docs of DB.pm.

Your patch seems to work as expected, based on the few tests I did. So
r=LpSolit assuming you will post a new patch with the $boolean_mode description
included. (carry forward my r+)
Attachment #188992 - Flags: review?(LpSolit) → review+
Attached patch patch v4: does what Max suggests (obsolete) — Splinter Review
> Why don't we just do that inside of Bugzilla::DB::Mysql's
> sql_fulltext_search? Then we'll get it whenever we fulltext-search anything.

Good point.  Done.  Frederic, this means that the $boolean_mode parameter is no
more, so I haven't added it to the POD documentation.  It also means that patch
is sufficiently different from the way it used to be that it needs another
review.

In particular, the code now SQL quotes the text inside sql_fulltext_search()
rather than in Search.pm, since boolean mode needs to be determined from the
unquoted string (otherwise we'd think all searches were in boolean mode on a
database that quotes with double-quotes).

The new approach is also cleaner, since we're quoting closer to where the
quotes are actually needed, and simpler, since we no longer need to unquote the
string before using it in the ANSI version of the method.
Attachment #188992 - Attachment is obsolete: true
Attachment #189094 - Flags: review?(LpSolit)
We don't need to trick taint about @words, since we derive @words via a regexp,
which suffices to detaint them.
Attachment #189094 - Attachment is obsolete: true
Attachment #189127 - Flags: review?(LpSolit)
Attachment #189094 - Flags: review?(LpSolit)
Blocks: 300548
Comment on attachment 189127 [details] [diff] [review]
patch v5: no need to trick taint about @words 

I did some searches using both the "Find a specific bug" and "Advanced Search".
Here are the typical SQL statements I get:


Find a specific bug (MySQL):

SELECT bugs.bug_id, bugs.bug_severity, bugs.priority, bugs.bug_status,
bugs.resolution, map_assigned_to.login_name, map_reporter.login_name,
bugs.bug_status, bugs.resolution, bugs.short_desc,
(SUM(MATCH(longdescs_0.thetext) AGAINST('test >assignee' IN BOOLEAN
MODE))/COUNT(MATCH(longdescs_0.thetext) AGAINST('test >assignee' IN BOOLEAN
MODE)) + MATCH(bugs.short_desc) AGAINST('test >assignee' IN BOOLEAN MODE)) AS
relevance FROM bugs INNER JOIN profiles AS map_assigned_to ON (bugs.assigned_to
= map_assigned_to.userid) INNER JOIN profiles AS map_reporter ON (bugs.reporter
= map_reporter.userid) INNER JOIN longdescs AS longdescs_0 ON (bugs.bug_id =
longdescs_0.bug_id AND longdescs_0.isprivate < 1) LEFT JOIN bug_group_map ON
bug_group_map.bug_id = bugs.bug_id WHERE ((bugs.bug_status IN
('UNCONFIRMED','NEW','ASSIGNED','REOPENED'))) AND (((MATCH(longdescs_0.thetext)
AGAINST('test >assignee' IN BOOLEAN MODE) > 0) OR (bugs.short_desc REGEXP
'(^|[^a-z0-9])test($|[^a-z0-9])' AND bugs.short_desc REGEXP
'(^|[^a-z0-9])\\>assignee($|[^a-z0-9])'))) AND bugs.creation_ts IS NOT NULL AND
((bug_group_map.group_id IS NULL)) GROUP BY bugs.bug_id ORDER BY relevance
LIMIT 200



Find a specific bug (non-MySQL):

SELECT bugs.bug_id, bugs.bug_severity, bugs.priority, bugs.bug_status,
bugs.resolution, map_assigned_to.login_name, map_reporter.login_name,
bugs.bug_status, bugs.resolution, bugs.short_desc, (SUM(CASE WHEN
(LOWER(longdescs_0.thetext) LIKE '%test%' AND LOWER(longdescs_0.thetext) LIKE
'%>assignee%') THEN 1 ELSE 0 END)/COUNT(CASE WHEN (LOWER(longdescs_0.thetext)
LIKE '%test%' AND LOWER(longdescs_0.thetext) LIKE '%>assignee%') THEN 1 ELSE 0
END) + CASE WHEN (LOWER(bugs.short_desc) LIKE '%test%' AND
LOWER(bugs.short_desc) LIKE '%>assignee%') THEN 1 ELSE 0 END) AS relevance FROM
bugs INNER JOIN profiles AS map_assigned_to ON (bugs.assigned_to =
map_assigned_to.userid) INNER JOIN profiles AS map_reporter ON (bugs.reporter =
map_reporter.userid) INNER JOIN longdescs AS longdescs_0 ON (bugs.bug_id =
longdescs_0.bug_id AND longdescs_0.isprivate < 1) LEFT JOIN bug_group_map ON
bug_group_map.bug_id = bugs.bug_id WHERE ((bugs.bug_status IN
('UNCONFIRMED','NEW','ASSIGNED','REOPENED'))) AND (((CASE WHEN
(LOWER(longdescs_0.thetext) LIKE '%test%' AND LOWER(longdescs_0.thetext) LIKE
'%>assignee%') THEN 1 ELSE 0 END > 0) OR (bugs.short_desc REGEXP
'(^|[^a-z0-9])test($|[^a-z0-9])' AND bugs.short_desc REGEXP
'(^|[^a-z0-9])\\>assignee($|[^a-z0-9])'))) AND bugs.creation_ts IS NOT NULL AND
((bug_group_map.group_id IS NULL)) GROUP BY bugs.bug_id ORDER BY relevance desc
LIMIT 200



Advanced search (MySQL):

SELECT bugs.bug_id, bugs.bug_severity, bugs.priority, bugs.bug_status,
bugs.resolution, map_assigned_to.login_name, map_reporter.login_name,
bugs.bug_status, bugs.resolution, bugs.short_desc,
(SUM(MATCH(longdescs_0.thetext) AGAINST('test >assignee' IN BOOLEAN
MODE))/COUNT(MATCH(longdescs_0.thetext) AGAINST('test >assignee' IN BOOLEAN
MODE)) + MATCH(bugs.short_desc) AGAINST('test >assignee' IN BOOLEAN MODE)) AS
relevance FROM bugs INNER JOIN profiles AS map_assigned_to ON (bugs.assigned_to
= map_assigned_to.userid) INNER JOIN profiles AS map_reporter ON (bugs.reporter
= map_reporter.userid) INNER JOIN longdescs AS longdescs_0 ON (bugs.bug_id =
longdescs_0.bug_id AND longdescs_0.isprivate < 1) LEFT JOIN bug_group_map ON
bug_group_map.bug_id = bugs.bug_id WHERE ((MATCH(longdescs_0.thetext)
AGAINST('test >assignee' IN BOOLEAN MODE) > 0)) AND bugs.creation_ts IS NOT
NULL AND ((bug_group_map.group_id IS NULL)) GROUP BY bugs.bug_id ORDER BY
bugs.bug_id LIMIT 200



Advanced search (non-MySQL):

SELECT bugs.bug_id, bugs.bug_severity, bugs.priority, bugs.bug_status,
bugs.resolution, map_assigned_to.login_name, map_reporter.login_name,
bugs.bug_status, bugs.resolution, bugs.short_desc, (SUM(CASE WHEN
(LOWER(longdescs_0.thetext) LIKE '%test%' AND LOWER(longdescs_0.thetext) LIKE
'%>assignee%') THEN 1 ELSE 0 END)/COUNT(CASE WHEN (LOWER(longdescs_0.thetext)
LIKE '%test%' AND LOWER(longdescs_0.thetext) LIKE '%>assignee%') THEN 1 ELSE 0
END) + CASE WHEN (LOWER(bugs.short_desc) LIKE '%test%' AND
LOWER(bugs.short_desc) LIKE '%>assignee%') THEN 1 ELSE 0 END) AS relevance FROM
bugs INNER JOIN profiles AS map_assigned_to ON (bugs.assigned_to =
map_assigned_to.userid) INNER JOIN profiles AS map_reporter ON (bugs.reporter =
map_reporter.userid) INNER JOIN longdescs AS longdescs_0 ON (bugs.bug_id =
longdescs_0.bug_id AND longdescs_0.isprivate < 1) LEFT JOIN bug_group_map ON
bug_group_map.bug_id = bugs.bug_id WHERE ((CASE WHEN
(LOWER(longdescs_0.thetext) LIKE '%test%' AND LOWER(longdescs_0.thetext) LIKE
'%>assignee%') THEN 1 ELSE 0 END > 0)) AND bugs.creation_ts IS NOT NULL AND
((bug_group_map.group_id IS NULL)) GROUP BY bugs.bug_id ORDER BY bugs.bug_id
LIMIT 200


Some remarks, constatations:

1) The WHERE clause:

- The WHERE clause does not look at the bug summary in the Advanced Search,
meaning that if the words we are looking for are in the bug summary only, we
will miss the bug.

What would prevent us from doing something like:
WHERE (MATCH(longdescs_0.thetext, bugs.short_desc) AGAINST('test >assignee' IN
BOOLEAN MODE) > 0)  ?

If that's not a performance issue, it would make sense to open a bug to
implement this feature.

- Why does "Find a specific bug" use LIKE for comments in the WHERE clause, but
REGEXP for the bug summary? Couldn't we use LIKE too? Maybe would LIKE be
faster than this REGEXP? If yes, open a bug.


2) The ORDER BY clause:

- "Find a specific bug" orders the results by relevance. Advanced Search
doesn't. That's a bug IMO. Especially because buglist.cgi displays "Bugs on
this list are sorted by relevance, with the most relevant bugs at the top.".
But we know that's not true, except when using "Find a specific bug".


But AFAIK, all these points are not related to your patch. So r=LpSolit
Attachment #189127 - Flags: review?(LpSolit) → review+
Flags: approval?
> - The WHERE clause does not look at the bug summary in the Advanced Search,
> meaning that if the words we are looking for are in the bug summary only, we
> will miss the bug.

Right.  That's a known limitation of Bugzilla which is actually fixed in the
Bugzilla tip.  Unfortunately, that fix regresses performance significantly for
large installations, so we've backed out that patch for b.m.o.


> What would prevent us from doing something like:
> WHERE (MATCH(longdescs_0.thetext, bugs.short_desc) AGAINST('test >assignee' IN
> BOOLEAN MODE) > 0)  ?

Well, first of all, you can only list multiple columns from the same table in
the MATCH function (they have to exactly match a set of columns on which a
fulltext index has been created, and you can't create cross-table indexes), so
this would return an error.

You could instead do two separate MATCHes, but then MySQL won't use the fulltext
indexes, and it will be extremely slow (so slow, in fact, that queries will time
out on b.m.o and probably many other Bugzilla installations).

At one point, I knew why this was the case, but I don't quite remember anymore.
 Bug 145588, comment 35 says something about it.  It may have something to do
with MySQL's one index per table limitation, or it may be something about doing
a union of summary and comment matches (i.e. summary matches *OR* comments match).


> - Why does "Find a specific bug" use LIKE for comments in the WHERE clause, 
> but REGEXP for the bug summary? Couldn't we use LIKE too? Maybe would LIKE
> be faster than this REGEXP? If yes, open a bug.

I guess you are referring to the ANSI version.  Presumably, the ANSI version is
not intended to be used unless absolutely necessary, so great care has not been
taken with making it work as well as possible.  I imagine that LIKE would be
faster but REGEXP would be more accurate (since it's possible to specify word
boundaries in it, and fulltext searches are concerned with words, not substring
matches).


> - "Find a specific bug" orders the results by relevance. Advanced Search
> doesn't. That's a bug IMO. Especially because buglist.cgi displays "Bugs on
> this list are sorted by relevance, with the most relevant bugs at the top.".
> But we know that's not true, except when using "Find a specific bug".

Yes.  It's tough, since the advanced search has a sort field that always has
some value selected, but I imagine that most users searching for content want
their results sorted by relevance, so it would make sense to override that value
(but we should find a way to distinguish between users who didn't explicitly set
an order and those who did, perhaps by changing the default sort order to
something like "default").


Thanks for the review!  Checking in...

Checking in Bugzilla/Search.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Search.pm,v  <--  Search.pm
new revision: 1.102; previous revision: 1.101
done
Checking in Bugzilla/DB.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB.pm,v  <--  DB.pm
new revision: 1.57; previous revision: 1.56
done
Checking in Bugzilla/DB/Mysql.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Mysql.pm,v  <--  Mysql.pm
new revision: 1.26; previous revision: 1.25
done
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Flags: approval? → approval+
Resolution: --- → FIXED
Whiteboard: [patchv2 applied to b.m.o] [requires MySQL 4.0.1 or higher] → [requires MySQL 4.0.1 or higher]
QA Contact: matty_is_a_geek → default-qa
You need to log in before you can comment on or make changes to this bug.