Closed Bug 137011 Opened 22 years ago Closed 20 years ago

most summary searches take 13-17 seconds

Categories

(Bugzilla :: Query/Bug List, defect)

2.10
x86
Windows 98
defect
Not set
major

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: jruderman, Assigned: nobody)

References

Details

(Keywords: perf)

Simple summary searches are very slow.  This slows me down a lot, since most of
my searches are in one of these forms:
+foo               summary contains foo
foo                (summary or component or product or whiteboard) contains foo
foo bar            (foo and bar)         using the atoms above for foo, bar
foo,bar baz,zap    (foo or bar) and (baz or zap)

Rough measurements of search times:

bookmarklets                                          15 s (2 hits)
foopy                                                 15 s (1 hit)
ALL bookmarklets                                      14 s
+bookmarklets                                         13 s
ALL foopy                                             17 s

table reflow                                          44 s
opacity +select                                       45 s (0 hits)

comment by jruderman@hmc.edu contains 'bookmarklets'   2 s (57 hits)
comment by ksosez@softhome.net contains 'duplic'       3 s (1172 hits)

I did the comment-contains searches using the comment-contains form on
http://www.cs.hmc.edu/~jruderma/s/bugz.html and the rest using
http://www.cs.hmc.edu/~jruderma/quicksearch.html.
Keywords: perf
I think mysql doesn't use the index for case insensitive substring searches. I'd
have to test, though.
Using a case-sensitive summary search only reduces the search time from 17s to 12s.
This would be because, um, there isn't an index on short_desc.

short_desc is also a mediumtext, which can hold up to 16777215 characters. See
bug 96431.

Indexes on medium text are only supported port 3.23.2, and bugzilla currently
supports older mysql versions - see bug 87958

(Note that this bug only depends on one of the above bugs being fixed)

However, the other problem is that an index can only work on the beginning part
of the string. (Later mysql verions also have fulltext searches, but that only
matches whole words). so searching in a substring can't use an index - which
part of a string would you index?

Not sure if there is a way arround that.

myk, any ideas?
Depends on: 87958, 96431
Don't think there's anything we can do about this, so moving off to Future.
Target Milestone: --- → Future
Most of the examples given by the reporter are full word searches, not substring
searches, and the former can indeed be handled by fulltext searches, although
the full power of fulltext searches is only available in MySQL 4.0.1+.  
No longer blocks: bz-perf
Most of my searches are substring searches, not full-word searches.  For
example, I often search for "auto compl" to get "auto-complete", "autocomplete",
and "auto completion".  I'm sorry I wasn't clear with my examples.
Would changing short_desc to a more sensible length and adding an index provide
a big performance improvement, or probably not?

Gerv
Nope. You can't use an index unless you're searching form the beginning (eg LIKE
'Foo%'), because thast where the index is based on.

If we have full text indexing (*cough* Pg *cough* :-), then we could use that
for 'all words' searches. (Yes, I know mysql has it too, but its only in 4.0 IIRC)

The basic problem is that searching for a substring cannot use an index,
directly - what would you index (OK, you could index every substring, and Pg's
full text stuff may evne give you an option to do so. I shudder to think what
the index size would be, though)
I think 3.23.31 and up (or something like that) has fulltext indexes.  We
weren't using 4.0 on Syndicomm, and we used the fulltext indexes for message
board searches.
mysql doesn't have FT indexing, though. What 3.23 has is full text _matching_.
Its probably an artificial distinction, mind you, although we do want the
boolean stuff. It also doesn't do substrings, only whole word searches.

See
http://www.mysql.com/documentation/mysql/bychapter/manual_Reference.html#Fulltext_Search

We need 4.01 to do boolean matches, which is really what we want. The 3.23
version is not only less flexable, it won't return results opccuring in more
than half the rows...

Both mysql and Pg have different defintions of 'words' though, neither of which
match our current definition. They also both have stopwords, and I don't know if
we want those.

I also don't want to think about how big this index will be...
OK, then why did we have to create indexes in the database before we could use it?

OK, here we go.  The very first sentence of URL you just referred me to, and I
quote "As of Version 3.23.23, MySQL has support for full-text indexing"
but yes, the indexes can get pretty big.
Bleh, lets try this again. :)

What I _meant_ to say was that indexes of type fulltext (mysql or pg) do not
provide the functionality to replace the current 'all words as substring'
option. We could use it for 'all words', but since thats not the common option,
and the index would be massive, I don't think its worth it.

PG's FTS stuff allows stemming, which may make it OK to use (see
http://openfts.sourceforge.net/tsearch/README.tsearch for the pg index plugin)
Target Milestone: Future → Bugzilla 2.20
Summary searches on bugzilla.mozilla.org are a lot faster than when I reported
this bug.
Assignee: endico → nobody
wow, lots of stuff here
We have a fulltext index on short_desc, now, since bug 145588.

Feel free to re-open this if you still have any troubles.
Status: NEW → RESOLVED
Closed: 20 years ago
Depends on: 145588
No longer depends on: 96431
Resolution: --- → WORKSFORME
clearing target of DUPLICATE/WONTFIX/INVALID/WORKSFORME so they'll show up as
untriaged if they get reopened.
Target Milestone: Bugzilla 2.20 → ---
QA Contact: matty_is_a_geek → default-qa
You need to log in before you can comment on or make changes to this bug.