Closed
Bug 447933
Opened 16 years ago
Closed 16 years ago
Flat search for "better" doesn't find "better gmail" or "better gmail 2" add-ons
Categories
(addons.mozilla.org Graveyard :: Public Pages, defect)
addons.mozilla.org Graveyard
Public Pages
Tracking
(Not tracked)
VERIFIED
FIXED
4.0.2
People
(Reporter: stephend, Assigned: cpollett)
References
()
Details
Attachments
(1 obsolete file)
Sadly, I don't know if this is a regression; I hope this isn't a duplicate, either A search for "better" doesn't yield either the "Better Gmail" or "Better Gmail 2" add-ons: https://addons.mozilla.org/en-US/firefox/search?q=better However, an explicit search for "better gmail" yields both: https://addons.mozilla.org/en-US/firefox/search?q=better+gmail&cat=all (I have added this testcase to my Selenium test suite.)
Assignee | ||
Comment 3•16 years ago
|
||
I think this is caused because "better" is a mysql full text stopword. So a search on it doesn't give anything back.
Comment 4•16 years ago
|
||
Correct. "Better" is not indexed, so you can't find it. However, other words *containing* the stopword (composites as "BetterGReader") will be found, which is the reason we are seeing 7 results here, all of which contain such composites. A more obvious example is searching for "the" which will mostly return *Themes* as "theme" contains "the". However, the is a stopword, so it won't be indexed or matched.
Assignee | ||
Comment 5•16 years ago
|
||
I looked at the mysql sources. The default stopword list is English only. http://www.ranks.nl/stopwords/ has suggested stopword lists for other languages. You can only have one stopword list at a time. So we can give a warning when someone is using one of the default stopwords but the behaviour might seem a little funky if the locale is not English.
Comment 6•16 years ago
|
||
Why funky? If a word is not on the list, it will end up in the index, so for example German articles "der", "die", and "das" will get indexed and thus there's no need to warn users searching for them. I guess, if "better" means something very special in another language, they'd be confused if it said, it's a common word, but we need to warn them either way.
Comment 7•16 years ago
|
||
OK, not really a comment for this bug but...let's fix this temporarily if we can... but we should really consider a more robust search solution, (does one exist?) that allows us to manage separate stoplist and give us flexibility in our ranking. Spelling suggestions, special boolean searches, etc..
Comment 8•16 years ago
|
||
We may want to investigate http://www.sphinxsearch.com/features.html
Assignee | ||
Comment 10•16 years ago
|
||
The proposed patch prints a message like: "better" as a frequently occurring word is not indexed except as a substring of longer words. for each stop-word among the search terms. The list of stop-words in the patch is a hard-coded array in search_controller because the default mysql stop-words are actually compiled into the mysql executable and are not directly accessible by querying the DB. For reasons mentioned in previous comments English stop-words are used for all locales because Mysql supports only one stop-word file and by default it uses English words.
Attachment #331584 -
Flags: review?(fwenzel)
Comment 12•16 years ago
|
||
I'm sorry to say, just warning people not to use a list of words isn't going to be a good idea, even temporarily. I just posted bug 456206 a moment ago (thanks for duping; would never have guessed this one) and the problem there is that there is an extension named simply "Brief" which is apparently also a stop-word. Thus, it's impossible to search for this extension by name at all. (and a user can't install by name from FF3's Get Add-ons) A better solution is going to be needed here.
Comment 14•16 years ago
|
||
From bug 457952 comment 9: "Short term remedy: We strip stop words from the queries before executing them."
Assignee | ||
Comment 15•16 years ago
|
||
That seems reasonable. One could probably also just get rid of stopwords altogether within the mysql config file without increasing DB load that much.
Comment 16•16 years ago
|
||
Comment on attachment 331584 [details] [diff] [review] patch to warn when someone is using stop words Hi, Chris! Just r-ing this, not because your code is bad but because according to the discussion here, warning people about stop words is not the way to go.
Attachment #331584 -
Flags: review?(fwenzel) → review-
Comment 17•16 years ago
|
||
(In reply to comment #15) > That seems reasonable. One could probably also just get rid of stopwords > altogether within the mysql config file without increasing DB load that much. What concerns me is that this is a server-wide setting: *Every* project using these DBs and possibly wanting to employ full text search will need to live with an empty stop word list then. CCing xb95: Mark, are the DB servers AMO runs on dedicated to the project, or do they carry other project's DBs as well? In the latter case, we'd at the very least need to be aware of the possible side effects for other projects.
Comment 18•16 years ago
|
||
The AMO databases are dedicated, they serve no other purpose (well, besides SAMO/VAMO stuff). If you want to remove stopwords, we can do that. I see no reason not to.
Comment 19•16 years ago
|
||
Thanks. We'll plan this on the webdev side, then move forward with IT (server restart, rebuilding indexes, probably asks for a maintenance window).
Assignee | ||
Comment 20•16 years ago
|
||
This might be the better solution. Otherwise, we have to maintain a list of mysql stopwords to ignore in the amo code, which strikes me as somewhat brittle.
Comment 21•16 years ago
|
||
"Tab mix plus" (no quotes) finds the add-on as expected now, as the first result, as we removed stop words. To my confusion, "better" (n.q.) still does not find "better gmail". Searching for "better gmail" (n.q.) however, finds it as the first result in the set. What is possible is that somebody searched for "better" recently so the result is still cached. We'll need to check again in a little while. If it still doesn't show up, it could be that more than 50% of all add-ons use "better" somewhere in their descriptions (cf. bug 458110 comment 13), leading to it still not being indexed, though I find that quite unlikely.
Comment 22•16 years ago
|
||
(In reply to comment #21) > What is possible is that somebody searched for "better" recently so the result > is still cached. We'll need to check again in a little while. Yup: Cache has been flushed, searching for "better" (no quotes) now returns better Gmail and better gmail 2 on top of the result set. Crowd, please cheer. :) Chris and Mark: Thank you for your support in this issue! Happily marking FIXED.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 23•16 years ago
|
||
I've also re-run my Selenium tests, and confirmed Fred's results, as well as that we haven't regressed performance (at least by Selenium's timings): we're still averaging 24 seconds for the Search testcase (http://svn.mozilla.org/addons/trunk/site/app/tests/search.html) on https://wiki.mozilla.org/QA/Tools/Selenium/AMO_Automation, over 20 runs. Verified FIXED, as I don't think there's anything left to do?
Status: RESOLVED → VERIFIED
Comment 24•16 years ago
|
||
*cheers* Just tested my particular case (bug 456206/comment 12) and searching for "Brief" now finds Brief as the first result. :) Is there some way we can track the possible performance hit over the next week or so, in addition to tests?
Updated•16 years ago
|
Attachment #331584 -
Attachment is obsolete: true
Comment 25•16 years ago
|
||
(In reply to comment #24) > Is there some way we can track the possible performance hit over the next week > or so, in addition to tests? I don't think this is necessary: A comparison of the search table size before and after did show an increase, however, it is still only in the single-digit (!) megabyte dimension, which is very small, in fact much smaller than I expected. Monitoring performance is more important when the queries change (bug 446122), as that can have a considerable impact on how expensive search is, and thus how well it performs.
Updated•16 years ago
|
Target Milestone: --- → 4.0.2
Updated•8 years ago
|
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•