Closed Bug 900922 Opened 11 years ago Closed 5 years ago

Simple quote (') and apostrophe (’) don't cut words

Categories

(support.mozilla.org :: Search, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: scoobidiver, Unassigned)

Details

Attachments

(1 file)

The apostrophe (’) sometimes written as simple quote (') doesn't separate two words leading to no results while there are some. It's often used in French and Italian. See https://support.mozilla.org/fr/search?q=l%E2%80%99accueil or https://support.mozilla.org/fr/search?q=l%27accueil To partially worked around that, accueil has been set as a keyword in https://support.mozilla.org/fr/kb/comment-definir-page-accueil and https://support.mozilla.org/fr/kb/page-accueil-defaut-firefox-a-acces-rapide-vers-fonctionnalites-courantes for a search like https://support.mozilla.org/fr/search?q=accueil
Not sure if there is anything we can do here. Mike?
ElasticSearch provides two French capable analyzers. One of them is Snowball, which has a setting for french, and the other is the analyzer simply called French. I assumed that Snowball would be the better of the two. I did some quick checks, and French seems to deal with word combinations like d'accueil and l'accueil better than Snowball. I don't just want to jump at the first sign of trouble though, so I want to do some more research here. However, modifying the analyzers beyond what they do natively would probably be more trouble than it is worth.
Some more thought on this: We should get a sample of French search queries, and see how each of the analyzers deals with them. Then someone who speaks French can pick which one looks better. We should probably also do the same with some text of popular SUMO KB articles in French. Ibai: Can you get a list of common searches in French on SUMO? I don't know how to work the Google Analytics interface well enough to find the information I want.
Flags: needinfo?(ibai)
I can: https://docs.google.com/spreadsheet/ccc?key=0AiYrsx83iOr0dDdtejVXR2x3YjIxd0ZQOXFsRGNwWEE&usp=sharing It's a pretty rough list. I hope it's enough. Let me know if it doesn't and I can get an unsampled report.
I think there's a module 45 somewhere and spaces at the end should be removed to aggregate the same terms. That said, it helped me to tweak keywords for common search terms and typo. Does French contain common typo such as words without accent? I will attach a file based on the one from comment 4 and that displays the assumed expected article and the current search result rank.
Flags: needinfo?(ibai)
Priority: -- → P2
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: