Certain characters in some languages cause new sumo search indexing to fail

VERIFIED FIXED in 0.8

Status

support.mozilla.org
Knowledge Base Software
VERIFIED FIXED
9 years ago
8 years ago

People

(Reporter: nkoth, Assigned: nkoth)

Tracking

unspecified
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: sumo_only oldsearch, URL)

Attachments

(1 attachment)

(Assignee)

Description

9 years ago
The languages are (Korean, Finnish, Polish, Russian, Slovak,
Ukrainian, Japanese).
(Assignee)

Comment 1

9 years ago
Proposed solution: write a logging system that finds out which characters these are whenever it crashes and to exclude them from indexing using a filter

But to check upstream (Sphinx). Laura, did you say you have a contact with someone.
Target Milestone: --- → 0.7.2

Updated

9 years ago
Assignee: nobody → laura

Comment 2

9 years ago
Run out of time for these.
Target Milestone: 0.7.2 → 0.7.3
(Assignee)

Comment 3

9 years ago
After upgrading to Sphinx 0.9.8.1 and testing this on my dev server, it seems to affect other languages as well (but English is fine). Could this be more to do with the writing out of the XML files/server config of some sort?

To reproduce this problem, edit scripts/sphinx/indexer.php, change the AND tiki_pages.`lang` IN filter in the MySQL query, and then generate the XML by running this.

(There is also UI in tiki-admin_newsearch.php that triggers the indexing.)
(Assignee)

Comment 4

9 years ago
Can others who have a development environment reproduce this?

Updated

9 years ago
Target Milestone: 0.7.3 → 0.8
(Assignee)

Comment 5

9 years ago
Update: I seem to have overcome this by making sure every single thing (tags were left out) are being utf8_encode

Now to see if we can search for these international documents...
Assignee: laura → nelson
(Assignee)

Comment 6

9 years ago
Created attachment 351847 [details] [diff] [review]
patch for multilingual search

This fixes it for me, but can others check as well?

The way to trigger a manual indexing: https://bugzilla.mozilla.org/show_bug.cgi?id=464851#c6

The test case I used: searched for 安装 in Simplified Chinese. You should be getting a list of chinese articles with that, followed by a list of english forum threads with "installation" in them. 安装 means "installation.

Sometimes (notably when I paste the search terms into the search box and press enter instead of using the submit button) I get garbled text in the search results. That might be a separate bug... but if anyone knows why, let me know. (a Firefox bug?)
Attachment #351847 - Flags: review?(laura)
(Assignee)

Updated

9 years ago
Attachment #351847 - Flags: review?(smirkingsisyphus)

Updated

9 years ago
Attachment #351847 - Flags: review?(laura) → review+

Updated

9 years ago
Blocks: 467486

Comment 7

9 years ago
I don't have the permissions to grant reviews yet, but applying the patch and re-indexing works as it should.

To test, I searched for "のカスタマイズ" in Japanese and proper results for Japanese and English (customization/customizing).

Updated

9 years ago
Attachment #351847 - Flags: review?(smirkingsisyphus) → review+
(Assignee)

Comment 8

9 years ago
in r20605/r20606
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
(Assignee)

Updated

9 years ago
Duplicate of this bug: 467486
Verified FIXED using http://support-stage.mozilla.org/tiki-newsearch.php?locale=en&q=%E3%81%AE%E3%82%AB%E3%82%B9%E3%82%BF%E3%83%9E%E3%82%A4%E3%82%BA&where=all&l=ja&filter_lang=1&author=&filter_author=0&en_too=0&type=0&answered=0&lastmodif=0&offset=0.
Status: RESOLVED → VERIFIED

Updated

8 years ago
Whiteboard: sumo_only oldsearch
You need to log in before you can comment on or make changes to this bug.