The languages are (Korean, Finnish, Polish, Russian, Slovak, Ukrainian, Japanese).
Proposed solution: write a logging system that finds out which characters these are whenever it crashes and to exclude them from indexing using a filter But to check upstream (Sphinx). Laura, did you say you have a contact with someone.
Run out of time for these.
After upgrading to Sphinx 0.9.8.1 and testing this on my dev server, it seems to affect other languages as well (but English is fine). Could this be more to do with the writing out of the XML files/server config of some sort? To reproduce this problem, edit scripts/sphinx/indexer.php, change the AND tiki_pages.`lang` IN filter in the MySQL query, and then generate the XML by running this. (There is also UI in tiki-admin_newsearch.php that triggers the indexing.)
Can others who have a development environment reproduce this?
Update: I seem to have overcome this by making sure every single thing (tags were left out) are being utf8_encode Now to see if we can search for these international documents...
Created attachment 351847 [details] [diff] [review] patch for multilingual search This fixes it for me, but can others check as well? The way to trigger a manual indexing: https://bugzilla.mozilla.org/show_bug.cgi?id=464851#c6 The test case I used: searched for 安装 in Simplified Chinese. You should be getting a list of chinese articles with that, followed by a list of english forum threads with "installation" in them. 安装 means "installation. Sometimes (notably when I paste the search terms into the search box and press enter instead of using the submit button) I get garbled text in the search results. That might be a separate bug... but if anyone knows why, let me know. (a Firefox bug?)
I don't have the permissions to grant reviews yet, but applying the patch and re-indexing works as it should. To test, I searched for "のカスタマイズ" in Japanese and proper results for Japanese and English (customization/customizing).
Verified FIXED using http://support-stage.mozilla.org/tiki-newsearch.php?locale=en&q=%E3%81%AE%E3%82%AB%E3%82%B9%E3%82%BF%E3%83%9E%E3%82%A4%E3%82%BA&where=all&l=ja&filter_lang=1&author=&filter_author=0&en_too=0&type=0&answered=0&lastmodif=0&offset=0.