Closed
Bug 321427
Opened 19 years ago
Closed 16 years ago
Advanced search for Turkish İ - Unicode Character LATIN CAPITAL LETTER I WITH DOT ABOVE - U+0130 İ fails
Categories
(Bugzilla :: Query/Bug List, defect)
Bugzilla
Query/Bug List
Tracking
()
RESOLVED
FIXED
Bugzilla 3.2
People
(Reporter: gangleri, Assigned: mkanat)
References
()
Details
Attachments
(1 file, 1 obsolete file)
3.16 KB,
patch
|
LpSolit
:
review+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5 Hallo! See first: http://www.fileformat.info/info/unicode/char/0130/index.htm Unicode Character LATIN CAPITAL LETTER I WITH DOT ABOVE - U+0130 TML Entity (decimal) İ (hex) İ UTF-8 (hex) 0xC4 0xB0 (c4b0) &c4%b0 &C4%B0 Please read: http://bugzilla.wikimedia.org/show_bug.cgi?id=2761 == [Bug MediaZilla 2761]: Capitalization of "i" is not "I" in Turkish LATIN CAPITAL LETTER I WITH DOT ABOVE is contained both inside a comment of http://landfill.bugzilla.org/bugzilla-tip/show_bug.cgi?id=3296 and inside " Keywords:" at the same bug. Nevertheless Advanced serarch fails to find this bug at landfill. I noticed this problem because http://bugzilla.wikimedia.org/query.cgi?format=advanced generates false positives / pages that should not belong to the search result. best regards reinhardt [[user:gangleri]] Reproducible: Always Steps to Reproduce: every time - follow the instructions use copy and paste to insert the special characters - or use the keyboard as described at http://www.fileformat.info/info/unicode/char/0130/index.htm - or change the search url using &long_desc=%C4%B0 Actual Results: "Zarro Boogs found." Expected Results: only bugs containig LATIN CAPITAL LETTER I WITH DOT ABOVE - U+0130 should be found *notes* Search should work independend of the language interface. As the functions (whatever called) capital() and lowercase() are language dependent Bugzilla should offer an "exact search option" where *no* "normalisation" to the search string should aplay. Such a feature would be better then actual behaviour.
Updated•19 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reporter | ||
Updated•19 years ago
|
Summary: Advanced search for Turkish capital of Unicode Character LATIN CAPITAL LETTER I WITH DOT ABOVE - U+0130 İ fails → Advanced search for Turkish İ - Unicode Character LATIN CAPITAL LETTER I WITH DOT ABOVE - U+0130 İ fails
Reporter | ||
Comment 1•19 years ago
|
||
This bug is about Advanced search at landfill. not simple search / "Find a Specific Bug". https://bugzilla.mozilla.org/show_bug.cgi?id=316836 == Search bugs http://bugzilla.wikimedia.org/query.cgi?format=specific does not handle Unicode strings correctly being marked as a duplicate of https://bugzilla.mozilla.org/show_bug.cgi?id=126266 == Use UTF-8 (Unicode) charset encoding for pages and email for NEW installations "Find a Specific Bug" works at Landwill with İ
Comment 2•19 years ago
|
||
http://landfill.bugzilla.org/bugzilla-tip/buglist.cgi?query_format=advanced&long_desc=%C4%B0&long_desc_type=regexp finds the bug. http://landfill.bugzilla.org/bugzilla-tip/buglist.cgi?query_format=advanced&long_desc=%C4%B0&long_desc_type=allwordssubstr doesn't. Strange. MySQL?
Assignee | ||
Comment 3•17 years ago
|
||
This should work now in Bugzilla 3.0.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → WORKSFORME
Comment 4•17 years ago
|
||
Clicking the links in comment 2 shows it doesn't.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Assignee | ||
Comment 5•17 years ago
|
||
Hrm. Maybe with a different MySQL collation this would work properly?
Comment 6•17 years ago
|
||
The search for UTF-8 is completely broken. I just install clean Bugzilla-3.0.2, and search doesn't work if i search something outside latin1 encoding.
Updated•17 years ago
|
Flags: blocking3.2?
Assignee | ||
Comment 7•17 years ago
|
||
Okay, this should definitely at least be looked into before 3.2.
Flags: blocking3.2? → blocking3.2+
Updated•16 years ago
|
Status: REOPENED → NEW
Target Milestone: --- → Bugzilla 3.2
Comment 8•16 years ago
|
||
(In reply to comment #2) > buglist.cgi?long_desc=%C4%B0&long_desc_type=regexp > finds the bug. > buglist.cgi?long_desc=%C4%B0&long_desc_type=allwordssubstr > doesn't. Let's add a 3rd query: buglist.cgi?long_desc_type=casesubstring&long_desc=%C4%B0 Appending &debug=1 to all three queries shows that: 1) the regexp one uses: longdescs_.thetext REGEXP 'İ' 2) the allwordssubstr one (case insensitive) uses: INSTR(CAST(LOWER(longdescs_.thetext) AS BINARY), CAST('i̇' AS BINARY)) > 0 3) the casesubstring one (case sensitive) uses: INSTR(CAST(longdescs_.thetext AS BINARY), CAST('İ' AS BINARY)) > 0 So the problem seems to be that 'i̇' is not seen as the lowercase flavor of 'İ', and so MySQL returns no match.
Comment 9•16 years ago
|
||
I tested with PostgreSQL 8.2.6, and it has the same problem.
Comment 10•16 years ago
|
||
In Search::GetByWordListSubstr(), I tried replacing (using PostgreSQL): push(@list, $dbh->sql_position(lc($sql_word), "LOWER($field)") . " > 0"); by: push(@list, $dbh->sql_position("LOWER($sql_word)", "LOWER($field)") . " > 0"); but this doesn't help. Instead of 0 bugs, it now returns all bugs.
Comment 11•16 years ago
|
||
As reported by bbaetz on IRC, there isn't a one to one mapping between lowercase and uppercase for Turkish, see http://rt.perl.org/rt3/Public/Bug/Display.html?id=36953 and also perldoc perlunicode /lc: "Things to do with locales (Lithuanian, Turkish, Azeri) do not work since Perl does not understand the concept of Unicode locales."
Assignee | ||
Comment 12•16 years ago
|
||
Okay. So we should find a way to be using sql_istrcmp or something like that to be doing case-insensitive substring location, instead of using Perl's lc.
Updated•16 years ago
|
Assignee: query-and-buglist → jjclark1982
Comment 13•16 years ago
|
||
In theory this should work if we replace code like $$term = $dbh->sql_position(lc($$q), "LOWER($$ff)") . " > 0"; with $$term = $dbh->sql_position($dbh->sql_istring($$q), $dbh->sql_istring($$ff)) . " > 0"; However, I am having a lot of trouble ensuring that the entered value ($$q) is in the correct encoding. encode('utf8',decode('utf8',$$q)) appears to print the correct value, but passing this to mysql does not match correctly.
Assignee | ||
Comment 14•16 years ago
|
||
(In reply to comment #13) > However, I am having a lot of trouble ensuring that the entered value ($$q) is > in the correct encoding. encode('utf8',decode('utf8',$$q)) appears to print the > correct value, but passing this to mysql does not match correctly. Oh, don't mess with the encoding of anything--that shouldn't be necessary at all, if this is 3.1.x.
Assignee | ||
Comment 15•16 years ago
|
||
Hey jjclark, any progress on this? This is one of our few code blockers for 3.2.
Comment 16•16 years ago
|
||
Is it as simple as that? I didn't test this patch.
Attachment #327334 -
Flags: review?(jjclark1982)
Assignee | ||
Comment 17•16 years ago
|
||
Comment on attachment 327334 [details] [diff] [review] patch, v1 This won't work on MySQL. Our sql_position for MySQL was made case-sensitive: INSTR(CAST($text AS BINARY), CAST($fragment AS BINARY)) We could make a sql_iposition, though, which could handle it. It could default to calling istring on both its arguments, and MySQL could have its own version.
Attachment #327334 -
Flags: review?(jjclark1982) → review-
Assignee | ||
Comment 18•16 years ago
|
||
I didn't realize there were so few LOWER/lc calls in Search.pm, I can probably fix this myself.
Assignee: jjclark1982 → mkanat
Assignee | ||
Comment 19•16 years ago
|
||
I've tested this and it generates the right SQL. So at this point, if we don't work, it's a bug in the database, not in Bugzilla. :-)
Attachment #327334 -
Attachment is obsolete: true
Attachment #327344 -
Flags: review?(LpSolit)
Assignee | ||
Comment 20•16 years ago
|
||
Comment on attachment 327344 [details] [diff] [review] v2 I want to write a more extensive patch for the tip that uses sql_iposition everywhere that we currently use LOWER() in sql_position.
Attachment #327344 -
Attachment description: v2 → v2 (3.2)
Assignee | ||
Comment 21•16 years ago
|
||
Comment on attachment 327344 [details] [diff] [review] v2 Actually, I'll just do that in a separate bug.
Attachment #327344 -
Attachment description: v2 (3.2) → v2
Comment 22•16 years ago
|
||
Comment on attachment 327344 [details] [diff] [review] v2 Looks correct to me, so r=LpSolit. Someone who is used to Turkish characters will have to test it for us after checkin.
Attachment #327344 -
Flags: review?(LpSolit) → review+
Assignee | ||
Updated•16 years ago
|
Flags: approval3.2+
Flags: approval+
Assignee | ||
Comment 23•16 years ago
|
||
tip: Checking in Bugzilla/DB.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB.pm,v <-- DB.pm new revision: 1.115; previous revision: 1.114 done Checking in Bugzilla/Search.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/Search.pm,v <-- Search.pm new revision: 1.160; previous revision: 1.159 done Checking in Bugzilla/DB/Mysql.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Mysql.pm,v <-- Mysql.pm new revision: 1.62; previous revision: 1.61 done 3.2: Checking in Bugzilla/DB.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB.pm,v <-- DB.pm new revision: 1.112.2.1; previous revision: 1.112 done Checking in Bugzilla/Search.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/Search.pm,v <-- Search.pm new revision: 1.159.2.1; previous revision: 1.159 done Checking in Bugzilla/DB/Mysql.pm; /cvsroot/mozilla/webtools/bugzilla/Bugzilla/DB/Mysql.pm,v <-- Mysql.pm new revision: 1.60.2.1; previous revision: 1.60 done
Status: NEW → RESOLVED
Closed: 17 years ago → 16 years ago
Resolution: --- → FIXED
Comment 24•16 years ago
|
||
Will try to get Pardus team involved
Comment 25•16 years ago
|
||
http://bugs.pardus.org.tr/show_bug.cgi?id=7621 filed
Comment 26•16 years ago
|
||
Right now landfill returns 16 bugs: http://landfill.bugzilla.org/bugzilla-tip/buglist.cgi?query_format=advanced&short_desc_type=allwordssubstr&short_desc=%C4%B0 Correct test case (http://landfill.bugzilla.org/bugzilla-tip/show_bug.cgi?id=3296) is found, but all accented 'i' variants (í, Î, Ì) are returned also.
Comment 27•16 years ago
|
||
confirmed by Bugzilla-tr staff: http://bugs.pardus.org.tr/show_bug.cgi?id=7621#c7 QA passed, one can safely pronounce this CLOSED :-)
You need to log in
before you can comment on or make changes to this bug.
Description
•