Closed
Bug 254740
Opened 20 years ago
Closed 19 years ago
Quick Search is case sensitive with international (cyrillic) characters
Categories
(MailNews Core :: Internationalization, defect)
MailNews Core
Internationalization
Tracking
(Not tracked)
VERIFIED
FIXED
mozilla1.8beta1
People
(Reporter: asqueella, Assigned: jshin1987)
References
Details
(Keywords: intl)
Attachments
(1 file, 2 obsolete files)
8.43 KB,
patch
|
neil
:
review+
Bienvenu
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040806 Firefox/0.9.1+ (best browser EVAR1!!) Build Identifier: version 0.7+ (20040806) When I get a mail with its subject in Russian, first character is uppercase, say "Привет", I can't find it by entering the same word lowercase ("привет"). It works with English characters, so I assume this is a bug. Reproducible: Always Steps to Reproduce: 1. Get a mail with a subject string beginning with an uppercase cyrillic letter 2. Quick-filter it for the same string, only lowercase Actual Results: the message is not found Expected Results: the message is found I see this bug with both Thunderbird nightly and Mozilla 1.7rc2, so I believe it should be in MailNews.
Comment 1•20 years ago
|
||
From a quick look at nsMsgSearchTerm::MatchString, Contains (as used by QuickSearch) uses PL_strcasestr; Is uses EqualsIgnoreCase; BeginsWith uses PL_strncmp and EndsWith uses PL_strcmp - unless DO_I18N_YET is defined in which case they would use INTL_StrBeginWith and INTL_StrEndWith, had anyone bothered to implement them. So, Is is the only case that will work with international characters; Contains may only match ASCII upper/lower case while Begin and End are just wrong.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee | ||
Comment 2•20 years ago
|
||
With UniCharUtils, it'd be easy to 'implement' them. I'm taking.
Assignee: smontagu → jshin
OS: Windows XP → All
Hardware: PC → All
Assignee | ||
Comment 3•20 years ago
|
||
It works, but I left alone the old code (I made them fallbacks) because nsIMsgSearch* interfaces are rather poorly documented (perhaps, it was hastily converted from NS 3.x/4.x codebase. 'INTL_' prefix and 'csid' are remnants of that time, I guess.) so that I'm not quite sure what encodings are used.
Reporter | ||
Comment 4•20 years ago
|
||
is this bug related to bug 55455? I'm asking because bug 208262, which reports the same issue as here, was duped to bug 55455. There are more than 200 bugs in Internationalization component, so maybe this helps you. (Attachment 16309 [details] [diff] from bug 55455 also changes nsMsgSearchTerm.cpp)
Updated•20 years ago
|
Product: MailNews → Core
Assignee | ||
Comment 6•20 years ago
|
||
Neil, David, and Scott, what do you think of the patch? It's conservative in that I kept the old code as a fallback while taking care of non-ASCII cases because of the reason given in comment #3. I can be brave and get rid of the fallback.
Comment 7•20 years ago
|
||
I don't think you need the fallback. m_value.string will always be in UTF8, because it's assigned from ToNewUTF8String in nsMsgSearchValueImpl::SetStr.
Assignee | ||
Comment 8•20 years ago
|
||
Thanks, Neil, for figuring that out. I got rid of the fallback.
Attachment #155494 -
Attachment is obsolete: true
Attachment #173139 -
Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #173139 -
Flags: review?(smontagu)
Comment 9•20 years ago
|
||
Comment on attachment 173139 [details] [diff] [review] update without fallback Just looking at this quickly, I think some of the indentation is off, but it looks as if the IsEmpty check is wrong, as you need to check the raw value string. Also I think you should have David look at this too.
Assignee | ||
Comment 10•20 years ago
|
||
Thanks for catching my mistake in 'IsEmpty' case. The indentation was also fixed. Btw, while testing this, I found that IMAP folder search is broken (I thought the local folder search and IMAP folder search took the same code path, but this patch only affects local folder search in addition to quick search). I'll file a bug on IMAP folder search issue.
Attachment #173139 -
Attachment is obsolete: true
Attachment #173207 -
Flags: superreview?(bienvenu)
Attachment #173207 -
Flags: review?(neil.parkwaycc.co.uk)
Assignee | ||
Updated•20 years ago
|
Attachment #173139 -
Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #173139 -
Flags: review?(smontagu)
Comment 11•20 years ago
|
||
Comment on attachment 173207 [details] [diff] [review] update >+ nsAutoString needle; >+CaseInsensitiveFindInReadable( const nsAString& aPattern, const nsAString& aHay) These humorous variable names aren't international-safe :-P Nobody else has a FindInReadable(nsAString&, nsAString&)... Begin/EndReading actually return the iterator, so you could simplify it enough to justify inlining the code i.e. nsAString::const_iterator searchBegin, searchEnd; if (CaseInsensitiveFindInReadable(needle, utf16StrToMatch.BeginReading(searchBegin), utf16StrToMatch.BeginReading(searchEnd))) result = PR_TRUE; Or simply define CaseInsensitiveFindInReadable as inline (might be a good idea for the other version too).
Attachment #173207 -
Flags: review?(neil.parkwaycc.co.uk) → review+
Comment 12•19 years ago
|
||
Comment on attachment 173207 [details] [diff] [review] update is this going to slow down the non-unicode charset case, the 8 bit ascii case?
Attachment #173207 -
Flags: superreview?(bienvenu) → superreview+
Comment 13•19 years ago
|
||
(In reply to comment #12) >(From update of attachment 173207 [details] [diff] [review] [edit]) >is this going to slow down the non-unicode charset case, the 8 bit ascii case? You do mean 7 bit ascii, I hope? In the Begins With and Ends With cases it will definitely be slower because we're actually doing case insensitive searches which it doesn't look like we were doing before. We could probably speed it up by holding the query string as unicode instead of UTF - after all that's how we get it from JS.
Assignee | ||
Comment 14•19 years ago
|
||
(In reply to comment #13) > (In reply to comment #12) > >(From update of attachment 173207 [details] [diff] [review] [edit] [edit]) > >is this going to slow down the non-unicode charset case, the 8 bit ascii case? > You do mean 7 bit ascii, I hope? I think he meant ISO-8859-1 (of course, that's not 8 bit ascii..). Anyway, for ISO-8859-1 cases, it didn't work properly (it was as much broken for characters between U+00A0 and U+00FF as for any other cases) so that there's no comparision to make. For pure ASCII cases, the additional cost may come from the function call overhead because I believe that our case-folding routine (in intl/unicharutil) is optimized for ASCII characters. > In the Begins With and Ends With cases it will definitely be slower because > we're actually doing case insensitive searches which it doesn't look like we > were doing before. Hmm, should I just use the default string comparator (which doesn't do any case-folding and is faster)? Are beginWith and EndWith supposed to be case-insensitive? I think they're > We could probably speed it up by holding the query string as > unicode instead of UTF - after all that's how we get it from JS. Gee, you must have meant 'UTF-16 instead of UTF-8' (both are transformation forms of Unicode/ISO 10646 coded character set).
Comment 15•19 years ago
|
||
(In reply to comment #14) >(In reply to comment #13) >>We could probably speed it up by holding the query string as >>unicode instead of UTF - after all that's how we get it from JS. >Gee, you must have meant 'UTF-16 instead of UTF-8' (both are transformation >forms of Unicode/ISO 10646 coded character set). Actually I meant "PRUnichar* instead of char*" but your way works too :-)
Comment 16•19 years ago
|
||
Does this patch only apply to QuickSearch, or does it also fix Search Messages and Filters? bug 194514 -- Search is case sensitive in Russian and insensitive in English bug 202558 -- "Begins With" and "Ends With" is case sensitive
Assignee | ||
Comment 17•19 years ago
|
||
fix landed (about a day ago) with Neil's comment about inlining addressed. thanks for r/sr Mike, this fixes those problems as well as long as folders are local. For remote (IMAP) folders, this doesn't fix them. (see comment #10)
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Comment 18•19 years ago
|
||
Thanks, Jungshik. btw, you left a comment dangling up there: (In reply to comment #14) > (In reply to comment #13) > [...] > > In the Begins With and Ends With cases it will definitely be slower because > > we're actually doing case insensitive searches which it doesn't look like we > > were doing before. > > Hmm, should I just use the default string comparator (which doesn't do any > case-folding and is faster)? Are beginWith and EndWith supposed to be > case-insensitive? I think they're You're not considering going back to change those comparisons to case-sensitive, are you? I think all these comparisons should be case-insensitive at base; they need to be consistent with one another. If bug 167505 is ever implemented, then they would be c-s only in response to a setting in the UI.
Assignee | ||
Comment 19•19 years ago
|
||
You can rest assured :-) I kept the case-insensitivity
Assignee | ||
Comment 20•19 years ago
|
||
For IMAP folders, I filed bug 284856
Assignee | ||
Comment 21•19 years ago
|
||
*** Bug 256265 has been marked as a duplicate of this bug. ***
Reporter | ||
Comment 22•19 years ago
|
||
Verified using 20050311 thunderbird build. Thanks!
Status: RESOLVED → VERIFIED
Updated•16 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•