Closed Bug 254740 Opened 20 years ago Closed 19 years ago

Quick Search is case sensitive with international (cyrillic) characters

Categories

(MailNews Core :: Internationalization, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.8beta1

People

(Reporter: asqueella, Assigned: jshin1987)

References

Details

(Keywords: intl)

Attachments

(1 file, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040806 Firefox/0.9.1+ (best browser EVAR1!!)
Build Identifier: version 0.7+ (20040806)

When I get a mail with its subject in Russian, first character is uppercase, say
"Привет", I can't find it by entering the same word lowercase ("привет"). It
works with English characters, so I assume this is a bug.

Reproducible: Always
Steps to Reproduce:
1. Get a mail with a subject string beginning with an uppercase cyrillic letter
2. Quick-filter it for the same string, only lowercase


Actual Results:  
the message is not found

Expected Results:  
the message is found

I see this bug with both Thunderbird nightly and Mozilla 1.7rc2, so I believe it
should be in MailNews.
From a quick look at nsMsgSearchTerm::MatchString, Contains (as used by
QuickSearch) uses PL_strcasestr; Is uses EqualsIgnoreCase; BeginsWith uses
PL_strncmp and EndsWith uses PL_strcmp - unless DO_I18N_YET is defined in which
case they would use INTL_StrBeginWith and INTL_StrEndWith, had anyone bothered
to implement them.

So, Is is the only case that will work with international characters; Contains
may only match ASCII upper/lower case while Begin and End are just wrong.
Status: UNCONFIRMED → NEW
Ever confirmed: true
With UniCharUtils, it'd be easy to 'implement' them. I'm taking.
Assignee: smontagu → jshin
OS: Windows XP → All
Hardware: PC → All
Attached patch patch (1st attempt) (obsolete) — Splinter Review
It works, but I left alone the old code (I made them fallbacks) because
nsIMsgSearch* interfaces are rather poorly documented (perhaps, it was hastily
converted from NS 3.x/4.x codebase. 'INTL_' prefix and 'csid' are remnants of
that time, I guess.) so that I'm not quite sure what encodings are used.
is this bug related to bug 55455? I'm asking because bug 208262, which reports
the same issue as here, was duped to bug 55455. There are more than 200 bugs in
Internationalization component, so maybe this helps you. (Attachment 16309 [details] [diff] from
bug 55455 also changes nsMsgSearchTerm.cpp)
*** Bug 55455 has been marked as a duplicate of this bug. ***
Product: MailNews → Core
Neil, David, and Scott, what do you think of the patch? It's conservative in
that I kept the old code as a fallback while taking care of non-ASCII cases
because of the reason given in comment #3. I can be brave and get rid of the
fallback. 
Status: NEW → ASSIGNED
Keywords: intl
Target Milestone: --- → mozilla1.8beta
I don't think you need the fallback. m_value.string will always be in UTF8,
because it's assigned from ToNewUTF8String in nsMsgSearchValueImpl::SetStr.
Attached patch update without fallback (obsolete) — Splinter Review
Thanks, Neil, for figuring that out. I got rid of the fallback.
Attachment #155494 - Attachment is obsolete: true
Attachment #173139 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #173139 - Flags: review?(smontagu)
Comment on attachment 173139 [details] [diff] [review]
update without fallback

Just looking at this quickly, I think some of the indentation is off, but it
looks as if the IsEmpty check is wrong, as you need to check the raw value
string. Also I think you should have David look at this too.
Attached patch updateSplinter Review
Thanks for catching my mistake in 'IsEmpty' case. The indentation was also
fixed.

Btw, while testing this, I found that IMAP folder search is broken (I thought
the local folder search and IMAP folder search took the same code path, but
this patch only affects local folder search in addition to quick search). I'll
file a bug on IMAP folder search issue.
Attachment #173139 - Attachment is obsolete: true
Attachment #173207 - Flags: superreview?(bienvenu)
Attachment #173207 - Flags: review?(neil.parkwaycc.co.uk)
Attachment #173139 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #173139 - Flags: review?(smontagu)
Blocks: 194514
Blocks: 256265
Comment on attachment 173207 [details] [diff] [review]
 update

>+  nsAutoString needle; 
>+CaseInsensitiveFindInReadable( const nsAString& aPattern, const nsAString& aHay)
These humorous variable names aren't international-safe :-P

Nobody else has a FindInReadable(nsAString&, nsAString&)... Begin/EndReading
actually return the iterator, so you could simplify it enough to justify
inlining the code i.e.

nsAString::const_iterator searchBegin, searchEnd;
if (CaseInsensitiveFindInReadable(needle,
    utf16StrToMatch.BeginReading(searchBegin),
    utf16StrToMatch.BeginReading(searchEnd)))
  result = PR_TRUE;

Or simply define CaseInsensitiveFindInReadable as inline (might be a good idea
for the other version too).
Attachment #173207 - Flags: review?(neil.parkwaycc.co.uk) → review+
Comment on attachment 173207 [details] [diff] [review]
 update

is this going to slow down the non-unicode charset case, the 8 bit ascii case?
Attachment #173207 - Flags: superreview?(bienvenu) → superreview+
(In reply to comment #12)
>(From update of attachment 173207 [details] [diff] [review] [edit])
>is this going to slow down the non-unicode charset case, the 8 bit ascii case?
You do mean 7 bit ascii, I hope?
In the Begins With and Ends With cases it will definitely be slower because
we're actually doing case insensitive searches which it doesn't look like we
were doing before. We could probably speed it up by holding the query string as
unicode instead of UTF - after all that's how we get it from JS.
(In reply to comment #13)
> (In reply to comment #12)
> >(From update of attachment 173207 [details] [diff] [review] [edit] [edit])
> >is this going to slow down the non-unicode charset case, the 8 bit ascii case?
> You do mean 7 bit ascii, I hope?

I think he meant ISO-8859-1 (of course, that's not 8 bit ascii..). Anyway, for
ISO-8859-1 cases, it didn't work properly (it was as much broken for characters
between U+00A0 and U+00FF  as for any other cases) so that there's no
comparision to make. For pure ASCII cases, the additional cost may come from the
function call overhead because  I believe that our case-folding routine (in
intl/unicharutil) is optimized for ASCII characters. 

> In the Begins With and Ends With cases it will definitely be slower because
> we're actually doing case insensitive searches which it doesn't look like we
> were doing before. 

  Hmm, should I just use the default string comparator (which doesn't do any
case-folding and is faster)?  Are beginWith and EndWith supposed to be
case-insensitive? I think they're 

> We could probably speed it up by holding the query string as
> unicode instead of UTF - after all that's how we get it from JS.

  Gee, you must have meant 'UTF-16 instead of UTF-8' (both are transformation
forms of Unicode/ISO 10646 coded character set).  
(In reply to comment #14)
>(In reply to comment #13)
>>We could probably speed it up by holding the query string as
>>unicode instead of UTF - after all that's how we get it from JS.
>Gee, you must have meant 'UTF-16 instead of UTF-8' (both are transformation
>forms of Unicode/ISO 10646 coded character set).  
Actually I meant "PRUnichar* instead of char*" but your way works too :-)
Does this patch only apply to QuickSearch, or does it also fix Search Messages 
and Filters?

 bug 194514 -- Search is case sensitive in Russian and insensitive in English
 bug 202558 -- "Begins With" and "Ends With" is case sensitive
fix landed (about a day ago) with Neil's comment about inlining addressed.  
thanks for r/sr

Mike, this fixes those problems as well as long as folders are local. For remote
(IMAP) folders, this doesn't fix them.
(see comment #10)

Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Thanks, Jungshik.

btw, you left a comment dangling up there:

(In reply to comment #14)
> (In reply to comment #13)
> [...]
> > In the Begins With and Ends With cases it will definitely be slower because
> > we're actually doing case insensitive searches which it doesn't look like we
> > were doing before. 
> 
>   Hmm, should I just use the default string comparator (which doesn't do any
> case-folding and is faster)?  Are beginWith and EndWith supposed to be
> case-insensitive? I think they're 


You're not considering going back to change those comparisons to case-sensitive, 
are you?  I think all these comparisons should be case-insensitive at base; they 
need to be consistent with one another.  If bug 167505 is ever implemented, then 
they would be c-s only in response to a setting in the UI.
You can rest assured :-) I kept the case-insensitivity 
For IMAP folders, I filed bug 284856
No longer blocks: 256265
*** Bug 256265 has been marked as a duplicate of this bug. ***
Verified using 20050311 thunderbird build. Thanks!
Status: RESOLVED → VERIFIED
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: