Closed Bug 54195 Opened 24 years ago Closed 24 years ago

Macintosh: Local mail search is not working properly for non-ascii headers

Categories

(MailNews Core :: Internationalization, defect, P2)

PowerPC
All

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: marina, Assigned: wtc)

Details

(Whiteboard: [rtm++])

Attachments

(1 file)

Searching headers with non-ascii key will bring you back all messages that have 
non-ascii chars in their headerignoring specific search. For ex.: you entered 
"é" as your search criteria for headers, the result will be everything including  
"á" and JA chars
is this a regression? adding nsbeta3. Naoki, if this cannot be in nsbeta3 for
the timing, please add the keyword rtm
Is this Macintosh only? Otherwise, we should reopen the bug for local mail 
search.
in the same build on Windows the Search is not working at all (xml error) 
Could you specify the build ID you are testing?
the latest build i looked at on Win is 2000-09-22-08, Mac is 2000-09-25-12
I tried the branch win32 build id 2000092608 but I did not see the XML error.
Please try that version.
I used Mac is 2000-09-25-12 and I was able to search "Santé" from my local 
folder. Marina, what is your pref setting (view default charset)?
The default is set to Western-8859, and if you enter "Santé" it'll find it, but 
if you want to match only one critetia like "é" it'll bring you back everything
I can reproduce it on Macintosh but not on Windows build (used today's branch 
build).
Status: NEW → ASSIGNED
Summary: Local mail search is not working properly for non-ascii headers → Macintosh: Local mail search is not working properly for non-ascii headers
Cannot make beta3, adding rtm.
Keywords: rtm
The search code eventually calls PL_strncasecmp() and it cannot handle 8 bit 
characters correctly. The mapping table only contains 128 items while it is 
indexed by unsigned char. So characters above 127 reference memory out of 
boundary which is very scary. For 8 bit character matching, it depends on the 
state of the memory after the table. I assume it's zero for Macintosh so any 8 
bit characters would match.

 20 
 21 static const unsigned char uc[] =
 22 {
 23     '\000', '\001', '\002', '\003', '\004', '\005', '\006', '\007',
 24     '\010', '\011', '\012', '\013', '\014', '\015', '\016', '\017',
 25     '\020', '\021', '\022', '\023', '\024', '\025', '\026', '\027',
 26     '\030', '\031', '\032', '\033', '\034', '\035', '\036', '\037',
 27     ' ',    '!',    '"',    '#',    '$',    '%',    '&',    '\'',
 28     '(',    ')',    '*',    '+',    ',',    '-',    '.',    '/',
 29     '',    '1',    '2',    '3',    '4',    '5',    '6',    '7',
 30     '8',    '9',    ':',    ';',    '<',    '=',    '>',    '?',
 31     '@',    'A',    'B',    'C',    'D',    'E',    'F',    'G',
 32     'H',    'I',    'J',    'K',    'L',    'M',    'N',    'O',
 33     'P',    'Q',    'R',    'S',    'T',    'U',    'V',    'W',
 34     'X',    'Y',    'Z',    '[',    '\\',   ']',    '^',    '_',
 35     '`',    'A',    'B',    'C',    'D',    'E',    'F',    'G',
 36     'H',    'I',    'J',    'K',    'L',    'M',    'N',    'O',
 37     'P',    'Q',    'R',    'S',    'T',    'U',    'V',    'W',
 38     'X',    'Y',    'Z',    '{',    '|',    '}',    '~',    '\177'
 39 };
 40 

 59 
 60 PR_IMPLEMENT(PRIntn)
 61 PL_strncasecmp(const char *a, const char *b, PRUint32 max)
 62 {
 63     const unsigned char *ua = (const unsigned char *)a;
 64     const unsigned char *ub = (const unsigned char *)b;
 65 
 66     if( ((const char *)0 == a) || (const char *)0 == b ) 
 67         return (PRIntn)(a-b);
 68 
 69     while( max && (uc[*ua] == uc[*ub]) && ('\0' != *a) )
 70     {
 71         a++;
 72         ua++;
 73         ub++;
 74         max--;
 75     }
 76 
 77     if( 0 == max ) return (PRIntn)0;
 78 
 79     return (PRIntn)(uc[*ua] - uc[*ub]);
 80 }
 81 

Adding the default owner of nspr to cc.

I will try to fix this by changing the mail search code not to use case 
comparison if the strings contains 8 bit characters. Eventually, we want to do 
the comparison using UCS2 instead of UTF-8. Since UTF-8 is multibyte, it is not 
efficient for string matching.
Here is a patch for nspr to extend the table to 256. This does not support case 

less comparison for 8 bit characters but at least prevents the incorrect memory 

reference and supports case insensitive comparison.

Wtc, is it possible to use this patch?



Index: strccmp.c

===================================================================

RCS file: /cvsroot/mozilla/nsprpub/lib/libc/src/strccmp.c,v

retrieving revision 3.2

diff -c -2 -r3.2 strccmp.c

*** strccmp.c	1999/04/21 21:37:54	3.2

--- strccmp.c	2000/10/03 22:30:09

***************

*** 36,40 ****

      'H',    'I',    'J',    'K',    'L',    'M',    'N',    'O',

      'P',    'Q',    'R',    'S',    'T',    'U',    'V',    'W',

!     'X',    'Y',    'Z',    '{',    '|',    '}',    '~',    '\177'

  };

  

--- 36,56 ----

      'H',    'I',    'J',    'K',    'L',    'M',    'N',    'O',

      'P',    'Q',    'R',    'S',    'T',    'U',    'V',    'W',

!     'X',    'Y',    'Z',    '{',    '|',    '}',    '~',    '\177',

!     '\200', '\201', '\202', '\203', '\204', '\205', '\206', '\207',

!     '\210', '\211', '\212', '\213', '\214', '\215', '\216', '\217',

!     '\220', '\221', '\222', '\223', '\224', '\225', '\226', '\227',

!     '\230', '\231', '\232', '\233', '\234', '\235', '\236', '\237',

!     '\240', '\241', '\242', '\243', '\244', '\245', '\246', '\247',

!     '\250', '\251', '\252', '\253', '\254', '\255', '\256', '\257',

!     '\260', '\261', '\262', '\263', '\264', '\265', '\266', '\267',

!     '\270', '\271', '\272', '\273', '\274', '\275', '\276', '\277',

!     '\300', '\301', '\302', '\303', '\304', '\305', '\306', '\307',

!     '\310', '\311', '\312', '\313', '\314', '\315', '\316', '\317',

!     '\320', '\321', '\322', '\323', '\324', '\325', '\326', '\327',

!     '\330', '\331', '\332', '\333', '\334', '\335', '\336', '\337',

!     '\340', '\341', '\342', '\343', '\344', '\345', '\346', '\347',

!     '\350', '\351', '\352', '\353', '\354', '\355', '\356', '\357',

!     '\360', '\361', '\362', '\363', '\364', '\365', '\366', '\367',

!     '\370', '\371', '\372', '\373', '\374', '\375', '\376', '\377'

  };

  

You said:
> Here is a patch for nspr to extend the table to 256.
> This does not support case less comparison for 8 bit
> characters but at least prevents the incorrect memory 
> reference and supports case insensitive comparison.

I am confused.  Does this patch support case insensitive
comparison or not?
> reference and supports case insensitive comparison.
Correction: I meant to say case "sensitive" comparison, case insensitive is not
supported for 8 bit characters. I will attach the patch as an attachment.
Naoki - Per our conversation, please get a review and super-review for this bug, 
then FTang or I will present it to PDT for RTM+.
The patch is being reviewed by wtc. He also proposed a different change using
tolower(). But that may cause problems for non 7 bit ASCII (e.g. false match,
case sensitive match failure).
Change to P2.
With the current implementation of PL_strncasecmp(), non 7 bit characters 
comparison always references out of table boundary for any platform.
Although Macintosh got the apparent problem (8 bit character matches with any 
character), it could cause any kind of incorrect comparison resutls.
And it is also possible to cause uninitialized memory reference too.
Also the problem is not specific to local mail search but applies to any 
functionality which uses PL_strncasecmp() directly/indirectly.
Severity: normal → major
Priority: P3 → P2
Whiteboard: waiting for the module owner's approval
I think we really need to fix this. Otherwise, we will also hit ABR also in the
purify. put [rtm+ need info] since nhotta still waiting for some answer.
Whiteboard: waiting for the module owner's approval → waiting for the module owner's approval[rtm+ need info]
I reviewed the patch (id=16066). It is fine.
-- Wan-Teh Chang, NSPR module owner
Whiteboard: waiting for the module owner's approval[rtm+ need info] → [rtm+ need info]
a=brendan@mozilla.org. Get this into the trunk today.  Thanks,

/be
It looks like nspr is not branched for Netscape_20000922_BRANCH. It has a branch
tag NSPRPUB_CLIENT_BRANCH. This means if I check in to the trunk also affect
Netscape branch?
So far, I have not got a super review. Brendan, can I regard you as a super
reviewer?
Cc'ing leaf to answer nhotta's branch question.

I certainly hope my a= counts as super-review!

/be
Marked as super-reviwed.

Ftang - Please change status to [RTM+], so that we can get PDT's approval 
[RTM++] to check this one in.
Whiteboard: [rtm+ need info] → [rtm need info] super-reviewed
rtm+ per ftang
Whiteboard: [rtm need info] super-reviewed → [rtm+] super-reviewed
rtm+ per ftang
Whiteboard: [rtm+] super-reviewed → rtm+
The branch tag problem was my local tree problem. I updated my local branch tree
with the right tag Netscape_20000922_BRANCH.
Whiteboard: rtm+ → [rtm+]
PDT marking [rtm++]
Whiteboard: [rtm+] → [rtm++]
I checked in the patch to the branch.
But I could not check in to the trunk because I do not have a privilege.
Reassign to wtc, please check in the patch to NSPRPUB_CLIENT_BRANCH, thanks.

Assignee: nhotta → wtc
Status: ASSIGNED → NEW
I checked in the fix on the NSPRPUB_CLIENT_BRANCH of NSPR.
/cvsroot/mozilla/nsprpub/lib/libc/src/strccmp.c, revision 3.2.72.1
(Note that the SeaMonkey main trunk pulls NSPRPUB_CLIENT_BRANCH.)

I also checked in the fix on the main trunk of NSPR.
/cvsroot/mozilla/nsprpub/lib/libc/src/strccmp.c, revision 3.5
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
The fix is also checked in on the NSPRPUB_RELEASE_4_0_BRANCH.
For the issue of using unicode to support case insensitive matching for 8 bit
characters, I filed a bug 55455.
Naoki, we need sdditional info to verifyu this bug fix.
Quoting from one of your last comments:

> With the current implementation of PL_strncasecmp(), non 7 bit characters
> comparison always references out of table boundary for any platform.

I take it that on some platforms like Windows, they happen to be lucky
most of the time and do not lead to comparison failure but we cannot be sure.
Is this right?

> Although Macintosh got the apparent problem (8 bit character matches with any
> character), it could cause any kind of incorrect comparison resutls.

Therefore, we need to look at a variety of non-ASCII matching under all platforms
for verification. It is possible that before the fix, we simply did not encounter
the kind of failure which could be caused by the problem in the nspr code.
It would be nice to know if potential problems are likely to occur if
the data you're searching through are large, or some other condition.

> And it is also possible to cause uninitialized memory reference too.

This seems like a random problem and cannot be easily re-created as a test case.
I guess we will have to substitute a variety of testing and reply on the results
of such data.

> Also the problem is not specific to local mail search but applies to any
> functionality which uses PL_strncasecmp() directly/indirectly.

Can you list major areas other than local mail where PL_strncasecmp()
is used?
OS: All
For Windows, I did not see this problem. But before the fix, it might had
happened if the search term is small and number of comparison is large.

Here is lxr result of PL_strncasecmp usage in mozilla.
http://lxr.mozilla.org/seamonkey/search?string=PL_strncasecmp
It also called indirectly (e.g. by PL_strcaserstr).
So it is not easy to list all functionality.

To verify the bug. We should try Macintosh local search and Window and Linux.
Then make sure no false match and no search failure.
Make sure to set default mail view charset. Local search is depending on that
charset.


Thank you, Naoki, for the suggestions.
It looks like it would be best if we could divide work
among us. The main part of the etsting would have be on
Mac where the original problem was discovered.

1. Use a short word (e.g. 1 character word) on a large body of data for the
   upperlimit case. But also conduct other non-ASCII search locally. Do a variety
    of cases, short to long word search on small to large amount of mail data.
2. Conduct similar testing on Linux and Windows.

marina -- please take the Mac part and report the results here.
momoi (Windows) and ji (Linux) will help with the other platforms
and report back here. Thanks!

Checked linux 2000103009-mn6 build. This problem doesn't occur on linux.
verified this fix with  Mac MN6 and trunk build 2000-10-31, the problem
discribed in this bug report doesn't exist, tested in a small folder (10
messages)and a bigger one (2000 messages) doing search with an exact string,
part of the string and one criteria like "é". In all cases got the right result
Marina,
If this has been verified please mark it so else it shows up on queries for
people looking to verify bugs.  Thanks.
cc: marina on previous comments.
verifying as such in branch build using 2000-11-03 Mac branch
Status: RESOLVED → VERIFIED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: