Closed
Bug 54195
Opened 24 years ago
Closed 24 years ago
Macintosh: Local mail search is not working properly for non-ascii headers
Categories
(MailNews Core :: Internationalization, defect, P2)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: marina, Assigned: wtc)
Details
(Whiteboard: [rtm++])
Attachments
(1 file)
Searching headers with non-ascii key will bring you back all messages that have non-ascii chars in their headerignoring specific search. For ex.: you entered "é" as your search criteria for headers, the result will be everything including "á" and JA chars
is this a regression? adding nsbeta3. Naoki, if this cannot be in nsbeta3 for the timing, please add the keyword rtm
Comment 2•24 years ago
|
||
Is this Macintosh only? Otherwise, we should reopen the bug for local mail search.
in the same build on Windows the Search is not working at all (xml error)
Comment 4•24 years ago
|
||
Could you specify the build ID you are testing?
the latest build i looked at on Win is 2000-09-22-08, Mac is 2000-09-25-12
Comment 6•24 years ago
|
||
I tried the branch win32 build id 2000092608 but I did not see the XML error. Please try that version.
Comment 7•24 years ago
|
||
I used Mac is 2000-09-25-12 and I was able to search "Santé" from my local folder. Marina, what is your pref setting (view default charset)?
The default is set to Western-8859, and if you enter "Santé" it'll find it, but if you want to match only one critetia like "é" it'll bring you back everything
Comment 9•24 years ago
|
||
I can reproduce it on Macintosh but not on Windows build (used today's branch build).
Status: NEW → ASSIGNED
Summary: Local mail search is not working properly for non-ascii headers → Macintosh: Local mail search is not working properly for non-ascii headers
Comment 11•24 years ago
|
||
The search code eventually calls PL_strncasecmp() and it cannot handle 8 bit characters correctly. The mapping table only contains 128 items while it is indexed by unsigned char. So characters above 127 reference memory out of boundary which is very scary. For 8 bit character matching, it depends on the state of the memory after the table. I assume it's zero for Macintosh so any 8 bit characters would match. 20 21 static const unsigned char uc[] = 22 { 23 '\000', '\001', '\002', '\003', '\004', '\005', '\006', '\007', 24 '\010', '\011', '\012', '\013', '\014', '\015', '\016', '\017', 25 '\020', '\021', '\022', '\023', '\024', '\025', '\026', '\027', 26 '\030', '\031', '\032', '\033', '\034', '\035', '\036', '\037', 27 ' ', '!', '"', '#', '$', '%', '&', '\'', 28 '(', ')', '*', '+', ',', '-', '.', '/', 29 '', '1', '2', '3', '4', '5', '6', '7', 30 '8', '9', ':', ';', '<', '=', '>', '?', 31 '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 32 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 33 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 34 'X', 'Y', 'Z', '[', '\\', ']', '^', '_', 35 '`', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 36 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 37 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 38 'X', 'Y', 'Z', '{', '|', '}', '~', '\177' 39 }; 40 59 60 PR_IMPLEMENT(PRIntn) 61 PL_strncasecmp(const char *a, const char *b, PRUint32 max) 62 { 63 const unsigned char *ua = (const unsigned char *)a; 64 const unsigned char *ub = (const unsigned char *)b; 65 66 if( ((const char *)0 == a) || (const char *)0 == b ) 67 return (PRIntn)(a-b); 68 69 while( max && (uc[*ua] == uc[*ub]) && ('\0' != *a) ) 70 { 71 a++; 72 ua++; 73 ub++; 74 max--; 75 } 76 77 if( 0 == max ) return (PRIntn)0; 78 79 return (PRIntn)(uc[*ua] - uc[*ub]); 80 } 81
Comment 12•24 years ago
|
||
Adding the default owner of nspr to cc. I will try to fix this by changing the mail search code not to use case comparison if the strings contains 8 bit characters. Eventually, we want to do the comparison using UCS2 instead of UTF-8. Since UTF-8 is multibyte, it is not efficient for string matching.
Comment 13•24 years ago
|
||
Here is a patch for nspr to extend the table to 256. This does not support case less comparison for 8 bit characters but at least prevents the incorrect memory reference and supports case insensitive comparison. Wtc, is it possible to use this patch? Index: strccmp.c =================================================================== RCS file: /cvsroot/mozilla/nsprpub/lib/libc/src/strccmp.c,v retrieving revision 3.2 diff -c -2 -r3.2 strccmp.c *** strccmp.c 1999/04/21 21:37:54 3.2 --- strccmp.c 2000/10/03 22:30:09 *************** *** 36,40 **** 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', ! 'X', 'Y', 'Z', '{', '|', '}', '~', '\177' }; --- 36,56 ---- 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', ! 'X', 'Y', 'Z', '{', '|', '}', '~', '\177', ! '\200', '\201', '\202', '\203', '\204', '\205', '\206', '\207', ! '\210', '\211', '\212', '\213', '\214', '\215', '\216', '\217', ! '\220', '\221', '\222', '\223', '\224', '\225', '\226', '\227', ! '\230', '\231', '\232', '\233', '\234', '\235', '\236', '\237', ! '\240', '\241', '\242', '\243', '\244', '\245', '\246', '\247', ! '\250', '\251', '\252', '\253', '\254', '\255', '\256', '\257', ! '\260', '\261', '\262', '\263', '\264', '\265', '\266', '\267', ! '\270', '\271', '\272', '\273', '\274', '\275', '\276', '\277', ! '\300', '\301', '\302', '\303', '\304', '\305', '\306', '\307', ! '\310', '\311', '\312', '\313', '\314', '\315', '\316', '\317', ! '\320', '\321', '\322', '\323', '\324', '\325', '\326', '\327', ! '\330', '\331', '\332', '\333', '\334', '\335', '\336', '\337', ! '\340', '\341', '\342', '\343', '\344', '\345', '\346', '\347', ! '\350', '\351', '\352', '\353', '\354', '\355', '\356', '\357', ! '\360', '\361', '\362', '\363', '\364', '\365', '\366', '\367', ! '\370', '\371', '\372', '\373', '\374', '\375', '\376', '\377' };
Assignee | ||
Comment 14•24 years ago
|
||
You said:
> Here is a patch for nspr to extend the table to 256.
> This does not support case less comparison for 8 bit
> characters but at least prevents the incorrect memory
> reference and supports case insensitive comparison.
I am confused. Does this patch support case insensitive
comparison or not?
Comment 15•24 years ago
|
||
> reference and supports case insensitive comparison.
Correction: I meant to say case "sensitive" comparison, case insensitive is not
supported for 8 bit characters. I will attach the patch as an attachment.
Comment 16•24 years ago
|
||
Comment 17•24 years ago
|
||
Naoki - Per our conversation, please get a review and super-review for this bug, then FTang or I will present it to PDT for RTM+.
Comment 18•24 years ago
|
||
The patch is being reviewed by wtc. He also proposed a different change using tolower(). But that may cause problems for non 7 bit ASCII (e.g. false match, case sensitive match failure).
Comment 19•24 years ago
|
||
Change to P2. With the current implementation of PL_strncasecmp(), non 7 bit characters comparison always references out of table boundary for any platform. Although Macintosh got the apparent problem (8 bit character matches with any character), it could cause any kind of incorrect comparison resutls. And it is also possible to cause uninitialized memory reference too. Also the problem is not specific to local mail search but applies to any functionality which uses PL_strncasecmp() directly/indirectly.
Severity: normal → major
Priority: P3 → P2
Whiteboard: waiting for the module owner's approval
Comment 20•24 years ago
|
||
I think we really need to fix this. Otherwise, we will also hit ABR also in the purify. put [rtm+ need info] since nhotta still waiting for some answer.
Whiteboard: waiting for the module owner's approval → waiting for the module owner's approval[rtm+ need info]
Assignee | ||
Comment 21•24 years ago
|
||
I reviewed the patch (id=16066). It is fine. -- Wan-Teh Chang, NSPR module owner
Updated•24 years ago
|
Whiteboard: waiting for the module owner's approval[rtm+ need info] → [rtm+ need info]
Comment 22•24 years ago
|
||
a=brendan@mozilla.org. Get this into the trunk today. Thanks, /be
Comment 23•24 years ago
|
||
It looks like nspr is not branched for Netscape_20000922_BRANCH. It has a branch tag NSPRPUB_CLIENT_BRANCH. This means if I check in to the trunk also affect Netscape branch? So far, I have not got a super review. Brendan, can I regard you as a super reviewer?
Comment 24•24 years ago
|
||
Cc'ing leaf to answer nhotta's branch question. I certainly hope my a= counts as super-review! /be
Comment 25•24 years ago
|
||
Marked as super-reviwed. Ftang - Please change status to [RTM+], so that we can get PDT's approval [RTM++] to check this one in.
Whiteboard: [rtm+ need info] → [rtm need info] super-reviewed
Comment 26•24 years ago
|
||
rtm+ per ftang
Whiteboard: [rtm need info] super-reviewed → [rtm+] super-reviewed
Comment 28•24 years ago
|
||
The branch tag problem was my local tree problem. I updated my local branch tree with the right tag Netscape_20000922_BRANCH.
Updated•24 years ago
|
Whiteboard: rtm+ → [rtm+]
Comment 30•24 years ago
|
||
I checked in the patch to the branch. But I could not check in to the trunk because I do not have a privilege. Reassign to wtc, please check in the patch to NSPRPUB_CLIENT_BRANCH, thanks.
Assignee: nhotta → wtc
Status: ASSIGNED → NEW
Assignee | ||
Comment 31•24 years ago
|
||
I checked in the fix on the NSPRPUB_CLIENT_BRANCH of NSPR. /cvsroot/mozilla/nsprpub/lib/libc/src/strccmp.c, revision 3.2.72.1 (Note that the SeaMonkey main trunk pulls NSPRPUB_CLIENT_BRANCH.) I also checked in the fix on the main trunk of NSPR. /cvsroot/mozilla/nsprpub/lib/libc/src/strccmp.c, revision 3.5
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 32•24 years ago
|
||
The fix is also checked in on the NSPRPUB_RELEASE_4_0_BRANCH.
Comment 33•24 years ago
|
||
For the issue of using unicode to support case insensitive matching for 8 bit characters, I filed a bug 55455.
Comment 34•24 years ago
|
||
Naoki, we need sdditional info to verifyu this bug fix. Quoting from one of your last comments: > With the current implementation of PL_strncasecmp(), non 7 bit characters > comparison always references out of table boundary for any platform. I take it that on some platforms like Windows, they happen to be lucky most of the time and do not lead to comparison failure but we cannot be sure. Is this right? > Although Macintosh got the apparent problem (8 bit character matches with any > character), it could cause any kind of incorrect comparison resutls. Therefore, we need to look at a variety of non-ASCII matching under all platforms for verification. It is possible that before the fix, we simply did not encounter the kind of failure which could be caused by the problem in the nspr code. It would be nice to know if potential problems are likely to occur if the data you're searching through are large, or some other condition. > And it is also possible to cause uninitialized memory reference too. This seems like a random problem and cannot be easily re-created as a test case. I guess we will have to substitute a variety of testing and reply on the results of such data. > Also the problem is not specific to local mail search but applies to any > functionality which uses PL_strncasecmp() directly/indirectly. Can you list major areas other than local mail where PL_strncasecmp() is used?
OS: All
Comment 35•24 years ago
|
||
For Windows, I did not see this problem. But before the fix, it might had happened if the search term is small and number of comparison is large. Here is lxr result of PL_strncasecmp usage in mozilla. http://lxr.mozilla.org/seamonkey/search?string=PL_strncasecmp It also called indirectly (e.g. by PL_strcaserstr). So it is not easy to list all functionality. To verify the bug. We should try Macintosh local search and Window and Linux. Then make sure no false match and no search failure. Make sure to set default mail view charset. Local search is depending on that charset.
Comment 36•24 years ago
|
||
Thank you, Naoki, for the suggestions. It looks like it would be best if we could divide work among us. The main part of the etsting would have be on Mac where the original problem was discovered. 1. Use a short word (e.g. 1 character word) on a large body of data for the upperlimit case. But also conduct other non-ASCII search locally. Do a variety of cases, short to long word search on small to large amount of mail data. 2. Conduct similar testing on Linux and Windows. marina -- please take the Mac part and report the results here. momoi (Windows) and ji (Linux) will help with the other platforms and report back here. Thanks!
Comment 37•24 years ago
|
||
Checked linux 2000103009-mn6 build. This problem doesn't occur on linux.
Reporter | ||
Comment 38•24 years ago
|
||
verified this fix with Mac MN6 and trunk build 2000-10-31, the problem discribed in this bug report doesn't exist, tested in a small folder (10 messages)and a bigger one (2000 messages) doing search with an exact string, part of the string and one criteria like "é". In all cases got the right result
Comment 39•24 years ago
|
||
Marina, If this has been verified please mark it so else it shows up on queries for people looking to verify bugs. Thanks.
Comment 40•24 years ago
|
||
cc: marina on previous comments.
Reporter | ||
Comment 41•24 years ago
|
||
verifying as such in branch build using 2000-11-03 Mac branch
Status: RESOLVED → VERIFIED
Updated•20 years ago
|
Product: MailNews → Core
Updated•16 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•