Closed Bug 45222 Opened 25 years ago Closed 25 years ago

IMAP search for Japanese fails when search key contains 0x5C or 0x22

Categories

(MailNews Core :: Internationalization, defect, P2)

x86
Windows NT
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: momoi, Assigned: mozilla)

Details

(Whiteboard: [nsbeta3+][pdtp2])

** Observed with 7/10/2000 Win32 build ** This is a left over bug from Bug 5933 where basic IMAP search for non-ASCII key data has been implemented. Unfortunately this still leaves a problem with those JPN characters whose JIS 2nd-byte corresponds to 0x5C (\) or 0x22 (*) which have special significance in IMAP protocols. See a SCOPUS Communicator bug 343598 for details, particularly my comments: Additional Comments From momoi 05/03/1999 02:51 There I summarize what needs to be looked to delclare that the problem has been fixed. Here are some characters which will fail: "niti" (Day) in Kanji, "a" as in Hiragana "a". See other comments in Bug 5933.
We should use literal string over quoted string for IMAP search to avoid this sort of problem.
nsbeta3, reassign to taka.
Assignee: nhotta → taka
Keywords: nsbeta3
accepted
Status: NEW → ASSIGNED
nsbeta3+ per i18n bug review meeting.
Whiteboard: [nsbeta3+]
What happen to English message which have \ and * in it ? For example, if I have a English message, the subject is "Hello\My Test" and I search "o\M", can I find it ? or if I have a message which subject is "Hello*My Test" and I search for "o*M" , can I find it ? If the answer is NO, then we have a generic escaping issue that MailNews folks didn't do.
I did not paste in all the comments from the internal bug database but in that bug, I stated that these special characters are missed even in ASCII search. 0x5C (\) -- backslash 0x22 (") -- double-quote (there was an error in my original bug report above. 0x22 is not (*). ) These 2 are the "quoted specials" in RFC 2060. So, yes, the complete fix will have to deal with dealing with these in ASCII, or not using quoted strings in IMAP search.
Strings such as "hello\kitty" or "common"key"" cannot be searched in ASCII currently. I take it that taka will take care of all these problems.
My plan is to switch to string literal so that we don't have to worry about escaping all those characters.
The quoted-string code should search for characters not permitted in quoted strings, switching to a literal when it finds them. It should avoid literals in other cases as synchronizing literals require an extra round trip.
taka what is the status. Is this on track ? Please consider buffering time for code review and different opinion.
i'm sorry that it's taking long time. this is my top priority thing to do now. i'll update you within a few days.
Mark it as P2
Priority: P3 → P2
taka is testing his approach now. He said he will post his patch shortly today.
Whiteboard: [nsbeta3+] → [nsbeta3+]ETA:9/1
Per jgmyer's comment and the fact that there is a bug in escaping code, I kept quoted string code rather than rewriting them for string literal. Following is the diff from the tip of tree for review: D:\0826\mozilla\mailnews\base\search\src>cvs diff -u nsMsgSearchAdapter.cpp Index: nsMsgSearchAdapter.cpp =================================================================== RCS file: /cvsroot/mozilla/mailnews/base/search/src/nsMsgSearchAdapter.cpp,v retrieving revision 1.25 diff -u -r1.25 nsMsgSearchAdapter.cpp --- nsMsgSearchAdapter.cpp 2000/08/20 01:08:29 1.25 +++ nsMsgSearchAdapter.cpp 2000/08/30 12:36:09 @@ -619,19 +619,35 @@ useQuotes = !reallyDredd || (nsAutoString(convertedValue).FindChar((PRUnichar)' ') != -1); - if (useQuotes) + + // now convert to char* and escape quoted_specials + value = TryToConvertCharset(convertedValue, destCharset, reallyDredd); + if (value) { - PRUnichar *oldConvertedValue = convertedValue; - convertedValue = EscapeQuoteImapSearchProtocol(convertedValue); - nsCRT::free(oldConvertedValue); + if (useQuotes) + { + char *oldValue = value; + // max escaped length is one extra character for every character in the cmd. + char *newValue = (char*)PR_Malloc(sizeof(char) * (2*nsCRT::strlen(value) + 1)); + if (newValue) + { + char *p = newValue; + while (1) + { + char ch = *value++; + if (!ch) + break; + if (ch == '"' || ch == '\\') + *p++ = '\\'; + *p++ = ch; + } + *p = '\0'; + value = nsCRT::strdup(newValue); // realloc down to smaller size + nsCRT::free(newValue); + } + nsCRT::free(oldValue); + } } - - // now convert to char* - value = TryToConvertCharset(convertedValue, destCharset, reallyDredd); - // if couldn't convert, send as is - if (!value) - value = NS_ConvertUCS2toUTF8(convertedValue).ToNewCString(); - nsCRT::free(convertedValue); valueWasAllocated = PR_TRUE; D:\0826\mozilla\mailnews\base\search\src>
this looks ok to me from what I remember after poking around in this code adding bienvenu in case he wants to review too
Whiteboard: [nsbeta3+]ETA:9/1 → [nsbeta3+]fix in hand. code reviewed. Wati for more review
I don't really know this code, so I'll just second Alec's comment. It looks ok, and performance isn't an issue since this code just executes when we're building search terms, which is rare.
So far, I haven't heard of any complains. I'll check it in late today (JST).
Whiteboard: [nsbeta3+]fix in hand. code reviewed. Wati for more review → [nsbeta3+]code reviwed. Wait for tree open to check in.
please check it in.
checked in. mark as fixed.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
With 9/11/2000 Win32 build, characters like "hon" as in "nihon" now can be searched. So also we can search with hiragana "a". (Note that I misstated above that "ni" in "nihon" cannot be searched. "ni" or "niti" has no problem. It was "hon" that contains 0x5C byte. However, this is not enough, there are other other characters like 0x5C (\) -- backslash 0x22 (") -- double-quote in English/ASCII which will fail. I'm out of time for this one and so will pass this on to ji. Please check out ASCII ones in addition to Japanese ones.
QA Contact: momoi → ji
Checked 2000-09-18-08 win32 build. When the keystring contains 0x5c characters, like "hon", IMAP search doesn't list the mail which matches the search criteria. When the key string contains hiragana "a", an error dialog pops up, saying that "The current command didn't succeed. The mail server responded:Unexpected extra arguments to Search." Reopened the bug.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I could really use some help on this one.
Ftang/Nhotta - Can you check to see if Taka can help Alec?
It's actually assign to him. Taka san, could you take a look at this?
There is a piece of code in NECKO which replaces '\' with '/'. I'll try to find a way to fix the problem with minimum impact.
If you generate %5C instead of \ in the mailnews code, will necko change it back to \ ?
Whiteboard: [nsbeta3+]code reviwed. Wait for tree open to check in. → [nsbeta3+]
didn't work.
i've found a way to work around NECKO and i'm pretty confident that I can fix this by end of tomorrow. i'll post a diff for review once tomorrow.
pdt agrees p2.
Whiteboard: [nsbeta3+] → [nsbeta3+][pdtp2]
Target Milestone: --- → M18
D:\0919\mozilla\mailnews\base\search\src>cvs diff -u nsMsgSearchAdapter.cpp Index: nsMsgSearchAdapter.cpp =================================================================== RCS file: /cvsroot/mozilla/mailnews/base/search/src/nsMsgSearchAdapter.cpp,v retrieving revision 1.26 diff -u -r1.26 nsMsgSearchAdapter.cpp --- nsMsgSearchAdapter.cpp 2000/09/05 09:07:24 1.26 +++ nsMsgSearchAdapter.cpp 2000/09/21 15:07:20 @@ -217,8 +217,16 @@ return result; } +/* + 09/21/2000 - taka@netscape.com + This method is bogus. Escape must be done against char * not PRUnichar * + should be rewritten later. + for now, just duplicate the string. +*/ PRUnichar *nsMsgSearchAdapter::EscapeSearchUrl (const PRUnichar *nntpCommand) { + return nsCRT::strdup(nntpCommand); +#if 0 PRUnichar *result = nsnull; // max escaped length is two extra characters for every character in the cmd. PRUnichar *scratchBuf = (PRUnichar*) PR_Malloc(sizeof(PRUnichar) * (3*nsCRT::strlen(nntpCommand) + 1)); @@ -246,11 +254,20 @@ nsCRT::free (scratchBuf); } return result; +#endif } +/* + 09/21/2000 - taka@netscape.com + This method is bogus. Escape must be done against char * not PRUnichar * + should be rewritten later. + for now, just duplicate the string. +*/ PRUnichar * nsMsgSearchAdapter::EscapeImapSearchProtocol(const PRUnichar *imapCommand) { + return nsCRT::strdup(imapCommand); +#if 0 PRUnichar *result = nsnull; // max escaped length is one extra character for every character in the cmd. PRUnichar *scratchBuf = @@ -276,11 +293,20 @@ nsCRT::free (scratchBuf); } return result; +#endif } +/* + 09/21/2000 - taka@netscape.com + This method is bogus. Escape must be done against char * not PRUnichar * + should be rewritten later. + for now, just duplicate the string. +*/ PRUnichar * nsMsgSearchAdapter::EscapeQuoteImapSearchProtocol(const PRUnichar *imapCommand) { + return nsCRT::strdup(imapCommand); +#if 0 PRUnichar *result = nsnull; // max escaped length is one extra character for every character in the cmd. PRUnichar *scratchBuf = @@ -306,6 +332,7 @@ nsCRT::free (scratchBuf); } return result; +#endif } @@ -616,37 +643,32 @@ // do all sorts of crazy escaping convertedValue = reallyDredd ? EscapeSearchUrl (searchTermValue) : EscapeImapSearchProtocol(searchTermValue); - useQuotes = !reallyDredd || (nsAutoString(convertedValue).FindChar((PRUnichar)' ') != -1); - // now convert to char* and escape quoted_specials value = TryToConvertCharset(convertedValue, destCharset, reallyDredd); if (value) { - if (useQuotes) + char *oldValue = value; + // max escaped length is one extra character for every character in the cmd. + char *newValue = (char*)PR_Malloc(sizeof(char) * (2*nsCRT::strlen(value) + 1)); + if (newValue) { - char *oldValue = value; - // max escaped length is one extra character for every character in the cmd. - char *newValue = (char*)PR_Malloc(sizeof(char) * (2*nsCRT::strlen(value) + 1)); - if (newValue) + char *p = newValue; + while (1) { - char *p = newValue; - while (1) - { - char ch = *value++; - if (!ch) - break; - if (ch == '"' || ch == '\\') - *p++ = '\\'; - *p++ = ch; - } - *p = '\0'; - value = nsCRT::strdup(newValue); // realloc down to smaller size - nsCRT::free(newValue); + char ch = *value++; + if (!ch) + break; + if ((useQuotes ? ch == '"' : 0) || ch == '\\') + *p++ = '\\'; + *p++ = ch; } - nsCRT::free(oldValue); + *p = '\0'; + value = nsCRT::strdup(newValue); // realloc down to smaller size + nsCRT::free(newValue); } + nsCRT::free(oldValue); } nsCRT::free(convertedValue); valueWasAllocated = PR_TRUE; D:\0919\mozilla\mailnews\base\search\src>
D:\0919\mozilla\mailnews\imap\src>cvs diff -u nsImapService.cpp Index: nsImapService.cpp =================================================================== RCS file: /cvsroot/mozilla/mailnews/imap/src/nsImapService.cpp,v retrieving revision 1.159 diff -u -r1.159 nsImapService.cpp --- nsImapService.cpp 2000/09/20 12:23:42 1.159 +++ nsImapService.cpp 2000/09/21 15:03:48 @@ -772,7 +772,12 @@ urlSpec.AppendWithConversion(hierarchySeparator); urlSpec.Append((const char *) folderName); urlSpec.Append('>'); - urlSpec.Append(aSearchUri); + // escape aSearchUri so that IMAP special characters (i.e. '\') + // won't be replaced with '/' in NECKO. + // it will be unescaped in nsImapUrl::ParseUrl(). + char *search_cmd = nsEscape((char *)aSearchUri, url_XAlphas); + urlSpec.Append(search_cmd); + nsCRT::free(search_cmd); rv = mailNewsUrl->SetSpec((char *) urlSpec.GetBuffer()); if (NS_SUCCEEDED(rv)) { D:\0919\mozilla\mailnews\imap\src>
two files need to be updated. search criteria will be URL escaped so that IMAP special character '\' won't be replaced with '/' in NECKO. search criteria string will be unescaped later before being sent to server. a couple of methods in nsMsgSearchAdapter class need to be rewritten. escaping must be done after converting from UCS2. i'll revisit these after M18.
Can you make sure that searching for things like %, /, and " still work? The escaping that is done on PRUnichar* was done to avoid excess conversions (if you try to change them back to char*, you'll see what I'm talking about) - it seems like it should be perfectly valid to escape/unescape PRUnichar* stuff. After you've verified that the above search strings work, then r=alecf.
>it seems like it should be perfectly valid to escape/unescape PRUnichar* stuff. char is 8 bits, so you escape it into %xx PRUnichar is 16 bits, how can you escape / unescape it correctly ?
alec- taka hack from Japan, in a different time zone from us. His AIM is mozippe. Sometime I saw him on line after 5:00PM PST.
well, I assume / would become PRUnichar('%') plus the unicode equivalent of the ASCII charcode for /.
There are several characters other than U+002F which use a 2F octet in their ISO-2022-* representation. Since a U+0025 character will cause an ISO-2022-* encoder to shift into ASCII mode, how can you possibly escape in a (PRUnicode *) a non-ASCII character which uses a 2F octet in is ISO-2022-* representation?
double checked that all ASCII range characters including ", \, and % can be searched correctly against nsmail-1.mcom.com. Alec, please review the code above. I'll check in as soon as tree becomes open and you give me a go sign.
checked in. r=alecf, a=alecf mark as fixed.
Status: REOPENED → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → FIXED
As long as block bug 53736 is resolved, I'll verify this.
Verified with win32 2000092409 build. It is fixed.
Status: RESOLVED → VERIFIED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.