45222 - IMAP search for Japanese fails when search key contains 0x5C or 0x22

Reporter

Description

•

25 years ago

** Observed with 7/10/2000 Win32 build ** This is a left over bug from Bug 5933 where basic IMAP search for non-ASCII key data has been implemented. Unfortunately this still leaves a problem with those JPN characters whose JIS 2nd-byte corresponds to 0x5C (\) or 0x22 (*) which have special significance in IMAP protocols. See a SCOPUS Communicator bug 343598 for details, particularly my comments: Additional Comments From momoi 05/03/1999 02:51 There I summarize what needs to be looked to delclare that the problem has been fixed. Here are some characters which will fail: "niti" (Day) in Kanji, "a" as in Hiragana "a". See other comments in Bug 5933.

Takayuki Tei

Assignee

Comment 1

•

25 years ago

We should use literal string over quoted string for IMAP search to avoid this sort of problem.

nhottanscp

Comment 2

•

25 years ago

nsbeta3, reassign to taka.

Assignee: nhotta → taka

Keywords: nsbeta3

Takayuki Tei

Assignee

Comment 3

•

25 years ago

accepted

Status: NEW → ASSIGNED

Frank Tang

Comment 4

•

25 years ago

nsbeta3+ per i18n bug review meeting.

Whiteboard: [nsbeta3+]

Frank Tang

Comment 5

•

25 years ago

What happen to English message which have \ and * in it ? For example, if I have a English message, the subject is "Hello\My Test" and I search "o\M", can I find it ? or if I have a message which subject is "Hello*My Test" and I search for "o*M" , can I find it ? If the answer is NO, then we have a generic escaping issue that MailNews folks didn't do.

Katsuhiko Momoi

Reporter

Comment 6

•

25 years ago

I did not paste in all the comments from the internal bug database but in that bug, I stated that these special characters are missed even in ASCII search. 0x5C (\) -- backslash 0x22 (") -- double-quote (there was an error in my original bug report above. 0x22 is not (*). ) These 2 are the "quoted specials" in RFC 2060. So, yes, the complete fix will have to deal with dealing with these in ASCII, or not using quoted strings in IMAP search.

Katsuhiko Momoi

Reporter

Comment 7

•

25 years ago

Strings such as "hello\kitty" or "common"key"" cannot be searched in ASCII currently. I take it that taka will take care of all these problems.

Takayuki Tei

Assignee

Comment 8

•

25 years ago

My plan is to switch to string literal so that we don't have to worry about escaping all those characters.

John G. Myers

Comment 9

•

25 years ago

The quoted-string code should search for characters not permitted in quoted strings, switching to a literal when it finds them. It should avoid literals in other cases as synchronizing literals require an extra round trip.

Frank Tang

Comment 10

•

25 years ago

taka what is the status. Is this on track ? Please consider buffering time for code review and different opinion.

Takayuki Tei

Assignee

Comment 11

•

25 years ago

i'm sorry that it's taking long time. this is my top priority thing to do now. i'll update you within a few days.

Frank Tang

Comment 12

•

25 years ago

Mark it as P2

Priority: P3 → P2

Frank Tang

Comment 13

•

25 years ago

taka is testing his approach now. He said he will post his patch shortly today.

Whiteboard: [nsbeta3+] → [nsbeta3+]ETA:9/1

Takayuki Tei

Assignee

Comment 14

•

25 years ago

Per jgmyer's comment and the fact that there is a bug in escaping code, I kept quoted string code rather than rewriting them for string literal. Following is the diff from the tip of tree for review: D:\0826\mozilla\mailnews\base\search\src>cvs diff -u nsMsgSearchAdapter.cpp Index: nsMsgSearchAdapter.cpp =================================================================== RCS file: /cvsroot/mozilla/mailnews/base/search/src/nsMsgSearchAdapter.cpp,v retrieving revision 1.25 diff -u -r1.25 nsMsgSearchAdapter.cpp --- nsMsgSearchAdapter.cpp 2000/08/20 01:08:29 1.25 +++ nsMsgSearchAdapter.cpp 2000/08/30 12:36:09 @@ -619,19 +619,35 @@ useQuotes = !reallyDredd || (nsAutoString(convertedValue).FindChar((PRUnichar)' ') != -1); - if (useQuotes) + + // now convert to char* and escape quoted_specials + value = TryToConvertCharset(convertedValue, destCharset, reallyDredd); + if (value) { - PRUnichar *oldConvertedValue = convertedValue; - convertedValue = EscapeQuoteImapSearchProtocol(convertedValue); - nsCRT::free(oldConvertedValue); + if (useQuotes) + { + char *oldValue = value; + // max escaped length is one extra character for every character in the cmd. + char *newValue = (char*)PR_Malloc(sizeof(char) * (2*nsCRT::strlen(value) + 1)); + if (newValue) + { + char *p = newValue; + while (1) + { + char ch = *value++; + if (!ch) + break; + if (ch == '"' || ch == '\\') + *p++ = '\\'; + *p++ = ch; + } + *p = '\0'; + value = nsCRT::strdup(newValue); // realloc down to smaller size + nsCRT::free(newValue); + } + nsCRT::free(oldValue); + } } - - // now convert to char* - value = TryToConvertCharset(convertedValue, destCharset, reallyDredd); - // if couldn't convert, send as is - if (!value) - value = NS_ConvertUCS2toUTF8(convertedValue).ToNewCString(); - nsCRT::free(convertedValue); valueWasAllocated = PR_TRUE; D:\0826\mozilla\mailnews\base\search\src>

Alec Flett

Comment 15

•

25 years ago

this looks ok to me from what I remember after poking around in this code adding bienvenu in case he wants to review too

Frank Tang

Updated

•

25 years ago

Whiteboard: [nsbeta3+]ETA:9/1 → [nsbeta3+]fix in hand. code reviewed. Wati for more review

David :Bienvenu

Comment 16

•

25 years ago

I don't really know this code, so I'll just second Alec's comment. It looks ok, and performance isn't an issue since this code just executes when we're building search terms, which is rare.

Takayuki Tei

Assignee

Comment 17

•

25 years ago

So far, I haven't heard of any complains. I'll check it in late today (JST).

Frank Tang

Updated

•

25 years ago

Whiteboard: [nsbeta3+]fix in hand. code reviewed. Wati for more review → [nsbeta3+]code reviwed. Wait for tree open to check in.

Frank Tang

Comment 18

•

25 years ago

please check it in.

Takayuki Tei

Assignee

Comment 19

•

25 years ago

checked in. mark as fixed.

Status: ASSIGNED → RESOLVED

Closed: 25 years ago

Resolution: --- → FIXED

Katsuhiko Momoi

Reporter

Comment 20

•

25 years ago

With 9/11/2000 Win32 build, characters like "hon" as in "nihon" now can be searched. So also we can search with hiragana "a". (Note that I misstated above that "ni" in "nihon" cannot be searched. "ni" or "niti" has no problem. It was "hon" that contains 0x5C byte. However, this is not enough, there are other other characters like 0x5C (\) -- backslash 0x22 (") -- double-quote in English/ASCII which will fail. I'm out of time for this one and so will pass this on to ji. Please check out ASCII ones in addition to Japanese ones.

QA Contact: momoi → ji

ji

Comment 21

•

25 years ago

Checked 2000-09-18-08 win32 build. When the keystring contains 0x5c characters, like "hon", IMAP search doesn't list the mail which matches the search criteria. When the key string contains hiragana "a", an error dialog pops up, saying that "The current command didn't succeed. The mail server responded:Unexpected extra arguments to Search." Reopened the bug.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Alec Flett

Comment 22

•

25 years ago

I could really use some help on this one.

Jaime Rodriguez, Jr.

Comment 23

•

25 years ago

Ftang/Nhotta - Can you check to see if Taka can help Alec?

nhottanscp

Comment 24

•

25 years ago

It's actually assign to him. Taka san, could you take a look at this?

Takayuki Tei

Assignee

Comment 25

•

25 years ago

There is a piece of code in NECKO which replaces '\' with '/'. I'll try to find a way to fix the problem with minimum impact.

Frank Tang

Comment 26

•

25 years ago

If you generate %5C instead of \ in the mailnews code, will necko change it back to \ ?

Whiteboard: [nsbeta3+]code reviwed. Wait for tree open to check in. → [nsbeta3+]

Takayuki Tei

Assignee

Comment 27

•

25 years ago

didn't work.

Takayuki Tei

Assignee

Comment 28

•

25 years ago

i've found a way to work around NECKO and i'm pretty confident that I can fix this by end of tomorrow. i'll post a diff for review once tomorrow.

selmer (gone)

Comment 29

•

25 years ago

pdt agrees p2.

Whiteboard: [nsbeta3+] → [nsbeta3+][pdtp2]

Target Milestone: --- → M18

Takayuki Tei

Assignee

Comment 30

•

25 years ago

D:\0919\mozilla\mailnews\base\search\src>cvs diff -u nsMsgSearchAdapter.cpp Index: nsMsgSearchAdapter.cpp =================================================================== RCS file: /cvsroot/mozilla/mailnews/base/search/src/nsMsgSearchAdapter.cpp,v retrieving revision 1.26 diff -u -r1.26 nsMsgSearchAdapter.cpp --- nsMsgSearchAdapter.cpp 2000/09/05 09:07:24 1.26 +++ nsMsgSearchAdapter.cpp 2000/09/21 15:07:20 @@ -217,8 +217,16 @@ return result; } +/* + 09/21/2000 - taka@netscape.com + This method is bogus. Escape must be done against char * not PRUnichar * + should be rewritten later. + for now, just duplicate the string. +*/ PRUnichar *nsMsgSearchAdapter::EscapeSearchUrl (const PRUnichar *nntpCommand) { + return nsCRT::strdup(nntpCommand); +#if 0 PRUnichar *result = nsnull; // max escaped length is two extra characters for every character in the cmd. PRUnichar *scratchBuf = (PRUnichar*) PR_Malloc(sizeof(PRUnichar) * (3*nsCRT::strlen(nntpCommand) + 1)); @@ -246,11 +254,20 @@ nsCRT::free (scratchBuf); } return result; +#endif } +/* + 09/21/2000 - taka@netscape.com + This method is bogus. Escape must be done against char * not PRUnichar * + should be rewritten later. + for now, just duplicate the string. +*/ PRUnichar * nsMsgSearchAdapter::EscapeImapSearchProtocol(const PRUnichar *imapCommand) { + return nsCRT::strdup(imapCommand); +#if 0 PRUnichar *result = nsnull; // max escaped length is one extra character for every character in the cmd. PRUnichar *scratchBuf = @@ -276,11 +293,20 @@ nsCRT::free (scratchBuf); } return result; +#endif } +/* + 09/21/2000 - taka@netscape.com + This method is bogus. Escape must be done against char * not PRUnichar * + should be rewritten later. + for now, just duplicate the string. +*/ PRUnichar * nsMsgSearchAdapter::EscapeQuoteImapSearchProtocol(const PRUnichar *imapCommand) { + return nsCRT::strdup(imapCommand); +#if 0 PRUnichar *result = nsnull; // max escaped length is one extra character for every character in the cmd. PRUnichar *scratchBuf = @@ -306,6 +332,7 @@ nsCRT::free (scratchBuf); } return result; +#endif } @@ -616,37 +643,32 @@ // do all sorts of crazy escaping convertedValue = reallyDredd ? EscapeSearchUrl (searchTermValue) : EscapeImapSearchProtocol(searchTermValue); - useQuotes = !reallyDredd || (nsAutoString(convertedValue).FindChar((PRUnichar)' ') != -1); - // now convert to char* and escape quoted_specials value = TryToConvertCharset(convertedValue, destCharset, reallyDredd); if (value) { - if (useQuotes) + char *oldValue = value; + // max escaped length is one extra character for every character in the cmd. + char *newValue = (char*)PR_Malloc(sizeof(char) * (2*nsCRT::strlen(value) + 1)); + if (newValue) { - char *oldValue = value; - // max escaped length is one extra character for every character in the cmd. - char *newValue = (char*)PR_Malloc(sizeof(char) * (2*nsCRT::strlen(value) + 1)); - if (newValue) + char *p = newValue; + while (1) { - char *p = newValue; - while (1) - { - char ch = *value++; - if (!ch) - break; - if (ch == '"' || ch == '\\') - *p++ = '\\'; - *p++ = ch; - } - *p = '\0'; - value = nsCRT::strdup(newValue); // realloc down to smaller size - nsCRT::free(newValue); + char ch = *value++; + if (!ch) + break; + if ((useQuotes ? ch == '"' : 0) || ch == '\\') + *p++ = '\\'; + *p++ = ch; } - nsCRT::free(oldValue); + *p = '\0'; + value = nsCRT::strdup(newValue); // realloc down to smaller size + nsCRT::free(newValue); } + nsCRT::free(oldValue); } nsCRT::free(convertedValue); valueWasAllocated = PR_TRUE; D:\0919\mozilla\mailnews\base\search\src>

Takayuki Tei

Assignee

Comment 31

•

25 years ago

D:\0919\mozilla\mailnews\imap\src>cvs diff -u nsImapService.cpp Index: nsImapService.cpp =================================================================== RCS file: /cvsroot/mozilla/mailnews/imap/src/nsImapService.cpp,v retrieving revision 1.159 diff -u -r1.159 nsImapService.cpp --- nsImapService.cpp 2000/09/20 12:23:42 1.159 +++ nsImapService.cpp 2000/09/21 15:03:48 @@ -772,7 +772,12 @@ urlSpec.AppendWithConversion(hierarchySeparator); urlSpec.Append((const char *) folderName); urlSpec.Append('>'); - urlSpec.Append(aSearchUri); + // escape aSearchUri so that IMAP special characters (i.e. '\') + // won't be replaced with '/' in NECKO. + // it will be unescaped in nsImapUrl::ParseUrl(). + char *search_cmd = nsEscape((char *)aSearchUri, url_XAlphas); + urlSpec.Append(search_cmd); + nsCRT::free(search_cmd); rv = mailNewsUrl->SetSpec((char *) urlSpec.GetBuffer()); if (NS_SUCCEEDED(rv)) { D:\0919\mozilla\mailnews\imap\src>

Takayuki Tei

Assignee

Comment 32

•

25 years ago

two files need to be updated. search criteria will be URL escaped so that IMAP special character '\' won't be replaced with '/' in NECKO. search criteria string will be unescaped later before being sent to server. a couple of methods in nsMsgSearchAdapter class need to be rewritten. escaping must be done after converting from UCS2. i'll revisit these after M18.

Alec Flett

Comment 33

•

25 years ago

Can you make sure that searching for things like %, /, and " still work? The escaping that is done on PRUnichar* was done to avoid excess conversions (if you try to change them back to char*, you'll see what I'm talking about) - it seems like it should be perfectly valid to escape/unescape PRUnichar* stuff. After you've verified that the above search strings work, then r=alecf.

Frank Tang

Comment 34

•

25 years ago

>it seems like it should be perfectly valid to escape/unescape PRUnichar* stuff. char is 8 bits, so you escape it into %xx PRUnichar is 16 bits, how can you escape / unescape it correctly ?

Frank Tang

Comment 35

•

25 years ago

alec- taka hack from Japan, in a different time zone from us. His AIM is mozippe. Sometime I saw him on line after 5:00PM PST.

Alec Flett

Comment 36

•

25 years ago

well, I assume / would become PRUnichar('%') plus the unicode equivalent of the ASCII charcode for /.

John G. Myers

Comment 37

•

25 years ago

There are several characters other than U+002F which use a 2F octet in their ISO-2022-* representation. Since a U+0025 character will cause an ISO-2022-* encoder to shift into ASCII mode, how can you possibly escape in a (PRUnicode *) a non-ASCII character which uses a 2F octet in is ISO-2022-* representation?

Takayuki Tei

Assignee

Comment 38

•

25 years ago

double checked that all ASCII range characters including ", \, and % can be searched correctly against nsmail-1.mcom.com. Alec, please review the code above. I'll check in as soon as tree becomes open and you give me a go sign.

Takayuki Tei

Assignee

Comment 39

•

25 years ago

checked in. r=alecf, a=alecf mark as fixed.

Status: REOPENED → RESOLVED

Closed: 25 years ago → 25 years ago

Resolution: --- → FIXED

ji

Comment 40

•

25 years ago

As long as block bug 53736 is resolved, I'll verify this.

ji

Comment 41

•

25 years ago

Verified with win32 2000092409 build. It is fixed.

Status: RESOLVED → VERIFIED

Myk Melez [:myk] [@mykmelez]

Updated

•

21 years ago

Product: MailNews → Core

Nobody; OK to take it and work on it

Updated

•

17 years ago

Product: Core → MailNews Core