Closed Bug 5933 Opened 22 years ago Closed 19 years ago

International support for IMAP4 search

Categories

(MailNews Core :: Internationalization, defect, P1)

All
Windows NT
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: mozilla, Assigned: nhottanscp)

References

Details

(Whiteboard: nsbeta2+]Exception Feature)

(This bug imported from BugSplat, Netscape's internal bugsystem.  It
was known there as bug #88257
http://scopus.netscape.com/bugsplat/show_bug.cgi?id=88257
Imported into Bugzilla on 05/04/99 17:49)

Messenger client should have fall back mechanism just in case IMAP4 server
doesn't support the charset used with SEARCH command. For example, when it's
working in Japanese char encoding, it should work like below:

     result = SEARCH(UTF8 string with charset "UTF-8");
     if (result == NO) {     // UTF-8 may not be supported.
         result = SEARCH(ISO-2022-JP string with charset "ISO-2022-JP");
         if (result == NO) { // ISO-2022-JP may not be supported.
             result = SEARCH(AS IS without charset);
             if (result == NO)
                 printf("Couldn't match any");
         }
     }

     Notice: By checking whether the search string contains only ASCII or not,
     you can skip first two SEARCH().  It's up to implementation.

Messenger client in Communicator 4.0 doesn't work like above.  It sends SEARCH
command with "Shift_JIS" charset, and gives up without retrying if server
response is "NO".
John, Isn't this on the Gromit "In" list?
Actually, there is one change to the algorithm specified here.

As the very first step, if search string contains only US-ASCII (regardless of
encoding of the search UI), then SEARCH with charset=US-ASCII otherwise continue
as listed here.
Seems more search related than IMAP related. If you disagree, assign back to me.
Yes; it's search related, so it goes to Scott :-)

We'll need to invent some way to allow multiple passes at a single search scope,
which we don't have right now.

To clarify jfriend's point, if the search string contains only US-ASCII, we only
try US-ASCII, and not any i18n charset stuff.

I'd also like to clarify whether the IMAP server must send a NO response
if it doesn't know the charset, or whether it can just search and not find any
matches.
Setting TFV to 4.5.
Mass moving bugs from product version 5.0 to 4.5 since that's where the bugs are
now (no change to TFV).
Setting qa assigned to field.
Not a PR1 stopper.
Bulk change: Bug assigned to mail/news engineer but no component specified.
Changed to mail/news component.
<sorry for the bug notification intrusion.  Product version on this bug shows
1.0 (due to a bugsplat bug).  Correcting all mail/news bugs numbered < 90000 to
product version 4.0.   Bulk changing this.>
FYI.

tintin.mcom.com is now running MS4.0 Beta which supports various
charset option for IMAP SEARCH.  If you don't have test environment
now, ask sheneman@netscape.com for an account.

However, I would recommend to have WSU's IMAP4 server as reference
as well. It also have good SEARCH implementation.
Phil, wasn't this the I18N bug we were talking about with Naoki and Bob? Where
you going to end up doing this one?

changing QA field to gbush
Bouncing over to Phil.
M15, I hope. I won't get to this for 4.5b2
Later. Too many more serious bugs for 4.5.
How can this remain "latered"?  Negotiation of search
charsets with our own MS4.0 is something most major mail clients
can perform now, e.g. Outlook, WinBiff, etc. We should be
competitive and support the search charset negotiation. Without
this, our IMAP search for Japanese and other non-ASCII languages
would not work. How can we promote our clients to enterprise
customers without this feature working?

MS4.0 is nearing completion. This bug should be a perfect candiadte
for 4.51.

Re-opening for consideration in 4.51.
In case we need to review how this functionality should work,
I consulted taka and came up with the following summary of the
spec.

** Proposed steps for negotiating down the IMAP search charset. **

0. Check the 'capability' of the IMAP server for UTF-8.

   IMAP4 capability command should return something like the
   following in response to "a capability" command:

     a capability
     * CAPABILITY IMAP4 IMAP4rev1 ACL QUOTA LITERAL  NAMESPACE UIDPLUS
     LANGUAGE XSENDER X-NETSCAPE XSERVERINFO AUTH=PLAIN AUTH=LOGIN
     a OK Completed

If the return string contains "X-NETSCAPE", we can be assured of UTF-8
seacrh capability with this server.

(Note: If you see X-NETSCAPE in the response of CAPABILITY command, there's
       100% guarantee that the server will recognize UTF-8 charset.  Do NOT
       rely on the banner message because it's configurable, user may change
       it to something else. You can always try UTF-8 as charset whethr or not
       it's IMAP4 server (it will fail if the server doesn't know UTF-8). )

1. Determine if the search string contains any 8-bit characters.

   ---> If not (=only 7-bit data), send the search string in ASCII.

2. If 1) is yes, then assume that the search charset is in the System
   Charset (or the global default -- e.g. in 4.5 we use global default
   for LDAP servers so that more than one charsets can be used for search.)
   Convert it to UTF-8 and send to the server. If the server accepts it,
   then it should return matches if there are any matches.

3. If the request in 2 is rejected by the server, then, send the string in
   the standard mail charset matching the System (or the global default)
   charset. (For example, iso-2022-jp for the Japanese Win/Mac system
   charset, Shift_JIS.)

4. If the request in 3 is rejected, then send the raw search
   string (as is) without any charset specification.
   And this completes the client's responsibility.

Open issue: Should we use the global default or the system charset
            as the basis for the source charset?  The global default is
            more flexible in that we can input in different charsets
            if proper keyboards or input methods are available as we
            change the global default.
qa assigned shouldn't be gbush.  Should be someone in msanz's group.
There are two issues,
The pref mailnews.force_ascii_search is set to true.
The second problem is that we need to convert search string to mail charset
which is JIS in case of Japanese. We are currently using the folder csid which
is ShiftJIS or EUC.

Here is a change I applied to my local tree.
Index: search.cpp
===================================================================
RCS file: /m/src/ns/lib/libmsg/search.cpp,v
retrieving revision 1.112.4.2.2.42
diff -c -r1.112.4.2.2.42 search.cpp
*** search.cpp	1998/10/01 04:24:55	1.112.4.2.2.42
--- search.cpp	1998/11/10 18:53:45
***************
*** 2182,2188 ****
--- 2182,2192 ----

  		// Ask the newsgroup/folder for its csid.
  		if (m_scope->m_folder)
  		{
  			dst_csid = m_scope->m_folder->GetFolderCSID() &
~CS_AUTO;
  dst_csid = INTL_DefaultMailCharSetID(dst_csid);
  		}

  	}

  	// default means that our best guess is to get the default window char
set ID
This sounds like a lot of work, so I think we shouldn't commit to doing this
for 4.51, unless a customer escalation comes in which forces us to do it.
Clearing TFV. Please see me before setting the TFV.

BTW, I think Naoki's proposed change above is partial, at best, and defeats the
per-folder CSID that we allow the user to set.
Why can it sound like a lot of work?  Naoki shows everything to fix.
What is wrong with partial solution?  Any serious side effect?

Although I don't mind what TFV it's got, I do care if customers in Japan
find all other IMAP clients work with Messaging Server 4.0, but
only Netscape client (except Messenger Express 4.1) doesn't with
Netscape's own IMAP server.

I've waited almost 10 month. And, seems like I have to keep
waiting more.  Am I expecting too much?
>and defeats the per-folder CSID that we allow the user to set.
That has been true anyway as we restrict to Ascii only. The other issue is that
we only support single charset inside the search dialog. Also more complicated
issue is folder hierachy which may have mixed charsets situations.
So, those issues need to be solved in future. But I am not sure if we should
support only ascii until we solve those issues.
> Why can it sound like a lot of work?

Because none of the other searching code takes more than one attempt at a search
based on the results of previous attempts.

> Naoki shows everything to fix.

That is absolutely not true. Naoki shows how to convert to the mail server's
charset only. That does not implement the algorithm Kat showed his 10/29/98
comments.

> I've waited almost 10 month. And, seems like I have to keep waiting more.
> Am I expecting too much?

As I said above, the question for when we add this feature is determined by
customer escalations. There are lots of other features that people have wanted
for longer then 10 months that we're not doing in 4.51.
After discussing various pros and cons, we have decided to
open a new bug for fulfilling a minimum IMAP search
requirement for the Japanese market. A new bug does not
ask for server-client negotiation, and should be handled by
the escalation team.
The new bug is: 334536.
TFV 5.0
I (or someone else) will be moving enhancements, etc, bugs targeted for 5.0 to
bugzilla in the near future.

------- Additional Comments From paulmac  May-04-1999 17:44 -------

Okay, time to close out old bugsplat bugs - Please move to bugzilla if this
one is still relevant or mark won't fix, please.

------- Additional Comments From momoi  May-04-1999 17:49 -------

Well, this is still a valid bug.
Let's move to 5.0 and send it to the Mail/News team.
Target Milestone: M9
Blocks: 7228
Target Milestone: M9 → M13
search is moving out.
Search won't be implemented until after Beta 1, so this bug does not need to be
fixed until after Beta 1
Assignee: phil → mscott
Status: REOPENED → NEW
Target Milestone: M13 → M14
mscott owns the search backend, so reassigning to him for M14. Searching is not
a B1 feature.
Target Milestone: M14 → M16
triagin...this is not a beta2 bug.
Target Milestone: M16 → M18
Based on Beta2 Criteria http://client/seamonkey/prd/beta2criteria.html.
This is beta2 P1 bug, should add a keyworkds beta2 on this bug?
Karen, the beta2 doc says we need to implement a search back end which is a
separate bug. We need the search backedn before we can start fixing bugs like
this which have been around since 4.5. =(

I don't see any mention of this bug in the beta2 docs so I'm not sure what you
were looking at or maybe you were thinking about the comment to implement search
for beta2?
I suck i was only looking under mail not under mail 18n on the beta2 docs.

moving back to a beta2 milestone. Thanks for catching my mistake Karen!

I18N, are you guys sure this is a beta2 stopper?
Target Milestone: M18 → M17
4.x didn't do this - I can't believe it would be a beta stopper for 6.0, and we 
could ship with it as well - we always have before.
From Beta2 Criteria http://client/seamonkey/prd/beta2criteria.html.
1) Scroll down to see the Features
2) Selec I18N Features.
3) Select Mail I18N
4) Search for Mail/News Tasks - IMAP I18N - IMAP search 5933 - P1

P.S. I don't know what I18N mean? Does anybody know that?
I18N = Internationalization. I believe that the i18n group says it's a beta 
stopper. I just don't think we're going to have time to do it.
OK. I am just checking & trying to clarify that.
Then the document should be modified!!
This bug was transferred from 4.x bug system.
What we need for beta2 is i18n IMAP search to work. It is working in 4.x.
In 4.x, if ascii search does fails then it falls back to another query using a 
folder charset.
But for mozilla, it is easier and better to do UTF-8 query since we have a 
query string in unicode.

This is an IMAP spec.

We made some very hard choices to ship 4.5 and this was one of the features
that was cut at the very end.

The mail server guys have been very adamant that the client needs to support
this and were very disappointed that if fell off the 4.5 list at the end of
that development cycle.

taka and jgmyers can provide more data on what will break for who without this
long awaited feature...
I'd be surprised if we get 80% of the search functionality that was in 4.5 into
6.0 - getting > 100% would be a miracle. If you hadn't noticed, we haven't even
started search yet!
Putting beta2 for i18n beta2 criteria items. Contact bobj for question.
Keywords: beta2
> This is an IMAP spec.

I don't see this mentioned in RFC 2060 or 2683. Please give the spec reference
which supports your claim.
Blocks: 35851
Keywords: nsbeta2
Putting on [nsbeta2-] radar. 
Keywords: beta2
Whiteboard: [nsbeta2-]
As the bug is old and the original comment is not consistent with what we need 
for beta2, I am rewriting the i18n requirement for beta2 (which is the same 
level of support as the current 4.x cleint). I also changed the summary.
For beta2, we need US-ASCII search and charset specified search (i18n search). 

Here is how we can do,
* Apply 7 bit check against search string. Assuming the search string is unicode 
(PRUnichar* or UTF-8), we can check < 128 against the search string.
* If the search string is 7bit then the do US-ASCII search (search with no 
charset specified).
* If the search string is 8bit then get the folder charset, convert the unicode 
string to the folder charset and specify the charset in the search command.
Summary: IMAP4 search doesn't retry if first attempt fails → International support for IMAP4 search
clear nsbeta2-
Whiteboard: [nsbeta2-]
ftang, why did you clear nsbeta2-..can you state your case?
Whiteboard: [NEED INFO]
Since search has been an approved feature exception, this goes hand in hand with
that. It basically says make our imap seach I18N friendly when we implement it =).
On exception list for PR2, removing 5/16...giving [nsbeta2+]Exception Feature 
status.
Whiteboard: [NEED INFO] → nsbeta2+]Exception Feature
It's my understanding that the mail team cut search today.
so, like the last bug, I did a bunch of i18n work yesterday.

And a reality check from everyone: This bug is over 2 years old now, a carryover
from 4.5.. the general i18n-ness of search is already covered in bug 11659..
kinda seems like this should just be a dupe. 

if however this bug is referring to the algorithm described at the top of this
file, I believe it may never have been implemented in 4.x.. and if that's the
case I'm not sure why this would be nsbeta2+

in any case, I think this should either go to bienvenu or myself to lighten
scott's load.
So after your i18n fixes, are we now close to parity 
with 4.5 and later? The spec there was described in
nhotta@netscape.com 2000-05-01 16:00 comment above.
That should be the minimum -- it has been implemnted before
and current users of Communicator will expect as much.
I _think_ so... we won't know for certain until we have a UI.
I haven't seen the equivalent of the algorithm described at the top of this 
bug...it might be there though
The algorithm which retries with a different character set if no hits are found
was not implemented in 4.x. Since that's that this bug was about originally, I'm
guessing that we should separate that issue (which we're not addressing for
seamonkey) with the issue of 4.x parity WRT i18n searching (which we should
address for seamonkey)
>I _think_ so... we won't know for certain until we have a UI.
Do we have a bug for that? As soon as that is resolved iqa can test i18n search.
ok, does anyone object to me marking this a dupe of 11659 (which has been marked
fixed) then? bienvenu has appearantly got IMAP search working, and I have
supposedly made the whole search backend i18n friendly...

*** This bug has been marked as a duplicate of 11659 ***
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → DUPLICATE
nhotta, see bug 33101 for the search UI frontend bug.
I just added you to the CC
i object actually. Alecf, this bug refers to a specic algorith for imap4
searching that escalation engineering implemented in 4.6. This bug is track this
when we implement search for imap.

It's separate from the random i18n filter and search bug you marked it a  dup of.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
I don't really see how i18n search can be done, despite what Alec has done. My 
understanding was that the i18n group had to provide us with API's that existed 
in 4.5 but no longer exist in 6.0 in order for i18n search to work. Of course, 
I've been out of it for a long time, but that was my understanding.
The last time we talked about NNTP search API and we dropped that from beta2.
For I map, I think what I mentioned in 2000-05-01 16:00 are available in 6.0 
(e.g. getting a folder charset, conversion from unicode to a folder 
charset, etc.).
is the latter also true for local? Convert the headers to unicode using the 
charset and do a unicode comparison with the utf8->unicode converted search 
string? What about message bodies? We can't really convert the whole message 
body to unicode in memory, can we?
I believe that some of the search code converts the unicode search term to the
folder's charset, then performs the search with this converted string.
alecf's description is how local searching is supposed to work (and did in 4.x).


For local search, header search requires MIME decoding. Sicne the MIME decoder 
returns unicode, that can be compared with the search term.
For body local search, I believe we converted the body (not the search term). 
Here is a 4.x code, I belive DO_I18N was defined in the 4.x (otherwise japanese 
search wouldn't work).
http://lxr.mozilla.org/mozilla/source/mailnews/base/search/src/nsMsgSearchTerm.c
pp#687

739 #ifdef DO_I18N
740                     // In here we do I18N conversion if we get the converter
741                     char *newBody = nsnull;
742                     newBody = (char *)INTL_CallCharCodeConverter(conv, 
(unsigned char *) buf, (int32) PL_strlen(buf));
743                     if (newBody && (newBody != buf))
744                     {
745                         // CharCodeConverter return the char* to the orginal 
string
746                         // we don't want to free body in that case 
747                         compare = newBody;
748                     }
749 #endif

DO_I18N is not in the 4.5 code; it was added to 6.0 so the code would compile 
because things like INTL_CreateCharCodeConverter don't exist in 6.0 - I think 
this was one area where we need a 6.0 equivalent way of doing this.
you know, it's actually going to be EASIER for me to convert the user-entered
value to the folder's charset and do the body search that way. Anyone object if
I do it that way? It'll be faster too.
I take that back, it's not as simple as I had hoped.. converting the body is the
easy way right now.
Reposting my comment in 2000-05-01 16:00 which contains I18N requirement for 
nsbeta2.

> As the bug is old and the original comment is not consistent with what we need 
> for beta2, I am rewriting the i18n requirement for beta2 (which is the same 
> level of support as the current 4.x cleint). I also changed the summary.
> For beta2, we need US-ASCII search and charset specified search (i18n search). 
> 
> Here is how we can do,
> * Apply 7 bit check against search string. Assuming the search string is 
unicode 
> (PRUnichar* or UTF-8), we can check < 128 against the search string.
> * If the search string is 7bit then the do US-ASCII search (search with no 
> charset specified).
> * If the search string is 8bit then get the folder charset, convert the 
unicode 
> string to the folder charset and specify the charset in the search command.




Adding jaimejr@netscape.com and putterman@netscape.com to cc.
Added myself to Cc.
I and taka started to look at the code. The search criteria string is UTF-8 and 
there is also a function to get a folder charset. 7 bit check can be done 
easily agains a UTF-8 string. Also, we can convert the string from UTF-8 to a 
folder charset.
Taka pointed that we can use literal string instead of quoted string (which 
needs escaping for some charset, e.g. ISO-2022-JP).
The patch (hooked up charset conversion) was reviewed. I will probably check in 
tomorrow.
Checked in, testable once the UI is functional again.
Status: REOPENED → RESOLVED
Closed: 20 years ago19 years ago
Resolution: --- → FIXED
** Checked with 7/10/2000 Win32 build **

OK, we are finally able to check on this because I can now
see attribute names. 

Here's what works:

1. With the default view charset set to ISO-2022-JP, a single condition
 or "OR" with more than 1 attributes work OK to find relevant messages
 when we input Japanese search keys.

What does not work:

1. Any search after the first one using a Japanese word produces no 
   change even if you change an attribute value to another Japanese
   word. Even if you close the Search window and re-open it,
   it does not seem possible to do any search. If you use ASCII values,
   you can do more than 1 search at a time succesfully. This problem
   seems to be due to the use of non-ASCII data as search keys. 
2. Any change in attribute category changes, e.g. from Subject to 
   Sender, or from "OR" conjunction to "AND" conjunction. This type
   of change forces the server to send an error message saying that
   "Required argument was missing."
   This problem happens regardless of the charset of the attribute values
   used.

There are other problems but I have not sorted them out yet.

For item 2, I'll look for an existing bug. But for Item 1, I need to
re-open this bug.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reassing to me.
Can we file a separate bug for the first problem since the international search 
itself has been enabled?

Assignee: mscott → nhotta
Status: REOPENED → NEW
Can we get a bit more analysis before deciding to file another
bug? If we know for sure that it has nothing to do with the 
way non-ASCII was implemented, then let's file a new bug.

Problem #1 makes Search in Japanese very difficult since 
users often try one key and then another in case the first didn't work.
Unless I am mistaken, the user will have to reboot Mozilla to try
the next search. That is really bad. 
I've looked at Problem #1 a bit further and it seems that
the problem is a bit more complex than I had described above.
It seems that if you pick certain Japanese words, you can
do more than 1 search at a time. When you use some other word,
it does not work until you use some other data that do not have
this problem. One example of a problem word is "Ni-hon" (Japan
in Kanji). I have not been able to do any search with it.
I used win32 build ID 2000071008 on WinNT 4 Japanese and I can search Japanese 
strings more than once.
First, I searched "mail" in Japanese then got some results.
Then I searched "homepage" in Japanese then got additional results and they were 
appended to the search result.
And I searched "welcome" in Japanese  then got additional results and they were 
appended to the search result.
So I cannot reproduce the problem. There may be a condition to reproduce this. 
Anyway, I prefer the problem to be filed separately.
Would you try "nihon" in Kanji and see if that works?

There seems to be another problem in search string
formation to send to the server. See the SCOPUS bug
we dealt with for Communicator, Bug ID 343598. The example
string described there, Hiragana "a", causes a server
error in Mozilla.
Please file seperate bug instead reopen this feature bug. Individual bugs will 
help us track different cases. 
OK. I'll verify that the basic Intl IMAP functionality is 
working.
There are soem misses and they will be filed as separate bugs.
Status: NEW → RESOLVED
Closed: 19 years ago19 years ago
Resolution: --- → FIXED
** Checked on 7/10/2000 Win32, Mac, and Linux builds **

On the above builds, basic non-ASCII search function is now
working as long the search keys match the default view charset
set in the Preferences dilaog.
Marking it verifies ad fixed.
We will file new bugs for thsi new feature in separate bugs.
Status: RESOLVED → VERIFIED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.