Last Comment Bug 6983 - UTF-7 encoder/decoder needed
: UTF-7 encoder/decoder needed
Status: VERIFIED FIXED
:
Product: MailNews Core
Classification: Components
Component: Internationalization (show other bugs)
: Trunk
: All All
: P3 normal (vote)
: M8
Assigned To: cata
: Katsuhiko Momoi
Mentors:
Depends on: 6813
Blocks:
  Show dependency treegraph
 
Reported: 1999-05-24 11:08 PDT by nhottanscp
Modified: 2008-07-31 01:22 PDT (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments

Description nhottanscp 1999-05-24 11:08:16 PDT
Both encoder and decoder are needed.

Here is a list of RFCs.
utf-7 - ftp://ftp.isi.edu/in-notes/rfc2152.txt
imap rfc for modified utf-7 - ftp://ftp.isi.edu/in-notes/rfc2060.txt
Comment 1 nhottanscp 1999-05-24 11:09:59 PDT
Set target to M7, adding bienvenu@netscape.com,bobj@netscape.com to cc.
Comment 2 John G. Myers 1999-05-26 23:41:59 PDT
M-UTF-7 is needed in both directions for IMAP folders.

I suggest not doing an encoder for plain UTF-7, making it a read-only charset.
Comment 3 nhottanscp 1999-05-27 10:16:59 PDT
I think plain UTF-7 encoder is needed to send UTF-7 mails.
Comment 4 John G. Myers 1999-05-27 10:33:59 PDT
My point is that there is no need to send UTF-7 mail.  Such mail should be sent
in UTF-8 instead.
Comment 5 nhottanscp 1999-05-27 11:20:59 PDT
UTF-7 message send issue was forwarded to netscape.public.mozilla.mail-news.
Comment 6 bobj 1999-06-09 11:02:59 PDT
Is don't think this bug is about IMAP.   Bug 6814 is about supporting
modified UTF-7 for IMAP folder names.  This bug seems to be about sending
messages encoded in UTF-7.  I've updated the summary to reflect this.

I agree with jgmeyers -- I'm not sure how useful or advisable it is to send
UTF-7 email.  Let's see if we get any feedback from the news group.
Moving to M8.
Comment 7 nhottanscp 1999-06-16 10:14:59 PDT
We are not going to include UTF-7 to the message compose charset menu.
UTF-7 converters were checked in so I mark this as fixed.
If any quality issue for converters, it should be filed separately.
Comment 8 bobj 1999-06-16 18:19:59 PDT
Changed Summary to reflect the bug.

This bug is NOT about sending UTF-7 email.  It was decide that we will not
support sending UTF-7 messages.  We will support viewing properly labeled UTF-7
messages.
Comment 9 Katsuhiko Momoi 1999-06-18 12:08:59 PDT
** Checked with 6/19/99 Win32 build **

The encoding part seems to be OK.
I'm not sure if decoding is working, but there are some problems
in viewing JPN strings:

Msgs sent from 4.6:

1. Can view body but not thread pane header nor msg window header

Msgs sent from 5.0:

2. Cannot view body.

However, I can view all of the above correctly with 4.6 client.

This may be due to Bug 8343.
In any case, decoding is not working and so I'll
re-open this bug.
Comment 10 leger 1999-06-18 13:43:59 PDT
Clearing Fixed resolution.
Comment 11 cata 1999-06-29 14:23:59 PDT
But Bug 8343 is not a converter problem. The decoder is just fine. So, what's
going on then? Please provide a test case and I'll try to see where we have the
problem and why.
Comment 12 Katsuhiko Momoi 1999-07-01 16:10:59 PDT
** Checked with 7/1/99 Win32 build **

This has been a complicated problem but I think we have almost
all the pieces resolved now.

First, the viewing problem.

1. The problem with viewing UTF-7 plain text mail from 4.6
   had to do with the fact that 4.6 did not use MIME header.
   Charset parameter (=UTF-7) is sent out. This explains
   why the headers could not be displayed.
2. Outlok Express 4 labels the header but not the body, and this
   caused the body not to be displayed without additional steps.

3. Outlook Express 5 sends the properly labeled header and
   body. Thus we have no problem with OE 5 UTF-7 msgs.

4. With Messenger 5.0 Mail send. We were sending extraneous
   characters before and after the intended string until the
    6/30/99 build. 5.0 had was displaying something extra (a blank
    suqare) at the beginning of a string. Starting with 7/1/99
    build, we seem to be sending only the intended string + the
    coda character.

   Here's an example

   Intended string: êtíra

   6/30/99 build: +ABcA6g-t+AO0-ra+AA0AC
   7/1/99 build: +AOo-t+AO0-ra+AA0AC
   Outlook Express 5: +AOo-t+AO0-ra

   Note: +AAOAC  at the end of the 7/1/99 encoded string.

I'm now inclined to mark this bug resolved, but still not sure why
we are inserting extra stuff at the end.
Comment 13 John G. Myers 1999-07-01 16:14:59 PDT
Didn't we decide to never send UTF-7 in mail messages?
Comment 14 Katsuhiko Momoi 1999-07-01 16:17:59 PDT
No, what we decided was that we would not ship with UTF-7 menu
for the mail composer. But someone can actually enable UTF-7 menu
easily via modifying the .xul file. I think we still need to send
correctly encoded characters in case someone enables the menu.
Comment 15 bobj 1999-07-02 14:47:59 PDT
Can we confirm (e.e.g, in debugger, in standalone testing) that the converter
is not adding the extraneious bytes?  If so, we need to reassign it to
someone in the mail team.  Is this only happening for UTF-7?
Comment 16 bobj 1999-07-02 14:48:59 PDT
Actually, if we can confirm this is not the converter adding the extraneous
bytes, let's mark this bug FIXED and open a new bug.
Comment 17 nhottanscp 1999-07-02 16:18:59 PDT
I tried the same characters as momoi san, looks like line feed 0x0A is encoded
(so the message is encoded in one big line). Although I don't know if we should
encode it or not.
But I found that UTF-7 is actually included in the menu. I don't think I added
it, we should remove it later.
Comment 18 John G. Myers 1999-07-02 16:48:59 PDT
\u000a should not be encoded.
The MIME "text/*" top-level type requires all of its charsets to encode
line breaks as \x0d \x0a.
Comment 19 bobj 1999-07-02 17:54:59 PDT
What he said.
Space (decimal 32), tab (decimal 9), carriage return, (decimal 13),
and line feed (decimal 10) are allowed to be represented by their
ASCII equivalents.  Our converter probably should do so.

UTF-7 RFC (ftp://ftp.isi.edu/in-notes/rfc2152.txt):
   A UTF-7 stream represents 16-bit Unicode characters using 7-bit US-
   ASCII octets as follows:

   ...

      Rule 3: The space (decimal 32), tab (decimal 9), carriage return
      (decimal 13), and line feed (decimal 10) characters may be
      directly represented by their ASCII equivalents. However, note
      that MIME content transfer encodings have rules concerning the use
      of such characters. Usage that does not conform to the
      restrictions of RFC 822, for example, would have to be encoded
      using MIME content transfer encodings other than 7bit or 8bit,
      such as quoted-printable, binary, or base64.
Comment 20 cata 1999-07-06 15:44:59 PDT
Right. Fixed.
Comment 21 Katsuhiko Momoi 1999-07-09 16:03:59 PDT
** Checked with 7/9/99 Win32 build **

I checked the sample string: êtíra
and confirmed that we are not encoding LF any more.
Thus the raw string looks like: +AOo-t+AO0-ra
which is what we desired for this.

I also checked "space" and "tab" and they are not
encoded, either.
I'm not sure of "CR", however. I'm looking at Windows
client, but we are not using CR+LF for linebreaks.
I wonder if there is a convention/requirement to use the Unix
type "LF" only for sending mail
In any case, I'm going to mark this fix verified now as all remaining issues
seem to have been verified.

I'll file a new bug to remove UTF-7 from the Charset menu for
M9 since we probably won't need it for regular debugging any more.
Comment 22 Katsuhiko Momoi 1999-07-09 16:12:59 PDT
A request to remove UTF-7 from the Mail Composer menu
has been filed as Bug 9555.

Note You need to log in before you can comment on or make changes to this bug.