Last Comment Bug 3889 - Conversion failure: char corruption
: Conversion failure: char corruption
Product: MailNews Core
Classification: Components
Component: Internationalization (show other bugs)
: Trunk
: x86 Windows NT
: P3 normal (vote)
: M5
Assigned To: rhp (gone)
: Katsuhiko Momoi
Depends on:
  Show dependency treegraph
Reported: 1999-03-16 18:33 PST by Katsuhiko Momoi
Modified: 2008-07-31 01:22 PDT (History)
5 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---


Description Katsuhiko Momoi 1999-03-16 18:33:34 PST
** Observed with 3/16/99 Mozilla Win32 build **

Send yourself 2 msgs:

1. HTML and Plain text
2. Both should contain a Japanese Subject line: "Kore ha Nihongo no Meeru desu."
3. The same wording as the subject line should be in the body text.
4. Send yourself these 2 msgs. Receive them and display them.

Note that both have problems of character corruption right after
"Me" in "Meeru". Apparently the long vowel symbol is causing a problem.

Should check if this is a general conversion failure from JIS
to Unicode in this area.
Comment 1 Frank Tang 1999-03-16 21:34:59 PST
momoi, please clearify the sending is done on 5.0 or 4.5 ? Is this is a sending
problem or a receiving problem ? If the problem is receiving, then the target
should be M3, if the problem is sending, then mark it M4. I mark it as M3 for
now, please change the target ASAP. Thanks.
Comment 2 Katsuhiko Momoi 1999-03-16 21:47:59 PST
The original msgs were sent by Comm4.51 and are correctly displayed
by the same client.This seems to be either a display/conversion problem
or parsing problem by 5.0, i.e. mail headers and body may not be
parsed correctly when non-ASCII characters are there.TM should
be M3m therefore.
Comment 3 nhottanscp 1999-03-17 10:01:59 PST
I propose to test the exact same text by creating a ISO-2022-JP meta-tagged
HTML. This way, we can verify if this is a converter bug or somewhere in the
Comment 4 Katsuhiko Momoi 1999-03-17 12:45:59 PST
This does not seem to be a straight conversion failure.
I see other kinds of corruption when messages get longer.
By the way, I see that the problem string I described above
gets displayed OK in some msgs (typically a 1-line msg) but get
corrupted in longer ones.
1-line display test for the layout is at the above URL. The browser
has no problem with it.
Comment 5 nhottanscp 1999-03-17 14:00:59 PST
The problem does not happen to all the japanese characters.
Looks like it happens for some specific characters (of iso-2022-jp) which
confilicts with html (e.g. 0x213C where 3C is '<').
Since we have already decided to change libmime to use unicode, the problem will
be solved when the change is available. Even in the new code when to convert to
unicode is important, we should consider this to avoid the problem like this
Rich, can this be ready be M4?
Comment 6 nhottanscp 1999-03-17 14:01:59 PST
Adding myself to cc-list.
Comment 7 rhp (gone) 1999-03-17 14:04:59 PST
My plate is really full for M4 already. I am doing some major reworking of
the output routines for libmime and I am not sure I will be able to address
this issue without some assistance.

- rhp
Comment 8 nhottanscp 1999-03-17 14:25:59 PST
We need to review where to put the unicode converter in the new code. I18n eng.
can do that help, also possible to do a simple test (by using the characters
which had problems in M3) when the code is checked in.
Comment 9 rhp (gone) 1999-04-03 07:17:59 PST
Just updating this bug. I probably won't be able to do any major investigation
in this area before M4. Naoki, since we are not generating XML for headers and
HTML for the body, we need to investigate the header issue to see the behavior
now. (Also, if the emitters need to be exporting charset information, I could
use a little help on what we should output.)

- rhp
Comment 10 nhottanscp 1999-04-15 16:54:59 PDT
I saw this is working in my local build with Rich's change.
There is a bug in UTF-8 encoder (#4968) so the text is truncated.
Comment 11 nhottanscp 1999-04-16 15:44:59 PDT
Rich, would you change the status to 'FIXED'?
Comment 12 Katsuhiko Momoi 1999-04-16 17:16:59 PDT
** Checked with 4/16/99 Win32 build **

Now that we can see non-ASCII body, I'm able to check on
this problem. I sent the same test string which failed originally
(see the URL above), and the result indicates that there is
still the same problem remaining in processing isio-2022-jp charcters
containg 0x3C in the first or the 2nd byte.
As far as I can see this is not a truncation problem reported
Reopening it ...
Comment 13 Katsuhiko Momoi 1999-04-16 17:26:59 PDT
The display of the test string works on headers.
So maybe I'm just seeing the efects of the truncation bug in the body,
but this is hard to tell without seeing the rest of the string. There
is some indication that I'm seeing a messed-up HTML structure.
I'm going to leave the status as is but we need to look into this
Comment 14 nhottanscp 1999-04-16 18:10:59 PDT
I tried some of the test cases sent from momoi by using my local build.
My build has a hack to work around the UTF-8 truncation (#4968).
I saw the message body without truncation and looks fine execpt some character
display as boxes (but this can be seen in the thread pane).
So I think the new code is working fine. But the verification is not possible
because of the blocking bug (#4968).
Comment 15 nhottanscp 1999-04-16 20:36:59 PDT
I checked in an error handling code which will reallocate the buffer in case the
converter's estimate length was incorrect thus avoid the truncation.
It will be included in the next build. Please verify this bug again with that
Comment 16 bobj 1999-04-20 13:51:59 PDT
4968 was FIXED 4/19.
Comment 17 Katsuhiko Momoi 1999-04-20 18:52:59 PDT
** Checked with 4/20/99 Win32 build **

Now that theh truncation bug has been fixed, I can now see
all the text in the mail body part.
The parsing failure does not occur with the original
problem string I used to file this bug. There are other
similar strings in othe mail messages in my mailbox and
none show this type of parsing failures.

Marking the fix verified.

Note You need to log in before you can comment on or make changes to this bug.