Closed Bug 6890 Opened 25 years ago Closed 25 years ago

XML Parsing error (again) in the Message headers with 8-bit characters

Categories

(MailNews Core :: MIME, defect, P3)

x86
Windows NT
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: momoi, Assigned: nhottanscp)

References

()

Details

(Whiteboard: Regression: fails even M4 intl criteria)

** Observed with 5/2/99 Win32 build but not with 5/19/99 Win32 build **

I thought about re-opening Bug 4784 but since symptoms are bizzare,
I decided to open a new one.
The outward apperance of the problem is tha same as Bug 4784, I see
an error message that there is an XML parsing error in the
message pane headers of almost all the messages with 8-bit characters
in them. So, here are some steps to reproduce this bug:

1. Choose a mailbox containing some 8-bit Subject headers (.e.g French,
   Japanese).
2. Throw away the summary file and start your 5/20 build.
3. Now view the Message oane headers. You may or may not see the
   XML parsing error. Now quit and re-start.
4. This time, if you did not see the parsing errors, you wil probably see
   them now. If did see the problem in step 3, you may not see them
   this time.
5. Try quitting and re-starting again. This time, the conditions in
   Step 4 may reverse.

I have experienced flips flops as described in steps 3-5
several times. Enough to think that this is bizzare. But errors are
there often enough that this should be fixed for M6. This is
really a regression. For International Mail, we go back to
pre M4 state.

There are 2 types of errors: All 2-byte headers (e.g. Japanese
produce XML parsing errors). Some 1-byte accented characters
produce parsing errors while others simply misdisplay the
accented characters without the parsing error. These are the specific
problems when the errors occur.

I tried the 5/19 build and followed the same steps but was not able to
reproduce this bug.
QA Contact: 4080 → 1308
This fails our International Smoketest for M6.
Please fix it for M6 -- unless it's a 1-day wonder which will
go away.
Does 8-bit mean raw 8bit or MIME encoded? Only MIME encoded headers are
supported currently. If we cannot see MIME encoded headers then that's a
regression bug.
The pref to send MIME encoded header is this.
user_pref("mail.strictly_mime_headers", true);
My data messages contain almost all MIME-encoded headers. And they
are showing this error. So indeed this is a regression.The
symptoms are almost identical to 4784 except for the intermittent
nature of the problem.
Whiteboard: Regression: fails even M4 intl criteria
Target Milestone: M6
** Re-checked with 5/23/99 Win32 build **

This problem is continuing with this new build and so this is
not a 1-day problem. With this error we cannot show
Japanese Message headers and of course headers in other languages.
We canot possibly ship M6 with this.
Setting TFV to M6 and copying choffmann.
Status: NEW → ASSIGNED
Can you send me the message that is showing this problem, That would be very
helpful in my debugging.

- rhp
Momoi san, I tested with windows/32bit/x86/1999-05-23-08-M6 also my local tree
updated this morning but cannot see this problem. Let's check again today.
Rich, I will send MIME encoded Latin1 mail using 4.5.
FYI, easy way to try 8 bit char is (Alt+202 in the num pad).
I have several messages with this type of encoding in the headers and I'm not
seeing the problem either.

- rhp
I was able to reproduce this problem again in yesterday's build.

As I said in the original report, the problem is more easily reproducible
with Japanese headers than Latin 1 headers though I have observed
it in both.

One critical step is that you might need to try re-starting
several times. Today I had to re-start 3 times before I got the
parsing errors on the Japanese msgs which were showing the
headers OK before these re-starts. The problem might go away
after that and they also come back again when you try re-starting
Messenger again and again.

At times, I saw this problem right after I deleted the .msf file
and re-started. Other times I didn't. There is some inconsistency in
obvtaining the problem reproduced but it surely does occur.
Summary: XML Parsing error (again) in the Message headers with 8=bit characters → XML Parsing error (again) in the Message headers with 8-bit characters
This really sounds like a problem with the data I am receiving from the mail
store and not a parsing error. What is the XML error you are seeing? I can
probably tell from the message if you are getting valid data or not.

- rhp
I made some images of the bug in action under NT4-Japanese,
and placed them here:

http://rocknroll/users/momoi/publish/bugs/6890/
I see one of the image http://rocknroll/users/momoi/publish/bugs/6890/image6.JPG
contains string looks like ISO-2022-JP which should have been converted to UTF-8
instead.
Could any of this be related to ftang's fix that needs to be checked in. I know
that there are other bugs (i.e. vCard display with 8 bit data) that will be
fixed when he gets his stuff into the tree. I just can't get this thing to
duplicate and I am a bit at the end of my rope on this one for now :-(

- rhp
I said before that this problem is hard to reproduce on NT4-US.
Well, no more. I tried the "Smoketest" mailbox file I sent to rhp
on my laptop's NT4-US. I normally use the Japanese NT4 on this laptop
but since I couldn't reproduce this problem on the NT4-US at work.
I tried the laptop one and I had no problem reproducing this problem.

There are some differences between the laptop NT4-US and the one I use
at work on a desktop. One is that I have things like VC++ 4.2 on
the work machine but not on my laptop. I any case, I got the problem
to appear at the 3rd restart on this mailbox file (the same one I
had sent to rhp):

http://rocknroll/users/momoi/publish/bugs/6890/smoketest.zip

Just select the first 2 msgs for display. If the problem doesn't show,
then quit, re-start and try the same 2 msgs, etc. If you happen to be
on a machine which has a problem, you should see the problem after 3-4
restarts and tries.
Assignee: rhp → nhotta
Status: ASSIGNED → NEW
Ok, I am seeing this problem now and after much tracking (thanks momoi for the
test messages), I know where it is failing, but unfortunately, I don't know
why. The line that is failing randomly is in comi18n.cpp and the exact line
that fails is:

        res = ccm->GetUnicodeEncoder(&aCharset, &encoder);

Now all of the arguments going into that routine are fine and I've stepped
through the function and seen it work, so it is a random problem with the
GetUnicodeEncoder() call.

I have stepped through the code with the test messages in smoketest.zip
and I've traced the first message being displayed correctly
AND incorrectly. The reason XML parsing fails is that when we go to convert
the string to UTF-8 and it fails, we end up escaping the wrong characters and
the "<" and ">" are not in the correct location for the XML parser. This isn't
good either and I will look into that, but it's not the real bug. The real bug
is finding why we can't locate an encoder. The input charset is "iso-8859-1"
and I am requesting "UTF-8" output.

Naoki, I am assigning this one over to you because I don't know where to go
next with this encoder/decoder problem. If you use the smoketest.zip and just
run apprunner.exe in the debugger, you will eventually get where you see the
problem. If I can be of any help, please let me know.

- rhp

For what it is worth, the stack at that point is:

INTL_ConvertCharset(const char * 0x017ef48c, const char * 0x04498304, const
char * 0x043d84f0, const int 15, char * * 0x017ef488) line 1234
MIME_ConvertString(const char * 0x017ef48c, const char * 0x04498304, const char
* 0x043d84f0, char * * 0x017ef488) line 1378 + 34 bytes
mime_convert_rfc1522(const char * 0x043df5a0, int 34, const char * 0x00000000,
const char * 0x0449204c, char * * 0x017ef544, int * 0x017ef548, void *
0x043dd350) line 204 + 28 bytes
MimeHeaders_convert_rfc1522(MimeDisplayOptions * 0x043dd1f0, const char *
0x043df5a0, int 34, char * * 0x017ef570, int * 0x017ef56c) line 901 + 35 bytes
MimeHeaders_convert_header_value(MimeDisplayOptions * 0x043dd1f0, char * *
0x017ef5a0) line 117 + 27 bytes
MimeHeaders_write_all_headers(MimeHeaders * 0x043df670, MimeDisplayOptions *
0x043dd1f0, int 0) line 2014 + 13 bytes
MimeMessage_write_headers_html(MimeObject * 0x043dd160) line 591 + 21 bytes
MimeMessage_close_headers(MimeObject * 0x043dd160) line 344 + 9 bytes
MimeMessage_parse_line(char * 0x03e639d8, int 2, MimeObject * 0x043dd160) line
211 + 9 bytes
convert_and_send_buffer(char * 0x03e639d8, int 2, int 1, int (char *, unsigned
int, void *)* 0x044773c0 MimeMessage_parse_line(char *, int, MimeObject *),
void * 0x043dd160) line 148 + 15 bytes
mime_LineBuffer(const char * 0x00d84648, int 112, char * * 0x043dd188, int *
0x043dd190, unsigned int * 0x043dd198, int 1, int (char *, unsigned int, void
*)* 0x044773c0 MimeMessage_parse_line(char *, int, MimeObject *), void *
0x043dd160) line 235 + 29 bytes
MimeObject_parse_buffer(char * 0x00d842c0, int 1016, MimeObject * 0x043dd160)
line 218 + 49 bytes
mime_display_stream_write(_nsMIMESession * 0x043dd0d0, const char * 0x00d842c0,
int 1016) line 282 + 20 bytes
MimePluginInstance::Write(MimePluginInstance * const 0x043dd854, const char *
0x00d842c0, unsigned int 1016, unsigned int * 0x017efb58) line 379 + 20 bytes
plugin_stream_write(_NET_StreamClass * 0x043dd030, const char * 0x00d842c0,
long 1016) line 68 + 24 bytes
net_read_file_chunk(_ActiveEntry * 0x043ddb30) line 956 + 27 bytes
net_ProcessFile(_ActiveEntry * 0x043ddb30) line 1327 + 9 bytes
NET_ProcessNet(PRFileDesc * 0x00000000, int 1) line 3355 + 13 bytes
nsNetlibThread::NetlibMainLoop() line 304 + 9 bytes
nsNetlibThread::NetlibThreadMain(void * 0x00c057f0) line 260
_PR_NativeRunThread(void * 0x00c05600) line 379 + 13 bytes
_threadstartex(void * 0x00c05450) line 212 + 13 bytes
I have tried (7 times) to reproduce this with the same data Rich used with my
local build.
Adding cata@netscape.com to cc since he owns the encoder code.
I will continue try more to reproduce this. But could someone also try this with
today's build?
I looked at 5/25/99 M7 build which presumably includes ftang's
fixes from yesterday. No change there. I can reproduce the problem
on my NT4-J.
It's not reproducable on my machine niether Cata's.
I put printf to debug with momoi's machine (it's about 20-30% occurrence on that
machine). The result confirms the Rich's comment. It is failing to create
Encoder. Input charset is "UTF-8" and the converter manager return value is
80500001 (NS_ERROR_UCONV_NOCONV). Neither service manager or decoder creation
fails.
nsCharsetConverterManager::GetCharsetConverter in file
nsCharsetConverterManager.cpp returns this result code. I need Cata's help to
debug this code.

There is a service manager related issue (i.e. NS_WITH_SERVICE) but this looks
like not related to this bug at least. The bug happens even without using
NS_WITH_SERVICE.

By the way, earliest we can see this bug is 5/18 build. And it not possible to
reproduce on some machines and even if reproducable it's not always happen (like
20-30%).

I will look at this again tomorrow inside the encoder with Cata's help.
Assignee: nhotta → cata
I was able to dump the internal table of the converter manager.
Looks like the table got screwed after the 17th entry. And no entry for UTF-8.
Reassign to cata@netscape.com for further investigation.

#####FAILED: GetCharsetConverter
result = -2142240767 80500001
aSize = 42
0 ISO-8859-1
1 ISO-8859-2
2 ISO-8859-3
3 ISO-8859-4
4 ISO-8859-5
5 ISO-8859-6
6 ISO-8859-7
7 ISO-8859-8
8 ISO-8859-9
9 windows-1250
10 windows-1251
11 windows-1252
12 windows-1253
13 windows-1254
14 windows-1257
15 x-mac-roman
16 x-mac-ce
17 x-mac-ce
18 x-mac-ce
19 x-mac-ce
20 x-mac-ce
21 x-mac-ce
22 x-mac-ce
23 x-mac-ce
24 x-mac-ce
25 x-mac-ce
26 x-mac-ce
27 x-mac-ce
28 x-mac-ce
29 x-mac-ce
30 x-mac-ce
31 x-mac-ce
32 x-mac-ce
33 x-mac-ce
34 x-mac-ce
35 x-mac-ce
36 x-mac-ce
37 x-mac-ce
38 x-mac-ce
39 x-mac-ce
40 x-mac-ce
41 x-mac-ce
#####FAILED to get encoder
Charset = UTF-8
result = -2142240767 80500001
Target Milestone: M6 → M7
Kat mentionned the problem that occurs in Message header with 8-bit chars. I'm
encountering (again, it wasn't there for a while) the problem with 8-bit Message
body. On my machine message with 8-bit chars is showing garbage in the body in
viewing (sending is alright)The confusing thing and that's why it is a random
problem is that i see 8-bit message from Kat's Smoketest folder just fine.There
is no problem in viewing this message in 4.6

****** observed with 5/26, 5/27 and 5/28 builds ******
Cata, please check in your part of the fix then reassign to dp.
Assignee: cata → dp
My part of fix is in, reassigned to dp to investigate the FindFactory failure.
Assignee: dp → nhotta
I would prefer to open another bug on the FindFactory() failing part and make
this depend on it if that is ok.
Depends on: 7308
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Marking this as fix.
With Cata's fix, there are going to be very little chance to see the current
problem (unless FindFactory for UTF-8 converter fails).
Dp, please open the new bug for FindFactory().
Status: RESOLVED → VERIFIED
** Checked with 6/11/99 Win32 build **

I have started and re-started Messenger about 10 times
on the NT4 machine I had the original problem with.
In none of these 10 times, I saw the original XML parser problem.
Prior to the fix, I would have seen this problem at about the
3rd or 4th try.
This result gives me confidence to say that the fix works.
Marking it verified fixed.
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.