Last Comment Bug 8899 - Yahoo Japan (EUC) Page as attachment cannot be viewed inline
: Yahoo Japan (EUC) Page as attachment cannot be viewed inline
Status: VERIFIED FIXED
:
Product: MailNews Core
Classification: Components
Component: Internationalization (show other bugs)
: Trunk
: x86 Windows NT
: P1 critical (vote)
: M10
Assigned To: nhottanscp
: Katsuhiko Momoi
:
Mentors:
http://home.netscape.com/ja
: 8903 (view as bug list)
Depends on: 10605
Blocks:
  Show dependency treegraph
 
Reported: 1999-06-25 17:44 PDT by Katsuhiko Momoi
Modified: 2008-07-31 01:22 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
three example cases for incorrect charset detection (nsIStringCharsetDetector) (1.64 KB, text/plain)
1999-07-01 12:00 PDT, nhottanscp
no flags Details
Shift_SJIS text file contain a line - the same data as 1st case of the previous dump (37 bytes, text/plain)
1999-07-15 15:16 PDT, nhottanscp
no flags Details
Shift_SJIS text file contain a line - the same data as 3rd case of the previous dump (77 bytes, text/plain)
1999-07-15 15:16 PDT, nhottanscp
no flags Details
One line EUC text file. (152 bytes, text/html)
1999-08-10 14:11 PDT, nhottanscp
no flags Details

Description Katsuhiko Momoi 1999-06-25 17:44:50 PDT
** Observed with 6/25/99 Win32 M8 build **

When we send HTML pages using 4.6 or 4.7, some of these pages arrive
without the "Content-Disposition: inline" header. From the mail discusssion
we had today and by actually using the pref item:

mail.inline_attachments

the current default seems to be "true".

But the attachments like the above URL only shows up as a link and
is not displayed inline. Here's what the headers look like:

--------------F3E7E5FE4178CFD21E1EBEBE
Content-Type: text/html
Content-Transfer-Encoding: base64
Content-Base: "http://home.netscape.com/ja/"
Content-Location: "http://home.netscape.com/ja/"

Note the absence of Content-Disposition line.
Messages which have teh CTE line is displayed inline.

In 4.x, we actually did not listen to the CTE, but if the "View |
View attachment inline" menu is chosen, it shows any "displayable"
msg inline even without the CTE header.

1. This is Bug #1 in 5.0, i.e. the pref default is not working
   when the CTE is absent.
2. Issue #2:

In 5.0, we probably should consider listening to the CTE and rely
on the menu setting only if the CTE line is absent.

Even if we don't enable this menu item till later, we might
want to turn on this CTE-honoring in the backend now. Or is there
some reason against trusting the CTE? I don't know enough
about this issue to know if we should change 4.x behavior, which
ignored the CTE.
Comment 1 lchiang 1999-06-25 17:50:59 PDT
<update QA contact>
Comment 2 rhp (gone) 1999-06-26 13:29:59 PDT
Actually, I have a guess at what's going on here. Naoki, tell me if this makes
sense. This isn't an issue with the "mail.inline_attachments" pref or the
content-disposition header. They are working as they should, but gecko is not
displaying the output from libmime for the following reason.

First, to see that we are outputting the page inline, do the following:

1 - bring up messenger 5.0 and display the problem message
2 - now bring up a 5.0 browser window and load the URL:
file:///c:/temp/tempMessage.eml?header=none
(note: you probably won't see anything past the URL)
3 - Do a "View Source" - notice how there is source output for the entire web
page.

Now, this is what I think is happening. We start decoding everything to UTF-8
and the message and the body part is encoded with charset = "iso-2022-jp". When
we hit the web page, we start decoding that message to UTF-8, but there is no
charset= on the part, so we fall back to the body, which is iso-2022-jp. Now,
we do this conversion and output to Gecko, but the web page has a <META
HTTP-EQUIV ="Content-Type" ="text/html; charset=x-sjis> line. I assume that
Gecko is listening to this and trying to display UTF-8 data (which is probably
wrong to begin with) as x-sjis.

So, the bug about the content-disposition is invalid, but this is a problem we
need to figure out. Naoki, do you have any ideas?

Here is the output from libmime for the message body:

< META HTTP-EQUIV ="Content-Type" ="text/html; charset=UTF-8">< !doctype html
public "-//w3c//dtd html 4.0 transitional//en">< html>
ã..ã..ã.¯SJISã.®ã..ã.¼ã.¸ã.§ã..ã..< br>& nbsp;< p>< A HREF
="http://home.netscape.com/ja/"> http://home.netscape.com/ja/< /A>< /html><
BASE HREF ="http://home.netscape.com/ja/">< HTML>< HEAD>< TITLE> Netcenter
.Ö.æ.¤.±.»< /TITLE>< META HTTP-EQUIV ="Content-Type" ="text/html;
charset=x-sjis">< META http-equiv =PICS-Label ='(PICS-1.1
"http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0)'>  < META
http-equiv =PICS-Label ='(PICS-1.1 "http://www.classify.org/safesurf/" l gen
true r (SS~~000 1))'

< META HTTP-EQUIV ="Content-Type" ="text/html; charset=UTF-8">< !doctype html
public "-//w3c//dtd html 4.0 transitional//en">< html>
ã..ã..ã.¯SJISã.®ã..ã.¼ã.¸ã.§ã..ã..< br>& nbsp;< p>< A HREF
="http://home.netscape.com/ja/"> http://home.netscape.com/ja/< /A>< /html><
BASE HREF ="http://home.netscape.com/ja/">< HTML>< HEAD>< TITLE> Netcenter
.Ö.æ.¤.±.»< /TITLE>< META HTTP-EQUIV ="Content-Type" ="text/html;
charset=x-sjis">< META http-equiv =PICS-Label ='(PICS-1.1
"http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0)'>  < META
http-equiv =PICS-Label ='(PICS-1.1 "http://www.classify.org/safesurf/" l gen
true r (SS~~000 1))'.......
Comment 3 rhp (gone) 1999-06-26 13:36:59 PDT
Actually, the view inline is working, but what is mentioned in this bug
which is the real problem is Bug #8903. Since I have that logged to me, I am
going to close this one.

- rhp
Comment 4 Katsuhiko Momoi 1999-06-26 15:33:59 PDT
You're The Content-disposition seems to be bogus. I manually eliminated
the inline disposition line from one of the atatchment msgs
that were displaying OK, and it still displayed OK.

Rich, there is a problem with your theory in that I have many other
msgs in which JPN attachments are all showing OK and all of them
have meta tags like "EUC-JP", "Shift_JIS", and "x-sjis" in addition
to the one matching the Japanese mail charset, "ISO-2022-JP".

But there is no problem showing them inline. I looked at Content-Base
and Content-Location headers for a clue but that did not help.

So there is something strange about attaching Netscape Japanese
Home Page -- which is the only page so far showing this problem.
Accordingly I modified the summary line.

I'm sending you and Naoki a mailbox file which contains a number of
messages -- the only one with this problem is the "NetCenter..." one
from Netscape Japanese Home Page.

If there is no objection to this, I'll re-open this bug later.
Comment 5 rhp (gone) 1999-06-26 15:46:59 PDT
I still think there will be a problem with the fact that we are going to try to
convert the attachment to UTF-8 and if we don't have a valid charset on that
parts Content-Type header, we will drop back to the one for the message itself
and if that isn't specified, we go back to us-ascii. All in all, I think we are
creating bogus UTF-8 for this page.

- rhp
Comment 6 Katsuhiko Momoi 1999-06-26 15:52:59 PDT
Naoki recently put in a Japanese auto-detection hack for attachments
in case the content-type charset parameter indicates that the main
body is in Japanese. My understanding is that this is why all
the JPN attachments are showihng OK. There will of course be
a problem showing any other charset.
Comment 7 Katsuhiko Momoi 1999-06-26 19:25:59 PDT
I was able to re-create another msg which shows this problem.

Attach this page under HTML mail:

   1. http://kaze:20020/xsjis2.html

   This made was made with 2 modifications to the original

   2. http://kaze:20020/xsjis.html

   This latter is shown inline when sent as attachment. The former
   is not.

   The differece between these pages are as follows:

   A. I changed the Japanese <TITLE> ... </TITLE> in page 2 to the same one
      as the Netscape Japanese Home Page.
   B. I inserted a number of ascii lines  in page 1 before we get to the
      Japanese body part (<PRE> ... </PRE>) -- you see them displayed.

This problem then seems to depend on the type of data in the attached
page. It could be that Japanese auto-detection is failing with this
kind of page and gets into a condition rhp describes.

I'm re-opening the bug and re-assigning it to nhotta and changing the
Component to international and assigning myself as QA Contact.
Comment 8 nhottanscp 1999-06-28 10:06:59 PDT
Accepting.
Comment 9 nhottanscp 1999-06-28 11:34:59 PDT
I verified that the problem is in the auto detect implementation.
That part is going to be replaced by the new XPCOM interface
(nsIStringCharsetDetector) for M8. I will test this bug when I finish that
migration.
One change for M8 is that auto detection choice to be done by a pref instead of
using the main body charset as a hint (you can send comments for this issue to
me or mozilla i18n news group).
Comment 10 nhottanscp 1999-07-01 12:00:59 PDT
Created attachment 655 [details]
three example cases for incorrect charset detection (nsIStringCharsetDetector)
Comment 11 nhottanscp 1999-07-01 12:12:59 PDT
I checked in mime/src/comi18n.cpp which now uses nsIStringCharsetDetector.
Note that the new charset detector also does incorrect detection (see
the attachment). Two examples by momoi the first example is not shown inline
because of the wrong charset detection. The second example some lines are
detected incorrectly (this case showed as inline but some lines
shows incorrectly).
So the remaining problem is accuracy of charset detectors. Assigning to Frank
and set to M10 (I don't think this problem blocks other testing).
Also, there is an issue of whether we should show non inline or show garbage in
case of detection failure but probably should be discussed separately.
Comment 12 nhottanscp 1999-07-06 09:52:59 PDT
*** Bug 8903 has been marked as a duplicate of this bug. ***
Comment 13 Frank Tang 1999-07-07 13:02:59 PDT
I have no idea what these three example in the attachment mean. Are those EUC-JP
data or Shift_JIS data ?
Comment 14 nhottanscp 1999-07-07 13:22:59 PDT
They are all Shift_JIS from two url http://kaze:20020/xsjis2.html and
http://kaze:20020/xsjis.html. The second result was correct. Both first and
third got wrong detection results.
Comment 15 Frank Tang 1999-07-14 11:59:59 PDT
check in the fix. Please verify
Comment 16 nhottanscp 1999-07-15 15:16:59 PDT
Created attachment 900 [details]
Shift_SJIS text file contain a line - the same data as 1st case of the previous dump
Comment 17 nhottanscp 1999-07-15 15:16:59 PDT
Created attachment 901 [details]
Shift_SJIS text file contain a line - the same data as 3rd case of the previous dump
Comment 18 Katsuhiko Momoi 1999-07-15 15:19:59 PDT
** Checked with 7/15/99 Win32 M9 Build **

I was expecting to see the attached page displayed inline. But that did not
turn ou to be the case. Also when I clicked on the link, the page did not display
as Japanese, either. Whatever change you made is not having a desired effect.

re-opning it.
Comment 19 leger 1999-07-19 11:19:59 PDT
Clearing Fixed resolution due to ReOpen of this bug.
Comment 20 Frank Tang 1999-07-19 11:21:59 PDT
I need to change the cp1252 verifier to a better one.
Comment 21 Frank Tang 1999-07-22 15:30:59 PDT
Put in temp fix by remove UCS2BE, UCS2LE, and CP1252 verifier from the string
based version one. Need to crate a better cp1252 verifier for this case.
Naoki, if the temp fix work, then please DO NOT CLOSE THE BUG , but move it to
M10. I want to put in a better CP1252 veifier for this. Thanks
Comment 22 Katsuhiko Momoi 1999-07-23 13:24:59 PDT
The new module seems to be working better with the Browser
though it fails on un-labeled ISO-2022-JP page.

On the Mail side, this causes a crash with all the attachment
test cases (ISO-2022-JP, EUC-JP, and Shift_JIS).

Basically, as Messenger begins to load an attachment,
it crashes.

Here's part of what I sent to Talkback.


Trigger Type:  Program Crash
Trigger Reason:  Access violation
Call Stack:    (Signature = nsXPCOMStringDetector::Report f121549c)
nsXPCOMStringDetector::Report[d:\builds\seamonkey\mozilla\intl\chardet\src\nsPSM
Detectors.cpp, line 421]
MimeCharsetConverterClass::Convert[d:\builds\seamonkey\mozilla\mailnews\mime\src
\comi18n.cpp, line 1410]
MIME_ConvertCharset [d:\builds\seamonkey\mozilla\mailnews\mime\src\comi18n.cpp,
line 1549]
mime_convert_charset[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimemoz2.cpp,
line 160]
MimeInlineText_rotate_convert_and_parse_line[d:\builds\seamonkey\mozilla\mailnew
s\mime\src\mimetext.cpp, line 292]
convert_and_send_buffer[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimebuf.cp
p, line 113]
mime_LineBuffer [d:\builds\seamonkey\mozilla\mailnews\mime\src\mimebuf.cpp, line
235]
MimeInlineText_parse_decoded_buffer[d:\builds\seamonkey\mozilla\mailnews\mime\sr
c\mimetext.cpp, line 237]
mime_decode_base64_buffer[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimeenc.
cpp, line 300]
MimeDecoderWrite [d:\builds\seamonkey\mozilla\mailnews\mime\src\mimeenc.cpp,
line 603]
MimeLeaf_parse_buffer[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimeleaf.cpp
, line 149]
MimeMultipart_parse_child_line[d:\builds\seamonkey\mozilla\mailnews\mime\src\mim
emult.cpp, line 538]
MimeMultipart_parse_line[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimemult.
cpp, line 207]
convert_and_send_buffer[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimebuf.cp
p, line 113]
mime_LineBuffer [d:\builds\seamonkey\mozilla\mailnews\mime\src\mimebuf.cpp, line
235]
MimeObject_parse_buffer[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimeobj.cp
p, line 220]
MimeMessage_parse_line[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimemsg.cpp
, line 172]
convert_and_send_buffer[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimebuf.cp
p, line 113]
mime_LineBuffer[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimebuf.cpp, line
235]
MimeObject_parse_buffer[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimeobj.cp
p, line 220]
MimeMessage_parse_line[d:\builds\seamonkey\mozilla\mailnews\mime\src\mimemsg.cpp
, line 110]
MimePluginInstance::Write[d:\builds\seamonkey\mozilla\mailnews\mime\src\plugin_i
nst.cpp, line 371]
plugin_stream_write[d:\builds\seamonkey\mozilla\network\cnvts\cvplugin.cpp, line
69]
net_read_file_chunk[d:\builds\seamonkey\mozilla\network\protocol\file\mkfile.c,
line 964]
net_ProcessFile[d:\builds\seamonkey\mozilla\network\protocol\file\mkfile.c, line
1328]
NET_ProcessNet[d:\builds\seamonkey\mozilla\network\main\mkgeturl.c, line 3363]
ntdll.dll + 0x74fd (0x77f674fd) 0x0010c200
Comment 23 Katsuhiko Momoi 1999-07-23 13:29:59 PDT
If you want to see the full reports, you can view them here
but clicking on the Bug number on this page.

http://cyclone/reports/reporttemplate.cfm?style=1&reportID=1099
Comment 24 nhottanscp 1999-07-23 13:36:59 PDT
In my local build, I don't see the crash.
Instead libmime got result code 0x02be1fb0 after DoIt() call.
Libmime uses the main body's charset for this case (ISO-2022-JP)
andthe attachment is displayed incorrectly or as an link.
Comment 25 Katsuhiko Momoi 1999-07-23 13:45:59 PDT
The crash does not occur if I don't explicitly have the following
line in the prefs50,js.

user_pref("intl.charset.detector", "japsm");

But then I only get the ISO-2022-JP attachment showing correctly.
I thought we defaulted to the Japanese detection module if
no prefs50.js is defined for detector. Or has that been
changed?
Comment 26 nhottanscp 1999-07-23 13:56:59 PDT
>I thought we defaulted to the Japanese detection module if
>no prefs50.js is defined for detector. Or has that been
>changed?
That's a bug in my code. It didn't fall back to 'japsm' from the beginning.
Please file a separate bug for that. I can implement fall back to 'japsm' or do
no charset detection (maybe this is better).
Comment 27 Katsuhiko Momoi 1999-07-27 13:24:59 PDT
Per ftang's request, the crash part of the bug was split into
Bug 10605.
The current cannot be verified until this new bug is fixed.
The dependency is also marked.
Comment 28 Frank Tang 1999-07-27 14:01:59 PDT
I have fixed 10605. Please verify again. Thanks
Comment 29 Katsuhiko Momoi 1999-07-29 13:18:59 PDT
With 7/29/99 Win32 (Necko) build, JIS and Shift_JIS attachments now can
be viewed inline. EUC pages don't load inline, however. EUC detection
is working on the Browser side with the JPN detector turned on. So,
thia seems to be a mail-sepcific issue.
Sending it to Naoki with Target Milestone M10.
Comment 30 Katsuhiko Momoi 1999-07-29 13:20:59 PDT
Downgraded severity to critical.
Comment 31 nhottanscp 1999-08-02 15:18:59 PDT
> EUC pages don't load inline, however. EUC detection
> is working on the Browser side with the JPN detector turned on.
In messenger, charset detection is done for each line. Momoi san, which EUC page
is failing? I could try to break it to htmls with only 1 line then see if they
works in the browser.
Comment 32 nhottanscp 1999-08-10 13:49:59 PDT
Changed the title since Netscape Japan page is now viewable.
For EUC page, I found that it works if I save yahoo japan page as a text and
attach to a mail. This means some EUC characters with HTML tag combination may
cause the problem. I will break the page into each lines then investigate.
Comment 33 nhottanscp 1999-08-10 14:11:59 PDT
Created attachment 1194 [details]
One line EUC text file.
Comment 34 nhottanscp 1999-08-10 14:15:59 PDT
I created a text file by retrieving one line from yahoo japan page.
That is viewable with 4.5 auto-detect on but not viewable with my local build
(pulled 8/10). I think this is the reason we cannot view EUC attachments because
the detection is applied line by line.
Reassign to Frank since this is a generic (not mail specific) charset detection
problem.
Comment 35 nhottanscp 1999-08-11 15:07:59 PDT
BTW, I fixed a problem in window's native detector so it works with browser.
user_pref("intl.charset.detector", "jams");
That detects the EUC file (attached) correctly.
Comment 36 Frank Tang 1999-08-13 11:34:59 PDT
No, this is not generic problem since only mail will send data to detector line
by line, please fix it by either change the libmime to send more data (say the
whole file) to the detector, or keep the last detected vaule somewhere. The time
that spend on improving the detecting algorithm for < 80 bytes will be much
longer than fixing the libmime.
Comment 37 nhottanscp 1999-08-13 11:57:59 PDT
But I think the new detector should be at least the same level as accuracy as
4.X.
We may need to separate bugs for the detector accuracy and limemime issue.
Frank, what do you think?
Comment 38 nhottanscp 1999-08-13 13:26:59 PDT
> or keep the last detected vaule somewhere
I don't think this is good because the user will see garbage lines until the
detector succeeds.
> improving the detecting algorithm for < 80 bytes will be much longer
How about using the old 4.X detector for those cases. The old code may be ugly
but it was tuned for Japanese HTML (both web and attachments)? We may port it to
COM and make it a separate DLL then japsm may call it if <80 bytes.
Comment 39 nhottanscp 1999-08-18 15:39:59 PDT
I plan to do following changes for M10.
Port 4.x ja detector to XPCOM and check in to intl/chardet/src/classic.
A new pref "mail.charset.detector", libmime will use it when it specified
otherwise it will use "intl.charset.detector".
Comment 40 nhottanscp 1999-08-24 16:05:59 PDT
>Port 4.x ja detector to XPCOM and check in to intl/chardet/src/classic.
This is done. Can be specified by user_pref("intl.charset.detector",
"jaclassic"); Name of the DLL is "chardetc" and this is windows only (for now).
>A new pref "mail.charset.detector", libmime will use it when it specified
>otherwise it will use "intl.charset.detector".
Not done yet.
Comment 41 nhottanscp 1999-08-25 15:05:59 PDT
I checked in mailnews/mime/src/comi18n.cpp rev 1.36.
user_pref("mail.charset.detector", "jaclassic");
I can view yahoo japan attachment by above pref setting. If that's not specified
then "intl.charset.detector" is used. If neither of those prefs are specified
(default) then no charset detection will happen.
I filed a separate bug for libmime data passing issue (feeding data line by
line) as #12481. Any charset detector specific bug (i.e. can reproduce in
browser) should be filed separately.
Marking as FIXED.
Comment 42 Katsuhiko Momoi 1999-09-17 07:38:59 PDT
** Checked with 9/16/99 Win32 M11 build **

I looked at "japsm", "jaclassic", and "jams".
They all were able to show JIS, EUC_JP and SJIS attachments inline
when specified in: mail.charset.detector.

(However, as expected, only "jaclassic" displayed EUC attachment
 correctly. We need to either release note this or make it a default
  for M11 (if that is possible.)

Marking it verified/fixed.

Note You need to log in before you can comment on or make changes to this bug.