Open
Bug 71551
Opened 24 years ago
Updated 3 years ago
Add charset=unknown-8bit when an attachment is of .txt type
Categories
(MailNews Core :: MIME, defect)
MailNews Core
MIME
Tracking
(Not tracked)
NEW
Future
People
(Reporter: momoi, Unassigned)
References
()
Details
** Observed with 3/9/2001 Win32 build **
When a .txt type of file is attached to a message, currently,
Mozilla does not create teh charset parameter in the
Content-type header. For example, the following are the typical
headers for such an attachment (for multupart messages.)
Content-Type: text/plain; name="mysigJ.txt"
Content-Transfer-Encoding: base64
Content-Disposition: inline; filename="mysigJ.txt"
Accoding to RFC 2046 and RFC 2049, it seems that the charset
parameter must be present if we use text/plain type of
declaration for the Content-type and if the attached body
part contains 8-bit bytes originally, i.e. before applying
transfer encodings such as Base64 or Quoted-Printable.
However, the original motivation for leaving out the charset
parameter is precisely because there is no easy way to
discern the charset of .txt type of files. Unlike HTML,
there is no formal way to embed charset info in .txt type
of files.
jgmyers suggested that we can probably make use of
the information RFC 1428 (above URL). Though the original
intention of using 'unknown-8bit' for the charset value is
for the non-MIME to MIME mail gateway servers, it is probably
not illegal to use this charset value for the current
purpose.
This proposal has some merits:
1. RFC 2046 states that the default charset in case the charset
parameter is missing is US-ASCII. Thus, receiving agents
are obligated to interpret text/plain multipart bodies without
charset parameter as US-ASCII. Mozilla currently either
applies auto-detection or setlle on the View Default charset
instead.
2. The use of 'charset=unknown-8bit' then addresses the problem
in 1. RFC 1428 states that a body with this charset parameter
can be interpreted as seen fit by receiving agents.
3. RFC 1428 states also that:
"This character set is not intended to be used by mail composers. It
is assumed that the mail composer knows the character set in use and
will mark it with a character set value as specified in [1], ..."
However, what we have here is precisely the case when the mail composer
does not know the charset in use of an external .txt file.
Additional notes:
A. The proposal then is to use 'charset=unknown-8bit' when
attaching .txt files.
B. We also need to do 8-bit check on such .txt files.
If .txt file contains 8-bit bytes or the escape sequences used
ISO-2022-xx encodings, then regard them as 8-bit.
Otherwise, regard them as US-ASCII.
C. In decoding such a charset parameter, we need to
apply auto-detection if an auto-detect module is chosen,
and then apply view default charset as the final fallback in
case there is no auto-detection applied.
The end results of all of this is that the behavior of
decoding will be exactly the same as it is prior to this
fix. But the creation of such body parts now will be in
compliance with RFC 2046 & RFC 2049. It has been reported
that some mail agents have difficulty displaying body
parts without the charset parameter because they interpret
them as US-ASCII. If such body parts in anything other
than US-ASCII, e.g. Japanese Shift_JIS, this would
cause a display problem in some mail viewing programs.
Reporter | ||
Comment 1•24 years ago
|
||
Note that currently 'charset=unknown-8bit' is treated
as ISO-8859-1. We probably should apply the default
viewing charset in such a case.
We also check to see what we are really doing when
the charset parameter is missing in received msgs.
I would think that the correct behaviro in such a
case would be:
1. Apply auto-detection if one is seletced and ON.
2. When there is no auto-detection module ON, then
assume that it is US-ASCII.
Reporter | ||
Comment 2•24 years ago
|
||
http://bugzilla.mozilla.org/show_bug.cgi?id=71541
takes care of the default charset issue mentioned in
the notes immediately abobe this comment.
Reporter | ||
Comment 3•24 years ago
|
||
Points raised here have been discussed extensively
in Bugzilla-Japan:
http://bugzilla.mozilla.gr.jp/show_bug.cgi?id=727
If you can read Japanese.
Comment 4•24 years ago
|
||
Momoi san, I have a couple of questions.
* In your additional notes B, is that check for sending or viewing?
* The proposal A, what kind of impact to the users of existing mailers (e.g.
NS4.x, NS6).
Status: NEW → ASSIGNED
Reporter | ||
Comment 5•24 years ago
|
||
> In your additional notes B, is that check for sending or viewing?
This is for sending only. In order to append 'unknown-8bit',
.txt data need to be truly 8-bit.
> The proposal A, what kind of impact to the users of existing
mailers (e.g. NS4.x, NS6).
I don't think this will break NS4.x. My recollection is that we
were not paying attention at all to multipart charset parameter
in NS4.x.
For NS6, it is more complicated. Bug 71541 points to dealing
with unknown-8bit as "ISO-8859-1". jgmyers is proposing to
change that to "default viewing charset". I think this is
the right approach since RFC1428 says that it si upt to the
receving mail agent to decide the charset of such body part.
What about IE, Eudora and other mailers? RFC 1428 is not on
standards track. It is informational. If other mailers
use this RFC as a guideline, then they would have their
own way of dealing with "unknown-8bit". But if they don't
know how to deal with this charset name but can deal with no
charset name, then that is somewhat of a cocern.
If anyone reading this report has used other mailers with
'unknown-8bit" header name in messages, please help.
I'll try to create data for this soon.
Comment 6•24 years ago
|
||
I don't think we want to do this soon since it would affect the existing users.
The other options could be.
* Have a flag to force to use OS file system (or user's selected) charset for
text/plain.
* UI to allow the user to set a charset per attachment (e.g. attachment info
dialog to be invoked from the message compose view).
Target Milestone: --- → Future
Reporter | ||
Comment 7•24 years ago
|
||
I like the combined approach, i.e. the default would be
the system charset but by right-clicking on a particular
atatchment in the attachment pane, we could bring up
a dilog to change the charset.
As for breaking existing users if we adopt charset=unknown-8bit,
For NS6.0/Mozilla M18, what would happen if they receive
a multipart marked by unknown-8bit. It would not display
it correctly presumably because it is not a known charset.
Would auto-detection not apply? If so, that would be a problem.
If auto-detection is not ON, then I don't see that it makes
any difference whether charset=(null) or charset=unknown-8bit.
In the former, in either case, it would be interpreted as
ISO-8859-1.
In any case, if we are allowing the user an option to set
charset, then this proposal can be postponed or even tabled.
NS 4 seems to use the default charset (iso-8859-1 in my case). This seems like
the best choice.
At least one person has complained that his mailer (VM under Emacs) displays
attachements without charset as ASCII, and other characters outside ASCII
becomes \nnn, which is very ugly in non-englich languages :-)
Comment 10•23 years ago
|
||
Bug 162440 is about extending nsIMsgAttachment to support a charset attribute
(as well others).
Depends on: 162440
QA Contact: trix → stephend
Comment 11•21 years ago
|
||
IMHO the first step would be just to _allow_ attachments to have charset.
Currently, if I quit Mozilla and hand-edit the message in Drafts of Unsent\
Messsages folder, adding a charset, it stiill get dropped by Mozilla.
IMHO the best default would be to use the users' locale setting (e.g. LC_CTYPE
on Unix).
OS: Windows NT → All
Hardware: PC → All
Comment 12•21 years ago
|
||
This should be a user configuration, with auto-detection when possible. For
instance, the user could choose UTF-8 if this is a valid UTF-8 file, otherwise
ISO-8859-1.
Comment 13•21 years ago
|
||
(In reply to comment #12)
> This should be a user configuration, with auto-detection when possible.
User configuration is bug 72116.
Comment 14•21 years ago
|
||
Not exactly. I was speaking about a user configuration for the default charset.
Updated•20 years ago
|
Product: MailNews → Core
Updated•17 years ago
|
Assignee: nhottanscp → nobody
Status: ASSIGNED → NEW
QA Contact: stephend → mime
Assignee | ||
Updated•17 years ago
|
Product: Core → MailNews Core
Updated•3 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•