Closed Bug 241821 Opened 21 years ago Closed 20 years ago

Mozilla gives dubious mime-type "text/plain" when I attach a file to outgoing e-mail

Categories

(MailNews Core :: Attachments, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 238152

People

(Reporter: ishikawa, Assigned: sspitzer)

References

Details

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040113 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040113 This is 1.7b and 1.6 problem as well. (Could be older than this.) attaching a file gets incorrect mime type Recently, I had complaints from a few recipients of my e-mail that the attachment is unreadable. Or rather, to be exact that my e-mail is UNREADABLE!? I thought this was bogus since I attach a memo file as separate attachment after writing a few lines of message. This can't be "unreadable"!? (A background: there are three major character sets used in Japan. JIS, EUC and MS-Kanji. [Note I simplified the explanation drastically. But you get the idea. It is very chaotic.] ISO-2022-JP, a variant of JIS is used for e-mail exchange. EUC (or ujis) is often used under workstation and PC-unixens. MS-Kanji and its variants are used by MS operating systems and Mac OS. (I forgot, sure Unicode is making inroad these days. But they are often used as internally and not exposed to userland yet.) If you try to render a Japanese text file assuming incorrect character set, the display is messed up. With all the three major Japanese character code sets [plus unicode used internally in some commercial products], it is often safe to send a document file [even in so called plain text file] to a different computer using e-mail attachment. Viewers such as web browsers and intelligent editors such as Emacs, commercial editors and word processors can handle code conversion when it becomes necessary for a particular hardware platform. This is why I often send a text file prepared by Emacs on my unix/Linux machines as - attachment file and let the recipient handle the code set issue, OR - copy/paste the document into the message buffer of mozilla before sending so that it would go out in the interoperable ISO-2022-JP encoding. I would avoid the latter when the document in question is large because copyt and paste doesn't work easily for, say, a 600KB document from emacs. I would simply attach the file to my outgoing e-mail in such cases. I think it used to work without major complaints. But now the problem as I noted.) Back to my original problem. It turns out that the attached file, which I hope the recipient will save into an external file and view it with a proper viewer, is given somewhat problematic MIME types as in below: I created a very short file and attach it to an e-mail to myself to recreate the problematic situation. Please note the use of "text/plain" type. I wonder why mozilla tries to give "text/plain" file type test.dat. I think it is trying to be clever by "GUESSING" the contents and got the incorrect result! Example 1: --------------020902060403030802040501 Content-Type: text/plain; name="test.dat" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename="test.dat" VGhpcyBmaWxlIGlzIGluIEVVQyBjb2RlLgqks6TspM8gRVVDIKWzobylyaTOpdWloaWkpeuk x6S5oaMKVGhpcyBpcyBOT1QgaW4gSklTIGNvZGUuCqSzpOykzyBKSVMgpbOhvKXJpMekz6Si pOqk3qS7pPOhowo= --------------020902060403030802040501-- It seems that some e-mail readers both on Mac and Windows try to use the above MIME information to display the file contents using ITS OWN IDEA OF USED CHARACTER SET and thus show garbage on the screen and worse messed up the recipient's mail folder. I got curious why I didn't notice this problem since I CC:ed the same e-mail messages to myself. First of all, I found out that I didn't turned on "Display Attachment inline" in my mozilla setting. (This is a good way to avoid the problem of currupted display.) As soon as I turned it ON, Mozilla showed the attached file in a readable manner. But the attachment was shown below a clearly marked horizontal bar (presumably to show that this is part of an attachment.) I got suspcious about this. Mozilla must be trying to be very, very clever in detecting the character set inside a file, it seems, and show it in a legitimate manner. So I disabled the automatic character set recognition off. Then the rendering is messed up. Quite natural and this is all as it should be! Back to the original problem. I think that the reason the recipients of my e-mail messages got mangled message display was that the attachement, which mozilla unfortunately gave `incorrect' "text/plain" content-type, was handled as text file (because of "text/plain") and was automagically rendered using the recipient's mailer character set recognition routine, which probably was not quite the same as mozilla's. (And I believe everyone agrees here that there will be no way the character set recognition routines will behave in the same manner in all the applications!) I think here mozilla's is giving a WRONG/INAPPRORIPATe MIME type after all. When I tried to attach /bin/cat (a binary program file) to an e-mail to myself,mozilla somehow gave this mime information.: note "application/x-vnd.mozilla.guess-from-ext". Example-2: --------------060403080600000207080505 Content-Type: application/x-vnd.mozilla.guess-from-ext; name="cat" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename="cat" f0VMRgEBAQAAAAAAAAAAAAIAAwABAAAA4IoECDQAAADYNgAAAAAAADQAIAAGACgAGAAXAAYA AAA0AAAANIAECDSABAjAAAAAwAAAAAUAAAAEAAAAAwAAAPQAAAD0gAQI9IAECBMAAAATAAAA ..omission ... --------------060403080600000207080505-- With the above mime-type, there was no way for mozilla to "display" the contents of the attachment incorrectly no matter what I did to tweak the "Display attachment inline" and the character code recognization setting. Again, this is all as it should be. With this mime type, I think no sane e-mail client will try to render this in visible message buffer and messes up display. No complaint from the recipient of my e-mail. That will be good. So my conclusion is mozilla is doing something funny to attach a mime type to an attachment and getting it incorrectly sometimes. Since "Guessing" can't succeed always, I would rather see the unreliable guessing turned off completely to avoid the costly exchange with my e-mail recipients such as "I can't read the latter part of your e-mail, please re-send" after a day passed after my original e-mail. This might change "user experience" as some software company's ads often mentions. So the turning off can be a clearly marked option in the main menu or something. Any thoughts? My take on this is to add mime type that will err on the safe side. That is, make sure that the receiving e-mail cliednt will handle the attachment "text" document as attached file rather than to be clever about showing it inline (unless mozilla offers a menu to override this somehow when we send an e-mail with attachment.). Also treat the receiving e-mail's mime type in a carefull manner. (Not enabling the attachment inline achieves this goal rather well.) ---- An observation of similar bugs in bugzilla. I typed "incorrect mime type attachment" in mozilla bugzilla search and quite a few hits came up. I think we need to fix the mime handling somehow. Some stood out since they have something in common. 236212 ... incorrect mime type for PHP attachment. 239849 ... basically the duplicate of 236212. I mention this bug since the display corruption (or the lack of display at all) reported is the simiar kind of symptom my e-mail recipients may have experienced 71551 ... text-related. This bug reports text-specific issued concerning incorrect mime type. But I don't agree with the fine-tuning proposals mentioned in the dicussion thread. Guessing the code inevitably fails. Unless you are 100% sure [ and this could be asserted ONLY by the user, and s/he could be wrong somtimes even :-) ], make the attachment NOT "text" type. 18920 ... not directly related, but this could have happened to me as well. I once tried to send to the complaining recipient a MS-WORD DOC version of the document since I thought this binary document format would be readable, but somehow that was not handled very well on the receiving end. But this particular problem I experienced COULD be IE problem. "Guessing failes inevitably" department: 215005 ... Not only the display gets mangled sometimes, it seems that some files gets corrupted even. But this doesn't seem to be related to incorrect mime type issue. Reproducible: Always Steps to Reproduce: 1. Try attaching a data file when we send an e-mail. The data file probably needs to contain text-like data sufficiently, but it has binary data inside actually. 2. 3. Actual Results: Mozilla gave "text/plain" Content-type to the attachment. Expected Results: I think mozilla should give something different. "Application/binary" or something???
*** Bug 241825 has been marked as a duplicate of this bug. ***
> Any thoughts? Yes: strive for conciseness. Your report is overwhelmingly long, and a lot of what isn't noise is supposition. Just the facts, please. > If you try to render a Japanese text file assuming incorrect character set, > the display is messed up. > [snip] > Example 1: > > --------------020902060403030802040501 > Content-Type: text/plain; > name="test.dat" > Content-Transfer-Encoding: base64 > Content-Disposition: inline; > filename="test.dat" > > VGhpcyBmaWxlIGlzIGluIEVVQyBjb2RlLgqks6TspM8gRVVDIKWzobylyaTOpdWloaWkpeuk > x6S5oaMKVGhpcyBpcyBOT1QgaW4gSklTIGNvZGUuCqSzpOykzyBKSVMgpbOhvKXJpMekz6Si > pOqk3qS7pPOhowo= > --------------020902060403030802040501-- > > It seems that some e-mail readers both on Mac and Windows try to use > the above MIME information to display the file contents using ITS OWN > IDEA OF USED CHARACTER SET and thus show garbage on the screen > and worse messed up the recipient's mail folder. Note that the Content-Type header on this MIME section does not specify a character set. For instance: Content-type: text/plain; charset=iso-2022-jp; Therefore, whatever client is displaying the attachment has to either figure the charset out heuristically, or assume it's the same character set as the original mail. > I think here mozilla's is giving a WRONG/INAPPRORIPATe MIME type after all. If the file is actually plain text, the MIME type is correct. It does need a charset if it's not the expected text, and Mozilla does not provide a means to specify that. Bug 71551 addresses that in a simplistic way; bug 72116 requests a UI for it. This bug should be marked a duplicate of one of those two, or of bug 192262 which requests a UI to specify the Content-Type. > 236212 ... incorrect mime type for PHP attachment. Unrelated to this problem. > 18920 Is not viewable (probably security-related); do you have access to this bug? > 215005 it seems that some files gets corrupted even. Unrelated to this problem.
i believe 18920 was a typo as it doesn't seem to fit the description of this bug (fwiw i can't see the bug, so it is not a mozilla security bug).
(In reply to comment #2) > Note that the Content-Type header on this MIME section does not specify a > character set. For instance: Content-type: text/plain; charset=iso-2022-jp; > Therefore, whatever client is displaying the attachment has to either figure > the charset out heuristically, or assume it's the same character set > as the original mail. No. Default charset of "text" subtype is US-ASCII. Assuming it's the same character set as the original mail, or automatic charset detection for text is Mozilla's extended function for user's convenience. See RFC 2046 ( http://www.faqs.org/rfcs/rfc2046.html ) > 4.1.2. Charset Parameter > The default character set, which must be assumed > in the absence of a charset parameter, is US-ASCII. > The default character set, US-ASCII, has been the subject of some > confusion and ambiguity in the past. Not only were there some > ambiguities in the definition, there have been wide variations in > practice. In order to eliminate such ambiguity and variations in the > future, it is strongly recommended that new user agents explicitly > specify a character set as a media type parameter in the Content-Type > header field. "US-ASCII" does not indicate an arbitrary 7-bit > character set, but specifies that all octets in the body must be > interpreted as characters according to the US-ASCII character set. > National and application-oriented versions of ISO 646 [ISO-646] are > usually NOT identical to US-ASCII, and in that case their use in > Internet mail is explicitly discouraged. The omission of the ISO 646 > character set from this document is deliberate in this regard. The > character set name of "US-ASCII" explicitly refers to the character > set defined in ANSI X3.4-1986 [US- ASCII]. If mime-type of text is specified(or defaulted to text/plain; by omission), charset should be specified and valid, although many mailers and text editors have automatic character detection mechanism and/or character set choice mechanism. For attachment of unknown MIME type and/or unknown file extention, Content-Type: application/octet-stream, Content-Transfer-Encoding: Base 64 or QP (or 7bit or 8bit if possible), Content-Disposition: attachment is better, I think, which is a general purpose way for data attachment to a mail.
Ishikawa-san, will adding extention of ".dat"(with preferable Mime Type) to Profile/"Helper Application" resolve problem? I attatched xxx.dat file(content is Shift_JIS text) by Mozilla on Win-2K under following definitions. (A) Windows Regstry definition HKEY_CLASSES_ROOT\.dat (Name=Content Type , Data=application/x-httpd-php) (B) Mozilla's Helper Applocation Mime-Type = application/x-dat-file, Extention = dat Generated headers are as follows. (Definition in Helper Application was used) > Content-Type: application/x-dat-file; name="xxx.dat" > Content-Transfer-Encoding: base64 > Content-Disposition: inline; filename="xxx.dat"
Answer to post #5. Thank you Wada-san, Well, since I am using Linux version of mozilla, I have no idea where the MIME type association is stored. (Well actually I have a hunch: /etc/mime.types. ) But even if we can override the MIME type given by mozilla for files with suffix ".dat", this doesn't solve the general problem. I mean, I can have ".xxa", ".xxb", ".xxc", ..., and other suffices, and have files that have binary data. If mozilla decides to give "text/plain" or whatever data based on ITS OWN IDEA of data contents, which is NOT SHARED by other mailers on other hardware/software platforms, the problem I mentioned in the original post will persist. Giving application/octet-stream type and encoding the contents seems to be only reliable solution here. Again, if a particular set of parties (senders and receivers) agree on a certain rules (and can make sure that their email clients on various platforms behave according to such an agreement) using that particular rule for adding MIME data types to whatever the party exchanges among its members may work. But unfortunately, in real life, we have problems.
Ishikawa-san, "Helper Applications" is Mozilla's Preference. If you already tried "Helper Applications" setting, it was probably due to preceding dot in extention specification in "Helper Applications". If you entered ".dat" as extention, remove preceding dot ("dat" only as extention instead of ".dat"). This seems to be a new Mozilla's bug. See Bug 236212 Comment #12
I comfirm this bug. (recreated with trunk-nightly/Win-2K) Problem can de said as follows : If Extention=>Mime-Type relation is defined in both Helper Applications and systems's inventry(Windows registry on MS Windows) for extention of attached file, and if attachment file is guessed as text file (Not zip format nor jpeg format nor png format nor gif format ...), Mozilla generates atattchment with "Content-Disposition: inline" and "Content-Type: text/plain;" with no charset parameter even though content of atacched file is not US-ASCII (EUC-JP or Shift_JIS or ISO-2022-JP in reporter's case and my recreation test case). This is aparantly vioration of RFC 2046 for other than US-ASCII text data.
WADA: what do you think this bug is about, that is not covered by either bug 71551 or bug 72116? Is the problem a bug, or an unimplemented feature? One possible approach would be to specify a default charset for each text/* type (and for whatever other textual types there are) in the Helper Apps UI. But there will always be a case where a file (whether arriving via the net or being attached from your files) doesn't match the default and doesn't have its own charset specifed. For attachments, we will still need a way to specify the charset on a per-file basis -- bug 72116. Another possible approach would be for Mozilla to scan each text attachment to determine the charset; this could use the same heuristic at work in the View Source window, which I've seen make correct guesses on messages that were unspecified or even incorrectly specified. But the problem of needing to specify it individually is still there, in case of an incorrect scan (?). Re: Helpers requiring extension entered without the dot (txt instead of .txt): > This seems to be a new Mozilla's bug. See Bug 236212 As I noted there, bug 170090 (filed against 1.2) covers that problem, so it's not such a new bug.
(In reply to comment #6) > Well, since I am using Linux version of mozilla, I have no idea > where the MIME type association is stored. I've found Boris Zbarsky's decription about mime-type determination from extention on Unix. >When reading from the local filesystem, we ask the OS for the type (at the moment, >on Unix, this is done by looking at the extension and looking it up in >the ~/.mime.types file, the /etc/mime.types file, and the gnome-vfs registry). See Bug 242743 Comment 2
(In reply to comment #9) Mike: RFC requests charset if "Content-Type: text" is used for other than US-ASCII. However, perfect charset determination is impossible for other than US-ASCII. Therefore, "Content-Type: text" should not be used for other than US-ASCII data as the automatically determined Content-Type, even if Mozilla properly guessed that the attachced file is text file. (I prefer "Content-Type: application/octet-stream;" for text data of unknown charset.) And automaticaly determined Content-Disposition: should be attachment in this case, even when mail.content_disposition_type setting in prefs.js is "Inline". In other words, "Content-Type: text" and "Content-Disposition: inline" should be used only when text data of US-ASCII. In addition, I think mail.content_disposition_type setting of "inline" should not be applied for files such as jpeg, gif, even if Mozilla can display them in inline. "Content-Disposition: inline" should be used only for "Content-Type: text" attachment. If these automatically determined Content-Type: and Content-Disposition: can be changed by user on mail composition or can be set in preference, it is very convinient for many users. But I think these are enhancement requests. (Some of them are already requested, as you mentioned.)
(In reply to comment #11) > perfect charset determination is impossible for other than US-ASCII. I don't think 'perfect' is necessary; 'better than we currently have' would be more than acceptable, particularly since Mozilla is already quite clever about ID'ing charsets for display in the View Source window (in my experience, anyway). > (I prefer "Content-Type: application/octet-stream;" for text data of unknown > charset.) Using text/plain with charset=unknown-8bit (as requested in bug 71551) would be better; why throw away the information that this file was ID'd as text/plain on the sender's system? Could be oarticularly useful if saving to a MIME-enabled filesystem (e.g. BeOS), or for future enhancements to Mozilla's mail/attachment viewer. > And automaticaly determined Content-Disposition: should be attachment in this > case, even when mail.content_disposition_type setting in prefs.js is "Inline". > In other words, "Content-Type: text" and "Content-Disposition: inline" should > be used only when text data of US-ASCII. OK, but I don't think that's this bug. See bug 65794. > In addition, I think mail.content_disposition_type setting of "inline" should > not be applied for files such as jpeg, gif, even if Mozilla can display them > in inline. > "Content-Disposition: inline" should be used only for "Content-Type: text" > attachment. That I don't agree with. When my sister sends me a photo of her garden, I don't want to have to bother making an explicit action (and switching to a different window) just to view it. But again, that's Not This Bug. I still do not see any reason this bug is not a dupe of one of the two suggested in comment 9.
(In reply to comment #12) Mike, sorry for bothering you by my thoughts on other than charset issue. > I still do not see any reason this bug is not a dupe of one of the two suggested in comment 9. I think Bug 71551(Using text/plain with charset=unknown-8bit) is very good idea, and I now believe Bug 71551 is one of the best solutions by your suggestion. So I also think this bug can be closed as DUPE of other bug. But I can not say which bug's DUPE. Decision should be made by people who have previledge on marking DUPE, including bug opener. Please note that I don't have previledge on it.
I experienced a similar problem again today and looked into the problem with a fresh viewpoint. I am now inclined to say that Mozilla mailer should not send an attachment file as inline contents and let the receiver to use its own attachment handler (which presumably has a better idea of how to display contents including the correct deduction of character code system, etc..) So let's get rid of Content-Disposition: inline and instead use Content-Disposition: attachment this will solve the problem. Related bug reports: Bug 244829 mentions >Expected Results: >Display the text content >Or do not display inline I would now concur with "do not display inline". This Bug 244829 is probably worth reading to resolve this bug report. The comment #7 had this to say: >Currently, I think it's >assumed that text attachment has the same character encoding >as the main body of >the message. It doesn't hold in cases like this but in the >majority of the cases >it holds (although it may change as time goes by). This assumption IS and HAS BEEN INVALID in Japan!!! Bug 238152 has this to say: > It's not easy to determine the charset of a text file being attached >without prompting users to pick one. However, there are a couple of possibilities: >1. assume text/* file being attached is in the locale character encoding >2. assume it's in the same character encoding as the current character encoding for the mail composition >3. prompt users to pick one Assumption 1, 2 is invalid. We may have a different coding in the attached file after all. 3 is prone to errors. Since we can't get it right all the time, we should abandon the idea of specifying correct char set and just decide to attach the file as "non"-inlined application-dependent file. (Only the receiver cares about how to read it and s/he has the array of reading tools on her/his end.) it is not up to us (the sender of the attached file) to worry about it.
Product: MailNews → Core
(In reply to comment #14) > So let's get rid of Content-Disposition: inline > and instead use > Content-Disposition: attachment > > this will solve the problem. Well, yes and no. Per bug 65794, adding user_pref("mail.content_disposition_type",1); to user.js will send all attachments out as 'attachment' -- but Mozilla will ignore that disposition for known text/* types, and (attempt to) display them inline -- bug > Related bug reports: [...] Bug 238152 Thank you for finding that -- marking this as a dupe. > Since we can't get it right all the time, we should abandon > the idea of specifying correct char set and just decide > to attach the file as "non"-inlined application-dependent > file. Again: bug 71551 -- giving a charset of 'unknown' -- is an even better approach than changing Content-Disposition. *** This bug has been marked as a duplicate of 238152 ***
Status: UNCONFIRMED → RESOLVED
Closed: 20 years ago
Resolution: --- → DUPLICATE
(In reply to comment #15) > but Mozilla will > ignore that disposition for known text/* types, and (attempt to) display them > inline -- bug Sorry -- that should be "bug 147461"
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.