Closed Bug 12851 Opened 26 years ago Closed 24 years ago

Compliance with RFC 2368 (mailtourl) with regard to non-ASCII header values

Categories

(MailNews Core :: MIME, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla0.9.6

People

(Reporter: momoi, Assigned: nhottanscp)

References

()

Details

(Keywords: intl)

Attachments

(1 file)

** This point was originally brought in mozilla discussion newsgroup for 4.x clients, but since we are carrying over some of the mail codes from 4.x, we should also check this for 5.0 ** ** This is not currently testable since we don't seem to eb supporting the mailtourl yet. ** RFC 2368 (http://www.cis.ohio-state.edu/rfc/rfc2368.txt) stipulates: --------------------------------------------- 2. Syntax of a mailto URL Following the syntax conventions of RFC 1738 [RFC1738], a "mailto" URL has the form: mailtoURL = "mailto:" [ to ] [ headers ] to = #mailbox headers = "?" header *( "&" header ) header = hname "=" hvalue hname = *urlc hvalue = *urlc "#mailbox" is as specified in RFC 822 [RFC822]. This means that it consists of zero or more comma-separated mail addresses, possibly including "phrase" and "comment" components. Note that all URL reserved characters in "to" must be encoded: in particular, parentheses, commas, and the percent sign ("%"), which commonly occur in the "mailbox" syntax. "hname" and "hvalue" are encodings of an RFC 822 header name and value, respectively. As with "to", all URL reserved characters must be encoded. The special hname "body" indicates that the associated hvalue is the body of the message. The "body" hname should contain the content for the first text/plain body part of the message. The mailto URL is primarily intended for generation of short text messages that are actually the content of automatic processing (such as "subscribe" messages for mailing lists), not general MIME bodies. ... ... (some etxt omitted) Within mailto URLs, the characters "?", "=", "&" are reserved. ... ... (some etxt omitted) 8-bit characters in mailto URLs are forbidden. MIME encoded words (as defined in [RFC2047]) are permitted in header values, but not for any part of a "body" hname. --------------------------------------------- Following these rules, one can make up an mailtourl which contains Latin 1 accents in the subject header value ("Santé"): mailto:janedoe@netscape.nets?subject=%3D%3Fiso-8859-1%3FQ%3FSant%3DE9%3F%3D When someone click on this link, we are supposed to insert "Santé" (without the double quotes) into the subect header. Here's what we do in Communicator 4.7: Subject: =?iso-8859-1?Q?Sant=E9?= Here the problem is that we are not decoding the encoded string as we pull it into the Composer subject field. See this Latin 1 example: http://kaze:8000/bugs/mailtourl8bitencoded.html Interestingly, when the original subject string is in raw 8-bit 4.7 is able to pull in the string as desired. For example, try this mailtourl which contains un-encoded 8-bit accented character. http://kaze:8000/bugs/mailtourl8bit.html I think we need to do the following for RFC 2368 in 5.0: 1. De-code the properly encoded word in the subject and other headers of mailtourl. (I'll provide a Japanese example later.) 2. Continue to support raw 8-bit representation there and pull such data into the header field correctly. (They then eventually get encoded properly as the mail goes out.)
Assignee: rhp → phil
Assignee: phil → rhp
Target Milestone: M14
Reassign to rhp for M14
Summary: Compliance with RFC 2368 (mailtourl) with regard to nono-ASCII header values → Compliance with RFC 2368 (mailtourl) with regard to non-ASCII header values
Bulk change to assigned. - rhp
Assignee: rhp → mscott
Status: ASSIGNED → NEW
This is actually a mailto: protocol handler issues. Scott, I am reassigning to you, but please change if you are the wrong person. Thanks. - rhp
QA Contact: lchiang → momoi
I'm assigning myself as QA contact since testing would have to involve more than Latin 1 strings.
If our current behavior is the same as 4.x, I don't think this is a beta stopper.
Target Milestone: M14 → M17
moving to future milestone.
Target Milestone: M17 → Future
Keywords: intl
CC'ing marina & ji, but keeping this bug for myself to QA if it gets worked on in the future.
This is now currently testable and mailto URL is not able to deal with 8-bit characters. ** As of 6/15/2001 Win32 trunk build **
Phil said: > If our current behavior is the same as 4.x, I don't > think this is a beta stopper. Well, the current behavior is worse than 4.x as it cannot at all deal with Asian 8-bit or MIME-encoded headers in mailto URL. (It seems that Latin 1 examples of raw 8-bit headers might be working.) ji and/or marina, please test this.
<a href="mailto:tato@fureai.or.jp?subject=abc&body=def">test</a> is OK <a href="mailto:tato@fureai.or.jp? subject=JapanesWords&body=JapanesWords">test</a> is It can't be read. test page http://game.gr.jp/chkmoz/11/ Many Japanese pages use this. Therefore, it is a serious problem.
mscott, ftang, can we reconsider TM for this bug?
I think the 4.x parity issue can be resolved by relatively small changes (similar fix as bug 51355). Momoi san, could you file it separately, also put examples of real usage if possible?
4.x parity bug was filed as bug 87202.
So charset can be specified for headers. But how about a charset of the composing mail? Should that be taken from a charset of body? What if no body is specified for the mailto URL?
> So charset can be specified for headers. But how about > a charset of the composing mail? If "body" is specified according to this RFC, then we know what the charset is. > Should that be taken from a charset of body? What > if no body is specified for the mailto URL? I think we need to default to the default mail encoding which is in the same family as the web page encoding in case the 2 are different. If no web encoding is specified, then set it to the current encoding.
Multiple charsets can be specified in a header. So it is ambiguous even we use body's charset because of the possibility of multiple choices. In fact, the current MIME decoder does not return charset(s) used in the header. I wrote a patch to decode MIME encoded headers but it does not set a charset of the compose window. I will attach it later.
>Multiple charsets can be specified in a header. So it is ambiguous > even we use body's charset because of the possibility of multiple > choices. That is true but in a practical sense, web age designer should not be using 2 different encodigns for header and body. In a vast majority of cases, these two should be and will be the same. Let me also quote the following passage from the RFC cited above: "The "body" hname should contain the content for the first text/plain body part of the message. The mailto URL is primarily intended for generation of short text messages that are actually the content of automatic processing (such as "subscribe" messages for mailing lists), not general MIME bodies." The body content is supposed to be "short" usually containing only what is needed administratively for subscription and other purposes. It also looks like such msgs should be of plain/text type or at least provide text/plain body part at minimum.
In fact, I am not sure MIME encode in mailto URL is practically useful. It is not supported by IE and 4.x. Using UTF-8 for URL is simpler than including MIME encoded words inside URL. Reassign to nhotta.
Assignee: mscott → nhotta
Target Milestone: Future → ---
jbetak- can you take a look at nhotta's patch, review it, and get people sr it and check in if it is good.
Assignee: nhotta → jbetak
accepting - I'll try to land this after obtaining r/sr
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.4
mscott, sspitzer: what do you think? Could you r/sr?
Ducarroz is module owner of compose.
thanks for pointing it out! Ducarroz, what do you think of this one?
experiencing trouble with obtaining r/sr. I'll lobby harder. Nominating for nsbranch.
Whiteboard: nsbranch, have patch - need r/sr
Per Selmer suggestion, I am doing this myself. Moving nsbranch designation from the Status Whiteboard to Keywords.
Keywords: nsbranch
Whiteboard: nsbranch, have patch - need r/sr → have patch - need r/sr
not a feature in IE nor 4.x, move to m0.9.6
Target Milestone: mozilla0.9.4 → mozilla0.9.6
Blocks: 99171
nsbranch- since Frank moved it to 0.9.6
Keywords: nsbranchnsbranch-
Reassign to nhotta.
Assignee: jbetak → nhotta
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
Comment on attachment 44035 [details] [diff] [review] Patch, decode MIME encoded strings in mailto URL. looks good. R=ducarroz
Attachment #44035 - Flags: review+
Comment on attachment 44035 [details] [diff] [review] Patch, decode MIME encoded strings in mailto URL. sr=sspitzer
Attachment #44035 - Flags: superreview+
instead of defining that CID, I'd rather use the contract id, but there isn't one. I'll log a bug to add a contract id to nsMsgMimeCID.h and use it in the nsMimeModule.cpp and the callers.
Whiteboard: have patch - need r/sr
Checked in to the trunk.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: