Closed Bug 12851 Opened 25 years ago Closed 23 years ago

Compliance with RFC 2368 (mailtourl) with regard to non-ASCII header values

Categories

(MailNews Core :: MIME, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED
mozilla0.9.6

People

(Reporter: momoi, Assigned: nhottanscp)

References

()

Details

(Keywords: intl)

Attachments

(1 file)

** This point was originally brought in mozilla discussion newsgroup
   for 4.x clients, but since we are carrying over some of the
   mail codes from 4.x, we should also check this for 5.0 **

** This is not currently testable since we don't seem to eb supporting
   the mailtourl yet. **

RFC 2368 (http://www.cis.ohio-state.edu/rfc/rfc2368.txt)

stipulates:

---------------------------------------------
2. Syntax of a mailto URL

   Following the syntax conventions of RFC 1738 [RFC1738], a "mailto"
   URL has the form:

     mailtoURL  =  "mailto:" [ to ] [ headers ]
     to         =  #mailbox
     headers    =  "?" header *( "&" header )
     header     =  hname "=" hvalue
     hname      =  *urlc
     hvalue     =  *urlc

   "#mailbox" is as specified in RFC 822 [RFC822]. This means that it
   consists of zero or more comma-separated mail addresses, possibly
   including "phrase" and "comment" components. Note that all URL
   reserved characters in "to" must be encoded: in particular,
   parentheses, commas, and the percent sign ("%"), which commonly occur
   in the "mailbox" syntax.

   "hname" and "hvalue" are encodings of an RFC 822 header name and
   value, respectively. As with "to", all URL reserved characters must
   be encoded.

   The special hname "body" indicates that the associated hvalue is the
   body of the message. The "body" hname should contain the content for
   the first text/plain body part of the message. The mailto URL is
   primarily intended for generation of short text messages that are
   actually the content of automatic processing (such as "subscribe"
   messages for mailing lists), not general MIME bodies.

...
... (some etxt omitted)

   Within mailto URLs, the characters "?", "=", "&" are reserved.
...
... (some etxt omitted)
   8-bit characters in mailto URLs are forbidden. MIME encoded words (as
   defined in [RFC2047]) are permitted in header values, but not for any
   part of a "body" hname.

---------------------------------------------

Following these rules, one can make up an mailtourl which contains
Latin 1 accents in the subject header value ("Santé"):

mailto:janedoe@netscape.nets?subject=%3D%3Fiso-8859-1%3FQ%3FSant%3DE9%3F%3D

When someone click on this link, we are supposed to insert
"Santé" (without the double quotes) into the subect header.

Here's what we do in Communicator 4.7:

Subject: =?iso-8859-1?Q?Sant=E9?=

Here the problem is that we are not decoding the encoded string
as we pull it into the Composer subject field. See this Latin 1
example:

http://kaze:8000/bugs/mailtourl8bitencoded.html

Interestingly, when the original subject string is in raw 8-bit
4.7 is able to pull in the string as desired. For example, try this
mailtourl which contains un-encoded 8-bit accented character.

http://kaze:8000/bugs/mailtourl8bit.html

I think we need to do the following for RFC 2368 in 5.0:

1. De-code the properly encoded word in the subject and other headers
   of mailtourl. (I'll provide a Japanese example later.)
2. Continue to support raw 8-bit representation there and pull such
   data into the header field correctly. (They then eventually get
   encoded properly as the mail goes out.)
Assignee: rhp → phil
Assignee: phil → rhp
Target Milestone: M14
Reassign to rhp for M14
Summary: Compliance with RFC 2368 (mailtourl) with regard to nono-ASCII header values → Compliance with RFC 2368 (mailtourl) with regard to non-ASCII header values
Bulk change to assigned.

- rhp
Assignee: rhp → mscott
Status: ASSIGNED → NEW
This is actually a mailto: protocol handler issues.

Scott, I am reassigning to you, but please change if you are the wrong person.

Thanks.

- rhp
QA Contact: lchiang → momoi
I'm assigning myself as QA contact since testing would have to involve
more than Latin 1 strings.
If our current behavior is the same as 4.x, I don't think this is a beta stopper.
Target Milestone: M14 → M17
moving to future milestone.
Target Milestone: M17 → Future
Keywords: intl
CC'ing marina & ji, but keeping this bug for myself to QA if it gets worked on in the future.
This is now currently testable and mailto URL is not
able to deal with 8-bit characters.

** As of 6/15/2001 Win32 trunk build **
Phil said:

> If our current behavior is the same as 4.x, I don't 
> think this is a beta stopper.

Well, the current behavior is worse than 4.x as it cannot
at all deal with Asian 8-bit or MIME-encoded headers in
mailto URL.
(It seems that Latin 1 examples of raw 8-bit headers
might be working.)

ji and/or marina, please test this.
<a href="mailto:tato@fureai.or.jp?subject=abc&body=def">test</a> is OK

<a href="mailto:tato@fureai.or.jp?
subject=JapanesWords&body=JapanesWords">test</a> is It can't be read.

test page
http://game.gr.jp/chkmoz/11/

Many Japanese pages use this. Therefore, it is a serious problem.
mscott, ftang, can we reconsider TM for this bug?
I think the 4.x parity issue can be resolved by relatively small changes
(similar fix as bug 51355).
Momoi san, could you file it separately, also put examples of real usage if
possible?
4.x parity bug was filed as bug 87202.
So charset can be specified for headers. But how about a charset of the
composing mail? Should that be taken from a charset of body? What if no body is
specified for the mailto URL?
> So charset can be specified for headers. But how about 
> a charset of the composing mail? 

If "body" is specified according to this RFC, then we know
what the charset is.

> Should that be taken from a charset of body? What 
> if no body is specified for the mailto URL?

I think we need to default to the default mail encoding 
which is in the same family as the web page encoding in
case the 2 are different. If no web encoding is specified,
then set it to the current encoding.
Multiple charsets can be specified in a header. So it is ambiguous even we use
body's charset because of the possibility of multiple choices. In fact, the
current MIME decoder does not return charset(s) used in the header.
I wrote a patch to decode MIME encoded headers but it does not set a charset of
the compose window. I will attach it later.


>Multiple charsets can be specified in a header. So it is ambiguous 
> even we use body's charset because of the possibility of multiple 
> choices.

That is true but in a practical sense, web age designer should not
be using 2 different encodigns for header and body. In a vast 
majority of cases, these two should be and will be the same. 
Let me also quote the following passage from the RFC cited above:

"The "body" hname should contain the content for
   the first text/plain body part of the message. The mailto URL is
   primarily intended for generation of short text messages that are
   actually the content of automatic processing (such as "subscribe"
   messages for mailing lists), not general MIME bodies."

The body content is supposed to be "short" usually containing
only what is needed administratively for subscription and 
other purposes. It also looks like such msgs should be of
plain/text type or at least provide text/plain body part 
at minimum.


In fact, I am not sure MIME encode in mailto URL is practically useful.
It is not supported by IE and 4.x. Using UTF-8 for URL is simpler than including
MIME encoded words inside URL.

Reassign to nhotta.
Assignee: mscott → nhotta
Target Milestone: Future → ---
jbetak- can you take a look at nhotta's patch, review it, and get people sr it
and check in if it is good. 
Assignee: nhotta → jbetak
accepting - I'll try to land this after obtaining r/sr
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.4
mscott, sspitzer: what do you think? Could you r/sr?
Ducarroz is module owner of compose.
thanks for pointing it out! Ducarroz, what do you think of this one?
experiencing trouble with obtaining r/sr. I'll lobby harder. Nominating for 
nsbranch.
Whiteboard: nsbranch, have patch - need r/sr
Per Selmer suggestion, I am doing this myself.

Moving nsbranch designation from the Status Whiteboard to Keywords.
Keywords: nsbranch
Whiteboard: nsbranch, have patch - need r/sr → have patch - need r/sr
not a feature in IE nor 4.x, move to m0.9.6
Target Milestone: mozilla0.9.4 → mozilla0.9.6
Blocks: 99171
nsbranch- since Frank moved it to 0.9.6
Keywords: nsbranchnsbranch-
Reassign to nhotta.
Assignee: jbetak → nhotta
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
Comment on attachment 44035 [details] [diff] [review]
Patch, decode MIME encoded strings in mailto URL.

looks good. R=ducarroz
Attachment #44035 - Flags: review+
Comment on attachment 44035 [details] [diff] [review]
Patch, decode MIME encoded strings in mailto URL.

sr=sspitzer
Attachment #44035 - Flags: superreview+
instead of defining that CID, I'd rather use the contract id, but there isn't one.

I'll log a bug to add a contract id to nsMsgMimeCID.h and use it in the
nsMimeModule.cpp and the callers.
Whiteboard: have patch - need r/sr
Checked in to the trunk.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: