Open Bug 601933 Opened 14 years ago Updated 2 years ago

remove RFC 2047 encoding support for HTTP header field parameters

Categories

(Firefox :: File Handling, task)

task

Tracking

()

REOPENED
Tracking Status
relnote-firefox --- -

People

(Reporter: julian.reschke, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug, )

Details

(Keywords: dev-doc-needed, site-compat)

Attachments

(1 file, 7 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.10) Gecko/20100914 Firefox/3.6.10 (.NET CLR 3.5.30729)
Build Identifier: 3.6.10

When decoding the filename parameter in Content-Disposition headers, Firefox attempts to unescape using the RFC 2047 encoding. This is a bug according to the relevant specs (that encoding is for words in headers, not parameters), and is also only implemented on one other UA (Chrome).

Removing the support for RFC 2047 will reduce the code size, and is unlikely to break anything.

Reproducible: Always
Does that still allow non-ISO0-8859-1 filenames?
(In reply to comment #1)
> Does that still allow non-ISO0-8859-1 filenames?

Yes, of course. There's also support for the RFC2231/5987 encoding, which does that.
In the meantime I found out through jungshik@google.com that Google Mail actually uses this encoding, so it'll be hard to remove until this is fixed.
Blocks: 609667
The fix for bug 610054 has simplified this to calling a new variant of the method that is restricted to RFC 5987 support.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Attached patch Proposed patch, incl test case (obsolete) — Splinter Review
This patch removes the RFC 2047 support in DecodeParameter, which shouldn't be in there according to the applicable specs, and also is only "supported" by Firefox and Chrome. For both UAs, a better syntax (RFC 2231) has been available "forever" (FF) or for quite some time (Chrome 10).

As far as I understand, this will not affect Thunderbird, which calls decodeRFC2047Header() directly.

Note that this patch *will* break download with non-ASCII filenames from GMail. I have tried many times to contact them but had no success, so maybe the best thing we can do is to apply the patch, and wait for somebody notice (hopefully before beta).
Attachment #617324 - Flags: review?(jduell.mcbugs)
Please send me the text that you've been sending to them; I'll see what I can do to get it over to the right people.
(In reply to Boris Zbarsky (:bz) from comment #6)
> Please send me the text that you've been sending to them; I'll see what I
> can do to get it over to the right people.

I tried multiple time on G+.

If you think it makes sense, I can try to come up with a coherent argument; also suggesting how they can actually *simplify* their code by removing extra cases.
> I tried multiple time on G+.

My point is that we have a direct contact list with Google for this sort of thing.  Might work better.

A coherent description of how the change will affect them and how they can deal would be perfect, yes.
1) Why doesn't RFC 2047 encoding apply to header field parameters?

See <http://greenbytes.de/tech/webdav/rfc2231.html#rfc.section.2.p.4>:

"2. MIME headers, like the RFC 822 headers they often appear in, are limited to 7bit US-ASCII, and the encoded-word mechanisms of RFC 2047 are not available to parameter values. This makes it impossible to have parameter values in character sets other than US-ASCII without specifying some sort of private per-parameter encoding."

Note that RFC 2231 updates RFC 2047.

Also note that HTTPbis has removed all mentions of RFC 2047; even in those places where it was allowed to be used it wasn't in practice. See <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/111>.

2) Why and where is it used anyway?

The first UA to "support" RFC 2047 encoding was Firefox, and it seems it just happened to be supported because the MIME header parser is shared with Thunderbird, and Thunderbird itself was using the encoding when *sending* messages.

The only UA that copied Firefox' behavior is Chrome, and that's not really surprising as it seems that the code was written by the same developer.

There are web sites that indeed use RFC 2047 encoding (GMail, probably other Google services, and maybe more...); and one explanation for it is that it "works" in both Firefox and Chrome and thus does allow a single code path.

Of course the same is true for the encoding defined in RFCs 2231 and 5987, which is supported by Firefox, Chrome (>=10), Opera, Konqueror, and IE (>=9).

3) Why is it a good idea to remove RFC 2047 support from browsers?

- it's non-compliant

- it only works in two UAs

- it's used for something for which a better and more interoperable solution exists (see RFCs 5987 and 6266)

- the use of RFC 2047 encoding can "bleed" over to other header fields; for instance, the Firefox code (MIME header parser) is used for type attributes in HTML as well (not sure what the situation in Chrome is)

4) Why is UA sniffing for Content-Disposition syntax bad?

Content-Disposition (with disposition type attachment) is non-trivial to test automatically (as it requires observing the UA's "save as" prompt).

Senders can indeed switch based on the "User-Agent" request header field, but this creates multiple code paths, which are even harder to test in a systematic manner. To make things worse, sometimes bugs are only fixed in one code path, although they should have been fixed in several ones.

Another consideration is that for many developers, doing things right for Content-Disposition remains hard, as thinking beyond US-ASCII is not common, and they prefer to observe popular sites instead of reading specifications (such as RFC 6266). Therefore, widely used sites doing things wrong may cause even more breakage in other sites.

5) What's the recommendation for cites that need to support "all" browsers?

Optimally, just treat UAs that do not support RFC 6266 as legacy; see <http://www.greenbytes.de/tech/webdav/rfc6266.html#rfc.section.D>.

If support for IE (<9) and Safari is indeed needed, then my recommendation is to special-case just these:

if (UA is IE older than IE9)
  percent-encode UTF8 representation of filename
else if (US is Safari)
  (do whatever is needed for Safari)
else
  use RFC 5987 encoding (using UTF8 character set)
  
The default rule will work with Chrome >= 10, Opera, Firefox, Konqueror, IE >= 9 and hopefully any new UA.
Mail sent.  They've got a bug filed, looking into this.
Is this also remove RFC 2047 support from Thunderbird?
Windows Mail/Windows Live Mail/Outlook do not support RFC 2231/5987 encoding as far as I know. Also, mail-gateway for Japanese mobile-phones also relies on RFC 2047 encoding. I don't think it's possible to remove RFC 2047 support even if G+ has been fixed. At least, the relevant code should be moved to MailNews Core.
(In reply to Masatoshi Kimura [:emk] from comment #11)
> Is this also remove RFC 2047 support from Thunderbird?

No.

> Windows Mail/Windows Live Mail/Outlook do not support RFC 2231/5987 encoding
> as far as I know. Also, mail-gateway for Japanese mobile-phones also relies
> on RFC 2047 encoding. I don't think it's possible to remove RFC 2047 support
> even if G+ has been fixed. At least, the relevant code should be moved to
> MailNews Core.

Indeed. If this change was applied, the remaining RFC 2047 support would be in a strictly separate function that could easily be moved over.
Julian, to confirm what you are suggesting for Gmail:

1. To create MIME Content-Disposition filename attribute, use RFC 5987 format and encoding, i.e. UTF-8 (or ISO-8859-1 if data can be encoded in it). Use no other encoding.
2. For sending the filename attribute info to browsers via HTTP, e.g. file download from a message, use UTF-8 for Firefox (all versions after 1.0) and IE9 and later. For IE older than IE9, send UTF-8 percent-encoded data. For Chrome 10 or later, use UTF-8/RFC 5987. For Safari, it should work like Chrome, no? If not, punt for Safari? 

I remember for file download on Gmail, we punted for Safari(i.e. nothing worked) but sent UTF-8 percent-escaped format to IE, and MIME-encoded format to Firefox and other browsers. The reason for the latter was because Firefox supported both RFC 2231 and RFC 2047.
(In reply to Katsuhiko  Momoi from comment #13)
> Julian, to confirm what you are suggesting for Gmail:
> 
> 1. To create MIME Content-Disposition filename attribute, use RFC 5987
> format and encoding, i.e. UTF-8 (or ISO-8859-1 if data can be encoded in
> it). Use no other encoding.
> 2. For sending the filename attribute info to browsers via HTTP, e.g. file
> download from a message, use UTF-8 for Firefox (all versions after 1.0) and
> IE9 and later. For IE older than IE9, send UTF-8 percent-encoded data. For
> Chrome 10 or later, use UTF-8/RFC 5987. For Safari, it should work like
> Chrome, no? If not, punt for Safari? 

I'm not sure about the distinction between 1) and 2). Does 1) refer to sending attachments by email? In which case that's out of scope for this bug; just leave things as they are.

For creating the filename attribute in an HTTP Content-Disposition header field:

- Special case those UAs that do not support RFC5987 (so IE < 9 and potentially Safari)
- Send RFC 5987 format using UTF-8 encoding scheme to everybody else

> I remember for file download on Gmail, we punted for Safari(i.e. nothing
> worked) but sent UTF-8 percent-escaped format to IE, and MIME-encoded format
> to Firefox and other browsers. The reason for the latter was because Firefox
> supported both RFC 2231 and RFC 2047.

Safari: if it doesn't work right now, the best plan is to continue not to special-case them; just send the default format (RFC 5987 / UTF-8), and Apple can fix things by simply implementing the specs properly.

IE < 9: sounds right to me.

Firefox: you made almost the right decision back then, but should have picked RFC 2231 (now RFC 5987) instead of RFC 2047; it's not only the "right" encoding to pick, it also just works almost everywhere else (Opera, Konqueror), and even did so a long time ago (ca. 2005).

Finally, in case you haven't seen that: http://greenbytes.de/tech/tc2231/
Comment on attachment 617324 [details] [diff] [review]
Proposed patch, incl test case

Review of attachment 617324 [details] [diff] [review]:
-----------------------------------------------------------------

Patch removes 2047 support straightforwardly enough.  But before we land this I'd like to confirm that 1) gmail has stopped using 2047 encoding; and 2) the Thunderbird folks can verify that this won't break anything of theirs (Mark, can you +f or reassign to the right person?).
Attachment #617324 - Flags: review?(jduell.mcbugs)
Attachment #617324 - Flags: review+
Attachment #617324 - Flags: feedback?(mbanner)
Comment on attachment 617324 [details] [diff] [review]
Proposed patch, incl test case

Thanks for the question. David knows the mime stuff better than me, so I'll pass to him.
Attachment #617324 - Flags: feedback?(mbanner) → feedback?(dbienvenu)
Comment on attachment 617324 [details] [diff] [review]
Proposed patch, incl test case

the bad news is that Thunderbird does call decodeParameter (this patch breaks Thunderbird builds unless the calling code in TB is changed). The good news is that there's only one caller, mime_decode_filename. But we need to support the user's already downloaded messages, so I don't think we have the luxury of no longer parsing gmail messages, for example.  I'm open to suggestions as to how we can parse those older headers ourselves.
Attachment #617324 - Flags: feedback?(dbienvenu) → feedback-
this patch also seems to break some of our unit tests, so the good news is that we have test coverage :-;
My understanding was that TB never never needs the inlined RFC2047-decoding, but always call the separate method instead. So it was not intended to change TB's behaviour at all. Will investigate.
(In reply to David :Bienvenu from comment #18)
> this patch also seems to break some of our unit tests, so the good news is
> that we have test coverage :-;

David, I tried the xpcshell tests (after removing the two parameters from the invocation), and didn't see a problem. Which tests should I try?
(In reply to Julian Reschke from comment #20)
> David, I tried the xpcshell tests (after removing the two parameters from
> the invocation), and didn't see a problem. Which tests should I try?

OK, tests in db/gloda are failing (I didn't realize because when you ran *all* tests the failures just scroll by, and the end result appears to be fine)
It appears that the code for GMail (the webs service, not the mail server) has recently changed to include the RFC2231/5987-encoded variant. (It still contains a 2047-encoded "filename" parameter, but as we preference "filename*" that's not critical). So it seems that we can start to remove RFC 2047 support from HTTP header field parsing (and then see who else breaks :-).

Will provide a new patch soonish.
Summary: remove RFC 2047 encoding support in Content-Disposition → remove RFC 2047 encoding support for HTTP header field parameters
Attached patch Proposed patch, incl test case (obsolete) — Splinter Review
New attempt: leave RFC 2047 code in, but disable it in the "HTTP" code path used from within Firefox. To do this, rename the *5987 variant to *HTTP (it hadn't been used yet anyway), and change all invocations from Firefox to use the *HTTP variant.

In order to not create unneeded code paths, I have enabled support for continuations in this code path, so we don't change too many things at once. We can disable continuations for the *HTTP variant in a future version; I have created the separate bug 776324 for that.
Assignee: nobody → julian.reschke
Attachment #617324 - Attachment is obsolete: true
Attachment #644744 - Flags: review?(jduell.mcbugs)
Attached patch Proposed patch, incl test case (obsolete) — Splinter Review
New attempt: leave RFC 2047 code in, but disable it in the "HTTP" code path used from within Firefox. To do this, rename the *5987 variant to *HTTP (it hadn't been used yet anyway), and change all invocations from Firefox to use the *HTTP variant.

In order to not create unneeded code paths, I have enabled support for continuations in this code path, so we don't change too many things at once. We can disable continuations for the *HTTP variant in a future version; I have created the separate bug 776324 for that.

(new patch fixes the commit message)
Attachment #644744 - Attachment is obsolete: true
Attachment #644744 - Flags: review?(jduell.mcbugs)
Attachment #644751 - Flags: review?(jduell.mcbugs)
Comment on attachment 644751 [details] [diff] [review]
Proposed patch, incl test case

Note that this *renames* an IDL method which is likely bad.

I can undo the renaming for now, but it would be good if we could change the method name to something that actually reflects the semantics. What's the best way to do that?
Attached patch Proposed patch, incl test cases (obsolete) — Splinter Review
New attempt: leave RFC 2047 code in, but disable it in the "HTTP" code path used from within Firefox. To do this, change all invocations from Firefox to use the *HTTP (5987) variant.

In order to not create unneeded code paths, I have enabled support for continuations in this code path, so we don't change too many things at once. We can disable continuations for the *HTTP variant in a future version; I have created the separate bug 776324 for that.

Try results: https://tbpl.mozilla.org/?tree=Try&rev=c2d8e972a619

I also have verified that GMail continues to work for me with that patch.
Attachment #644751 - Attachment is obsolete: true
Attachment #644751 - Flags: review?(jduell.mcbugs)
Attachment #644895 - Flags: review?(jduell.mcbugs)
Attached patch Proposed patch, incl test cases (obsolete) — Splinter Review
New attempt: leave RFC 2047 code in, but disable it in the "HTTP" code path used from within Firefox. To do this, change all invocations from Firefox to use the *HTTP (5987) variant.

In order to not create unneeded code paths, I have enabled support for continuations in this code path, so we don't change too many things at once. We can disable continuations for the *HTTP variant in a future version; I have created the separate bug 776324 for that.

Try results: https://tbpl.mozilla.org/?tree=Try&rev=c2d8e972a619

I also have verified that GMail continues to work for me with that patch (and added a specific test case using the format they currently use)
Attachment #644895 - Attachment is obsolete: true
Attachment #644895 - Flags: review?(jduell.mcbugs)
Attachment #644905 - Flags: review?(jduell.mcbugs)
Note that the current patch suffers fro bit rot.
Attached patch Proposed patch, incl test case (obsolete) — Splinter Review
This change re-purposes the *5987 variant of the API that was already there. It changes the browser code to use *that* signature (Thunderbird can continue to use the MIME variant).

For the *5987 variant, it disables RFC2047-decoding. It *also* re-enables continuation support so that we don't silently introduce yet another change. We can disable continuation support in a separate future change.

See also <https://tbpl.mozilla.org/?tree=Try&rev=178bfd69ae5e>
Attachment #644905 - Attachment is obsolete: true
Attachment #644905 - Flags: review?(jduell.mcbugs)
Attachment #708089 - Flags: review?(jduell.mcbugs)
Comment on attachment 708089 [details] [diff] [review]
Proposed patch, incl test case

Review of attachment 708089 [details] [diff] [review]:
-----------------------------------------------------------------

bz: do you have a bikeshed opinion on the IDL name here?

Otherwise this patch looks good.  I want to hold off landing it until the next code fork (i.e. until after next week).  And I want to check in with the Thunderbird folks one last time to make sure this won't break anything of theirs.

::: netwerk/mime/nsIMIMEHeaderParam.idl
@@ +75,2 @@
>     */
>    AString getParameter5987(in ACString aHeaderVal,

I'd actually prefer to rename this getParameterHTTP() as in your earlier patch.  It's certainly clearer to developers, and it seems unlikely that mail will ever switch to 5987, so the name ought to not get obsoleted.  And no code AFAIK has ever actually used this method so far IIRC.
Attachment #708089 - Flags: review?(jduell.mcbugs) → review+
IDL name for which?
Flags: needinfo?(jduell.mcbugs)
(In reply to Jason Duell (:jduell) from comment #31)
> Otherwise this patch looks good.  I want to hold off landing it until the
> next code fork (i.e. until after next week).  And I want to check in with

Sure.

> the Thunderbird folks one last time to make sure this won't break anything
> of theirs.

The idea was this time to only modify behavior of functions that they do not use.

> ::: netwerk/mime/nsIMIMEHeaderParam.idl
> @@ +75,2 @@
> >     */
> >    AString getParameter5987(in ACString aHeaderVal,
> 
> I'd actually prefer to rename this getParameterHTTP() as in your earlier
> patch.  It's certainly clearer to developers, and it seems unlikely that
> mail will ever switch to 5987, so the name ought to not get obsoleted.  And
> no code AFAIK has ever actually used this method so far IIRC.

I agree that this would be better; if we agree on this I can provide a new patch quickly.
bz:

This would be the change that's in the IDL in attachment 644751 [details] [diff] [review] (i.e. change nsIMIMEHeaderParam.idl getParameter5987() to getParameterHTTP()).  I think this would be clearer to developers, and less likely to be obsoleted (we'll always need a function that handles getParameter for HTTP: it's not clear it'll always use RFC 5987 or that that will be the best name for the function).  I don't think any code has ever used the existing getParameter5987() so I don't expect this to break anything. (seems unlikely addons are using it).

If you're fine with this I assume we can just cut a new patch with the change and I'll mark it +sr.  Or you can +sr it yourself once we have new patch--I just don't want to make Julian write a new patch if we're not going to take it.
Flags: needinfo?(jduell.mcbugs) → needinfo?(bzbarsky)
Oh, I see.  It was about an attachment that wasn't even showing.  ;)

I have no problem with getParameterHTTP, if appropriately documented.
Flags: needinfo?(bzbarsky)
OK, will create a new patch soonish.
Attached patch Proposed patch, incl test case (obsolete) — Splinter Review
(patch updated to rename *5987 to *HTTP)
Attachment #708089 - Attachment is obsolete: true
Attachment #716070 - Flags: review?(jduell.mcbugs)
Comment on attachment 716070 [details] [diff] [review]
Proposed patch, incl test case

Review of attachment 716070 [details] [diff] [review]:
-----------------------------------------------------------------

Julian: all we need here is a uuid update and I think we're good to go.

::: netwerk/mime/nsIMIMEHeaderParam.idl
@@ +19,1 @@
>     * Given the value of a single header field  (such as

We need to update the UUID for nsIMIMEHeaderParam/idl.  Here's one I just generated:

  9c9252a1-fdaf-40a2-9c2b-a3dc45e28dde

@@ +34,5 @@
>     *
>     * <p>
> +   * Note that a lot of MUAs put RFC 2047-encoded parameters. Unfortunately,
> +   * this includes Mozilla as of 2003-05-30. Even more standard-ignorant MUAs,
> +   * web servers and application servers put 'raw 8bit characters'. This will

Comment still accurate?  We still see web servers doing this?

@@ +41,4 @@
>     * includes lang.
>     *
> +   * <p>
> +   * Note that GetParameter5987 skips some of the workarounds used for

s/5987/HTTP/
Attachment #716070 - Flags: review?(jduell.mcbugs) → feedback+
changed UUID, fixed one comment
Attachment #716070 - Attachment is obsolete: true
Attachment #716495 - Flags: review?(jduell.mcbugs)
(In reply to Jason Duell (:jduell) from comment #38)
> Comment on attachment 716070 [details] [diff] [review]
> Proposed patch, incl test case
> 
> Review of attachment 716070 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Julian: all we need here is a uuid update and I think we're good to go.
> 
> ::: netwerk/mime/nsIMIMEHeaderParam.idl
> @@ +19,1 @@
> >     * Given the value of a single header field  (such as
> 
> We need to update the UUID for nsIMIMEHeaderParam/idl.  Here's one I just
> generated:
> 
>   9c9252a1-fdaf-40a2-9c2b-a3dc45e28dde

OK.

> @@ +34,5 @@
> >     *
> >     * <p>
> > +   * Note that a lot of MUAs put RFC 2047-encoded parameters. Unfortunately,
> > +   * this includes Mozilla as of 2003-05-30. Even more standard-ignorant MUAs,
> > +   * web servers and application servers put 'raw 8bit characters'. This will
> 
> Comment still accurate?  We still see web servers doing this?

I believe/fear it is true, but I do not have any data on it.

> @@ +41,4 @@
> >     * includes lang.
> >     *
> > +   * <p>
> > +   * Note that GetParameter5987 skips some of the workarounds used for
> 
> s/5987/HTTP/

Done.
Comment on attachment 716495 [details] [diff] [review]
Proposed patch, incl test case

Try results: https://tbpl.mozilla.org/?tree=Try&rev=a241fb53e592
Comment on attachment 716495 [details] [diff] [review]
Proposed patch, incl test case

Review of attachment 716495 [details] [diff] [review]:
-----------------------------------------------------------------

None of the oranges in the try run appear to be new AFAICT, so let's give this a go.

Thanks for all your work on this Julian...

   https://hg.mozilla.org/integration/mozilla-inbound/rev/83535a72fb27
Attachment #716495 - Flags: review?(jduell.mcbugs) → review+
https://hg.mozilla.org/mozilla-central/rev/83535a72fb27
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
I've added this bug to the compatibility doc. Please correct the info if wrong.
https://developer.mozilla.org/en-US/docs/Site_Compatibility_for_Firefox_22

Also, please update relevant docs if any.
Keywords: dev-doc-needed
Since this is being documented elsewhere, we won't release note.

(In reply to Julian Reschke from comment #0)
> Removing the support for RFC 2047 will reduce the code size, and is unlikely
> to break anything.

For sake of support - if there were a web regression, what would it look like (errors)?
(In reply to Alex Keybl [:akeybl] from comment #45)
> For sake of support - if there were a web regression, what would it look
> like (errors)?

A download link where the download indeed takes place, but the suggest filename looks incorrect.
Keywords: site-compat
Depends on: 875615
Depends on: 883447
Why is this bug marked RESOLVED FIXED? The change was reverted in https://hg.mozilla.org/mozilla-central/rev/908ee156f377 and Firefox still accepts RFC2047 encoded filenames.
Flags: needinfo?(julian.reschke)
Yes, that is misleading. It should be re-opened.
Status: RESOLVED → REOPENED
Flags: needinfo?(julian.reschke)
Resolution: FIXED → ---
Product: Core → Firefox
Target Milestone: mozilla22 → ---
Not just GMail is using this encoding method, also Bugzilla 5.0.3 as used by the ACPICA Bugtracker is affected (e.g. https://bugs.acpica.org/attachment.cgi?id=1065). These headers are shown:

Content-disposition: attachment; filename="=?UTF-8?Q?lv-package-test=2Epatch?="
Content-Type: text/plain; name="=?UTF-8?Q?lv-package-test=2Epatch?="; charset=

(noticed this while using curl -JO to download an attachment)

From the Chrome issue tracker (https://bugs.chromium.org/p/chromium/issues/detail?id=66694#c27):
> Out of all Content-Disposition headers that were seen for downloads (not
> counting those seen but didn't result in a download):
>
>  1.2% used RFC 2047 encoding.
>  7% had non-ASCII bytes (i.e. unencoded bytes)
>
> Those are still pretty high. Also as jshin@ points out, usage varies
> wildly across markets. Use non non-ASCII bytes is at 19% of all headers in
>  South Korea, for example. We are going to be stuck with this for a while.

They have marked the change as WontFix. Should Firefox keep the
RFC2047-style decoding method for the sake of backwards-compatibility?
I think we should, yes.
Type: enhancement → task

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: julian.reschke → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: