Remove support for UTF-7 (and others) per HTML5 spec

RESOLVED FIXED in Firefox 5

Status

()

Core
Internationalization
RESOLVED FIXED
10 years ago
4 years ago

People

(Reporter: dveditz, Assigned: smontagu)

Tracking

(Blocks: 1 bug, {dev-doc-complete, html5})

Trunk
mozilla5
dev-doc-complete, html5
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox5 fixed, blocking2.0 -, status1.9.2 wontfix, status1.9.1 wontfix)

Details

(Whiteboard: [sg:want P3])

Attachments

(2 attachments, 8 obsolete attachments)

(Reporter)

Description

10 years ago
Section 8.2.2.2 of the WHAT-WG HTML5 spec says

  User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
  encodings. Support for UTF-32 is not recommended. This encoding
  is rarely used, and frequently misimplemented. 

http://www.whatwg.org/specs/web-apps/current-work/#character0

I don't know about UTF-32, but as far as I can tell UTF-7 is only used to demo XSS holes and isn't used for live web content. Even the authors of UTF-7 agree:

http://mail.apps.ietf.org/ietf/charsets/msg01746.html

Some bugs seem to imply we may need utf-7 or a variant for IMAP folder names. Don't know if that's still a requirement or if there's a way to preserve it for mail protocols while killing its use for the web and mail _content_.

Comment 1

10 years ago
We still need IMAP Mod utf-7 for IMAP folder names (and keywords), and will for the forseeable future, afaik. We use the intl converters to convert back and forth from unicode to imap mod utf7.
(Assignee)

Comment 2

10 years ago
Could we change the registered name of the converter to some arbitrary internal name so that it works when we call it manually for IMAP but doesn't recognize "utf-7" in content? 

OTOH, I think I remember Neil saying when we discussed this on IRC a little while ago that he still comes across UTF-7 in mail content.

Comment 3

10 years ago
IMAP uses MUTF7, "x-imap4-modified-utf7" - see http://mxr.mozilla.org/mailnews/search?string=mutf7

Comment 4

10 years ago
Within the IMAP community, the fact that MUTF7 exists is generally believed to
be a design error, but it still exists.  There is a draft proposal that might
deprecate it, but it's both controversial and not done:

  http://tools.ietf.org/html/draft-ietf-eai-imap-utf8

So despite the need for MUTF7 for the time being in IMAP mailbox names, removal
of support for the regular UTF-7 charset seems like a good idea.
Just wanted to poke this bug to see if anyone has given further thought on how we can remove UTF-7 globally.  We continue to see a steady trickle of XSS bugs that leverage UTF-7, so it would be great to shut it off once and for all.

Updated

8 years ago
Whiteboard: [sg:want P3]

Updated

8 years ago
Blocks: 530647
Keywords: html5
Created attachment 432349 [details] [diff] [review]
Remove UTF7 and move BasicUTF7
Attachment #432349 - Flags: review?(smontagu)
Created attachment 432350 [details] [diff] [review]
Merge BasicUTF7 and MUTF7
Attachment #432350 - Flags: review?(smontagu)
(Assignee)

Comment 8

8 years ago
So are we sure that there are no mail clients still around that send UTF-7 mails?

Comment 9

8 years ago
(In reply to comment #8)
MS Exchange 2000 uses UTF-7 for Delivery status notification messages
http://support.microsoft.com/kb/819831/en-us/

http://www.microsoft.com/exchange/2007/support/lifecycle/2000.mspx

Comment 10

8 years ago
(In reply to comment #9)
End of lifecycle of MS Exchange 2000 is January 1, 2011.

Comment 11

8 years ago
So, if we theoretically removed our UTF-7 support today, what issues could/would occur?

Comment 12

8 years ago
Webkit has been already removed UTF-7 support.

I filed a security bug around UTF-7 decoder of Webkit.
It is fixed by removal of UTF-7 support.

Comment 13

8 years ago
(In reply to comment #12)
> I filed a security bug around UTF-7 decoder of Webkit.
Oops, it is NOT correct description.
I thought it may be security-related, and filed it as Security-sensitive.
But, No security attack method using it was found.

Comment 14

8 years ago
(In reply to comment #13)
> (In reply to comment #12)
> > I filed a security bug around UTF-7 decoder of Webkit.
> Oops, it is NOT correct description.
> I thought it may be security-related, and filed it as Security-sensitive.
> But, No security attack method using it was found.

And main security-related reason was "UTF-7 is commonly used for XSS".
Comment on attachment 432349 [details] [diff] [review]
Remove UTF7 and move BasicUTF7

Canceling r? for now; comments still welcome though.
Attachment #432349 - Flags: review?(smontagu)

Updated

8 years ago
Attachment #432350 - Flags: review?(smontagu)

Comment 16

8 years ago
The way we handled this at Opera was disabling UTF-7 just for web content. UTF-32 support was nuked completely though.
(Assignee)

Comment 18

7 years ago
I'm going to investigate moving the UTF-7 converters into mailnews/imap or somewhere.
Status: NEW → ASSIGNED
(Assignee)

Comment 19

7 years ago
Created attachment 462910 [details] [diff] [review]
Remove utf-7 (intl part)

This patch breaks tests in mailnews/base, which is a Good Thing, because it means that there is already testing in place for the follow-up patch to reimplement UTF-7 in mailnews.
Attachment #462910 - Flags: review?(VYV03354)
(Assignee)

Comment 20

7 years ago
Created attachment 462911 [details] [diff] [review]
Remove utf-7 (parser part)
Attachment #462911 - Flags: review?(hsivonen)
(Assignee)

Comment 21

7 years ago
Created attachment 462912 [details] [diff] [review]
Remove utf-7 (parser part)

The right patch this time
Attachment #462911 - Attachment is obsolete: true
Attachment #462911 - Flags: review?(hsivonen)
(Assignee)

Comment 22

7 years ago
Comment on attachment 462910 [details] [diff] [review]
Remove utf-7 (intl part)

Removing review request -- we may still need the alias handling.
Attachment #462910 - Flags: review?(VYV03354)
(Assignee)

Updated

7 years ago
Depends on: 587475
(Assignee)

Comment 23

7 years ago
Filed bug 587475 with a patch reimplementing UTF-7 decoding and encoding in mailnews using character sinks.
Be real nice if we could get this into 2.0, then we could stop worrying about these UTF-7 smuggled XSS issues.
blocking2.0: --- → ?
(Assignee)

Comment 25

7 years ago
Created attachment 466601 [details] [diff] [review]
Remove utf-7 (intl part) v.2

Tested with the patch in bug 587475.
Attachment #462910 - Attachment is obsolete: true
(Assignee)

Updated

7 years ago
Attachment #466601 - Flags: review?(VYV03354)
(Assignee)

Updated

7 years ago
Attachment #462912 - Flags: review?(hsivonen)
(Assignee)

Comment 26

7 years ago
Created attachment 466602 [details] [diff] [review]
tests
(Assignee)

Updated

7 years ago
Blocks: 412431
(Assignee)

Comment 27

7 years ago
Unfortunately View Source of a UTF-7 message is broken with this approach. I haven't tested viewing an .eml file in browser, but I expect it will be broken also.
(Assignee)

Comment 28

7 years ago
Sorry, the previous comment belongs in bug 587475
Attachment #466601 - Flags: review?(VYV03354) → review+
Comment on attachment 462912 [details] [diff] [review]
Remove utf-7 (parser part)

If mailnews restores an UTF-7 decoder, what's the expected behavior for an HTML message that doesn't specify the encoding on the MIME layer but has <meta http-equiv="Content-Type" content="text/html; charset=utf-7"> or <meta http-equiv="Content-Type" content="text/html; charset=x-imap4-modified-utf7"> inside the message?

Would GetUnicodeDecoderRaw when called with "utf-7" or "x-imap4-modified-utf7" from return a decoder when the HTML parser runs in mailnews?

In any case, please add a reftest that loads an HTML document that has <meta http-equiv="Content-Type" content="text/html; charset=utf-7"> and another with <meta http-equiv="Content-Type" content="text/html; charset=x-imap4-modified-utf7"> and content decodes differently in UTF-7 and Windows-1252.

Updated

7 years ago
blocking2.0: ? → -
(Assignee)

Comment 30

7 years ago
(In reply to comment #29)

> Would GetUnicodeDecoderRaw when called with "utf-7" or "x-imap4-modified-utf7"
> from return a decoder when the HTML parser runs in mailnews?

The new UTF-7 decoder in mailnews is a character sink, not an nsIUnicodeDecoder, so GetUnicodeDecoderRaw is not going to return a decoder in this scenario.

> If mailnews restores an UTF-7 decoder, what's the expected behavior for an HTML
> message that doesn't specify the encoding on the MIME layer but has <meta
> http-equiv="Content-Type" content="text/html; charset=utf-7"> or <meta
> http-equiv="Content-Type" content="text/html; charset=x-imap4-modified-utf7">
> inside the message?

"UTF-7" is special cased in MimeInlineText_convert_and_parse_line() in the patch for bug 587475 so that it will be decoded, and I have tested that this works both in text and HTML messages. "x-imap4-modified-utf7" will not be decoded.

> In any case, please add a reftest that loads an HTML document that has <meta
> http-equiv="Content-Type" content="text/html; charset=utf-7"> and another with
> <meta http-equiv="Content-Type" content="text/html;
> charset=x-imap4-modified-utf7"> and content decodes differently in UTF-7 and
> Windows-1252.

I don't mind doing this, but do you think it adds significant value beyond the xpcshell test in attachment 466602 [details] [diff] [review]?
Comment on attachment 462912 [details] [diff] [review]
Remove utf-7 (parser part)

(In reply to comment #30)
> (In reply to comment #29)
> 
> > Would GetUnicodeDecoderRaw when called with "utf-7" or "x-imap4-modified-utf7"
> > from return a decoder when the HTML parser runs in mailnews?
> 
> The new UTF-7 decoder in mailnews is a character sink, not an
> nsIUnicodeDecoder, so GetUnicodeDecoderRaw is not going to return a decoder in
> this scenario.

Great.

> > If mailnews restores an UTF-7 decoder, what's the expected behavior for an HTML
> > message that doesn't specify the encoding on the MIME layer but has <meta
> > http-equiv="Content-Type" content="text/html; charset=utf-7"> or <meta
> > http-equiv="Content-Type" content="text/html; charset=x-imap4-modified-utf7">
> > inside the message?
> 
> "UTF-7" is special cased in MimeInlineText_convert_and_parse_line() in the
> patch for bug 587475 so that it will be decoded, and I have tested that this
> works both in text and HTML messages. "x-imap4-modified-utf7" will not be
> decoded.

Code inspection of the patch suggests to me that internal encoding declaration (<meta>) saying UTF-7 wouldn't get decoded as UTF-7, but I'm OK with that.

> > In any case, please add a reftest that loads an HTML document that has <meta
> > http-equiv="Content-Type" content="text/html; charset=utf-7"> and another with
> > <meta http-equiv="Content-Type" content="text/html;
> > charset=x-imap4-modified-utf7"> and content decodes differently in UTF-7 and
> > Windows-1252.
> 
> I don't mind doing this, but do you think it adds significant value beyond the
> xpcshell test in attachment 466602 [details] [diff] [review]?

Since there's not going to be an nsIUnicodeDecoder for UTF-7 in any configuration, I think it's not of significant value to add more tests.

r=me.
Attachment #462912 - Flags: review?(hsivonen) → review+
Nominating for 2.0 (and I think should be backported to active branches as well).
blocking1.9.1: --- → ?
blocking1.9.2: --- → ?
blocking2.0: - → ?
(Reporter)

Comment 33

7 years ago
We'll want this on branches after we figure out what we're doing on trunk and that it's not breaking things we care about. Not blocking a particular release.
blocking1.9.1: ? → needed
blocking1.9.2: ? → needed
status1.9.1: --- → wanted
status1.9.2: --- → wanted
(Reporter)

Comment 34

7 years ago
(but I do hope it blocks Firefox 4)

Updated

7 years ago
blocking2.0: ? → betaN+

Updated

7 years ago
Blocks: 568516
(Assignee)

Comment 35

7 years ago
Created attachment 491486 [details] [diff] [review]
Flag utf-7 as XSS-vulnerable

Patch using the mechanism from bug 601429 to mark the charsets as vulnerable, which will make GetUnicodeDecoder return an error. This still depends on bug 587475, but that will be a lot simpler to fix now too.
Attachment #432349 - Attachment is obsolete: true
Attachment #432350 - Attachment is obsolete: true
Attachment #462912 - Attachment is obsolete: true
Attachment #466601 - Attachment is obsolete: true
It might make sense to reorder the .notForBrowser block so that x-imap4-modified-utf7 is next to utf-7, I was about to post asking why x-imap4-modified-utf7 doesn't get .notForBrowser.

Also, some comments explaining the difference between .notForBrowser and .isXSSVulnerable would be nice; the former list has a lot of things that I don't immediately see what the problem is (us-ascii, for instance) and am left wondering whether others of those ought to be tagged xss-vulnerable.
(Assignee)

Comment 37

7 years ago
.notForBrowser is *supposed* to mean "we need to be able to decode this if it appears in content, but we don't want to expose it in the UI", but you may well be right that some of the existing entries should become .isXSSVulnerable

Updated

7 years ago
blocking2.0: betaN+ → -
(Assignee)

Comment 38

7 years ago
Created attachment 500009 [details] [diff] [review]
Flag utf-7 as XSS-vulnerable v.2
Attachment #491486 - Attachment is obsolete: true
Attachment #500009 - Flags: review?(VYV03354)
(Assignee)

Comment 39

7 years ago
Created attachment 500010 [details] [diff] [review]
Tests updated
Attachment #466602 - Attachment is obsolete: true
Attachment #500009 - Flags: review?(VYV03354) → review+
(Assignee)

Updated

7 years ago
Attachment #500009 - Flags: approval2.0?

Comment 40

7 years ago
Comment on attachment 500009 [details] [diff] [review]
Flag utf-7 as XSS-vulnerable v.2

I believe this has missed the boat and we shouldn't introduce any new risk at this point. This should wait for the next 3-month cycle.
Attachment #500009 - Flags: approval2.0? → approval2.0-
(Assignee)

Updated

7 years ago
Depends on: 610267
Keywords: checkin-needed
Pushed:
  http://hg.mozilla.org/projects/cedar/rev/6a21a25f2400
  http://hg.mozilla.org/projects/cedar/rev/4d103ea8178d
Flags: in-testsuite+
Whiteboard: [sg:want P3] → [sg:want P3][fixed-in-cedar]
Target Milestone: --- → mozilla2.2

Updated

7 years ago
Keywords: checkin-needed
Version: unspecified → Trunk

Comment 42

7 years ago
http://hg.mozilla.org/mozilla-central/rev/6a21a25f2400
http://hg.mozilla.org/mozilla-central/rev/4d103ea8178d
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Whiteboard: [sg:want P3][fixed-in-cedar] → [sg:want P3]
I've added dev-doc-needed as at least that page must be upgraded:
https://developer.mozilla.org/en/Character_Sets_Supported_by_Gecko
Keywords: dev-doc-needed
Documented:

https://developer.mozilla.org/en/Character_Sets_Supported_by_Gecko

Also mentioned on Firefox 5 for developers; and while I was at it, I overhauled that page a bit and added some links to it. It's also tagged so it will show up when people go looking for things to do.
Keywords: dev-doc-needed → dev-doc-complete
(Reporter)

Updated

7 years ago
blocking1.9.1: needed → ---
blocking1.9.2: needed → ---
status1.9.1: wanted → wontfix
status1.9.2: wanted → wontfix
(Reporter)

Updated

7 years ago
status-firefox5: --- → fixed
You need to log in before you can comment on or make changes to this bug.