Closed Bug 604317 Opened 12 years ago Closed 11 years ago
Remove support for UTF-32 per HTML5 spec
In the past I never saw the point of removing UTF-32, even though HTML5 recommends against supporting it. However, it seems to make more sense to remove it than spend time on fixing bugs. Kimura-san, does that make sense to you?
Sure. It will not harm Web compat because Internet Explorer have never been supported UTF-32 for HTML. MSXML supports UCS-4 (not ISO-10646-UCS-4, strangely). Also, I have never seen a real world example of UTF-32 encoded page.
Yay. Let's get rid of UTF-32. FWIW, Opera already got rid of UTF-32 support.
What about JSON?
I don't think this bug is high priority. Am I mistaken?
(In reply to comment #4) > What about JSON? The JSON RFC requires JSON content to be encoded in UTF-8, UTF-16 or UTF-32 but it doesn't require JSON consumers to support any of them. I can't think of any legitimate use case for sending JSON over the wire as UTF-32. AFAICT, UTF-32 is only mentioned for Unicode completeness. I'd be *extremely* surprised if a real site broke because of lack of UTF-32 decoding support for JSON. (In reply to comment #5) > I don't think this bug is high priority. Am I mistaken? I see two reasons for removing UTF-32 support: 1) It exposes attack surface for the class of attacks where an XSS attack is encoded in a non-ASCII superset encoding so that a server-side sanitizer that works in ASCII thinks that the data is safe but a browser that uses a non-ASCII superset decoder to decode the data turns the bytes into an executable script. This class of attacks is well-documented for UTF-7, and UTF-7 support is going away in bug 414064. There has been a UTF-16-based attack vector against IE: http://mail.apps.ietf.org/ietf/charsets/msg01846.html It seems reasonable to expect UTF-32 to at least pose a risk of providing a similar attack vector. Unfortunately, we can't remove UTF-16 support without breaking sites. Fortunately, we could remove UTF-32 support without breaking anything but maybe some test cases. 2) UTF-32 decoders and encoders aren't really providing any value to users or Web authors, since it doesn't make sense to use UTF-32 for interchange due to its ridiculous inefficiency in terms of the number of bytes used. When there's useless Web-exposed code sitting around, the code provides no benefit but can cause harm if there happens to be an exploitable bug in that code. There's also an opportunity cost to fixing such bugs and testing for the absence of such bugs. I can't say how this should translate into priorities, but it seems clear to me that it makes sense to remove UTF-32 support.
(In reply to comment #6) > > (In reply to comment #5) > > I don't think this bug is high priority. Am I mistaken? > > I see two reasons for removing UTF-32 support: > I agree that removing UTF-32 support would be ok, but this is an answer to a question I did not ask. > > I can't say how this should translate into priorities I asked about the priority of this bug. If doing this will fix critical security bugs, let's get to it. If not, I think we all have better things to do.
(In reply to comment #7) > > I asked about the priority of this bug. If doing this will fix critical > security bugs, let's get to it. If not, I think we all have better things to > do. IMHO reducing attack surface is enough justification!
Attachment #483849 - Flags: review?(VYV03354) → review+
Not enough reward for the risk, please wait until after we branch for FF5.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
dev-doc-needed as at least that page must be upgraded: https://developer.mozilla.org/en/Character_Sets_Supported_by_Gecko
Documented: https://developer.mozilla.org/en/Character_Sets_Supported_by_Gecko Also mentioned on Firefox 5 for developers; and while I was at it, I overhauled that page a bit and added some links to it. It's also tagged so it will show up when people go looking for things to do.
You need to log in before you can comment on or make changes to this bug.