Closed Bug 1287071 Opened 8 years ago Closed 8 years ago

DOMParser.parseFromString does not honor XML encoding

Categories

(Core :: XML, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: yan12125, Unassigned)

Details

I have a XML string encoded in big5:

var data = atob('PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iYmlnNSIgPz48dGl0bGU+pKSk5TwvdGl0bGU+')

(<?xml version="1.0" encoding="big5" ?><title>中文</title> in UTF-8.)

DOMParser does not give a correct answer:
(new DOMParser()).parseFromString(data), 'text/xml').firstChild.textContent

In dom/base/DOMParser.cpp I see DOMParser::ParseFromString() has hard-coded encoding UTF-8. Is it intentional by W3C standards?
Boris, your thoughts?
Flags: needinfo?(bzbarsky)
The spec is at http://domparsing.spec.whatwg.org/#the-domparser-interface and the input to it is a sequence of UTF-16 codepoints, kinda (DOMString), not a sequence of bytes (which would be ByteString).

So the observed behavior is correct per spec.
Status: UNCONFIRMED → RESOLVED
Closed: 8 years ago
Flags: needinfo?(bzbarsky)
Resolution: --- → INVALID
Thanks. I was not sure whether the standard or Firefox needs modification. Now it's clear :)
You need to log in before you can comment on or make changes to this bug.