Closed
Bug 240717
Opened 21 years ago
Closed 21 years ago
DOMParser.parseFromString() confused by character encodings
Categories
(Core :: XML, defect, P3)
Core
XML
Tracking
()
RESOLVED
FIXED
mozilla1.8alpha1
People
(Reporter: matthew, Assigned: bzbarsky)
Details
Attachments
(2 files)
385 bytes,
text/html
|
Details | |
2.28 KB,
patch
|
jst
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040206 Firefox/0.8
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040206 Firefox/0.8
DOMParser.parseFromString() seems not to take into account the character
encoding present in the XML declaration.
Reproducible: Always
Steps to Reproduce:
1. Load the testcase (to be attached).
2. Select the "test" link.
3. The first 'window.alert' shows a Javascript string which has been constructed
to be (the serialization of) an XML document containing the 'squared' character
(U00B2 SUPERSCRIPT TWO in Unicode terms). The string contains an XML declaration
which declares the encoding as ISO-8859-1.
4. After that, an XML document is created using new
DOMParser().parseFromString(xml, "text/xml");, and there is a second
window.alert showing the value of the text node in that document.
Actual Results:
The second window.alert shows that some mangling has gone on, presumably
relating to the character encoding: the string shown appears to be abc[U00C2
LATIN CAPITAL LETTER WITH CIRCUMFLEX][U00B2 SUPERSCRIPT TWO].
Expected Results:
The second window.alert should correctly show the text which was set as the
text, ie abc[U00B2 SUPERSCRIPT TWO].
The behaviour is the same on a nightly build of Firefox less than one week old.
Reporter | ||
Comment 1•21 years ago
|
||
![]() |
Assignee | |
Comment 2•21 years ago
|
||
> DOMParser.parseFromString() seems not to take into account the character
> encoding present in the XML declaration.
On the contrary, it does. When parsing.
What you're doing is passing in Unicode data into the DOMParser. It converts
this into UTF-8 bytes, then feeds them to the XML parser. But the XML parser
sees the encoding decl and parses the bytes as ISO-8859-1. Hence the mangling.
So either the conversion to bytes needs to scan the string for the XML decl
first (ugh!) or the nsDOMParser::ParseFromStream method needs to do something
with the "charset" arg it gets (like set it on the channel so that things don't
break).
![]() |
Assignee | |
Comment 3•21 years ago
|
||
![]() |
Assignee | |
Updated•21 years ago
|
Attachment #146284 -
Flags: superreview?(jst)
Attachment #146284 -
Flags: review?(jst)
Comment 4•21 years ago
|
||
Comment on attachment 146284 [details] [diff] [review]
Say like this
r+sr=jst
Attachment #146284 -
Flags: superreview?(jst)
Attachment #146284 -
Flags: superreview+
Attachment #146284 -
Flags: review?(jst)
Attachment #146284 -
Flags: review+
![]() |
Assignee | |
Comment 5•21 years ago
|
||
Taking.
Assignee: hjtoi-bugzilla → bzbarsky
OS: Windows XP → All
Priority: -- → P3
Hardware: PC → All
Target Milestone: --- → mozilla1.8alpha
![]() |
Assignee | |
Comment 6•21 years ago
|
||
Checked in.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Comment 7•21 years ago
|
||
Hmmm...looks like a "return NS_OK;" is needed in setContentCharset:
Right now, it's:
NS_IMETHODIMP nsDOMParserChannel::SetContentCharset(const nsACString
&aContentCharset)
{
mContentCharset = aContentCharset;
}
But it probably should read:
NS_IMETHODIMP nsDOMParserChannel::SetContentCharset(const nsACString
&aContentCharset)
{
mContentCharset = aContentCharset;
return NS_OK;
}
![]() |
Assignee | |
Comment 8•21 years ago
|
||
Doug, thanks for the heads-up, and you're right. Fix checked in.
Reporter | ||
Comment 9•21 years ago
|
||
Any chance of getting this fix ported onto the Aviary branch?
![]() |
Assignee | |
Comment 10•21 years ago
|
||
I have no plans to port this change to any branches. If someone makes an aviary
patch and convinces the aviary maintainers to take it, I don't plan to stop them
(nor could I), though I would appreciate not getting too much bugspam in the
process.
You need to log in
before you can comment on or make changes to this bug.
Description
•