Closed Bug 240717 Opened 21 years ago Closed 21 years ago

DOMParser.parseFromString() confused by character encodings

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla1.8alpha1

People

(Reporter: matthew, Assigned: bzbarsky)

Details

Attachments

(2 files)

Test case 21 years ago Matthew Wilson 385 bytes, text/html		Details
Say like this 21 years ago Boris Zbarsky [:bzbarsky] 2.28 KB, patch	jst : review+ jst : superreview+	Details \| Diff \| Splinter Review

Matthew Wilson

Reporter

Description

•

21 years ago

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040206 Firefox/0.8 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040206 Firefox/0.8 DOMParser.parseFromString() seems not to take into account the character encoding present in the XML declaration. Reproducible: Always Steps to Reproduce: 1. Load the testcase (to be attached). 2. Select the "test" link. 3. The first 'window.alert' shows a Javascript string which has been constructed to be (the serialization of) an XML document containing the 'squared' character (U00B2 SUPERSCRIPT TWO in Unicode terms). The string contains an XML declaration which declares the encoding as ISO-8859-1. 4. After that, an XML document is created using new DOMParser().parseFromString(xml, "text/xml");, and there is a second window.alert showing the value of the text node in that document. Actual Results: The second window.alert shows that some mangling has gone on, presumably relating to the character encoding: the string shown appears to be abc[U00C2 LATIN CAPITAL LETTER WITH CIRCUMFLEX][U00B2 SUPERSCRIPT TWO]. Expected Results: The second window.alert should correctly show the text which was set as the text, ie abc[U00B2 SUPERSCRIPT TWO]. The behaviour is the same on a nightly build of Firefox less than one week old.

Matthew Wilson

Reporter

Comment 1

•

21 years ago

Attached file Test case — Details

Boris Zbarsky [:bzbarsky]

Assignee

Comment 2

•

21 years ago

> DOMParser.parseFromString() seems not to take into account the character > encoding present in the XML declaration. On the contrary, it does. When parsing. What you're doing is passing in Unicode data into the DOMParser. It converts this into UTF-8 bytes, then feeds them to the XML parser. But the XML parser sees the encoding decl and parses the bytes as ISO-8859-1. Hence the mangling. So either the conversion to bytes needs to scan the string for the XML decl first (ugh!) or the nsDOMParser::ParseFromStream method needs to do something with the "charset" arg it gets (like set it on the channel so that things don't break).

Boris Zbarsky [:bzbarsky]

Assignee

Comment 3

•

21 years ago

Attached patch Say like this — Details — Splinter Review

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Attachment #146284 - Flags: superreview?(jst)

Attachment #146284 - Flags: review?(jst)

Johnny Stenback (:jst)

Comment 4

•

21 years ago

Comment on attachment 146284 [details] [diff] [review] Say like this r+sr=jst

Attachment #146284 - Flags: superreview?(jst)

Attachment #146284 - Flags: superreview+

Attachment #146284 - Flags: review?(jst)

Attachment #146284 - Flags: review+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 5

•

21 years ago

Taking.

Assignee: hjtoi-bugzilla → bzbarsky

OS: Windows XP → All

Priority: -- → P3

Hardware: PC → All

Target Milestone: --- → mozilla1.8alpha

Boris Zbarsky [:bzbarsky]

Assignee

Comment 6

•

21 years ago

Checked in.

Status: NEW → RESOLVED

Closed: 21 years ago

Resolution: --- → FIXED

Doug Halamay

Comment 7

•

21 years ago

Hmmm...looks like a "return NS_OK;" is needed in setContentCharset: Right now, it's: NS_IMETHODIMP nsDOMParserChannel::SetContentCharset(const nsACString &aContentCharset) { mContentCharset = aContentCharset; } But it probably should read: NS_IMETHODIMP nsDOMParserChannel::SetContentCharset(const nsACString &aContentCharset) { mContentCharset = aContentCharset; return NS_OK; }

Boris Zbarsky [:bzbarsky]

Assignee

Comment 8

•

21 years ago

Doug, thanks for the heads-up, and you're right. Fix checked in.

Matthew Wilson

Reporter

Comment 9

•

21 years ago

Any chance of getting this fix ported onto the Aviary branch?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 10

•

21 years ago

I have no plans to port this change to any branches. If someone makes an aviary patch and convinces the aviary maintainers to take it, I don't plan to stop them (nor could I), though I would appreciate not getting too much bugspam in the process.

You need to log in before you can comment on or make changes to this bug.

Bugzilla

DOMParser.parseFromString() confused by character encodings

Categories

(Core :: XML, defect, P3)

Tracking

()

People

(Reporter: matthew, Assigned: bzbarsky)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Attachment

General

Description

File Name

Content Type