Open Bug 560242 Opened 15 years ago Updated 3 years ago

Unknown encoding in XML declaration should be a fatal error

Tracking

()

Status:

NEW

People

(Reporter: ap, Unassigned)

Details

Attachments

(1 file)

test case (encoding="default") 15 years ago Alexey Proskuryakov 276 bytes, application/xhtml+xml		Details

Alexey Proskuryakov

Reporter

Description

•

15 years ago

User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-us) AppleWebKit/531.22.7 (KHTML, like Gecko) Version/4.0.5 Safari/531.22.7 Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; ru; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 From WebKit bug <https://bugs.webkit.org/show_bug.cgi?id=37629>. There is a difference between WebKit and Firefox in that we report a fatal error for something like <?xml version="1.0" encoding="default"?>, but Firefox seems to use utf-8 whenever it gets an unknown encoding name. My understanding is that detecting a fatal error is required, see section 4.3.3 of XML 1.0 spec: "it is a fatal error when an XML processor encounters an entity with an encoding that it is unable to process." Reproducible: Always Steps to Reproduce: There should be a parsing error reported when opening the attached test case.

Alexey Proskuryakov

Reporter

Comment 1

•

15 years ago

Attached file test case (encoding="default") — Details

Boris Zbarsky [:bzbarsky]

Comment 2

•

15 years ago

I'm guessing the issue is this code in ParserWriteFunc: 2848 if (pws->mParser->DetectMetaTag(buf, theNumRead, guess, guessSource) || 2849 ((count >= 4) && 2850 DetectByteOrderMark((const unsigned char*)buf, 2851 theNumRead, guess, guessSource))) { 2852 nsCOMPtr<nsICharsetAlias> alias(do_GetService(NS_CHARSETALIAS_CONTRACTID)); 2853 result = alias->GetPreferred(guess, preferred); 2854 // Only continue if it's a recognized charset and not 2855 // one of a designated set that we ignore. 2856 if (NS_SUCCEEDED(result) && 2857 ((kCharsetFromByteOrderMark == guessSource) || 2858 (!preferred.EqualsLiteral("UTF-16") && 2859 !preferred.EqualsLiteral("UTF-16BE") && 2860 !preferred.EqualsLiteral("UTF-16LE") && 2861 !preferred.EqualsLiteral("UTF-32") && 2862 !preferred.EqualsLiteral("UTF-32BE") && 2863 !preferred.EqualsLiteral("UTF-32LE")))) { etc. DetectByteOrderMark is what seems to deal with <?xml encoding="..."?> stuff. We probably need to make it a fatal error here in XML mode to get back an encoding we don't recognize, right?

Status: UNCONFIRMED → NEW

Ever confirmed: true

Henri Sivonen (:hsivonen)

Comment 3

•

15 years ago

(In reply to comment #2) > DetectByteOrderMark is what seems to deal with <?xml encoding="..."?> stuff. > We probably need to make it a fatal error here in XML mode to get back an > encoding we don't recognize, right? So it seems. (I wonder if we should move towards a model where Gecko provides decoders to expat and expat ingests bytes instead of nsParser performing some of the duties of the XML processor and performing them incorrectly.)

Anne (:annevk)

Updated

•

8 years ago

Priority: -- → P3

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Unknown encoding in XML declaration should be a fatal error

Categories

(Core :: XML, defect, P3)

Tracking

()

People

(Reporter: ap, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Updated

Updated

Attachment

General

Description

File Name

Content Type