Open Bug 560242 Opened 15 years ago Updated 3 years ago

Unknown encoding in XML declaration should be a fatal error

Categories

(Core :: XML, defect, P3)

x86
macOS
defect

Tracking

()

People

(Reporter: ap, Unassigned)

Details

Attachments

(1 file)

User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; en-us) AppleWebKit/531.22.7 (KHTML, like Gecko) Version/4.0.5 Safari/531.22.7 Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; ru; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 From WebKit bug <https://bugs.webkit.org/show_bug.cgi?id=37629>. There is a difference between WebKit and Firefox in that we report a fatal error for something like <?xml version="1.0" encoding="default"?>, but Firefox seems to use utf-8 whenever it gets an unknown encoding name. My understanding is that detecting a fatal error is required, see section 4.3.3 of XML 1.0 spec: "it is a fatal error when an XML processor encounters an entity with an encoding that it is unable to process." Reproducible: Always Steps to Reproduce: There should be a parsing error reported when opening the attached test case.
I'm guessing the issue is this code in ParserWriteFunc: 2848 if (pws->mParser->DetectMetaTag(buf, theNumRead, guess, guessSource) || 2849 ((count >= 4) && 2850 DetectByteOrderMark((const unsigned char*)buf, 2851 theNumRead, guess, guessSource))) { 2852 nsCOMPtr<nsICharsetAlias> alias(do_GetService(NS_CHARSETALIAS_CONTRACTID)); 2853 result = alias->GetPreferred(guess, preferred); 2854 // Only continue if it's a recognized charset and not 2855 // one of a designated set that we ignore. 2856 if (NS_SUCCEEDED(result) && 2857 ((kCharsetFromByteOrderMark == guessSource) || 2858 (!preferred.EqualsLiteral("UTF-16") && 2859 !preferred.EqualsLiteral("UTF-16BE") && 2860 !preferred.EqualsLiteral("UTF-16LE") && 2861 !preferred.EqualsLiteral("UTF-32") && 2862 !preferred.EqualsLiteral("UTF-32BE") && 2863 !preferred.EqualsLiteral("UTF-32LE")))) { etc. DetectByteOrderMark is what seems to deal with <?xml encoding="..."?> stuff. We probably need to make it a fatal error here in XML mode to get back an encoding we don't recognize, right?
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to comment #2) > DetectByteOrderMark is what seems to deal with <?xml encoding="..."?> stuff. > We probably need to make it a fatal error here in XML mode to get back an > encoding we don't recognize, right? So it seems. (I wonder if we should move towards a model where Gecko provides decoders to expat and expat ingests bytes instead of nsParser performing some of the duties of the XML processor and performing them incorrectly.)
Priority: -- → P3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: