Open Bug 501837 Opened 11 years ago Updated 8 months ago

Liberalize XML Names and VersionNum to reflect latest XML 1.0 edition (5th)

Categories

(Core :: XML, defect, minor)

defect
Not set
minor

Tracking

()

People

(Reporter: brettz9, Unassigned)

References

Details

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5
Build Identifier: 

The fifth edition of XML, at http://www.w3.org/TR/2008/REC-xml-20081126/ , incorporates errata for edition 4. Among these errata is the significant change to be more permissive for XML Names (i.e., retroactively allowing in XML 1.0 those XML names which XML 1.1 first liberalized): http://www.w3.org/XML/xml-V10-4e-errata#E09

Presently, XML parsing in Firefox does not allow XML names to use these now permitted characters in XML element or attribute names, etc.

This is probably the most important reason people wanted to use XML 1.1, so with the latest errata adjustment, it is no longer as necessary, assuming this aspect can be fixed.

(For the record, I personally don't have any compelling need for this, and I am aware that XML as implemented currently allows a full range of XML characters in the character data of an XML document, but I wonder whether this could expand the XML options for others, allowing XML to work as the fully universal semantic and exchange format it was designed to be, especially if it would only require change to a small portion of code where an XML Name was defined.)

Reproducible: Always

Steps to Reproduce:
1. Use a character like 'ϴ' (U+03F4) in an element name
2.
3.
Actual Results:  
Gives "XML Parsing Error: not well-formed"

Expected Results:  
Allow the element and document
Component: General → XML
Pkease consider this bug confirmed.  It was among 200+ bugs planted in my computer.
QA Contact: general → xml
Duplicate of this bug: 804116
After XML 1.1 flopped, the W3C changed the definition of 1.0 to fold some changes from XML 1.1 while pretending that the result is still XML 1.0. Of course, such changes are a really bad idea in a format that has Draconian error handling.

As far as I can tell, upstream expat doesn’t implement XML 1.0 5th edition. See http://blog.jclark.com/2008/10/xml-10-5th-edition.html for the thoughts of the developer of expat. 

I think we shouldn’t be in any hurry to change our copy of expat here. Files that Gecko rejects are also rejected by a decade of other software written according to pre-5th ed. XML 1.0, so anyone who is serious about XML interoperability cannot use 5th editition Names that are not also earlier edition Names.

I’m inclined to suggest WONTFIXing this on the grounds that the W3C made a big mistake by changing a Draconian format in an “edition” and if we ever make the XML parser in Gecko more permissive, we should just go ahead and go all the way to XML-ER (aka. XML5).
Status: UNCONFIRMED → NEW
Ever confirmed: true
Duplicate of this bug: 893664
It appears Chrome supports this. The other notable change is that <?xml version="1.x"?> with x being one or more code points in the range 0-9 is no longer an error.
Severity: normal → minor
OS: Windows Vista → All
Hardware: x86 → All
(In reply to Anne (:annevk) from comment #5)
> It appears Chrome supports this.

Test case? Link to their rational for supporting this?
data:text/xml,<?xml version="1.2"?><x/>

libxml implements the 5th edition and I suspect Chrome uses that library without much scrutiny.
Note that our current parser has bugs here, e.g. U+00B5 data:text/xml,<%C2%B5/> is not allowed per either the 4th or 5th edition.
Duplicate of this bug: 1262226
Summary: Liberalize XML Names to reflect latest XML 1.0 edition (5th) → Liberalize XML Names and VersionNum to reflect latest XML 1.0 edition (5th)
Duplicate of this bug: 500139
This has visible effects in the implementation of Document#createElementNS (and probably other places).  See https://github.com/web-platform-tests/wpt/pull/12202#issuecomment-411650590

This also appears (??) to have an impact on data-* attributes; for example document.body.dataset["\uAB57"] (which matches the Name production in XML 5th but not 4th) fails in Firefox (but not in Chrome, and should be allowed per the spec).

This will affect anyone trying to create dataset properties in languages which are encoded in Unicode ranges outside those allowed by XML fourth ed.

You need to log in before you can comment on or make changes to this bug.