Closed Bug 1262226 Opened 9 years ago Closed 9 years ago

XML Parsing Error: not well-formed

Categories

(Core :: XML, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 501837

People

(Reporter: achyuthkp94, Unassigned)

Details

(Whiteboard: btpp-backlog)

Attachments

(2 files)

Attached image firefox.png
User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36 Steps to reproduce: I visited the web page : https://pages.lip6.fr/Jean-Francois.Perrot/XML-Int/Session1/ExInter/ExInter.xml Actual results: XML Parsing Error: not well-formed Location: https://pages.lip6.fr/Jean-Francois.Perrot/XML-Int/Session1/ExInter/ExInter.xml Line Number 13, Column 3: Expected results: A proper list of countries with their capital cities in their native languages. Chrome and Safari is displaying it properly.
Attached image Chrome.png
This is the expected result.
OS: Unspecified → All
Hardware: Unspecified → All
Version: 46 Branch → Trunk
Component: Untriaged → XML
Product: Firefox → Core
Whiteboard: btpp-backlog
It looks like that file might really be invalid. I tossed it at a few xml validators and they were unhappy, a few others thought it was fine. You'd probably want to run this by someone with more unicode experience just in case. The line it's choking on is: > <ኢትዮጵያ>አዲስ አበባ</ኢትዮጵያ> Which I think in python dumps out to be: > <\u12a2\u1275\u12ee\u1335\u12eb>\u12a0\u12f2\u1235 \u12a0\u1260\u1263</\u12a2\u1275\u12ee\u1335\u12eb> In theory we could be messing up the conversion on our side when we go from UTF-8 -> UTF-16
Not sure if this helps, but if you see the attachments you'd notice that other browsers don't seem to be having a problem with it. So wouldn't this bug be browser specific? I'm no expert, but are the validators that different between these browsers?
XML 1.0 had a strong name character restriction until 5th edition [1]. ኢ (U+12A2) had not been included in the allowed character list. XML 5th loosened up the restriction so that basically almost all characters (except PUAs and unpaired surrogates) are allowed [2]. Our XML parser is fairly old and it does not catch up XML 5th Ed. [1] https://www.w3.org/TR/2006/REC-xml-20060816/#NT-Letter [2] https://www.w3.org/TR/xml/#NT-NameStartChar
I found an existing bug.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: