Closed
Bug 1262226
Opened 9 years ago
Closed 9 years ago
XML Parsing Error: not well-formed
Categories
(Core :: XML, defect)
Core
XML
Tracking
()
RESOLVED
DUPLICATE
of bug 501837
People
(Reporter: achyuthkp94, Unassigned)
Details
(Whiteboard: btpp-backlog)
Attachments
(2 files)
User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36
Steps to reproduce:
I visited the web page : https://pages.lip6.fr/Jean-Francois.Perrot/XML-Int/Session1/ExInter/ExInter.xml
Actual results:
XML Parsing Error: not well-formed
Location: https://pages.lip6.fr/Jean-Francois.Perrot/XML-Int/Session1/ExInter/ExInter.xml
Line Number 13, Column 3:
Expected results:
A proper list of countries with their capital cities in their native languages.
Chrome and Safari is displaying it properly.
Reporter | ||
Comment 1•9 years ago
|
||
This is the expected result.
Reporter | ||
Updated•9 years ago
|
OS: Unspecified → All
Hardware: Unspecified → All
Updated•9 years ago
|
Version: 46 Branch → Trunk
Updated•9 years ago
|
Component: Untriaged → XML
Product: Firefox → Core
Updated•9 years ago
|
Whiteboard: btpp-backlog
Comment 2•9 years ago
|
||
It looks like that file might really be invalid. I tossed it at a few xml validators and they were unhappy, a few others thought it was fine. You'd probably want to run this by someone with more unicode experience just in case.
The line it's choking on is:
> <ኢትዮጵያ>አዲስ አበባ</ኢትዮጵያ>
Which I think in python dumps out to be:
> <\u12a2\u1275\u12ee\u1335\u12eb>\u12a0\u12f2\u1235 \u12a0\u1260\u1263</\u12a2\u1275\u12ee\u1335\u12eb>
In theory we could be messing up the conversion on our side when we go from UTF-8 -> UTF-16
Reporter | ||
Comment 3•9 years ago
|
||
Not sure if this helps, but if you see the attachments you'd notice that other browsers don't seem to be having a problem with it. So wouldn't this bug be browser specific?
I'm no expert, but are the validators that different between these browsers?
Comment 4•9 years ago
|
||
XML 1.0 had a strong name character restriction until 5th edition [1]. ኢ (U+12A2) had not been included in the allowed character list. XML 5th loosened up the restriction so that basically almost all characters (except PUAs and unpaired surrogates) are allowed [2].
Our XML parser is fairly old and it does not catch up XML 5th Ed.
[1] https://www.w3.org/TR/2006/REC-xml-20060816/#NT-Letter
[2] https://www.w3.org/TR/xml/#NT-NameStartChar
Comment 5•9 years ago
|
||
I found an existing bug.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•