Closed
Bug 88338
Opened 24 years ago
Closed 11 years ago
NEL character not accepted by XML parser as newline
Categories
(Core :: XML, defect)
Core
XML
Tracking
()
RESOLVED
INVALID
Future
People
(Reporter: bugmail, Assigned: hjtoi-bugzilla)
References
()
Details
(Keywords: testcase)
Attachments
(1 file, 1 obsolete file)
58 bytes,
application/xml
|
Details |
From Bugzilla Helper: User-Agent: Mozilla/4.0 (compatible; MSIE 5.0; Mac_PowerPC) BuildID: 2001060712 Mozilla is not properly handling Unicode UTF-8 format XML files which use Unicode-style line endings. Reproducible: Always Steps to Reproduce: 1. Access the testcase URL Actual Results: Mozilla's XML parser returns the following error: XML Parsing Error: not well-formed Location: http://greg.tcp.com/mozilla//Unicode/Unicode%20Line%20Endings/ testcase.xml Line Number 1, Column 39: <?xml version="1.0" encoding="utf-8"?><foo>bar</foo> --------------------------------------^ Expected Results: The line endings should have been properly interpreted and no error displayed; the XML file should have been handled as normal. HTML files in the same format also display some quirks; in the same directory is a testcase.html file for reference. Extra space is displayed above the text, "foo", which is probably attributable to misinterpretation of the Unicode line endings.
Further research indicates that the problem is that the Unicode 3.0 newline character is omitted from the list of line-ending and whitespace characters in XML 1.0. (See "http://www.w3.org/TR/newline"). Perhaps it would be wise to perform and transform of NEL characters to LF line endings as discussed in the aforementioned TR prior to transferring XML documents to the XML parser.
Summary: Unicode UTF-8 XML files with Unicode line endings are rejected by Mozilla's XML parser as being "not well formed". → Unicode UTF-8 XML files with Unicode line endings are rejected by Mozilla's XML parser as being "not well formed".
Assignee | ||
Comment 2•24 years ago
|
||
What we are doing cuttently is correct behaviour, i.e. since it is not listed in XML 1.0 it is illegal in XML 1.0. MS IE 5 also reports it as an error. This has been discussed extensively in XML-DEV, and I believe the W3C Note you gave URL to is the result. But that is just a Note. It will most likely be addressed in the next XML revision/version. As the Note says, the NEL (x85) character is used as a newline marker in OS/390. I don't believe Mozilla has been ported to that platform. If it was, I would certainly go after this change. As it is now, you will only experience this problem if you receive documents that were created/edited on that platform and moved to other platforms. With this, and the upcoming XML revision in mind, I am inclined to wait. Confirming and moving to Future. If you can demonstrate this affecting a large number of users/popular websites, I will reconsider.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Mac System 9.x → All
Hardware: Macintosh → All
Summary: Unicode UTF-8 XML files with Unicode line endings are rejected by Mozilla's XML parser as being "not well formed". → NEL character not accepted by XML parser as newline
Target Milestone: --- → Future
Sounds right, Heikki. FYI, the latest BBEdit (6) on Mac OS also makes Unicode line endings an option for the user.
Heikki, what's your opinion on Unicode NELs in HTML? Should I file a seperate bug about that? (The problems being that the NEL LEs end up being displayed by the browser resulting in unwanted whitespace, and that View Source shows the source all on one line.)
Actually, it appears "http://www.w3.org/TR/html4/struct/text.html#didx-white_space-1" is on point. Strictly speaking, HTML doesn't recognize the Unicode NEL as whitespace, either.
Assignee | ||
Comment 6•24 years ago
|
||
Yes, file a separate bug. We have different code for parsing and building XML and HTML documents (layout code is the same).
Assignee | ||
Comment 9•23 years ago
|
||
LATER is deprecated, Future milestone is the same thing.
Comment 10•22 years ago
|
||
XML 1.0 Second Edition now treats this as a valid newline character right? So this bug is now a spec violation, rather than a non-standard extension. :-)
Comment 11•22 years ago
|
||
No. It's a valid newline in XML 1.1 documents only. AFAIK, we don't support XML 1.1 yet, although we probably should (I remember hwaara patching expat a few months ago to make sure we reject unknown versions).
Comment 12•22 years ago
|
||
Oh, was that a 1.1 change? ok. In that case this bug should be a 1.1 change too.
Updated•22 years ago
|
QA Contact: petersen → rakeshmishra
Reporter | ||
Comment 13•22 years ago
|
||
Attachment #40780 -
Attachment is obsolete: true
Updated•22 years ago
|
QA Contact: rakeshmishra → ashishbhatt
Comment 14•19 years ago
|
||
I suggest closing this as INVALID, because the requested change is a spec violation. (A request for XML support would be a wider issue. Personally, I think Gecko should not add support for XML 1.1, because the new features are not needed for XML-based languages that are appropriate for use on the Web.)
Updated•15 years ago
|
QA Contact: ashshbhatt → xml
Comment 15•11 years ago
|
||
See comment 14.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INVALID
You need to log in
before you can comment on or make changes to this bug.
Description
•