Closed Bug 88338 Opened 23 years ago Closed 11 years ago

NEL character not accepted by XML parser as newline

Categories

(Core :: XML, defect)

defect
Not set
normal

Tracking

()

RESOLVED INVALID
Future

People

(Reporter: bugmail, Assigned: hjtoi-bugzilla)

References

()

Details

(Keywords: testcase)

Attachments

(1 file, 1 obsolete file)

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.0; Mac_PowerPC)
BuildID:    2001060712

Mozilla is not properly handling Unicode UTF-8 format XML files which use Unicode-style 
line endings.

Reproducible: Always
Steps to Reproduce:
1. Access the testcase URL

Actual Results:  Mozilla's XML parser returns the following error:

XML Parsing Error: not well-formed
Location: http://greg.tcp.com/mozilla//Unicode/Unicode%20Line%20Endings/
testcase.xml
Line Number 1, Column 39:
<?xml version="1.0" encoding="utf-8"?><foo>bar</foo>
--------------------------------------^

Expected Results:  The line endings should have been properly interpreted and no error 
displayed; the XML file should have been handled as normal.

HTML files in the same format also display some quirks; in the same directory is a 
testcase.html file for reference. Extra space is displayed above the text, "foo", which is 
probably attributable to misinterpretation of the Unicode line endings.
Further research indicates that the problem is that the Unicode 3.0 newline
character is omitted from the list of line-ending and whitespace characters in
XML 1.0. (See "http://www.w3.org/TR/newline").

Perhaps it would be wise to perform and transform of NEL characters to LF line
endings as discussed in the aforementioned TR prior to transferring XML
documents to the XML parser.
Summary: Unicode UTF-8 XML files with Unicode line endings are rejected by Mozilla's XML parser as being "not well formed". → Unicode UTF-8 XML files with Unicode line endings are rejected by Mozilla's XML parser as being "not well formed".
What we are doing cuttently is correct behaviour, i.e. since it is not listed in
XML 1.0 it is illegal in XML 1.0. MS IE 5 also reports it as an error. This has
been discussed extensively in XML-DEV, and I believe the W3C Note you gave URL
to is the result. But that is just a Note. It will most likely be addressed in
the next XML revision/version.

As the Note says, the NEL (x85) character is used as a newline marker in OS/390.
I don't believe Mozilla has been ported to that platform. If it was, I would
certainly go after this change. As it is now, you will only experience this
problem if you receive documents that were created/edited on that platform and
moved to other platforms. With this, and the upcoming XML revision in mind, I am
inclined to wait.

Confirming and moving to Future.

If you can demonstrate this affecting a large number of users/popular websites,
I will reconsider.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Mac System 9.x → All
Hardware: Macintosh → All
Summary: Unicode UTF-8 XML files with Unicode line endings are rejected by Mozilla's XML parser as being "not well formed". → NEL character not accepted by XML parser as newline
Target Milestone: --- → Future
Sounds right, Heikki. FYI, the latest BBEdit (6) on Mac OS also makes Unicode
line endings an option for the user.
Heikki, what's your opinion on Unicode NELs in HTML? Should I file a seperate
bug about that? (The problems being that the NEL LEs end up being displayed by
the browser resulting in unwanted whitespace, and that View Source shows the
source all on one line.)
Actually, it appears
"http://www.w3.org/TR/html4/struct/text.html#didx-white_space-1" is on point.
Strictly speaking, HTML doesn't recognize the Unicode NEL as whitespace, either.
Yes, file a separate bug. We have different code for parsing and building XML
and HTML documents (layout code is the same).
Keywords: testcase
Actually, I'd suggest you LATER this, Heikki.
LATER is deprecated, Future milestone is the same thing.
XML 1.0 Second Edition now treats this as a valid newline character right? So
this bug is now a spec violation, rather than a non-standard extension. :-)
No. It's a valid newline in XML 1.1 documents only.  AFAIK, we don't 
support XML 1.1 yet, although we probably should (I remember hwaara 
patching expat a few months ago to make sure we reject unknown 
versions).
Oh, was that a 1.1 change? ok. In that case this bug should be a 1.1 change too.
QA Contact: petersen → rakeshmishra
Attachment #40780 - Attachment is obsolete: true
QA Contact: rakeshmishra → ashishbhatt
Depends on: 233154
I suggest closing this as INVALID, because the requested change is a spec violation.

(A request for XML support would be a wider issue. Personally, I think Gecko should not add support for XML 1.1, because the new features are not needed for XML-based languages that are appropriate for use on the Web.)
QA Contact: ashshbhatt → xml
See comment 14.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: