Closed Bug 187602 Opened 22 years ago Closed 22 years ago

XML Parsing Error if <xml> tag is not at very beginning of xml file (doesn't allow whitespace before tag)

Categories

(Core :: XML, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

VERIFIED INVALID

People

(Reporter: will, Assigned: hjtoi-bugzilla)

References

Details

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.2.1) Gecko/20021130
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.2.1) Gecko/20021130

Any xml file with whitespace before the beginning <xml> tag will cause the
following parsing error in mozilla, and will not display the xml file at all.

XML Parsing Error: xml processing instruction not at start of external entity

This is not the standard behavior for other XML viewers, and many of the XML
files used by my company happen to have a space or a carriage return at the
beginning of the files.  Consequently, I have to use Internet Explorer to view
all of these XML files, when I would rather use your program.

Reproducible: Always

Steps to Reproduce:
1. Take a normal xml file that works in mozilla.
2. Insert a space or carriage return at the very beginning of the file.
3. Try to open that file again in mozilla.

Actual Results:  
The XML does not display, and I get an error message

XML Parsing Error: xml processing instruction not at start of external entity

Expected Results:  
Mozilla should ignore the whitespace and display the xml message properly.

n/a
The XML grammar does not allow whitespace before the '<?xml' decl.  See 
http://www.w3.org/TR/REC-xml#NT-document and
http://www.w3.org/TR/REC-xml#NT-prolog for the production definitions.  Note the
absense of a way to have the 'S' production before the '<?xml' production.

Any XML document that has whitespace before the '<?xml' is thus not well-formed
and should trigger a well-formedness error.  Any parser that does not trigger
such an error is buggy, imo.
Status: UNCONFIRMED → RESOLVED
Closed: 22 years ago
Resolution: --- → INVALID
Boris is correct, verified invalid.
Status: RESOLVED → VERIFIED
And if you want to know why this is important not to have whitespace before the
first tag, look at http://www.w3.org/TR/REC-xml#sec-guessing-no-ext-info :
second table (auto-detecting without a bytemark).
Thanks for the correction.  I've raised the issue with my company, and hopefully
they'll make changes to the legacy files to become more standards compliant.  
*** Bug 214569 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.