Open Bug 129392 Opened 23 years ago Updated 2 years ago

XML parser and external entities

Categories

(Core :: XML, defect, P3)

defect

Tracking

()

People

(Reporter: hjtoi-bugzilla, Unassigned)

References

Details

Attachments

(1 file)

Given a document:

<!DOCTYPE doc [
<!ENTITY % entities SYSTEM "entitiesedcopy.dtd" >
%entities;
<!ENTITY status "WORKING!">
]>
<doc>
    <p>We should have some text after this: &status;</p>
</doc>

Mozilla gives a parser error when it sees &status;. For some reason it has not
seen or handled the entity declaration properly. Moving %entities; in the
internal subset to a location after the general entity will make this work as
expected. So it looks like the external entity (which we of course are not
loading) puts our XML parser into a bad state.
*** Bug 130343 has been marked as a duplicate of this bug. ***
I don't actually think this is a bug, though I raised the duplicate :-)


The specification at http://www.w3.org/TR/2000/REC-xml-20001006#proc-types
states clearly that


"Non-validating processors are required to check only the document entity,
including the entire internal DTD subset, for well-formedness. [Definition:
While they are not required to check the document for validity, they are
required to process all the declarations they read in the internal DTD subset
and in any parameter entity that they read, up to the first reference to a
parameter entity that they do not read; that is to say, they must use the
information in those declarations to normalize attribute values, include the
replacement text of internal entities, and supply default attribute values.]
Except when standalone="yes", they must not process entity declarations or
attribute-list declarations encountered after a reference to a parameter entity
that is not read, since the entity may have contained overriding declarations."


Since Mozilla does not load external entity definitions (see bug #130339 and
duplicates) because it is not a validating XML parser, the last sentence in the
quote above states clearly that Mozilla MUST NOT PROCESS the last entity
declaration.  If Mozilla did expand your &status; without loading the dtd, then
*that* would, IMHO, be against the standard and a bug.


I suggest that we raise a bug for FUTURE that Mozilla should load external
entity declarations (which seems to be legal, if not required, even for
non-Validating parsers).  It would get my vote and save me some 30k bytes on
every application/xhtml+xml response from my new server...


Comments?
Thanks for pointint out that part of the spec. That "fixes" part of the
problem.

However, as this attachment shows we still have a bug when we have
standalone="yes" document. I undrestood the spec that in that case we MUST read
all of the internal subset, but this test shows we don't.
Priority: -- → P2
Summary: XML error reported although document well-formed → Internal subset not read completely in standalone="yes" if external entities
Target Milestone: --- → mozilla1.0.1
Target Milestone: mozilla1.0.1 → mozilla1.2alpha
Target Milestone: mozilla1.2alpha → mozilla1.2beta
Target Milestone: mozilla1.2beta → mozilla1.3alpha
QA Contact: petersen → rakeshmishra
Target Milestone: mozilla1.3alpha → mozilla1.3beta
Target Milestone: mozilla1.3beta → mozilla1.4alpha
Target Milestone: mozilla1.4alpha → mozilla1.4beta
Target Milestone: mozilla1.4beta → ---
QA Contact: rakeshmishra → ashishbhatt
Assignee: hjtoi-bugzilla → core.xml
Assignee: xml → nobody
QA Contact: ashshbhatt → xml
The attached testcase demonstrates a potential interoperability problem between Firefox and other browsers. Might be nice to address that if/when we switch to a Rust-based parser.
Priority: P2 → P3
Summary: Internal subset not read completely in standalone="yes" if external entities → XML parser and external entities
(In reply to Anne (:annevk) from comment #4)
> The attached testcase demonstrates a potential interoperability problem
> between Firefox and other browsers. Might be nice to address that if/when we
> switch to a Rust-based parser.

Wow - been a long time since we originally discussed this one! :)
Latest version of the spec I quoted 16 years ago is here: https://www.w3.org/TR/REC-xml/#proc-types
The relevant part of the spec has not changed:

"Except when standalone="yes", [Non-validating processors] must not process entity declarations or attribute-list declarations encountered after a reference to a parameter entity that is not read, since the entity may have contained overriding declarations; when standalone="yes", processors must process these declarations"
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: