Closed Bug 204102 Opened 17 years ago Closed 3 years ago

Should not report undeclared entities in standalone="no" documents

Categories

(Core :: XML, defect, major)

x86
Linux
defect
Not set
major

Tracking

()

RESOLVED WONTFIX

People

(Reporter: thomas, Assigned: peterv)

References

(Depends on 1 open bug, )

Details

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4a) Gecko/20030401
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4a) Gecko/20030401

Mozilla throws a fatal error the moment it hits an undefined entity in any XML
document. While the wording in the XML spec is **** on this issue, I believe
the correct interpretation is that you *should* allow undefined entities. Simply
 dont replace them with the string they represent. In fact, there is a major
usablity issue here because not only does a simple undefined entity prevent
viewling of an entire document, BUT you are always forcing the user into replacing
the entities within the XML document. It does happen often that someone
editing/viewing an XML document *doesnt want* the entities replaced. I know this
is the case when I edit XML documents myself since some entities may have the
same replacement string, and its impossible to see what will get inserted into
the document from which entity once the entitie replacement is actually done by
the parser.

Reproducible: Always

Steps to Reproduce:
1.load the URL given above
2.
3.

Actual Results:  
you get the message from the browser:
XML Parsing Error: undefined entity
Location:
ftp://xml.gsfc.nasa.gov/pub/adc/xml_archives/journals/ApJS/126/37/public/J_ApJS_126_37.xml
Line Number 32, Column 14:<email>cordes&commat;spacenet.tn.cornell.edu</email>
-------------^

Expected Results:  
You should have just put the entity in the document, e.g. <node>&entity;</node>
and/or allow the user to turn off entity replacement in XML documents for
viewing. This last choice not only solves the bug, but is a useability
enhancement for XML parsing.
The entity in question is (as far as I can tell) not defined in the DTD, making
the document invalid per the validity constraint in:

   http://www.w3.org/TR/REC-xml#dt-entref

If we were a validating parser, that would make this bug INVALID.

However, we are supposedly a non-validating parser, so in theory we are only
bound by wellformedness constraints, and the relevant one:

   http://www.w3.org/TR/REC-xml#wf-entdeclared

...specifically says:

# Note that if entities are declared in the external subset or in external 
# parameter entities, a non-validating processor is not obligated to read and 
# process their declarations; for such documents, the rule that an entity must 
# be declared is a well-formedness constraint only if standalone='yes'.

...and since the document says standalone="no", in theory, we shouldn't be
reporting this particular error.

Heikki?
Summary: Cant render document with unknown entities. This breaks XML spec, AND usability of mozilla → Should not report undeclared entities in standalone="no" documents
Hi again,

        After a closer reading of the spec (as per your comment below) I
        believe I am still right. The facts are the following:

        1. Mozilla is non-validating XML parser, so only well-formedness
            matters.

        2. The entity in question is external.

        3. The document is standalone = "no", so according to the spec
        snippet you provided, therefore all external entities DONT qualify
        for well-formedness constraint and should therefore NOT be
        checked.

        So why doesnt mozilla just pass over the external entities, simply
        inserting them as is into the parsed document? 

        The problem seems to me to be an inconsitency in approach here. On
        one hand you are saying that Mozilla is a non-validating parser. Fine.
        But then you are quibbling over whether or not the external entity
        could invalidate the document, and because you didnt parse it (on 
purpose),
        you wont let the parser pass a well-formedness test. The end result is
        a nasty catch-22 and that no xml document with external entities may be
        viewed in Mozilla. This is why I filed the bug (and still believe it is a
        bug).

                                                =b.t.
Right, that was my understanding too. If the document is standalone="yes", then
the error message is definitely correct, but if the document is standalone="no"
as in this case, then we should not be reporting it as an error.

Exactly what we _should_ do is still unclear to me. I see no justification in
the spec for treating the entity as literal text (what would appear in the
DOM?), but I also see no other text saying what _should_ happen.
> Exactly what we _should_ do is still unclear to me. I see no justification in > the spec for treating the entity as literal text (what would appear in the > DOM?), but I also see no other text saying what _should_ happen.  	Well I believe that simply leaving the entity unchanged, and in place is the correct  	way to go, e.g. nodes with entities in PCDATA, CDATA and attributes looks the 	same if they are external entities. Perhaps, maybe, you can throw a _warning_ that the 	document contains external entitites but that Mozilla is not currently able to read 	these and will therefore leave them in place in the document.  	Please, please change Mozilla. I cant believe we (at NASA) are the only ones in 	the world with external entities in our XML documents. Functionally Mozilla is  	crippled as it now stands with regards to XML documents containing external  	entities (as I keep repeating: there is essentially NO way to load these things into  	Mozilla right now).  					-b.t.   
Ugh. konquerer screwed up my comment.. my appologies, here it is again in a 
more readable 'format'.

> Exactly what we _should_ do is still unclear to me. I see no justification in
> the spec for treating the entity as literal text (what would appear in the 
> DOM?), but I also see no other text saying what _should_ happen.

 Well I believe that simply leaving the entity unchanged, and in place is the
 correct way to go, e.g. nodes with entities in PCDATA, CDATA and attributes
 looks the same if they are external entities. Perhaps, maybe, you can throw 
 a _warning_ that the document contains external entitites but that Mozilla 
 is not currently able to read these and will therefore leave them in place 
 in the document.

  Please, please change Mozilla. I cant believe we (at NASA) are the only 
  ones in the world with external entities in our XML documents. Functionally
  Mozilla is crippled as it now stands with regards to XML documents 
  containing external entities (as I keep repeating: there is essentially NO 
  way to load these things into Mozilla right now).

                                     -b.t.   
The issue of us not handling external entities is bug 22942.
> The issue of us not handling external entities is bug 22942.  	No, not really. As I read that bug, it describes loading external DTD's in general. 	In so far as it focuses on external entitity issue one comment has a salient point         for us here:  > About validation, being non-validating just says that we are not required to > load external DTDs, not that we must not.  	Which means you CANT avoid dealing with external entities, even if you are         a non-validating parser. Mozilla, IMO, shouldnt behave the way it does currently.          Solutions are several in nature:          - make Mozilla load external entities           - make Mozilla simply pass over the external entities, either inserting them as is           or dropping them (I dont like that possiblity). A warning *might* be given to the            user to let them know that Mozilla is aware that some external entity replacement           did not occur.          - have a default view of XML documents which are NOT rendered at all (e.g. IE6           approach) [I dont particularly like this one either]           Of these solutions, the last is the easiest to implement, the second is the most           correct, and the first is the best compremise between the two.   					=b.t.     
Argh!!!! Konquerer, I hate you!! (sorry, just feeling emotional there for a moment)

My last message all screwed up (again). Here it is in more readable text:

> The issue of us not handling external entities is bug 22942.


No, not really. As I read that bug, it describes the problems that arise
from not loading external DTD's in general. In so far as it focuses on 
external entitity issue one comment has a salient point for us here:

> About validation, being non-validating just says that we are not required to 
> load external DTDs, not that we must not.

     Which means you CANT avoid dealing with external entities, even if you 
are a non-validating parser that doesnt load external entities. 

 Solutions are several in nature:
          - make Mozilla load external entities
          - make Mozilla simply pass over the external entities, either
            inserting them as is or dropping them (I dont like that last 
            possiblity). A warning *might* be given to the user to let 
            them know that Mozilla is aware that some external entity
            replacement did not occur.

          - have a default view of XML documents which are NOT rendered at 
            all [I dont particularly like this one either]

           Of these solutions, the last is the easiest to implement, the 
           second is the most correct, and the first is the best compremise
           between the two.                                         


=b.t.     

(I watch heikki and harishd in lieu of component watching, so I'm removing my CC
for the time being)
Sorry, I meant bug 69799, which is specifically about external entities, and
depends on bug 22942, which is about loading external DTDs in the first place
(whether or not we validate the document using them, which would be bug 196355).

This bug, though, is still valid; even if we try to load external DTDs, we might
not always succeed, so we will always have to be able to deal with recognising
entities that are not declared in documents marked standalone="no".
Status: UNCONFIRMED → NEW
Ever confirmed: true
Blocks: 263150
This bug bit me again yesterday. (And several other people now that XHTML Print is a Recommendation). Three years and still not fixed? PLEASE don't declare unrecognised entities as a well-formedness error when standalone="no".
See also bug 35984, which caused this.
Assignee: xml → peterv
Status: NEW → ASSIGNED
Depends on: 346444
The specification states what can be done if such an entity is not found because of not checking the external file in Section 4.4.3 at http://www.w3.org/TR/REC-xml/#include-if-valid :

"When an XML processor recognizes a reference to a parsed entity, in order to validate the document, the processor MUST  include its replacement text. If the entity is external, and the processor is not attempting to validate the XML document, the processor MAY, but need not, include the entity's replacement text. If a non-validating processor does not include the replacement text, it MUST inform the application that it recognized, but did not read, the entity.

"This rule is based on the recognition that the automatic inclusion provided by the SGML and XML entity mechanism, primarily designed to support modularity in authoring, is not necessarily appropriate for other applications, in particular document browsing. Browsers, for example, when encountering an external parsed entity reference, might choose to provide a visual indication of the entity's presence and retrieve it for display only on demand."

So, to add somewhat to Brian Thomas' points, it seems what can be done to fix the problem is one of the following:
1) Replace the entities (Bug 22942)
2) Provide a visual indication of the entity's presence and retrieve it for display only on demand
3) Or at the very least, it would seem permissible per the spec, to just indicate the entity's presence (Opera does this by indicating the entity in source as though it had been manually indicated as an escaped ampersand + entity text)

There indeed should be no single point of failure here.
QA Contact: ashshbhatt → xml
Cross-posting on relevant bug pages:

For this bug, and a number of other associated bugs (Bug 204102, Bug 267350, Bug 22942, and to a lesser extent Bug 196355), I've started a pledge drive at http://pledgie.com/campaigns/7732 to try to hire a developer(s) who can work with the Mozilla devs (if they are ineligible themselves) to get these long-standing and niche but important-to-XML-users bugs fixed. Feel free to make a pledge to donate toward these fixes or, if you are a developer, make a bid in the comments there to offer to fix, in conjunction with Mozilla devs, this or any of the other aforementioned XML-related bugs/feature requests. 

(If we can get enough momentum, Bug 234485, Bug 98413, Bug 275196, and Bug 94270 might be also nice candidates to get addressed too, but I've started with the (single-point-of-failure-causing) DTD issues.)
No browser supports this feature and it's not worth the added complexity.
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.