Closed Bug 69799 Opened 24 years ago Closed 7 years ago

External entities are not included in XML document

Categories

(Core :: XML, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Future

People

(Reporter: joden, Assigned: peterv)

References

()

Details

Attachments

(1 file, 1 obsolete file)

When rendering an XML document (whether with or without a style sheet) that
references external entities as in:

    <!ENTITY blah SYSTEM "blah.ent">

when the browser comes accross the entity tag:

    &blah;

it will not include the external entity in the rendered text.  For instance if
you had the XML document:

   <?xml version="1.0"?>
   <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
                      "dtd/docbook/docbookx.dtd" [
   <!ENTITY blah SYSTEM "blah.ent">
   ]>

   <article>
       <sect1>
            <para>
             Some text in a paragraph.
            </para>
            &blah;
       </sect1>
   </article>

and the file blan.ent contains:

    <para>
    Blah, blah blah blah...blah blah!
    </para>

Then Mozilla renders:

   Some text in a paragraph.

Instead of:

   Some text ina paragraph. Blah, blah blah blah...blah blah!
This is a known issue. Mozilla does not load external DTDs (or fragments of
them, like entities). The exception are the DTDs for the user interface, in
chrome dir.

Moving to Future.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Target Milestone: --- → Future
I propose that this bug duplicates bug 22942.
You are correct, thanks.

*** This bug has been marked as a duplicate of 22942 ***
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → DUPLICATE
I understand this bug to be about external entities and not external dtds.
The best reason that this is not a dupe of bug 22942 is that this doesn't work
from file :-(.
If you want, I can attach a testcase. I came across this while testing the 
docbook xslt stylesheet, which use external entities for l10n. Anyway, the
external entity references stay in the internalSubset of the doctype, but don't
get loaded or substituted. Even for file:// urls.
Status: RESOLVED → REOPENED
OS: Windows NT → All
Hardware: PC → All
Resolution: DUPLICATE → ---
Axel, we use the same mechanism for loading external DTDs/entities. There is
special code that for file URLs we look for them under 'dtd' folder from the
directory of the file that we are loading.
I changed
Index: nsExpatTokenizer.cpp
===================================================================
RCS file: /cvsroot/mozilla/htmlparser/src/nsExpatTokenizer.cpp,v
retrieving revision 1.92
diff -u -r1.92 nsExpatTokenizer.cpp
--- nsExpatTokenizer.cpp        2001/11/07 04:12:02     1.92
+++ nsExpatTokenizer.cpp        2001/11/21 16:31:06
@@ -838,7 +838,11 @@
         }
       }
     }
-  }  
+  }
+
+  if (!isLoadable) {
+    res = (*aDTD)->SchemeIs("file", &isLoadable);
+  }
 
   return isLoadable;
 }

and get a 
XML_ERROR_EXTERNAL_ENTITY_HANDLING
in return while loading a xml with external entities. (I've set the URL to the
file I'm testing)
Seems like this bug is different to bug 22942 after all. I glanced into expat,
and that indicates some effort to be done when we want this. And my mind isn't
set to effort tonight, so I just tell you how far I got without any effort ;-)
Oh, wow.  I was wondering what was wrong with my code, why I couldn't get an
external entity to work in my XHTML document.

Of course, if we ever get around to doing validation by DTD's, this is going to
hurt us severely for XHTML 1.1.
*** Bug 130339 has been marked as a duplicate of this bug. ***
Related to Bug 44458, Mozilla understands XHTML character entities if FPI for
XHTML 1.0 Strict/Transitional/Frameset, 1.1, Basic or XHTML 1.1 plus MathML 2.0
is specified, but otherwise it reports an XML parsing error for undefined entity
even if external DTD subset is present.  Try some test documents linked from:
http://www.w3.org/People/mimasa/test/xhtml/entities/#xhtml-family

This behavior is against "Well-formedness constraint: Entity Declared" of XML 1.0.
See: http://www.w3.org/TR/REC-xml#wf-entdeclared
Could you explain why it is against the wf rules? That part of the XML spec you
pointed to says that non-validating parsers are _not obligated to_ read external
entities. If we read them in some cases but don't read them in others shouldn't
matter according to the spec.

Also XHTML spec itself complicates things by suggesting that we should not flag
unknown entities as errors, but show them as &foo; which I think is a big mistake.
The last sentence of "Well-formedness constraint: Entity Declared" says
(emphasis added by me):

for such documents, the rule that an entity must be declared is a
well-formedness constraint *only if standalone='yes'*.

"2.9 Standalone Document Declaration" of XML 1.0 says "If there are external
markup declarations but there is no standalone document declaration, the value
"no" is assumed."
 cf. http://www.w3.org/TR/REC-xml#sec-rmd

Thus, when entities are declared in the external subset through the DOCTYPE
declaration, a non-validating processor is not obligated to read and process
their declarations but MUST NOT report well-formedness   
errors against those entities.

In addition, in the XHTML+MathML+SVG sample, the standalone document declaration
is explicitly declared as "no", yet Mozilla reports a WF error.  It is
acceptable that Mozilla doesn't understand entities in that case, but it is
totally unacceptable to report a WF error and stop normal processing against a
valid document.
  cf. http://www.w3.org/People/mimasa/test/xhtml/entities/entities-math-svg.xhtml
I disagree. If the parser can not resolve an entity reference, it must be an
error. How should the parser replace an entity it cannot understand?
I believe the rule in XML 1.0 is clear on this, but if you are still not sure,
read the rationale behind this described in "Reports from the W3C SGML ERB to
the SGML WG and from the W3C XML ERB to the XML SIG" at:
http://www.w3.org/XML/9712-reports#ID52

For your convenience, I'll copy relevant parts here.

<blockquote cite="http://www.w3.org/XML/9712-reports#ID52">
    S.40 Should Entity Declared be a VC or a WFC?

Decision: In a standalone document (one without a DTD, one with only an internal
subset and no references to external parameter entities, or one with
standalone='yes'"), this constraint should be treated as a WFC: i.e. it must be
checked by all conforming processors. In a document with a DTD and
"standalone='no'", it should be treated as a VC.

Unanimous (MMal and EM abstaining).

Rationale: it cannot be a WFC without serious injury to the notion of Draconian
error handling. As the current draft (97-11-17) makes explicit, a non-validating
processor cannot be expected to know whether an entity declaration for an entity
being referred to does or does not occur in some external parameter entity or
external DTD subset. But if the constraint is a well-formedness constraint, even
a non-validating processor should catch the error. So for "standalone='no'", it
should be a VC -- a constraint enforceable only if one reads the entire DTD.
</blockquote>
*** Bug 150728 has been marked as a duplicate of this bug. ***
*** Bug 151370 has been marked as a duplicate of this bug. ***
*** Bug 178945 has been marked as a duplicate of this bug. ***
I apologize for asking this question here, but...

How does a Buzilla user determine the release in which a bug will be or has been
fixed?  I am interested in the fix for this problem.
QA Contact: petersen → rakeshmishra
There are several related issues here:

   If the document is standalone="yes", we should be reading the entire internal
   subset, not stopping as we do in standalone="no" mode. This is bug 129392.

   If the document is standalone="no", we shouldn't be reporting undeclared
   entities as a wellformedness error, as we do (and should) in standalone="yes"
   mode. This is bug 204102.

   In either case, we should be reading external entities in order to do basic
   stuff like attribute defaulting, recognising ID attributes, and entity
   expansion. This is bug 22942.

This, therefore, appears to be a subset of bug 22942. Marking dependency.
Depends on: entities
Thanks for updating this. In the meantime, while waiting for a fix, is there
anyway to workaround this bug?
You could stick the entire list of entities in the internal subset I guess...
(Maybe using server side includes to make life easier.)
It seems to me that this bug is considered as an infrequent problem. 
However, I want to raise the attention on several use cases that
frequently happen in a professional environment:
-DocBook document foster the use of external entities to build
a book. External entities allow several persons to
work on the same document at the same time.
-web server applications can be built in the following way: an xml
page converted in html using xsl to display the main layout, and an
external entity loading the core data to be modified using jsp. 

In both cases mozilla cannot display correctly the pages whereas this
is a standard compliant implementation of web applications frequently
built in an enterprise. 
QA Contact: rakeshmishra → ashishbhatt
>However, I want to raise the attention on several use cases that
>frequently happen in a professional environment:

And I can add: This bug blocks creation of simple localisable remote XUL
application in similar way as this can be for chrome XUL (all strings are in
external file as external entities).
re #22, localisation depends on the chrome protocol inserting the locale.
So this is not gonna enable you to get localisation out of the box.
Of course it is a prerequisite for solving that bug in the first place.
re23: Should be. E.g. in http://domain/file.xul you have your XUL application
pointig to the http://doamuin/file.php as DTD with external entities and there
you offer to the client appropriate locale determined by some cookie or Accept
language in the HTTP request header by simple PHP script.

The possibility to have all strings in external files simplified localization
very much.
Blocks: 138460
Here's an example that doesn't include the external entity in the document tree:

Open this file:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE notebook [
	<!ELEMENT notebook (page|span)*>
	<!ELEMENT page (#PCDATA)>
	<!ELEMENT span  (#PCDATA)>
	<!ENTITY stuff SYSTEM "s1p1.xml">
]>
<notebook>
<span>in the beginning</span>
&stuff;
</notebook>



The external entity s1p1.xml contains three lines:
<page>
this is page one
</page>
there is also another interesting fact when the xhtml 1.1 dt-definition is
extended...

i tried the following

index.xml:
--
<?xml version="1.0" encoding="iso-8859-1" ?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" [
<!ENTITY dasmenu SYSTEM "menux.xml">
<!ENTITY ichglaubsjawohlnicht "bla">
]>

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<title>Einbinden von Menus in xhtml &ichglaubsjawohlnicht;</title>

</head>

<body>

<div id="menu">&dasmenu;</div>

<div id="content">CONTENT &ichglaubsjawohlnicht;</div>

</body>

</html>
--

and menux.xml:
--
<?xml version="1.0" encoding="iso-8859-1" ?>

<p>test mit <b>bold</b> zwischendurch :D</p>
--

it's interesting, that mozilla gave (when i deleted the xmlns= attribute in
<html> it gave the source) the following lines out

--
<html>
    <head>
<title>Einbinden von Menus in xhtml bla</title>
</head>
    <body>
    <div id="menu">
    <!--
 
  * The contents of this file are subject to the Mozilla Public
  * License Version 1.1 (the "License"); you may not use this file
  * except in compliance with the License. You may obtain a copy of
  * the License at http://www.mozilla.org/MPL/
  *   * Software distributed under the License is distributed on an "AS
  * IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
  * implied. See the License for the specific language governing
  * rights and limitations under the License.
  *   * The Original Code is mozilla.org code.
  *   * The Initial Developer of the Original Code is Netscape
  * Communications Corporation.  Portions created by Netscape are   * Copyright
(C) 2000 Netscape Communications Corporation.  All
  * Rights Reserved.
  *   * Contributor(s):
  -->
    <!--

  * Predefined HTML entities to be loaded when parsing XHTML documents.
  * The contents match mozilla/htmlparser/src/nsHTMLEntityList.h,
  * except that Navigator entity extensions are not included.
 
-->
<!-- ISO 8859-1 entities -->
<!-- Mathematical symbols and Greek letters -->
    <!--
 Markup-significant and internationalization characters -->
</div>
<div id="content">CONTENT bla</div>
</body>
</html> 
--

seems, as if it simply copied the comments from the DTD into the XML document?
*** Bug 224739 has been marked as a duplicate of this bug. ***
*** Bug 239607 has been marked as a duplicate of this bug. ***
With the emergence of XML this bug is becoming more important but seems to have
been forgotten. Work needs to be done on this bug.

WORKS:

test.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE test[
 <!ELEMENT test (#PCDATA)>
 <!ENTITY entity "hello">
]>
<test>&entity;</test>

DOES NOT WORK (unknown entity):

test.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE test SYSTEM "test.dtd">
<test>&entity;</test>

test.dtd:

<!ELEMENT test (#PCDATA)>
<!ENTITY entity "hello">

ALSO DOES NOT WORK:

test.xml:

<!DOCTYPE test[
 <!ELEMENT test (#PCDATA)>
 <!ENTITY entity SYSTEM "test.dtd">
]>
<test>&entity;</test>

test.dtd:

<!ENTITY entity "hello">
I would mention that http://bugzilla.mozilla.org/show_bug.cgi?id=219958
is related to the same problem of XML/XSL full support.
Flags: blocking1.8a3?
It's going to be extremely difficult to hold any Mozilla milestone for this bug 
if there are no patches for it...
Flags: blocking1.8a3? → blocking1.8a3-
*** Bug 261516 has been marked as a duplicate of this bug. ***
*** Bug 300389 has been marked as a duplicate of this bug. ***
Suggestion: alter function IsLoadableDTD() of
http://lxr.mozilla.org/seamonkey/source/parser/htmlparser/src/nsExpatDriver.cpp
(I believe this function is the only culprit)

Before attempting to load from the read-only directory <mozilla bin>/res/dtd,
attempt to load from the user-writable directory <user's profile>/res/dtd. This
will allow extensions to download DTDs if necessary. It's not a perfect solution
but it's acceptable, isn't it ?

*** Bug 303664 has been marked as a duplicate of this bug. ***
Is there anyone working on this ? I've been trying to contact Heikki Toivonen by
mail and irc (#developers) without success.
Peter, can you take a look at this, including the patch?
Assignee: hjtoi-bugzilla → peterv
Status: REOPENED → NEW
Append calls can fail, if they fail, then they will probably leave you w/ the wrong file reference which could easily exist and yet will not give you the right behavior. Ensure they succeed.
Attachment #188977 - Attachment is obsolete: true
Attachment #208187 - Flags: review-
Fwiw, timeless's patch looks good to me.

Just wondering: shouldn't the user directory be checked first ?
How can I use xml fragment and Stylesheet to generate html fragment?

I have the same problem in my application. I am trying to load xml fragment and apply the Stylesheet for transformToFragment. 
Is there a workarond for this?

Is this bug is going to fix in the near future? 
timeless: it looks like when you created the patch you marked it r- at the same time; is this correct?
For what it's worth: IE doesn't do much better than Firefox. Still, this should be fixed in both of them. If a DTD doesn't have a public ID, it should be cached in user dir, and loaded from there until it expires (i don't know how that's done for other files, but could be the same approach).
Just thought this ought to be mentioned, I'm writing my thesis in docbook and this bug keeps me from structuring it in different files.
Is there any way to include data from one xml file to another without using web serve? I'm trying to write xml files to display experiment results and I would like to add notes to the results in another file(so that no one changes the experiment results files) and also display the notes below the experiment results.
This bug has been around for 9 years. Could somebody give a try on the attached patch.

thanks
QA Contact: ashshbhatt → xml
I think this should be WONTFIXed since this can't be implemented without bug 22942, but since this bug has an assignee, not taking the liberty to actually mark as WONTFIX myself.
Status: NEW → RESOLVED
Closed: 23 years ago7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: