Last Comment Bug 69799 - External entities are not included in XML document
: External entities are not included in XML document
Status: NEW
:
Product: Core
Classification: Components
Component: XML (show other bugs)
: Trunk
: All All
: -- normal with 54 votes (vote)
: Future
Assigned To: Peter Van der Beken [:peterv]
:
: Andrew Overholt [:overholt]
Mentors:
http://cvs.sourceforge.net/cgi-bin/vi...
: 130339 150728 151370 178945 224739 239607 261516 300389 303664 365214 (view as bug list)
Depends on: entities
Blocks: 138460
  Show dependency treegraph
 
Reported: 2001-02-22 08:06 PST by James Olin Oden
Modified: 2016-11-18 18:12 PST (History)
46 users (show)
asa: blocking1.8a3-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Proposition of a small modification to nsExpatDriver.cpp (1.78 KB, text/plain)
2005-07-11 14:56 PDT, David Teller [:Yoric] (please use "needinfo")
no flags Details
patch version of proposed modification (4.30 KB, patch)
2006-01-11 02:18 PST, timeless
timeless: review-
Details | Diff | Splinter Review

Description James Olin Oden 2001-02-22 08:06:07 PST
When rendering an XML document (whether with or without a style sheet) that
references external entities as in:

    <!ENTITY blah SYSTEM "blah.ent">

when the browser comes accross the entity tag:

    &blah;

it will not include the external entity in the rendered text.  For instance if
you had the XML document:

   <?xml version="1.0"?>
   <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
                      "dtd/docbook/docbookx.dtd" [
   <!ENTITY blah SYSTEM "blah.ent">
   ]>

   <article>
       <sect1>
            <para>
             Some text in a paragraph.
            </para>
            &blah;
       </sect1>
   </article>

and the file blan.ent contains:

    <para>
    Blah, blah blah blah...blah blah!
    </para>

Then Mozilla renders:

   Some text in a paragraph.

Instead of:

   Some text ina paragraph. Blah, blah blah blah...blah blah!
Comment 1 Heikki Toivonen (remove -bugzilla when emailing directly) 2001-02-22 09:00:14 PST
This is a known issue. Mozilla does not load external DTDs (or fragments of
them, like entities). The exception are the DTDs for the user interface, in
chrome dir.

Moving to Future.
Comment 2 Greg K. 2001-07-18 14:58:10 PDT
I propose that this bug duplicates bug 22942.
Comment 3 Heikki Toivonen (remove -bugzilla when emailing directly) 2001-07-18 15:31:39 PDT
You are correct, thanks.

*** This bug has been marked as a duplicate of 22942 ***
Comment 4 Axel Hecht 2001-11-13 11:23:57 PST
I understand this bug to be about external entities and not external dtds.
The best reason that this is not a dupe of bug 22942 is that this doesn't work
from file :-(.
If you want, I can attach a testcase. I came across this while testing the 
docbook xslt stylesheet, which use external entities for l10n. Anyway, the
external entity references stay in the internalSubset of the doctype, but don't
get loaded or substituted. Even for file:// urls.
Comment 5 Heikki Toivonen (remove -bugzilla when emailing directly) 2001-11-14 09:42:28 PST
Axel, we use the same mechanism for loading external DTDs/entities. There is
special code that for file URLs we look for them under 'dtd' folder from the
directory of the file that we are loading.
Comment 6 Axel Hecht 2001-11-21 08:44:14 PST
I changed
Index: nsExpatTokenizer.cpp
===================================================================
RCS file: /cvsroot/mozilla/htmlparser/src/nsExpatTokenizer.cpp,v
retrieving revision 1.92
diff -u -r1.92 nsExpatTokenizer.cpp
--- nsExpatTokenizer.cpp        2001/11/07 04:12:02     1.92
+++ nsExpatTokenizer.cpp        2001/11/21 16:31:06
@@ -838,7 +838,11 @@
         }
       }
     }
-  }  
+  }
+
+  if (!isLoadable) {
+    res = (*aDTD)->SchemeIs("file", &isLoadable);
+  }
 
   return isLoadable;
 }

and get a 
XML_ERROR_EXTERNAL_ENTITY_HANDLING
in return while loading a xml with external entities. (I've set the URL to the
file I'm testing)
Seems like this bug is different to bug 22942 after all. I glanced into expat,
and that indicates some effort to be done when we want this. And my mind isn't
set to effort tonight, so I just tell you how far I got without any effort ;-)
Comment 7 Alex Vincent [:WeirdAl] 2001-11-27 00:22:44 PST
Oh, wow.  I was wondering what was wrong with my code, why I couldn't get an
external entity to work in my XHTML document.

Of course, if we ever get around to doing validation by DTD's, this is going to
hurt us severely for XHTML 1.1.
Comment 8 Heikki Toivonen (remove -bugzilla when emailing directly) 2002-03-12 11:51:53 PST
*** Bug 130339 has been marked as a duplicate of this bug. ***
Comment 9 Masayasu Ishikawa 2002-05-17 08:21:16 PDT
Related to Bug 44458, Mozilla understands XHTML character entities if FPI for
XHTML 1.0 Strict/Transitional/Frameset, 1.1, Basic or XHTML 1.1 plus MathML 2.0
is specified, but otherwise it reports an XML parsing error for undefined entity
even if external DTD subset is present.  Try some test documents linked from:
http://www.w3.org/People/mimasa/test/xhtml/entities/#xhtml-family

This behavior is against "Well-formedness constraint: Entity Declared" of XML 1.0.
See: http://www.w3.org/TR/REC-xml#wf-entdeclared
Comment 10 Heikki Toivonen (remove -bugzilla when emailing directly) 2002-05-17 10:20:54 PDT
Could you explain why it is against the wf rules? That part of the XML spec you
pointed to says that non-validating parsers are _not obligated to_ read external
entities. If we read them in some cases but don't read them in others shouldn't
matter according to the spec.

Also XHTML spec itself complicates things by suggesting that we should not flag
unknown entities as errors, but show them as &foo; which I think is a big mistake.
Comment 11 Masayasu Ishikawa 2002-05-18 22:57:27 PDT
The last sentence of "Well-formedness constraint: Entity Declared" says
(emphasis added by me):

for such documents, the rule that an entity must be declared is a
well-formedness constraint *only if standalone='yes'*.

"2.9 Standalone Document Declaration" of XML 1.0 says "If there are external
markup declarations but there is no standalone document declaration, the value
"no" is assumed."
 cf. http://www.w3.org/TR/REC-xml#sec-rmd

Thus, when entities are declared in the external subset through the DOCTYPE
declaration, a non-validating processor is not obligated to read and process
their declarations but MUST NOT report well-formedness   
errors against those entities.

In addition, in the XHTML+MathML+SVG sample, the standalone document declaration
is explicitly declared as "no", yet Mozilla reports a WF error.  It is
acceptable that Mozilla doesn't understand entities in that case, but it is
totally unacceptable to report a WF error and stop normal processing against a
valid document.
  cf. http://www.w3.org/People/mimasa/test/xhtml/entities/entities-math-svg.xhtml
Comment 12 Heikki Toivonen (remove -bugzilla when emailing directly) 2002-05-20 11:32:58 PDT
I disagree. If the parser can not resolve an entity reference, it must be an
error. How should the parser replace an entity it cannot understand?
Comment 13 Masayasu Ishikawa 2002-05-20 20:15:03 PDT
I believe the rule in XML 1.0 is clear on this, but if you are still not sure,
read the rationale behind this described in "Reports from the W3C SGML ERB to
the SGML WG and from the W3C XML ERB to the XML SIG" at:
http://www.w3.org/XML/9712-reports#ID52

For your convenience, I'll copy relevant parts here.

<blockquote cite="http://www.w3.org/XML/9712-reports#ID52">
    S.40 Should Entity Declared be a VC or a WFC?

Decision: In a standalone document (one without a DTD, one with only an internal
subset and no references to external parameter entities, or one with
standalone='yes'"), this constraint should be treated as a WFC: i.e. it must be
checked by all conforming processors. In a document with a DTD and
"standalone='no'", it should be treated as a VC.

Unanimous (MMal and EM abstaining).

Rationale: it cannot be a WFC without serious injury to the notion of Draconian
error handling. As the current draft (97-11-17) makes explicit, a non-validating
processor cannot be expected to know whether an entity declaration for an entity
being referred to does or does not occur in some external parameter entity or
external DTD subset. But if the constraint is a well-formedness constraint, even
a non-validating processor should catch the error. So for "standalone='no'", it
should be a VC -- a constraint enforceable only if one reads the entire DTD.
</blockquote>
Comment 14 Heikki Toivonen (remove -bugzilla when emailing directly) 2002-06-13 15:39:43 PDT
*** Bug 150728 has been marked as a duplicate of this bug. ***
Comment 15 Heikki Toivonen (remove -bugzilla when emailing directly) 2002-06-13 15:47:58 PDT
*** Bug 151370 has been marked as a duplicate of this bug. ***
Comment 16 Boris Zbarsky [:bz] (still a bit busy) 2002-11-07 13:46:06 PST
*** Bug 178945 has been marked as a duplicate of this bug. ***
Comment 17 Jay Lockwood 2002-11-20 07:47:35 PST
I apologize for asking this question here, but...

How does a Buzilla user determine the release in which a bug will be or has been
fixed?  I am interested in the fix for this problem.
Comment 18 Hixie (not reading bugmail) 2003-05-02 09:41:53 PDT
There are several related issues here:

   If the document is standalone="yes", we should be reading the entire internal
   subset, not stopping as we do in standalone="no" mode. This is bug 129392.

   If the document is standalone="no", we shouldn't be reporting undeclared
   entities as a wellformedness error, as we do (and should) in standalone="yes"
   mode. This is bug 204102.

   In either case, we should be reading external entities in order to do basic
   stuff like attribute defaulting, recognising ID attributes, and entity
   expansion. This is bug 22942.

This, therefore, appears to be a subset of bug 22942. Marking dependency.
Comment 19 Trenton Lipscomb 2003-05-02 14:01:53 PDT
Thanks for updating this. In the meantime, while waiting for a fix, is there
anyway to workaround this bug?
Comment 20 Hixie (not reading bugmail) 2003-05-02 14:06:18 PDT
You could stick the entire list of entities in the internal subset I guess...
(Maybe using server side includes to make life easier.)
Comment 21 Arnaud Legout 2003-06-27 05:34:44 PDT
It seems to me that this bug is considered as an infrequent problem. 
However, I want to raise the attention on several use cases that
frequently happen in a professional environment:
-DocBook document foster the use of external entities to build
a book. External entities allow several persons to
work on the same document at the same time.
-web server applications can be built in the following way: an xml
page converted in html using xsl to display the main layout, and an
external entity loading the core data to be modified using jsp. 

In both cases mozilla cannot display correctly the pages whereas this
is a standard compliant implementation of web applications frequently
built in an enterprise. 
Comment 22 Met - Martin Hassman 2003-07-22 01:44:26 PDT
>However, I want to raise the attention on several use cases that
>frequently happen in a professional environment:

And I can add: This bug blocks creation of simple localisable remote XUL
application in similar way as this can be for chrome XUL (all strings are in
external file as external entities).
Comment 23 Axel Hecht 2003-07-22 01:55:04 PDT
re #22, localisation depends on the chrome protocol inserting the locale.
So this is not gonna enable you to get localisation out of the box.
Of course it is a prerequisite for solving that bug in the first place.
Comment 24 Met - Martin Hassman 2003-07-22 02:09:02 PDT
re23: Should be. E.g. in http://domain/file.xul you have your XUL application
pointig to the http://doamuin/file.php as DTD with external entities and there
you offer to the client appropriate locale determined by some cookie or Accept
language in the HTTP request header by simple PHP script.

The possibility to have all strings in external files simplified localization
very much.
Comment 25 Mark Goldstein 2003-10-24 12:26:23 PDT
Here's an example that doesn't include the external entity in the document tree:

Open this file:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE notebook [
	<!ELEMENT notebook (page|span)*>
	<!ELEMENT page (#PCDATA)>
	<!ELEMENT span  (#PCDATA)>
	<!ENTITY stuff SYSTEM "s1p1.xml">
]>
<notebook>
<span>in the beginning</span>
&stuff;
</notebook>



The external entity s1p1.xml contains three lines:
<page>
this is page one
</page>
Comment 26 bugzilla 2003-11-05 13:28:27 PST
there is also another interesting fact when the xhtml 1.1 dt-definition is
extended...

i tried the following

index.xml:
--
<?xml version="1.0" encoding="iso-8859-1" ?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" [
<!ENTITY dasmenu SYSTEM "menux.xml">
<!ENTITY ichglaubsjawohlnicht "bla">
]>

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<title>Einbinden von Menus in xhtml &ichglaubsjawohlnicht;</title>

</head>

<body>

<div id="menu">&dasmenu;</div>

<div id="content">CONTENT &ichglaubsjawohlnicht;</div>

</body>

</html>
--

and menux.xml:
--
<?xml version="1.0" encoding="iso-8859-1" ?>

<p>test mit <b>bold</b> zwischendurch :D</p>
--

it's interesting, that mozilla gave (when i deleted the xmlns= attribute in
<html> it gave the source) the following lines out

--
<html>
    <head>
<title>Einbinden von Menus in xhtml bla</title>
</head>
    <body>
    <div id="menu">
    <!--
 
  * The contents of this file are subject to the Mozilla Public
  * License Version 1.1 (the "License"); you may not use this file
  * except in compliance with the License. You may obtain a copy of
  * the License at http://www.mozilla.org/MPL/
  *   * Software distributed under the License is distributed on an "AS
  * IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
  * implied. See the License for the specific language governing
  * rights and limitations under the License.
  *   * The Original Code is mozilla.org code.
  *   * The Initial Developer of the Original Code is Netscape
  * Communications Corporation.  Portions created by Netscape are   * Copyright
(C) 2000 Netscape Communications Corporation.  All
  * Rights Reserved.
  *   * Contributor(s):
  -->
    <!--

  * Predefined HTML entities to be loaded when parsing XHTML documents.
  * The contents match mozilla/htmlparser/src/nsHTMLEntityList.h,
  * except that Navigator entity extensions are not included.
 
-->
<!-- ISO 8859-1 entities -->
<!-- Mathematical symbols and Greek letters -->
    <!--
 Markup-significant and internationalization characters -->
</div>
<div id="content">CONTENT bla</div>
</body>
</html> 
--

seems, as if it simply copied the comments from the DTD into the XML document?
Comment 27 Heikki Toivonen (remove -bugzilla when emailing directly) 2003-12-03 23:12:31 PST
*** Bug 224739 has been marked as a duplicate of this bug. ***
Comment 28 Pascal S. de Kloe 2004-04-04 14:02:08 PDT
*** Bug 239607 has been marked as a duplicate of this bug. ***
Comment 29 Colin Snover 2004-04-15 01:32:19 PDT
With the emergence of XML this bug is becoming more important but seems to have
been forgotten. Work needs to be done on this bug.

WORKS:

test.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE test[
 <!ELEMENT test (#PCDATA)>
 <!ENTITY entity "hello">
]>
<test>&entity;</test>

DOES NOT WORK (unknown entity):

test.xml:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE test SYSTEM "test.dtd">
<test>&entity;</test>

test.dtd:

<!ELEMENT test (#PCDATA)>
<!ENTITY entity "hello">

ALSO DOES NOT WORK:

test.xml:

<!DOCTYPE test[
 <!ELEMENT test (#PCDATA)>
 <!ENTITY entity SYSTEM "test.dtd">
]>
<test>&entity;</test>

test.dtd:

<!ENTITY entity "hello">
Comment 30 Arnaud Legout 2004-04-15 01:56:05 PDT
I would mention that http://bugzilla.mozilla.org/show_bug.cgi?id=219958
is related to the same problem of XML/XSL full support.
Comment 31 Alex Vincent [:WeirdAl] 2004-08-11 18:52:41 PDT
It's going to be extremely difficult to hold any Mozilla milestone for this bug 
if there are no patches for it...
Comment 32 Christian :Biesinger (don't email me, ping me on IRC) 2004-09-25 14:07:00 PDT
*** Bug 261516 has been marked as a duplicate of this bug. ***
Comment 33 Martijn Wargers [:mwargers] (not working for Mozilla) 2005-07-11 11:06:48 PDT
*** Bug 300389 has been marked as a duplicate of this bug. ***
Comment 34 David Teller [:Yoric] (please use "needinfo") 2005-07-11 14:54:09 PDT
Suggestion: alter function IsLoadableDTD() of
http://lxr.mozilla.org/seamonkey/source/parser/htmlparser/src/nsExpatDriver.cpp
(I believe this function is the only culprit)

Before attempting to load from the read-only directory <mozilla bin>/res/dtd,
attempt to load from the user-writable directory <user's profile>/res/dtd. This
will allow extensions to download DTDs if necessary. It's not a perfect solution
but it's acceptable, isn't it ?

Comment 35 David Teller [:Yoric] (please use "needinfo") 2005-07-11 14:56:20 PDT
Created attachment 188977 [details]
Proposition of a small modification to nsExpatDriver.cpp
Comment 36 Erik Fabert 2005-08-06 06:22:21 PDT
*** Bug 303664 has been marked as a duplicate of this bug. ***
Comment 37 David Teller [:Yoric] (please use "needinfo") 2005-09-18 06:47:55 PDT
Is there anyone working on this ? I've been trying to contact Heikki Toivonen by
mail and irc (#developers) without success.
Comment 38 Heikki Toivonen (remove -bugzilla when emailing directly) 2005-09-18 21:37:46 PDT
Peter, can you take a look at this, including the patch?
Comment 39 timeless 2006-01-11 02:18:00 PST
Created attachment 208187 [details] [diff] [review]
patch version of proposed modification

Append calls can fail, if they fail, then they will probably leave you w/ the wrong file reference which could easily exist and yet will not give you the right behavior. Ensure they succeed.
Comment 40 David Teller [:Yoric] (please use "needinfo") 2006-03-19 09:33:31 PST
Fwiw, timeless's patch looks good to me.

Just wondering: shouldn't the user directory be checked first ?
Comment 41 Praful 2006-07-12 12:57:23 PDT
How can I use xml fragment and Stylesheet to generate html fragment?

I have the same problem in my application. I am trying to load xml fragment and apply the Stylesheet for transformToFragment. 
Is there a workarond for this?

Is this bug is going to fix in the near future? 
Comment 42 Alex Vincent [:WeirdAl] 2006-07-12 13:05:56 PDT
timeless: it looks like when you created the patch you marked it r- at the same time; is this correct?
Comment 43 Phil Ringnalda (:philor) 2006-12-28 08:30:49 PST
*** Bug 365214 has been marked as a duplicate of this bug. ***
Comment 44 Reşat SABIQ (Reshat) 2007-07-15 09:33:01 PDT
For what it's worth: IE doesn't do much better than Firefox. Still, this should be fixed in both of them. If a DTD doesn't have a public ID, it should be cached in user dir, and loaded from there until it expires (i don't know how that's done for other files, but could be the same approach).
Comment 45 Dominic Fandrey 2009-05-08 05:40:46 PDT
Just thought this ought to be mentioned, I'm writing my thesis in docbook and this bug keeps me from structuring it in different files.
Comment 46 Huanlong Liu 2009-05-26 08:49:03 PDT
Is there any way to include data from one xml file to another without using web serve? I'm trying to write xml files to display experiment results and I would like to add notes to the results in another file(so that no one changes the experiment results files) and also display the notes below the experiment results.
Comment 47 firefox 2009-06-22 08:06:55 PDT
This bug has been around for 9 years. Could somebody give a try on the attached patch.

thanks
Comment 48 Henri Sivonen (:hsivonen) 2011-04-19 02:52:22 PDT
I think this should be WONTFIXed since this can't be implemented without bug 22942, but since this bug has an assignee, not taking the liberty to actually mark as WONTFIX myself.

Note You need to log in before you can comment on or make changes to this bug.