Closed
Bug 44458
Opened 25 years ago
Closed 23 years ago
HTML entities are not recognized in XHTML documents
Categories
(Core :: XML, defect, P2)
Core
XML
Tracking
()
VERIFIED
FIXED
mozilla0.9.9
People
(Reporter: rbs, Assigned: hjtoi-bugzilla)
References
Details
(4 keywords, Whiteboard: [Hixie-P4])
Attachments
(4 files, 3 obsolete files)
|
462 bytes,
application/xhtml+xml
|
Details | |
|
7.63 KB,
text/plain
|
Details | |
|
14.98 KB,
patch
|
rbs
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
|
1.12 KB,
patch
|
rbs
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
HTML includes a number of default entities (e.g., α β, etc).
Since the XHTML specification doesn't say that these entities have been
removed, it is expetect that they should be available in XHTML as well.
STEP TO REPRODUCE
=================
Load the attachement (to follow). It contains the following document:
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC
"-//W3C//DTD XHTML 1.0 Strict//EN"
"DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>XHTML Document with entities</title>
</head>
<body>
<p>
alpha: α <br/>
beta: β
</p>
</body>
</html>
EXPECTED RESULTS
================
The browser should display
alpha: [greek alpha here]
beta: [greek beta here]
ACTUAL RESULTS
==============
The browser is displaying
alpha: [nothing here]
beta: [nothing here]
ADDITIONAL DETAILS
==================
My tree is one week old, but looking at the check-ins and their
associated comments, I don't see a change that could have fixed this.
I am yet again having trouble attaching the test case.
If someone can do it please go ahead.
This bug is a severe problem when authoring non-English XHTML documents.
e.g., the French-accented letters (é è û à etc.)
are lost.
Comment 4•25 years ago
|
||
Yes, this bug is happening because we don't load up the entity sets defined in
the XHTML DTD referenced from the XHTML document. We need to cache the
three XHTML DTDs locally, look at the PUBLIC ID in the DOCTYPE declaration, map
the ID into a XHTML DTD, load that DTD, and pass it to expat so that the entity
declarations in the DTD become available to the XHTML document.
Marking nsbeta3 and adding Heikki to the cc list.
If the attachement is saved in the "bin/dtd" directory as "htmlEntities.dtd",
then no missing entities arise when passing it as the SYSTEM ID:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "htmlEntities.dtd">
But as nisheeth pointed out, the right way to go about this is to load up
the list based on the PUBLIC ID. Fortunately, we only need one list (that of
HTML4) because XHTML relies on HTML4.
(The attached htmlEntities.dtd file is a replica in DTD form of
mozilla/htmlparser/src/nsHTMLEntityList.h)
Perf: would that be possible to cache/share the list? otherwise, it is removed
to just be re-loaded when navigating from document to document, and there could
even be several copies at the same time in different windows/frames. (we have
the same problem with the over 2000 mathml entities BTW). On the other hand, if
they are shared and users' scripts alter them (i.e., if the spec allows entities
to be writable?), there could be another can of worms that don't warrant such
troubles at this stage.
Updated•25 years ago
|
Keywords: correctness
Comment 8•25 years ago
|
||
Considerations:
1) If we don't fix this, there's no way to use HTML entities in XHTML documents,
essentially limiting the text content of XHTML docs to straight Unicode text
only, so we would like to fix this if possible.
2) However, there's no commitment to any XHTML support at all the the first
release, so we're not required to fix this. We could choose just to not support
XHTML at all in 6.0 and delay support to 6.01, for example.
3) Definitely don't implement a fix that will blow out our memory footprint on a
per-frame or per-window basis. If we don't have time to do a memory-efficient
fix, Future this.
4) How long do you think it will take to fix this?
I'm going to ask for a new xhtml keyword so we can track issues related to xhtml
support.
Keywords: xhtml
Comment 9•25 years ago
|
||
This bug has been marked nsbeta3- because the original netscape engineer working
on this is over-burdened. If you feel this is an error, that you or another
known resource will be working on this bug,or if it blocks your work in some way
-- please attach your concern to the bug for reconsideration, but do not clear
the nsbeta3- nomination.
Whiteboard: [nsbeta3-]
Target Milestone: M18 → Future
| Reporter | ||
Comment 10•24 years ago
|
||
*** Bug 68202 has been marked as a duplicate of this bug. ***
Comment 11•24 years ago
|
||
Nominating for beta1. HTML entities need to be supported in XHTML since they are
commonly used by popular sites.
Keywords: nsbeta1
Comment 13•24 years ago
|
||
Setting target milestone to 0.9.1...
Target Milestone: Future → mozilla0.9.1
Comment 14•24 years ago
|
||
Nisheeth: This relates strongly to bug 74172. We want to be able to include
XHTML1 entities in XUL files in the chrome. The simple solution I had is to put
the entity files in the chrome (xpfe/global/resources/content) but maybe there's
a better place to put them...?
OS: Windows 98 → All
Hardware: PC → All
| Reporter | ||
Comment 15•24 years ago
|
||
dr, the real challenge is to avoid to unnecessarily clutter the memory with the
same list of entities. For example, if not done properly, all the numerous XUL
fragments will be trailing their own copies of the same thing.
| Assignee | ||
Updated•24 years ago
|
Compatibility consideration:
The test case has a relative URL to the DTD. Some XML user agents (eg. IE 5 for
Mac*) might attempt to fetch the DTD and display an alert if the DTD isn't
found. Since the examples in the XHTML spec use relative URLs (a Bad Thing, IMO)
it is likely that authors will include those same relative URLs in their
documents without actually providing the DTD in the corresponding location.
If Mozilla doesn't check the existence of the DTD, a compatibility problem will
be introduced. (Author uses Mozilla but the site visitor has a browser that
attempts to fetch the DTD.)
However, generating unnecessary network traffic would be bad, too. One way to
solve this would be *not fetching* the DTD if the system identifier is an
absolute URL to the DTD hosted at the W3C and checking for existance on the
document's server in the case of a relative URL. (OK, I know this isn't very
likely to be implemented, but this really is a potential compatibility problem,
because validating parsers will have problems if they can't find the DTD.)
* IE 5 for Mac OS Classic actually tries to fetch the DTD and signals an error,
if the DTD is not found. However, the XHTML features of that browser are
otherwise too broken to be of any use.
Comment 17•24 years ago
|
||
moving to TM of 0.9.2 per PDT triage (you can check it into 0.9.1 until Friday,
18/May/01 or into 0.9.2 after the tree opens)
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Updated•24 years ago
|
Priority: P3 → P2
Comment 18•24 years ago
|
||
Moving P2 and P3 bugs over to 0.9.3...
Target Milestone: mozilla0.9.2 → mozilla0.9.3
| Reporter | ||
Comment 19•24 years ago
|
||
Seems like this bug would require some deeper thoughts, along the lines of
what was done for the new image lib. A DTD manager could keep remote DTDs in
a disk DTD cache in necko, while selected local DTDs could stay in the memory
DTD cache. The manager could then be a memory pressure listener and such. Looks like
the same issues that were addressed in the new image lib may arise here.
Comment 21•24 years ago
|
||
Bulk re-assign of my 0.9.4 bugs to Heikki. I will not have the cycles to work
on these bugs while Clayton is on sabbatical for the next six weeks.
Assignee: nisheeth → heikki
Status: ASSIGNED → NEW
| Assignee | ||
Updated•24 years ago
|
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Comment 22•24 years ago
|
||
Using Mozilla 0.9.4 on RedHat Linux 7.1, the testcase is displayed correctly. Is
this bug fixed?
| Assignee | ||
Comment 23•24 years ago
|
||
| Assignee | ||
Updated•24 years ago
|
Target Milestone: mozilla0.9.5 → mozilla0.9.6
Comment 24•24 years ago
|
||
All entities for XHTML 1.0 are defined here:
http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
For XHTML Basic and XHTML 1.1 here:
http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-lat1.ent
http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-special.ent
http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-symbol.ent
Comment 25•24 years ago
|
||
Aleksey Nogin, you must have tried the text/html testcase which works, not the
xhtml one, which still fails. Since msn.com is attempting to be xhtml compliant,
doesn't this mean top100? Adding another bug as being blocked by this too.
Blocks: 95770
This doesn't block really bug 95770. (I wouldn't consider this as a blocker for
bug 15391, either.) Numeric character references and UTF-8 work.
MSN isn't serving real XHTML using a real XML content type. (It wouldn't work in
IE if they did.) This bug has nothing to do with MSN.
Does the XML spec really require non-validating parsers to support external
character entities?
Comment 27•24 years ago
|
||
What do you mean by "external"? External files that are linked in main DTD? Well
XHTML 1.1 is entirely built on that. Nothing is defined in it's main DTD and it
has multiple levels of referencing. So I don't see how this could be optional
for XHTML support.
Updated•24 years ago
|
| Assignee | ||
Updated•24 years ago
|
Target Milestone: mozilla0.9.6 → mozilla0.9.7
Comment 28•24 years ago
|
||
*** Bug 107736 has been marked as a duplicate of this bug. ***
By "external" I mean the XML meaning: not in the same storage object as the main
document. So a separate DTD file is external to an XML document.
Updated•24 years ago
|
Attachment #10952 -
Attachment mime type: text/html → application/xhtml+xml
Updated•24 years ago
|
Attachment #49600 -
Attachment is obsolete: true
Comment 30•24 years ago
|
||
*** Bug 108079 has been marked as a duplicate of this bug. ***
Comment 31•24 years ago
|
||
A preliminary workaround that works with the current Mozilla browser could be:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" [
<!ENTITY copy "©"> ]>
| Assignee | ||
Updated•24 years ago
|
Target Milestone: mozilla0.9.7 → mozilla0.9.9
Updated•23 years ago
|
Whiteboard: [Hixie-P4]
| Assignee | ||
Comment 32•23 years ago
|
||
For Mozilla 1.0 I think it will be good enough if we just do the same thing our
MathML implementation is doing, basically adding the XHTML public ids to the
catalog table and creating xhtml10.dtd.
rbs, could you break the XHTML-only entities away from mathml.dtd and have
mathml.dtd include xhtml10.dtd entities as an external entity file?
Status: NEW → ASSIGNED
Comment 33•23 years ago
|
||
It probably should be xhtml11.dtd, although probably not much has changed since
XHTML1.0.
| Reporter | ||
Comment 34•23 years ago
|
||
>rbs, could you break the XHTML-only entities away from mathml.dtd and have
>mathml.dtd include xhtml10.dtd entities as an external entity file?
Is this level of fine-grain worth the trouble at this stage? Since there are
over 2000 MathML entities, removing a couple of hundreds wouldn't make much
difference (except... the tediousness of trying to figure out the duplicates
with XHTML and weeding them out :-)
It would be speedier to just save attachment 11622 [details] (which already has
everything, I think) as xhtml11.dtd and then adding two entries in the catalog
table for: "-//W3C//DTD XHTML 1.0 Strict//EN" and "-//W3C//DTD XHTML 1.1//EN".
With that, we will be done, and gzip will do the rest.
Comment 35•23 years ago
|
||
or use exact W3C entity DTD definitions for XHTML.
I've provided the URLs for them above in comment #24.
| Reporter | ||
Comment 36•23 years ago
|
||
The interest is about efficiency. Only the entity definitions are of interest.
The comments are going to slow the parsing unnecessarily. When trimmed down, the
result is the lightweight version in attachment 11622 [details] (which could be compiled,
a la fastload, if there was a way to do that).
| Assignee | ||
Comment 37•23 years ago
|
||
I can easily strip out the comments from the versions W3C supplies, and combine
them into one file to speed loading. The reason I did not do that yet was
because I was wondering some comment in the mathml.dtd:
This is a *customized* list for Mozilla: characters originally specified
as combined pairs and plane 1 characters have been remapped to internal
code points within the Unicode's Private Use Area (PUA).
If this does not apply (or I don't need to do anything special) to the XHTML
1.0/1.1 entities, I am all set and can finish this soon. rbs, could you explain
the above comment to us (me) non-Unicode experts ;) ?
Or does attachment 11622 [details] list every entity the W3C lists include so that I
wouldn't need to do anything, basically? I am slightly concerned about just
accepting a list compiled from our HTML code...
| Reporter | ||
Comment 38•23 years ago
|
||
The remapping is about plane-1 MathML characters (5-digit code points) which
don't work on any application yet -- although some work is under progress to
eventually support them in Mozilla (bug 118000). Since HTML/XHTML code points
are only 4-digit, the comments about the remapping don't apply there.
The list from where attachment 11622 [details] was compiled is meant to be indentical with
the expected list (otherwise something would be out of sync and need fixing). It
seems to be complete to me. (The other bit is to add 'dtd/*' in the packages as
I am doing over at bug 109826.)
| Assignee | ||
Comment 39•23 years ago
|
||
Using the attachment 11622 [details] as the xhtml11.dtd entity list in this patch. I
would assume we need to add mappings for 4 new public IDs: 3 for XHTML 1.0
(Strict, Transitional and Frameset) and 1 for XHTML 1.1. I also made packaging
changes but I am not sure how to test them yet (MOZILLA_OFFICIAL?). This patch
is missing (a one liner?) change to make the Mac build system copy the DTD. I
think I have a fix but I want to test it first. Haven't tested on Linux yet.
| Reporter | ||
Comment 40•23 years ago
|
||
Looking good. You might want to get rid of this one since it is a built-in
entity in the XML spec & parser, etc (right?)
+
+<!-- Navigator entity extensions; apos is from XML -->
+<!ENTITY apos "'">
Also, the ordering of the catalog table could be revamped to put the most
frequent stuff first.
| Assignee | ||
Comment 41•23 years ago
|
||
Doh, forgot the apos entity; you are right! The most frequently used order would
probably just translate to putting the XHTML ids first, or do you have other
opinions?
Also this seems to work on Linux, now compiling on Mac.
| Reporter | ||
Comment 42•23 years ago
|
||
Possible re-ordering from:
+ {"-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN", "mathml.dtd" },
+ {"-//W3C//DTD SVG 20001102//EN", "svg.dtd" },
+ {"-//W3C//DTD XHTML 1.0 Strict//EN", "xhtml11.dtd" },
+ {"-//W3C//DTD XHTML 1.0 Transitional//EN", "xhtml11.dtd" },
+ {"-//W3C//DTD XHTML 1.0 Frameset//EN", "xhtml11.dtd" },
+ {"-//W3C//DTD XHTML 1.1//EN", "xhtml11.dtd" },
To:
+ {"-//W3C//DTD XHTML 1.0 Transitional//EN", "xhtml11.dtd" },
+ {"-//W3C//DTD XHTML 1.1//EN", "xhtml11.dtd" },
+ {"-//W3C//DTD XHTML 1.0 Strict//EN", "xhtml11.dtd" },
+ {"-//W3C//DTD XHTML 1.0 Frameset//EN", "xhtml11.dtd" },
+ {"-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN", "mathml.dtd" },
+ {"-//W3C//DTD SVG 20001102//EN", "svg.dtd" },
Comment 43•23 years ago
|
||
XHTML Basic is missing?
"-//W3C//DTD XHTML Basic 1.0//EN" "xhtml-basic10.dtd"
I was also wondering, probably stupid question, if the current syntax includes
declarations with an URL such as
"-//W3C//DTD XHTML Basic 1.0//EN"
"http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd"
Is there a difference in how Mozilla treats that?
| Assignee | ||
Comment 44•23 years ago
|
||
I would think that people that author XHTML from scratch would use Strict... but
on the other hand I prefer using Transitional because I am lazy, and that would
seem the best choice if you want to do minimal job converting your old HTML to
XHTML. I am fine with your suggestion as well.
And it seems we are also missing Basic.
And we are also missing Modularization of XHTML public IDs. They are a bit more
problematic. Basically what is required is that they conform to the Formal
Public Identifier specification and contain 'XHTML' in the description section.
For example, this would be a correct public ID: "-//Heikki Toivonen//DTD XHTML
Programming Extensions//EN". But currently some validating XML parsers seem to
have difficulty handling that DTD, and apparently there are no XML editors that
can produce documents in that specification. Based on that, I would be happy to
leave this problem until XML Catalogs are implemented.
| Assignee | ||
Comment 45•23 years ago
|
||
| Assignee | ||
Comment 46•23 years ago
|
||
Attachment #66202 -
Attachment is obsolete: true
Attachment #66334 -
Attachment is obsolete: true
| Assignee | ||
Comment 47•23 years ago
|
||
| Reporter | ||
Comment 48•23 years ago
|
||
Comment on attachment 66355 [details] [diff] [review]
New combined patch
r=rbs
Attachment #66355 -
Flags: review+
| Reporter | ||
Comment 49•23 years ago
|
||
Comment on attachment 66359 [details] [diff] [review]
Embedding packager changes, want these?
r=rbs
Attachment #66359 -
Flags: review+
Comment 50•23 years ago
|
||
Comment on attachment 66359 [details] [diff] [review]
Embedding packager changes, want these?
r=adamlock
Comment 51•23 years ago
|
||
Comment on attachment 66355 [details] [diff] [review]
New combined patch
sr=jst
Attachment #66355 -
Flags: superreview+
Comment 52•23 years ago
|
||
Comment on attachment 66359 [details] [diff] [review]
Embedding packager changes, want these?
sr=jst
Attachment #66359 -
Flags: superreview+
| Assignee | ||
Comment 53•23 years ago
|
||
Fixed.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 54•23 years ago
|
||
*** Bug 121808 has been marked as a duplicate of this bug. ***
Comment 55•23 years ago
|
||
verified on Win2K
works with:
XHTML 1.0 Strict
XHTML 1.0 Transitional
XHTML 1.0 Frameset
XHTML Basic 1.0
XHTML 1.1
as both application/xhtml+xml and text/xml
Status: RESOLVED → VERIFIED
Comment 56•23 years ago
|
||
+
+ MakeAlias(":mozilla:content:xml:content:src:xhtml11.dtd",
"$dist_dir"."dtd:");
+
This makes a 'dtd' folder next to the application. On Mac OS, this adds to the
clutter that users see when they view the Mozilla/Netscape folder in the Finder,
which we really try to avoid.
I filed bug 122710 on this issue.
You need to log in
before you can comment on or make changes to this bug.
Description
•