Closed
Bug 44458
Opened 24 years ago
Closed 23 years ago
HTML entities are not recognized in XHTML documents
Categories
(Core :: XML, defect, P2)
Core
XML
Tracking
()
VERIFIED
FIXED
mozilla0.9.9
People
(Reporter: rbs, Assigned: hjtoi-bugzilla)
References
Details
(4 keywords, Whiteboard: [Hixie-P4])
Attachments
(4 files, 3 obsolete files)
462 bytes,
application/xhtml+xml
|
Details | |
7.63 KB,
text/plain
|
Details | |
14.98 KB,
patch
|
rbs
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
1.12 KB,
patch
|
rbs
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
HTML includes a number of default entities (e.g., α β, etc). Since the XHTML specification doesn't say that these entities have been removed, it is expetect that they should be available in XHTML as well. STEP TO REPRODUCE ================= Load the attachement (to follow). It contains the following document: <?xml version="1.0"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>XHTML Document with entities</title> </head> <body> <p> alpha: α <br/> beta: β </p> </body> </html> EXPECTED RESULTS ================ The browser should display alpha: [greek alpha here] beta: [greek beta here] ACTUAL RESULTS ============== The browser is displaying alpha: [nothing here] beta: [nothing here] ADDITIONAL DETAILS ================== My tree is one week old, but looking at the check-ins and their associated comments, I don't see a change that could have fixed this.
I am yet again having trouble attaching the test case. If someone can do it please go ahead.
This bug is a severe problem when authoring non-English XHTML documents. e.g., the French-accented letters (é è û à etc.) are lost.
Comment 4•24 years ago
|
||
Yes, this bug is happening because we don't load up the entity sets defined in the XHTML DTD referenced from the XHTML document. We need to cache the three XHTML DTDs locally, look at the PUBLIC ID in the DOCTYPE declaration, map the ID into a XHTML DTD, load that DTD, and pass it to expat so that the entity declarations in the DTD become available to the XHTML document. Marking nsbeta3 and adding Heikki to the cc list.
If the attachement is saved in the "bin/dtd" directory as "htmlEntities.dtd", then no missing entities arise when passing it as the SYSTEM ID: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "htmlEntities.dtd"> But as nisheeth pointed out, the right way to go about this is to load up the list based on the PUBLIC ID. Fortunately, we only need one list (that of HTML4) because XHTML relies on HTML4. (The attached htmlEntities.dtd file is a replica in DTD form of mozilla/htmlparser/src/nsHTMLEntityList.h)
Perf: would that be possible to cache/share the list? otherwise, it is removed to just be re-loaded when navigating from document to document, and there could even be several copies at the same time in different windows/frames. (we have the same problem with the over 2000 mathml entities BTW). On the other hand, if they are shared and users' scripts alter them (i.e., if the spec allows entities to be writable?), there could be another can of worms that don't warrant such troubles at this stage.
Updated•24 years ago
|
Keywords: correctness
Comment 8•24 years ago
|
||
Considerations: 1) If we don't fix this, there's no way to use HTML entities in XHTML documents, essentially limiting the text content of XHTML docs to straight Unicode text only, so we would like to fix this if possible. 2) However, there's no commitment to any XHTML support at all the the first release, so we're not required to fix this. We could choose just to not support XHTML at all in 6.0 and delay support to 6.01, for example. 3) Definitely don't implement a fix that will blow out our memory footprint on a per-frame or per-window basis. If we don't have time to do a memory-efficient fix, Future this. 4) How long do you think it will take to fix this? I'm going to ask for a new xhtml keyword so we can track issues related to xhtml support.
Keywords: xhtml
Comment 9•24 years ago
|
||
This bug has been marked nsbeta3- because the original netscape engineer working on this is over-burdened. If you feel this is an error, that you or another known resource will be working on this bug,or if it blocks your work in some way -- please attach your concern to the bug for reconsideration, but do not clear the nsbeta3- nomination.
Whiteboard: [nsbeta3-]
Target Milestone: M18 → Future
Reporter | ||
Comment 10•24 years ago
|
||
*** Bug 68202 has been marked as a duplicate of this bug. ***
Comment 11•24 years ago
|
||
Nominating for beta1. HTML entities need to be supported in XHTML since they are commonly used by popular sites.
Keywords: nsbeta1
Comment 13•24 years ago
|
||
Setting target milestone to 0.9.1...
Target Milestone: Future → mozilla0.9.1
Comment 14•23 years ago
|
||
Nisheeth: This relates strongly to bug 74172. We want to be able to include XHTML1 entities in XUL files in the chrome. The simple solution I had is to put the entity files in the chrome (xpfe/global/resources/content) but maybe there's a better place to put them...?
OS: Windows 98 → All
Hardware: PC → All
Reporter | ||
Comment 15•23 years ago
|
||
dr, the real challenge is to avoid to unnecessarily clutter the memory with the same list of entities. For example, if not done properly, all the numerous XUL fragments will be trailing their own copies of the same thing.
Assignee | ||
Updated•23 years ago
|
Comment 16•23 years ago
|
||
Compatibility consideration: The test case has a relative URL to the DTD. Some XML user agents (eg. IE 5 for Mac*) might attempt to fetch the DTD and display an alert if the DTD isn't found. Since the examples in the XHTML spec use relative URLs (a Bad Thing, IMO) it is likely that authors will include those same relative URLs in their documents without actually providing the DTD in the corresponding location. If Mozilla doesn't check the existence of the DTD, a compatibility problem will be introduced. (Author uses Mozilla but the site visitor has a browser that attempts to fetch the DTD.) However, generating unnecessary network traffic would be bad, too. One way to solve this would be *not fetching* the DTD if the system identifier is an absolute URL to the DTD hosted at the W3C and checking for existance on the document's server in the case of a relative URL. (OK, I know this isn't very likely to be implemented, but this really is a potential compatibility problem, because validating parsers will have problems if they can't find the DTD.) * IE 5 for Mac OS Classic actually tries to fetch the DTD and signals an error, if the DTD is not found. However, the XHTML features of that browser are otherwise too broken to be of any use.
Comment 17•23 years ago
|
||
moving to TM of 0.9.2 per PDT triage (you can check it into 0.9.1 until Friday, 18/May/01 or into 0.9.2 after the tree opens)
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Updated•23 years ago
|
Priority: P3 → P2
Comment 18•23 years ago
|
||
Moving P2 and P3 bugs over to 0.9.3...
Target Milestone: mozilla0.9.2 → mozilla0.9.3
Reporter | ||
Comment 19•23 years ago
|
||
Seems like this bug would require some deeper thoughts, along the lines of what was done for the new image lib. A DTD manager could keep remote DTDs in a disk DTD cache in necko, while selected local DTDs could stay in the memory DTD cache. The manager could then be a memory pressure listener and such. Looks like the same issues that were addressed in the new image lib may arise here.
Comment 21•23 years ago
|
||
Bulk re-assign of my 0.9.4 bugs to Heikki. I will not have the cycles to work on these bugs while Clayton is on sabbatical for the next six weeks.
Assignee: nisheeth → heikki
Status: ASSIGNED → NEW
Assignee | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.4 → mozilla0.9.5
Comment 22•23 years ago
|
||
Using Mozilla 0.9.4 on RedHat Linux 7.1, the testcase is displayed correctly. Is this bug fixed?
Assignee | ||
Comment 23•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.5 → mozilla0.9.6
Comment 24•23 years ago
|
||
All entities for XHTML 1.0 are defined here: http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent For XHTML Basic and XHTML 1.1 here: http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-lat1.ent http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-special.ent http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-symbol.ent
Comment 25•23 years ago
|
||
Aleksey Nogin, you must have tried the text/html testcase which works, not the xhtml one, which still fails. Since msn.com is attempting to be xhtml compliant, doesn't this mean top100? Adding another bug as being blocked by this too.
Blocks: 95770
Comment 26•23 years ago
|
||
This doesn't block really bug 95770. (I wouldn't consider this as a blocker for bug 15391, either.) Numeric character references and UTF-8 work. MSN isn't serving real XHTML using a real XML content type. (It wouldn't work in IE if they did.) This bug has nothing to do with MSN. Does the XML spec really require non-validating parsers to support external character entities?
Comment 27•23 years ago
|
||
What do you mean by "external"? External files that are linked in main DTD? Well XHTML 1.1 is entirely built on that. Nothing is defined in it's main DTD and it has multiple levels of referencing. So I don't see how this could be optional for XHTML support.
Updated•23 years ago
|
Assignee | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.6 → mozilla0.9.7
Comment 28•23 years ago
|
||
*** Bug 107736 has been marked as a duplicate of this bug. ***
Comment 29•23 years ago
|
||
By "external" I mean the XML meaning: not in the same storage object as the main document. So a separate DTD file is external to an XML document.
Updated•23 years ago
|
Attachment #10952 -
Attachment mime type: text/html → application/xhtml+xml
Updated•23 years ago
|
Attachment #49600 -
Attachment is obsolete: true
Comment 30•23 years ago
|
||
*** Bug 108079 has been marked as a duplicate of this bug. ***
Comment 31•23 years ago
|
||
A preliminary workaround that works with the current Mozilla browser could be: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" [ <!ENTITY copy "©"> ]>
Assignee | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.7 → mozilla0.9.9
Updated•23 years ago
|
Whiteboard: [Hixie-P4]
Assignee | ||
Comment 32•23 years ago
|
||
For Mozilla 1.0 I think it will be good enough if we just do the same thing our MathML implementation is doing, basically adding the XHTML public ids to the catalog table and creating xhtml10.dtd. rbs, could you break the XHTML-only entities away from mathml.dtd and have mathml.dtd include xhtml10.dtd entities as an external entity file?
Status: NEW → ASSIGNED
Comment 33•23 years ago
|
||
It probably should be xhtml11.dtd, although probably not much has changed since XHTML1.0.
Reporter | ||
Comment 34•23 years ago
|
||
>rbs, could you break the XHTML-only entities away from mathml.dtd and have >mathml.dtd include xhtml10.dtd entities as an external entity file? Is this level of fine-grain worth the trouble at this stage? Since there are over 2000 MathML entities, removing a couple of hundreds wouldn't make much difference (except... the tediousness of trying to figure out the duplicates with XHTML and weeding them out :-) It would be speedier to just save attachment 11622 [details] (which already has everything, I think) as xhtml11.dtd and then adding two entries in the catalog table for: "-//W3C//DTD XHTML 1.0 Strict//EN" and "-//W3C//DTD XHTML 1.1//EN". With that, we will be done, and gzip will do the rest.
Comment 35•23 years ago
|
||
or use exact W3C entity DTD definitions for XHTML. I've provided the URLs for them above in comment #24.
Reporter | ||
Comment 36•23 years ago
|
||
The interest is about efficiency. Only the entity definitions are of interest.
The comments are going to slow the parsing unnecessarily. When trimmed down, the
result is the lightweight version in attachment 11622 [details] (which could be compiled,
a la fastload, if there was a way to do that).
Assignee | ||
Comment 37•23 years ago
|
||
I can easily strip out the comments from the versions W3C supplies, and combine
them into one file to speed loading. The reason I did not do that yet was
because I was wondering some comment in the mathml.dtd:
This is a *customized* list for Mozilla: characters originally specified
as combined pairs and plane 1 characters have been remapped to internal
code points within the Unicode's Private Use Area (PUA).
If this does not apply (or I don't need to do anything special) to the XHTML
1.0/1.1 entities, I am all set and can finish this soon. rbs, could you explain
the above comment to us (me) non-Unicode experts ;) ?
Or does attachment 11622 [details] list every entity the W3C lists include so that I
wouldn't need to do anything, basically? I am slightly concerned about just
accepting a list compiled from our HTML code...
Reporter | ||
Comment 38•23 years ago
|
||
The remapping is about plane-1 MathML characters (5-digit code points) which don't work on any application yet -- although some work is under progress to eventually support them in Mozilla (bug 118000). Since HTML/XHTML code points are only 4-digit, the comments about the remapping don't apply there. The list from where attachment 11622 [details] was compiled is meant to be indentical with the expected list (otherwise something would be out of sync and need fixing). It seems to be complete to me. (The other bit is to add 'dtd/*' in the packages as I am doing over at bug 109826.)
Assignee | ||
Comment 39•23 years ago
|
||
Using the attachment 11622 [details] as the xhtml11.dtd entity list in this patch. I
would assume we need to add mappings for 4 new public IDs: 3 for XHTML 1.0
(Strict, Transitional and Frameset) and 1 for XHTML 1.1. I also made packaging
changes but I am not sure how to test them yet (MOZILLA_OFFICIAL?). This patch
is missing (a one liner?) change to make the Mac build system copy the DTD. I
think I have a fix but I want to test it first. Haven't tested on Linux yet.
Reporter | ||
Comment 40•23 years ago
|
||
Looking good. You might want to get rid of this one since it is a built-in entity in the XML spec & parser, etc (right?) + +<!-- Navigator entity extensions; apos is from XML --> +<!ENTITY apos "'"> Also, the ordering of the catalog table could be revamped to put the most frequent stuff first.
Assignee | ||
Comment 41•23 years ago
|
||
Doh, forgot the apos entity; you are right! The most frequently used order would probably just translate to putting the XHTML ids first, or do you have other opinions? Also this seems to work on Linux, now compiling on Mac.
Reporter | ||
Comment 42•23 years ago
|
||
Possible re-ordering from: + {"-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN", "mathml.dtd" }, + {"-//W3C//DTD SVG 20001102//EN", "svg.dtd" }, + {"-//W3C//DTD XHTML 1.0 Strict//EN", "xhtml11.dtd" }, + {"-//W3C//DTD XHTML 1.0 Transitional//EN", "xhtml11.dtd" }, + {"-//W3C//DTD XHTML 1.0 Frameset//EN", "xhtml11.dtd" }, + {"-//W3C//DTD XHTML 1.1//EN", "xhtml11.dtd" }, To: + {"-//W3C//DTD XHTML 1.0 Transitional//EN", "xhtml11.dtd" }, + {"-//W3C//DTD XHTML 1.1//EN", "xhtml11.dtd" }, + {"-//W3C//DTD XHTML 1.0 Strict//EN", "xhtml11.dtd" }, + {"-//W3C//DTD XHTML 1.0 Frameset//EN", "xhtml11.dtd" }, + {"-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN", "mathml.dtd" }, + {"-//W3C//DTD SVG 20001102//EN", "svg.dtd" },
Comment 43•23 years ago
|
||
XHTML Basic is missing? "-//W3C//DTD XHTML Basic 1.0//EN" "xhtml-basic10.dtd" I was also wondering, probably stupid question, if the current syntax includes declarations with an URL such as "-//W3C//DTD XHTML Basic 1.0//EN" "http://www.w3.org/TR/xhtml-basic/xhtml-basic10.dtd" Is there a difference in how Mozilla treats that?
Assignee | ||
Comment 44•23 years ago
|
||
I would think that people that author XHTML from scratch would use Strict... but on the other hand I prefer using Transitional because I am lazy, and that would seem the best choice if you want to do minimal job converting your old HTML to XHTML. I am fine with your suggestion as well. And it seems we are also missing Basic. And we are also missing Modularization of XHTML public IDs. They are a bit more problematic. Basically what is required is that they conform to the Formal Public Identifier specification and contain 'XHTML' in the description section. For example, this would be a correct public ID: "-//Heikki Toivonen//DTD XHTML Programming Extensions//EN". But currently some validating XML parsers seem to have difficulty handling that DTD, and apparently there are no XML editors that can produce documents in that specification. Based on that, I would be happy to leave this problem until XML Catalogs are implemented.
Assignee | ||
Comment 45•23 years ago
|
||
Assignee | ||
Comment 46•23 years ago
|
||
Attachment #66202 -
Attachment is obsolete: true
Attachment #66334 -
Attachment is obsolete: true
Assignee | ||
Comment 47•23 years ago
|
||
Reporter | ||
Comment 48•23 years ago
|
||
Comment on attachment 66355 [details] [diff] [review] New combined patch r=rbs
Attachment #66355 -
Flags: review+
Reporter | ||
Comment 49•23 years ago
|
||
Comment on attachment 66359 [details] [diff] [review] Embedding packager changes, want these? r=rbs
Attachment #66359 -
Flags: review+
Comment 50•23 years ago
|
||
Comment on attachment 66359 [details] [diff] [review] Embedding packager changes, want these? r=adamlock
Comment 51•23 years ago
|
||
Comment on attachment 66355 [details] [diff] [review] New combined patch sr=jst
Attachment #66355 -
Flags: superreview+
Comment 52•23 years ago
|
||
Comment on attachment 66359 [details] [diff] [review] Embedding packager changes, want these? sr=jst
Attachment #66359 -
Flags: superreview+
Assignee | ||
Comment 53•23 years ago
|
||
Fixed.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 54•23 years ago
|
||
*** Bug 121808 has been marked as a duplicate of this bug. ***
Comment 55•23 years ago
|
||
verified on Win2K works with: XHTML 1.0 Strict XHTML 1.0 Transitional XHTML 1.0 Frameset XHTML Basic 1.0 XHTML 1.1 as both application/xhtml+xml and text/xml
Status: RESOLVED → VERIFIED
Comment 56•23 years ago
|
||
+ + MakeAlias(":mozilla:content:xml:content:src:xhtml11.dtd", "$dist_dir"."dtd:"); + This makes a 'dtd' folder next to the application. On Mac OS, this adds to the clutter that users see when they view the Mozilla/Netscape folder in the Finder, which we really try to avoid. I filed bug 122710 on this issue.
You need to log in
before you can comment on or make changes to this bug.
Description
•