Closed
Bug 600198
Opened 14 years ago
Closed 14 years ago
Unicode font mapping fails for Egyptian Hieroglyphs
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
INVALID
People
(Reporter: saqqara, Assigned: smontagu)
References
()
Details
Attachments
(2 files)
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0) Build Identifier: Mozilla/5.0 (Windows NT 6.0; rv:2.0b6) Gecko/20100101 Firefox/4.0b6 Web page text that contains Unicode 5.2 Egyptian Hieroglyphs (but does not specify a specific font name, e.g. via CSS) displays the unknown character rather than picking a glyph from a suitable installed font on the host system. A character mapping bug. Reproducible: Always Steps to Reproduce: 1. Install an Egyptian font (such as Aegyptus, http://users.teilar.gr/~g1951d/). 2. Read page http://jtotobsc.blogspot.com/2010/09/quick-test-for-ancient-egyptian-in-web.html 3. Note the hieroglyphs are not displayed. Expected Results: Mapped the 'unknown characters' to a font that supports the Unicode 5.2 Egyptian Hieroglyphs. Curiously, paste the hieroglyph characters into an edit box, or the Navigation or Search toolbars and they display correctly. Only web pages that don't work. Also note that Wikipedia entries contain Aegyptus in CSS styling thus fooling the casual observer that hieroglyphs work (they only work if that specific font is installed).
Updated•14 years ago
|
Assignee: nobody → smontagu
Component: General → Internationalization
Product: Firefox → Core
QA Contact: general → i18n
Comment 1•14 years ago
|
||
You can't use surrogate pairs for character references. Encode non-BMP code points directly. For exapmle, �� should be 𓄿 Intrestingly, the RSS feed uses the UTF-8 raw bytes correctly instead of character references. http://jtotobsc.blogspot.com/feeds/posts/default?alt=rss
Status: UNCONFIRMED → RESOLVED
Closed: 14 years ago
Resolution: --- → INVALID
Assignee | ||
Comment 2•14 years ago
|
||
Note that this is not a bug in Firefox when entities are used for Unicode SMP, as you wrote in an update to http://jtotobsc.blogspot.com/2010/09/quick-test-for-ancient-egyptian-in-web.html. The bug is in Blogger, which uses the wrong values for the entities, as Kimura-san already explained. This testcase uses the correct values and displays fine. There is a useful tool for converting Unicode characters to different forms at http://rishida.net/tools/conversion/
I'll take your word for it that some specification somewhere says this construction is illegal. Whare can I read this for myself? I was fooled by the fact that Safari, Chrome and Internet Explorer all support the use of surrogate pairs for character references so apparently I was not alone making the assumption the construction is valid - suggest Mozilla submit to W3C test suite etc. if not done already in interests of standards conformance.
Comment 4•14 years ago
|
||
(In reply to comment #3) > I'll take your word for it that some specification somewhere says this > construction is illegal. Whare can I read this for myself? HTML5: 8.2.4.70 Tokenizing character references http://www.w3.org/TR/html5/tokenization.html#tokenizing-character-references > Otherwise, if the number is in the range 0xD800 to 0xDFFF or is greater than > 0x10FFFF, then this is a parse error. Return a U+FFFD REPLACEMENT CHARACTER. XML: 4.1 Character and Entity References http://www.w3.org/TR/xml/#sec-references > [66] CharRef ::= '&#' [0-9]+ ';' > | '&#x' [0-9a-fA-F]+ ';' [WFC: Legal Character] and the definition of Well-formedness constraint: Legal Character, http://www.w3.org/TR/xml/#wf-Legalchar > Characters referred to using character references MUST match the production > for Char. and the definition of Char production. http://www.w3.org/TR/xml/#NT-Char > [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | > [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding > the surrogate blocks, FFFE, and FFFF. */ WFC violations are fatal errors. http://www.w3.org/TR/xml/#dt-wfc XML processor must stop the normal parsing when it encounters a fatal error. http://www.w3.org/TR/xml/#dt-fatal HTML 4: 20 SGML Declaration of HTML 4 http://www.w3.org/TR/html4/sgml/sgmldecl.html > 55296 2048 UNUSED -- SURROGATES -- Character numbers from 55296 (D800 in hex) to 57343 (DFFF in hex) are not used for HTML 4 document. > I was fooled by the fact that Safari, Chrome and Internet Explorer all support > the use of surrogate pairs for character references so apparently I was not > alone making the assumption the construction is valid - suggest Mozilla submit > to W3C test suite etc. if not done already in interests of standards > conformance. Recently WebKit also implemented this rule. http://trac.webkit.org/changeset/61234 > fast/parser/entity-surrogate-pairs-expected.txt: > * HTML5 doesn't allow entities to create surrogate pairs.
it worked fine with earlier version in 2020, but with the last update the Egyptian Hieroglyphs are not shown anymore in firefox !!!
Test it here: http://www.alanwood.net/unicode/egyptian-hieroglyphs.html
Comment 6•3 years ago
|
||
Please file a new bug instead of commenting on a closed alchaic bug. Since alanwood.net does not use surrogate character references, your issue does not have to do with this bug.
It works or me by the way.
You need to log in
before you can comment on or make changes to this bug.
Description
•