Closed Bug 600198 Opened 14 years ago Closed 14 years ago

Unicode font mapping fails for Egyptian Hieroglyphs

Categories

(Core :: Internationalization, defect)

x86
Windows Vista
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: saqqara, Assigned: smontagu)

References

()

Details

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)
Build Identifier: Mozilla/5.0 (Windows NT 6.0; rv:2.0b6) Gecko/20100101 Firefox/4.0b6

Web page text that contains Unicode 5.2 Egyptian Hieroglyphs (but does not specify a specific font name, e.g. via CSS) displays the unknown character rather than picking a glyph from a suitable installed font on the host system. A character mapping bug.

Reproducible: Always

Steps to Reproduce:
1. Install an Egyptian font (such as Aegyptus, http://users.teilar.gr/~g1951d/).
2. Read page http://jtotobsc.blogspot.com/2010/09/quick-test-for-ancient-egyptian-in-web.html
3. Note the hieroglyphs are not displayed.


Expected Results:  
Mapped the 'unknown characters' to a font that supports the Unicode 5.2 Egyptian Hieroglyphs.

Curiously, paste the hieroglyph characters into an edit box, or the Navigation or Search toolbars and they display correctly. Only web pages that don't work. Also note that Wikipedia entries contain Aegyptus in CSS styling thus fooling the casual observer that hieroglyphs work (they only work if that specific font is installed).
Assignee: nobody → smontagu
Component: General → Internationalization
Product: Firefox → Core
QA Contact: general → i18n
You can't use surrogate pairs for character references. Encode non-BMP code points directly.
For exapmle, �� should be 𓄿
Intrestingly, the RSS feed uses the UTF-8 raw bytes correctly instead of character references.
http://jtotobsc.blogspot.com/feeds/posts/default?alt=rss
Status: UNCONFIRMED → RESOLVED
Closed: 14 years ago
Resolution: --- → INVALID
Note that this is not a bug in Firefox when entities are used for Unicode SMP, as you wrote in an update to http://jtotobsc.blogspot.com/2010/09/quick-test-for-ancient-egyptian-in-web.html. The bug is in Blogger, which uses the wrong values for the entities, as Kimura-san already explained. This testcase uses the correct values and displays fine.

There is a useful tool for converting Unicode characters to different forms at http://rishida.net/tools/conversion/
I'll take your word for it that some specification somewhere says this construction is illegal. Whare can I read this for myself? 

I was fooled by the fact that Safari, Chrome and Internet Explorer all support the use of surrogate pairs for character references so apparently I was not alone making the assumption the construction is valid - suggest Mozilla submit to W3C test suite etc. if not done already in interests of standards conformance.
(In reply to comment #3)
> I'll take your word for it that some specification somewhere says this
> construction is illegal. Whare can I read this for myself? 

HTML5: 8.2.4.70 Tokenizing character references
http://www.w3.org/TR/html5/tokenization.html#tokenizing-character-references
> Otherwise, if the number is in the range 0xD800 to 0xDFFF or is greater than 
> 0x10FFFF, then this is a parse error. Return a U+FFFD REPLACEMENT CHARACTER.

XML: 4.1 Character and Entity References
http://www.w3.org/TR/xml/#sec-references
> [66]       CharRef       ::=       '&#' [0-9]+ ';'
>             | '&#x' [0-9a-fA-F]+ ';'    [WFC: Legal Character]
and the definition of Well-formedness constraint: Legal Character,
http://www.w3.org/TR/xml/#wf-Legalchar
> Characters referred to using character references MUST match the production
> for Char.
and the definition of Char production.
http://www.w3.org/TR/xml/#NT-Char
> [2]       Char       ::=       #x9 | #xA | #xD | [#x20-#xD7FF] | 
> [#xE000-#xFFFD] | [#x10000-#x10FFFF]    /* any Unicode character, excluding
> the surrogate blocks, FFFE, and FFFF. */
WFC violations are fatal errors.
http://www.w3.org/TR/xml/#dt-wfc
XML processor must stop the normal parsing when it encounters a fatal error.
http://www.w3.org/TR/xml/#dt-fatal

HTML 4: 20 SGML Declaration of HTML 4
http://www.w3.org/TR/html4/sgml/sgmldecl.html
> 55296   2048    UNUSED  -- SURROGATES --
Character numbers from 55296 (D800 in hex) to 57343 (DFFF in hex) are not used for HTML 4 document.

> I was fooled by the fact that Safari, Chrome and Internet Explorer all support
> the use of surrogate pairs for character references so apparently I was not
> alone making the assumption the construction is valid - suggest Mozilla submit
> to W3C test suite etc. if not done already in interests of standards
> conformance.

Recently WebKit also implemented this rule.
http://trac.webkit.org/changeset/61234
> fast/parser/entity-surrogate-pairs-expected.txt:
>     * HTML5 doesn't allow entities to create surrogate pairs.

it worked fine with earlier version in 2020, but with the last update the Egyptian Hieroglyphs are not shown anymore in firefox !!!
Test it here: http://www.alanwood.net/unicode/egyptian-hieroglyphs.html

Please file a new bug instead of commenting on a closed alchaic bug. Since alanwood.net does not use surrogate character references, your issue does not have to do with this bug.

It works or me by the way.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: