Closed Bug 232657 Opened 21 years ago Closed 13 years ago

Some Unicode Plane 1 characters are not displayed, others are displayed at random

Categories

(Core :: Graphics, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: peter, Unassigned)

References

()

Details

(Keywords: intl, regression)

User-Agent: Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 On these pages which contain Unicode plane 1 characters (Ugaritic, Old Italic) some of the characters are displayed correctly (apparently with font Code2001) but others are displayed with ? in a box. Consistently the same characters are displayed or not displayed in the browser and in Composer normal view and preview. But almost all of these characters are displayed correctly in Composer HTML Source view (exceptionally, U+10302 is not). This implies that the problem is not (only) in the font, but is in Mozilla. Note that in the source the characters are encoded like 𐎀 and there is no format difference between the lines that display correctly and those that do not. I note that among the other plane 1 character sets viewable from http://www.unicode.org/charts/collation/, the same problem is found with a few Gothic characters but not with the other sets. Code2001 is of course a font substituted by the system. I cannot specify this as one of the fonts in Preferences because it supports only Plane 1. Reproducible: Always Steps to Reproduce: 1. Install font Code2001 2. View one of the above pages 3. Click File - Edit Page and compare Composer HTML source view Actual Results: The following characters are replaced by ? in a square: Ugaritic: 10385, 10386, 10387, 1038C, 1038D etc Old Italic: 10302, 10304, 10305, 10307 etc. Expected Results: Displayed the correct characters with the glyphs from Code2001. This seems to be a symptom of an apparent randomness in selection of fallback glyphs when there is no glyph in the listed fonts. I noted in bug 231889 that the substitute character for U+03DF in Composer seems to depend on the character size chosen.
Can you try the following? 1. Type 'about:config' in the location bar 2. Type 'name-list' to list only pref.entries with 'name-list' 3. _Add_ 'Code2001' to font.name-list.serif.x-western font.name-list.sans-serif.x-western font.name-list.monospace.x-western
Keywords: intl
(In reply to comment #1) > Can you try the following? > > 1. Type 'about:config' in the location bar > 2. Type 'name-list' to list only pref.entries with 'name-list' > 3. _Add_ 'Code2001' to > font.name-list.serif.x-western > font.name-list.sans-serif.x-western > font.name-list.monospace.x-western > > This has fixed the problem for me. Actually I had to add new strings to about:config for the first two; the third one already existed and included Code2001. But this is not a resolution of the bug, just a workaround, as ordinary users cannot be expected to do something this complex, especially if it is not documented.
Peter, thanks for testing. You're right that there's something strange going on. rbs, do you have any idea why we have this inconsistency (some characters being displayed while others not being displayed)? If all of them were not displayed because Code2001 is not looked into for glyphs (on Linux, Mozilla-Xft does exactly that because fontconfig doesn't return Code2001 in the list of fonts to look into for glyphs [1]), it'd make sense. [1] We may have to consider 'fixing' this in Gfx:Xft.
Assignee: nobody → jshin
Status: UNCONFIRMED → NEW
Ever confirmed: true
Sorry for spamming. I forgot to reset the component to Gfx:Win. I have to try a debug build with some diagnostic output turned on.
Component: Layout: Fonts and Text → GFX: Win32
As far as Old Italic and Ugaritic are concerned, I can't reproduce this bug on Win2k. Gothic, Desert, Shavian and Osmanya work fine as well with Andagii font installed (http://www.i18nguy.com/unicode/unicode-font.html). With only Code2001, Old Italic, Gothic, and Desert worked fine.
I just hit upon a strange behavior of Mozilla. When I moved back and forth between various plane 1 script pages at http://www.unicode.org/charts/collation/, some characters turned to question marks. Reloading the page turn them back to their glyphs. This seems related with the 'randomness' Peter observed.
Those plane1 characters, Ugaritic, Old Italic, etc, are behaving strangely on my system (Win2K - EN locale, with Code2001 installed). All I get with a plane1 character is a glyph that looks like "c c" (double c), or a question mark (?). No, it doesn't look alright for me at all, even with code2001 installed, making the bug all the more suspicious.
This is severe regression. Gone are all the 'proof' screenshots in the original bug 118000 where the support of plane 1 characters was added.
Blocks: 118000
Keywords: regression
The regression is most likely in platform-specific code (you probably already suspect that), since the test pages display fine for me with GTK2+Xft (and I don't see the problem mentioned in comment 3). Any idea when the regression happened?
I went back to my m1.0 corner and I am still seeing the problem. That's very strange because m1.0 came soon after bug 118000 was resolved. In fact, I hadn't tested plane 1 characters before. With my present misgivings, if it wasn't for the verified/fixed bug 118000, I would have a hard time believing that they have ever worked.
This bug is really 'illusive'. All but Shavian (supported by Code2001) characters are rendered correctly at http://www.i18nguy.com/unicode-plane1-utf8.html. James Kass' example pages work well for me. I have really no clue how come all of them are broken for rbs. Neither can I understand why only a part of characters are not rendered correctly on my Win2k. Could it be due to some changes on Windows 2k made recently? I went up to Win2k SP4, but fell back to SP3. How about you, rbs?
It is SP4 for me. I hope it isn't working (partially) for you because you and maybe shanjian and the other intl guys from bug 118000 are/were on a non-EN locale.
Yeah, that's another difference. I'm gonna test it after switching to EN and various other locales. My Win2k is EN, but at the moment the system default locale is set to KO. I'll test it on Win XP(KO. with other locales) later. BTW, what's your version of usp10.dll (WINNT\System\usp10.dll)? Mine is 1.0405.2416.1 (I thought I had updated it manually to a more recent one in 2002, but it's rather old).
C:\WINNT\system32\usp10.dll : version 1.325.2195.6692
In http://home.att.net/~jameskass/gothictest.htm, I get five boxed question marks in the first line, other characters look like correct Gothic. http://home.att.net/~jameskass/deserettest.htm and http://home.att.net/~jameskass/keybgoth.htm are OK. I have Win2K SP4 English locale, usp10.dll 1.405.2416.1, and Mozilla 1.6 as in the original report of this bug. Maybe the issue is that charset is x-user-defined in http://home.att.net/~jameskass/gothictest.htm, but UTF-8 in http://home.att.net/~jameskass/deserettest.htm. See the explanation at the bottom of the latter page. I did manage to fix the Gothic display by setting font.name-list.serif.x-user-def to Code2000, Code2001, similarly for monospace and sans-serif. Jungshik and rbs, try that.
rbs, your problem (of not being able to see any of plane 1 characters) is almost certainly due to that you haven't activated Uniscribe on your Win2k. People like James, Peter and me activated it so long time ago that they sometimes forget to tell others that it has to be activated. See Tex' write-up on the topic at http://www.i18nguy.com/surrogates.html The easiest way to do that is to just install a language support pack for any of complex scripts such as Devanagari, Thai, Tamil (in control panel, regional options). Of course, you can edit the registry directly if you like to. The default system locale was zh-CN when I wrote comment #13. (I forgot I had switched to zh-CN before). I tested it with the default system locale KO and EN-US and got exactly the same result. All of James' pages (Gothic, Desert and Gothic keyboard) work perfectly well (x-user-defined or not), but Shavian in Tex' page is still problematic and the Unicode collation tables get me really confused. It may change with a new font. I have yet to add in a few debug statements to see what's really going on. OT: as for the difference between my experience and dbaron's with Mozilla-Xft on Linux (comment #3 and comment #9), it comes from the difference in locales. He ran Mozilla under en_US(.UTF-8) locale while I ran it under ko_KR.UTF-8 locale. fontconfig's font search/match depends on 'lang'. For UTF-8 encoded pages like the Unicode collation table, Mozilla turns to the locale if 'lang' is not explicitely specified. Because Code2001 covers US-ASII characters, it's regarded as supporting en-US by fontconfig and returned to Mozilla along with other fonts. However, by any strech, Code2001 can't be considered supporting Korean so that it's not included in the list of fonts returned by fontconfig and I get question marks for plane 1 characters because no font in the list covers them. (see bug 232716 and other bugs mentioned there)
> certainly due to that you haven't activated Uniscribe on your Win2k yeah, that was the problem. I edited the registry and I can now see those non-BMP characters.
*** Bug 316408 has been marked as a duplicate of this bug. ***
Should this be resolved INVALID (not a bug in Mozilla)?
I bet it is WORKSFORME on trunk anyway, even without registry changes.
Trying to see if it now WORKSFORME, the original reporter, on my current Windows XP system with Firefox trunk 2.0.0.11 and Uniscribe enabled. The trouble is that the original test pages have disappeared. But the Unicode Page 1 test pages linked to at http://www.i18nguy.com/surrogates.html seem to display OK for me. At http://homepage.mac.com/thgewecke/BeyondBMP.html I get question marks for Old Persian, but I guess that is because my (newer?) version of Code2001 has Old Persian encoded at its Unicode code points and not in the Plane 15 private use area.
You can also use http://alanwood.net/unicode/linear_b_syllabary.html and onwards for testing.
(In reply to comment #22) > You can also use http://alanwood.net/unicode/linear_b_syllabary.html and > onwards for testing. > Thanks, Simon. On my system all characters from Linear B to Phoenician display OK, from Kharoshthi to Musical Symbols don't (mostly question marks displayed), from Ancient Greek Musical Notation to CJK Unified Ideographs Extension B are OK, CJK Compatibility Ideographs Supplement are not. I think this largely corresponds to the coverage in Code2001, although that does not have the CJK glyphs which are presumably being picked up from another font.
(In reply to comment #21) > Trying to see if it now WORKSFORME, the original reporter, on my current > Windows XP system with Firefox trunk 2.0.0.11 and Uniscribe enabled. 2.0.0.11 is not trunk. What I would really like to know is whether it works on a nightly build or Firefox 3 Beta on Windows XP *without* Uniscribe enabled.
Product: Core → Core Graveyard
I suspect this bug has long since been fixed, but I do not have a sufficiently old Windows machine to test it on. Relabeling Core:Graphics for triage.
Assignee: jshin1987 → nobody
Status: NEW → UNCONFIRMED
Component: GFX: Win32 → Graphics
Ever confirmed: false
Product: Core Graveyard → Core
QA Contact: layout.fonts-and-text → thebes
Works for me on trunk.
Status: UNCONFIRMED → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.