Closed Bug 127604 Opened 23 years ago Closed 16 years ago

Seventh letter in Russian alphabet displayed incorrectly

Categories

(Core :: Internationalization, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
mozilla1.2alpha

People

(Reporter: bzbarsky, Assigned: roland.mainz)

References

()

Details

(Keywords: intl)

Attachments

(4 files, 1 obsolete file)

BUILD: Linux 2002-02-24-06 nightly

STEPS TO REPRODUCE:
1)  Load http://www.google.com.ru/
2)  Look at the next-to-last letter on the right-hand button below the textfield

ACTUAL RESULTS:  Letter looks like pound sign (like british currency pound)

EXPECTED RESULTS:  Letter looks like a "e" with an umlaut over it

Same problem happens on
http://www.auburn.edu/academic/liberal_arts/foreign/russian/RWT-audio/alphabet/read-russian.html
Here you need to scroll down to the text that says "Now that you have learned
all of the letters in the Russian alphabet, you should be ready to try these
listening exercises".  Above this, on the left, there is a pink area that lists
the letters one by one.  The third-to-last letter shows a superscript 3 and a
pound sign instead of E and e with umlauts over them.
Keywords: intl
QA Contact: ruixu → ylong
ylong: is this regression?
I should note that this worksforme when running the same exact mozilla build on
a different X server (one that does not have decent cyrillic fonts installed). 
So this could be a font-specific issue...
> ylong: is this regression?

Seems not - I saw same problem on N6.2.1.
related to X font server
over to shanjian, cc bstell
Assignee: yokoyama → shanjian
This is broken for me in builds going back to 2001-05-15-08 (the oldest I have
on hand). So certainly not a recent regression.
accept. 
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.2
In attachment 71273 [details] the circle includes "superscript 3" glyph, a comma glyph, 
and the "Pound sign" glyph.

Are the first two glyphs expected?

Assuming the superscript-3 and comma are unexpected:

Can they be selected/highlighted separately?

If you copy-n-paste them into a page where the encoding is set to western
does the source show these as separate chars and or NCRs?

What value is the chars / NCR(s)?
The expected glyphs in that screen shot would look like:

1) E with an umlaut over it
2) comma
3) e with an umlaut over it

So the unexpected glyphs are the superscript-3 and pound-sign.  The comma is
expected.  All three glyphs can be selected/highlighted/copied separately.

The source has these as separate chars. The values are:

superscript-3 -- 0xB3
pound sign    -- 0xA3
(kindly forgive my bumbling here, I'm trying to determine if this is a 
converter problem or a font problem)

I see that http://www.google.com.ru/ has this:
META HTTP-EQUIV="content-type" CONTENT="text/html; charset=windows-1251"

Shanjian: are these correct for cp1251?
  superscript-3 -- 0xB3
  pound sign    -- 0xA3

Boris: can you set the environment variable NS_FONT_DEBUG=3D, display
http://www.google.com.ru/, and attach the output to this bug?
Can you also confirm that the encoding menu shows cp1251?
The "View" menu has no character coding marked in it at all (and never does,
these last few weeks, for pages that use a <meta> charset), but page info shows
that the google page is indeed being treated as windows-1251.

Note that the other page I cite (the one with the pink background) is in KOI-8,
not Windows-1251.
There were 18 fonts loaded: 
(I do not show the 10 iso8859-1 fonts):

loaded -cronyx-helvetica-medium-r-normal--17-120-100-100-p-67-koi8-r
loaded -cronyx-helvetica-bold-r-normal--17-120-100-100-p-67-koi8-r
loaded -cronyx-helvetica-bold-r-normal--14-100-100-100-p-56-koi8-r
loaded -cronyx-helvetica-medium-r-normal--14-100-100-100-p-56-koi8-r
loaded -cronyx-helvetica-medium-r-normal--17-120-100-100-p-67-koi8-r
loaded -cronyx-helvetica-medium-r-normal--11-80-100-100-p-46-koi8-r
loaded -misc-fixed-medium-r-normal--13-120-75-75-c-70-iso8859-13
loaded -adobe-symbol-medium-r-normal--10-100-75-75-p-61-adobe-fontspecific

This seems a bit too much data.

Boris: could you make a simple page that just has the problematic chars,
then attache the NS_FONT_DEBUG=3D output? (Perhaps you could start with the
google page and strip it down to the bare minimum.) It would reduce the
output if you specified the page on the command line something like this:

  ./mozilla file:///some_dir/testfile.html > output


Re: comment #11:

The google site has an 0xB8 for the pound-sign (windows-1252).
The other site has an 0xA3 for the pound-sign (KOI8-R).

Looks like whatever unicode char we get out misrenders the same way in both
cases....
This log was done by turning off all toolbars and the sidebar and then loading
a page that was encoded in Windows-1251 and had just a single 0xB8 character.
Attachment #80033 - Attachment is obsolete: true
it looks like the font of interest is:
loaded -cronyx-helvetica-medium-r-normal--17-120-100-100-p-67-koi8-r

> The google site has an 0xB8 for the pound-sign (windows-1252).
> The other site has an 0xA3 for the pound-sign (KOI8-R).

Did you get these by pasting them into a moz page set to iso8859-1?
(I'm trying to find what moz thought the character was).

If the unicode value in the doc is a 0xA3 moz should display as a pound-sign.
If the unicode value in the doc is a 0xB8 moz should display as a cedilla.

If we have the wrong unicode character value in the doc we should look at the 
input converter.

If we have the right character but wrong glyph then we need to look at the
font converter and/or the font.
> Did you get these by pasting them into a moz page set to iso8859-1?

No, that's what they are in the raw source of the relevant pages.  Sorry about
that...

If I take the characters from the
http://www.auburn.edu/academic/liberal_arts/foreign/russian/RWT-audio/alphabet/read-russian.html,
highlight it, open this bug page in Editor, and then paste, I get the following:

superscript-3 gets pasted as "&#1025;"
pound-sign gets pasted as "&#1105;"

They also paste as the correct glyphs.  That is, I highlight the pound sign,
copy, paste it into editor, and it pastes as an "e" with an umlaut over it.

Sounds like the problem is in the font converter or the font (and also sounds
like the glyph is very obviously available _somewhere_ since editor shows it...)
Re comment #11:
These are the values for A3 and B3 for CP1251:
=A3	U+0408	CYRILLIC CAPITAL LETTER JE
=B3	U+0456	CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I

CP1251 doesn't seem to support the superscript 3, and as for pound, do you
mean the british currency or the pound/hash/sharp sign?  The latter is an ASCII
character so it's 23 hex.
It appears to be the font/font-converter.
Can we get a screen show of xfd displaying
-cronyx-helvetica-medium-r-normal--17-120-100-100-p-67-koi8-r
Attached image screenshot
Wow.  Didn't know this program existed.  :)

Looks like the 0xA3 and 0xB3 chars are precisely what's being shown in this
case... So is this a font bug or are those not the right unicode values? 
(Note: this font does not seem to have the correct chars in it at all).  If
it's just the font, I'm still a little confused by why copying the "wrong" char
and pasting into composer pastes the "right" char (which must be coming from a
different font, I guess).
So this is a font bug. And yes, composer must use a different font. I will close
it as invalid. 
(We can't do much with those junk font. Such font should be removed from user's
system.)
Thanks brian for helping me resolve this bug. Thanks Boris for your effort too. 
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → INVALID
> So is this a font bug or are those not the right unicode values? 

For the xfd screen shot Unicode was never used and so is not part of the issue.

I am not an expert on Cyrillic encodings but this looks like a font bug since 
several sources show Koi8-r has an "'e' with a double dot above" at 0xA3:
http://koi8.pp.ru/main.html / http://koi8.pp.ru/koi8-r.gif
http://czyborra.com/charsets/cyrillic.html

Ftang: would you kindly comment on this?

> I'm still a little confused by why copying the "wrong" char and pasting into 
> composer pastes the "right" char (which must be coming from a different font, 
> I guess)

Exactly, inside moz it is always the same Unicode character. When displaying it
moz trusts that the registry-encoding in the font's XLFD is accurate and moz 
converts the Unicode value to the equivalent value for the font's 
registry-encoding. If the font has glyphs that do not match the 
registry-encoding moz has no way to detect this. It appears that the font 
selected in the editor is different from the font in the page. Do you get the
same result if you set the editor encoding to cp1251 before pasting?

It is true that the font is bad.

It is also true that mozilla's display is incorrect.

Perhaps we should reopen this bug, make it dependent on the "invalid font' code
bug 117877, and when that is working then mark this -font- as invalid.
> Do you get the same result if you set the editor encoding to cp1251 before
> pasting?

Aha!  If I do that, the pasted char is a pound-sign. So it's definitely a font bug.
this actually is a bug so I'm reopening
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
this bug depends on bug 117877 which is owned by Roland
Assignee: shanjian → Roland.Mainz
Status: REOPENED → NEW
Depends on: 117877
(In reply to comment #23)
> It is true that the font is bad.

The Cronyx font family originally did not have CAPITAL YO/SMALL YO glyphs, they were added later; and XFree86 now ships the "correct" ones.  They are, however, still incomplete -- compare the xfd screenshot with official rendering: http://koi8.pp.ru/koi8-r.gif
WONTFIX obsolete X core fonts bug.
Status: NEW → RESOLVED
Closed: 22 years ago16 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: