Closed Bug 111728 (tec-osx) Opened 23 years ago Closed 22 years ago

TEC causes problems with Greek, Cyrillic and some Latin chars (Icelandic, Polish, Czech)

Categories

(Core :: Internationalization, defect)

PowerPC
macOS
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla1.2alpha

People

(Reporter: hsivonen, Assigned: ftang)

References

()

Details

(Keywords: intl)

Attachments

(8 files, 3 obsolete files)

Build ID: 2001112020 FizzillaCFM

Steps to reproduce:
1) Load a page that contain eth, thorn, or l with stroke. eg.
http://www.unics.uni-hannover.de/nhtcapri/multilingual1.html

Actual results:
Even if the surrounding text is Times, some letters with stroke, eth and thorn
are rendered using another font. The other font has over-wide kerning for Latin
text which suggest characters have been designed to match the width of
Han/Hangul/Kanji blocks. I don't remember seeing this with Mac OS X 10.0.x which
had Japanese fonts but no Korean or Chinese fonts. Also, the problematic chars
don't look like chars from the Hiragino fonts. That's why I suspect the
characters come from Hei, AppleGothic or Apple LiGothic Medium that came with
Mac OS X 10.1.

Expected results:
Since Times, Helvetica etc. contain glyphs for the characters in question,
expected the usual Latin fonts to be used.

Additional information:
Affected languages include Icelandic, Polish and Old English.
MacOSX->nhotta
cc'ing shanjian
Assignee: yokoyama → nhotta
I am not sure if Frank wants to do this after he comes back.
Accept for now and set 0.9.9.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.9
Depends on: 105137
No longer depends on: 105137
maybe we should disable TECFallback and let ATSUI fallback kick in directly. Not
sure about performacne impact. see bug 111731 also. same issue.
Keywords: intl
QA Contact: teruko → ylong
Target Milestone: mozilla0.9.9 → mozilla1.2
*** Bug 149689 has been marked as a duplicate of this bug. ***
See also bug 148361 and bug 150485 on the Mac Mozilla product.
As ftang suspected, Apple's legacy Text Encoding Converter is the culprit. When
TEC is disabled, the right fonts are used for Icelandic, Old English, Central
European, Greek and Cyrillic characters. However, for some reason the Latin
chars in question don't get CoreGraphics anti-aliasing.

Attaching a patch that turns off TEC.

Problem: With the patch, I'm seeing severe problems with wrong glyphs showing
up semi-randomly at http://www.cs.tut.fi/~jkorpela/html/guide/entities.html
*** Bug 148361 has been marked as a duplicate of this bug. ***
*** Bug 150485 has been marked as a duplicate of this bug. ***
*** Bug 111731 has been marked as a duplicate of this bug. ***
Alias: tec-osx
Summary: Chinese or Korean font used for some Latin chars → TEC causes problems with Greek, Cyrillic and some Latin chars (Icelandic, Polish, Czech)
I suspect Mozilla is treating pages encoded as ISO-8859-1 somehow differently
from pages using another encoding. Or at least I can't come up with another
explanation for the problems I'm seeing with
http://www.cs.tut.fi/~jkorpela/html/guide/entities.html

Can anyone more familiar with the code confirm whether my assumption is correct.
Keywords: review
It's probably not about any special treatment of ISO-8859-1, but about
non-MacRoman chars getting cached somewhere in such a way that the
first-rendered char gets repeated when Mozilla is supposed to draw the next
char. After scrolling, the right glyph appears when I select a char. Then that
char is repeated when I select other chars until I scroll again.

The patch isn't ready for review until the problems with
http://www.cs.tut.fi/~jkorpela/html/guide/entities.html are solved.
Keywords: review
The ATSUStyle needs to be re-applied to an ATSUTextLayout after the text
pointer has been updated. Attaching a new patch that fixes the issue.

Looking for r=.
Attachment #93014 - Attachment is obsolete: true
Keywords: review
Blocks: 159809
the ATSUITextLayout fix seems a good stuff. It might address one of the big
problme I face recently. I don't think we could just turn TEC fallback off like
this way.
spin off the atsui issue into bug 160001
Henri Sivonen: 
1. I don't want to take the disable TEC fallback patch. This will slow down
Japanese/Chinese/Korean display
2. I want to take the ATSUI fix. I file bug 160001 for that
3. I think we should skip TEC Fallback for some charactesrs to fulfill your need. 
How about this. I will add a fast check and skip TEC fallback if the characters
is Latin, Cyrillic or Greek. ? Will that solve your problem ?
reassign to ftang
Assignee: nhotta → ftang
Status: ASSIGNED → NEW
Attachment #93064 - Attachment is obsolete: true
Blocks: 157673
Status: NEW → ASSIGNED
Keywords: nsbeta1+
Comment on attachment 93221 [details] [diff] [review]
patch with fix in bug 160001 obsolete last patch

r=nhotta

Please fix indentation.
Attachment #93221 - Flags: review+
Do we know _why_ TEC has issues with Greek, Latin and Cyrillic? This code is
complex enough as it is. Adding yet more weird and wonderful special cases
scares me.
> How about this. I will add a fast check and skip TEC fallback if the 
> characters is Latin, Cyrillic or Greek. ? Will that solve your problem ?

Yes, that will solve the problem with the duplicates of this bug. However, would
it be better to skip TEC for anything except Chinese, Japanese and Korean
instead of skipping TEC for Latin, Greek and Cyrillic?
sfraser wrote:
>Do we know _why_ TEC has issues with Greek, Latin and Cyrillic?
Yes, we know the following reason
1. TEC fallback will pick a the Greek, Latin or Cyrillic from a Japanese, 
Chinese or Korean font since CJK font have some Greek, Latin and Cyrillic 
glyphs. so if the font face is "Times New Roman" and some characters is 
in "Times New Roman" but cannot be encoded in MacRoman (Those characters part 
of the WGL4 but not part of MacRoman), then if the CJK script is installed, we 
will use CJK font to render it. With the patch, it will skip the CJK fonts and 
let ATSUI fallback to use "Times New Roman" font to render it. 
2. The problem is start from a certain version of TEC, it start to convert some 
character with compatability mapping. For example, in the ISO-8859-1 there are 
three character- 1/4, 1/2, and 3/4. Somehow start from a particular version of 
TEC (1.5 ?), it convert it to three characters for each one of them. - '1' 
+ '/' + '4' , '1' + '/' + '2', '3' + '/' + '4'. But since the "Times New Roman" 
have a '1/4' glyph, it is better to render it with 1 glyph instead of 3 glyphs. 

>However, would it be better to skip TEC for anything except Chinese, 
>Japanese and Korean instead of skipping TEC for Latin, Greek and Cyrillic?
Why it will be better? Please list the reason.
Why skip TEC and use ATSUI is better for Thai?
Why skip TEC and use ATSUI is better for Devanagri?
Why skip TEC and use ATSUI is better for other mac scripts ?
Why skip TEC and use ATSUI is better for punctation mark?
Why Times New Roman? I can't think of a *worse* font. Lucida Grande is far more
suited for such a replacement, because 1) it is the default Mac OS X font, and
2) it is a linear font that displays very well on computer screens (and is also
more in line with CJK calligraphy than copies of roman stone engravings (serif
fonts)).

On the other hand, I don't even understand why this is still discussed. Using
ATSUI would solve a bunch of outstanding bugs right away. Use of ATSUI is really
to be expected from serious Mac OS X applications. It is currently the number
one Mozilla drawback.

Performance hit? Jaguar.
Re: Comment #21
> Why it will be better? Please list the reason.

I don't know whether ATSUI one char at a time would be better for all non-CJK,
which is why I asked. I thought that perhaps Latin, Greek and Cyrillic aren't
the only scripts TEC causes trouble with. (I suppose one-char-at-a-time ATSUI
won't work with scripts that require contextual glyph selection or something
similar.)

Re: Comment #22
> Why Times New Roman?

Times, actually. Or Helvetica. Or any Latin font that has glyphs for chars that
aren't in the MacRoman repertoire. As far as the Latin script is concerned this
bug is about avoiding the font change when the font that is used for a-z also
has the other required chars.

> Lucida Grande is far more suited for such a replacement,

If the suggested font doesn't have the required glyphs, ATSUI will use Lucida
Grande if it has the glyphs.

> On the other hand, I don't even understand why this is still discussed. 
> Using ATSUI would solve a bunch of outstanding bugs right away. 

It would, yes. And I hope Mozilla will move to full throttle ATSUI. However,
fixing this bug makes sense in the interim, because we can have this fix now,
and moving to full throttle ATSUI will take more time.

> Use of ATSUI is really to be expected from serious Mac OS X applications.

Indeed.
> 1. TEC fallback will pick a the Greek, Latin or Cyrillic from a Japanese, 
> Chinese or Korean font since CJK font have some Greek, Latin and Cyrillic 
> glyphs. so if the font face is "Times New Roman" and some characters is 
> in "Times New Roman" but cannot be encoded in MacRoman (Those characters part 
> of the WGL4 but not part of MacRoman), then if the CJK script is installed, we 
> will use CJK font to render it. With the patch, it will skip the CJK fonts and 
> let ATSUI fallback to use "Times New Roman" font to render it. 

Thank you for the explanation. This kind of information needs to go into
comments in the code, for every bit of special-casing that's in there. If
someone wants to come along and clean that code up, they'll need all this
information. Indeed, I'll need it for the ATSUI patch, at some point.


> 2. The problem is start from a certain version of TEC, it start to convert some 
> character with compatability mapping. For example, in the ISO-8859-1 there are 
> three character- 1/4, 1/2, and 3/4. Somehow start from a particular version of 
> TEC (1.5 ?), it convert it to three characters for each one of them. - '1' 
> + '/' + '4' , '1' + '/' + '2', '3' + '/' + '4'. But since the "Times New Roman" 
> have a '1/4' glyph, it is better to render it with 1 glyph instead of 3 glyphs. 
> 

Can you get this information by calling TECGetInfo() ?
Blocks: 160317
I agree with you now seems a good time to go full ATSUI . When the time we wrote
taht part of code, we first try ATSUI, and basically the performance and quality
are very very very bad in 1999. I think with today's hardware and MacOS X ATSUI,
thing change a lot.
I think sfraser is working on that ATSUI issue. So... please focus, this is a
minor tweak before that is ready.
As having worked a lot on some of the problematic encodings (MacIcelandic, 
MacCroatian, MacTurkish etc.) I would like to recommend that you go or a full ATSUI 
implementation.

Covering all those special cases is very messy and believe me it will never look right.
Trying to find fallback glyphs from various fonts is not easy and in the end it will result in 
more work than to just decide to drop it and go for ATSUI.

On the sidenote, Apple will be introducing Unicode-only keyboards in Jaguar so forms 
must also support Unicode input so that the users from countries like Iceland and Turkey 
gan use Google etc.
>I would like to recommend that you go or a full ATSUI 
implementation.
that won't have any impact on this micro fix. 

>On the sidenote, Apple will be introducing Unicode-only keyboards in Jaguar
> so forms must also support Unicode input so that the users from countries
> like Iceland and Turkey gan use Google etc.
THis issue is not related to this bug at all.
We already support unicode-only keyboard input as today.
This issue won't impact this fix at all.

here is sfraser's comment:

Please revise the patch by adding a comment explaining these
special-case exceptions (as in your bugzilla comment).

Thanks
Simon


this patch need to be update after merge with MathML landing
*** Bug 161979 has been marked as a duplicate of this bug. ***
*** Bug 109461 has been marked as a duplicate of this bug. ***
sfraser, please sr=
Attachment #93221 - Attachment is obsolete: true
Comment on attachment 95131 [details] [diff] [review]
updated atch v1.1

r=nhotta

typo?
"have houndred more glyph"
Attachment #95131 - Flags: review+
Comment on attachment 95131 [details] [diff] [review]
updated atch v1.1

sr=sfraser. Now if only all the other special-casing had the same level of
comments.  :-)
Attachment #95131 - Flags: superreview+
Keywords: reviewapproval
fixed and land into trunk 
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
ftang: Thank you. 

I verified that the right fonts are now used for alphabetic characters in the
cases described in this bug and in the duplicates. Even polytonic Greek now
works given a proper font. However, the display of character U+21B5 seems to
have regressed at some point since Chimera branched. (Probably unrelated.) 

I filed bug 163085 about the use of QD rasterizing I mentioned in comment #6.
Attached image screenshot #1
Attached image screenshot 2
OK, don't have any experience in building, patching etc., but if I got it
correctly, then this should be fixed in the nightlies. To me it is _improved_,
but definitely not fixed. There still seem to be some sort of a problem with
Polish glyphs (looks to me more like normal vs. bold difference, than wrong font
used, though I might be wrong). Please have a look at the following attachments
(filed a minute ago):

1. http://www.linuxnews.pl - the first column of the page renders wrong
characters (in terms of size); marked them with red circles. BUT the second
column of the same page renders them perfectly! (marked with green circles).

2. http://www.google.com.pl/language_tools?hl=pl - as in the previous one, the
Polish glyphs seem to be too big (and too thin).

3. (not attached) have a look at the originally mentioned unicode testpage at
http://www.unics.uni-hannover.de/nhtcapri/multilingual1.html - it looks like
glyphs are for some reason bigger/bolder than regular latin letters.
lehu@it.pl, there are two issues, both of which are out of scope of this bug report.

First, the Polish characters that aren't in the MacRoman repertoire are being
rendered using the old QuickDraw rasterizer instead of the new Core Graphics
rasterizer (163085).

The other problem is with the choice of font. Linux News and Google suggest
Arial as their first font choice. However, the version of Arial that comes with
OS X isn't the same version of Arial that comes with Windows XP. The version
that comes with OS X is limited to MacRoman while the version that comes with
Windows XP has glyphs for virtually every language that uses Latin characters.
The ATSUI fallback code seems to get only the first font choice and if the
required glyph isn't in that font, ATSUI will do the fallback on its own without
paying attention to the secondary font specified in CSS. (The general fallback
font is Lucida Grande which comes only as regular and bold--not italic.) So the
pages would look better if they had specified Helvetica as the first choice or
if Mozilla could pass all the CSS font suggestions to ATSUI.

I don't think a bug has been filed yet on this latter issue.
Attached image Bug persists...
Dunno why this bug is considered fixed. It is not, not in any way.

The attachment shows the word "fuÞarker" (from the page
http://homepage.mac.com/nikd/dvd/lotr_fellowship_of_the_ring/), and it is still
rendered in obsolete MacRoman fashion. The first font choice in the style sheet
is indeed Lucida Grande, which is also the default font I use for browsing.
Non-4GL fonts from M$ do not exist on my system.

It is also the roman font variant, neither bold nor italic.

So what gives? No matter how superior Mozilla is to Omniweb in most regards,
this is exactly the kind of stuff that makes Omniweb users laugh their butts
off...

(Btw, naming the attached file "fuÞarker.png" made it impossible to upload, so
I had to rename it to "futharker.png"... *so* trapped in legacy MacRoman.)
nikd@mac.com, what build did you take that screenshot in? Using
FizzillaCFM/2002081803, I view the code:

<p style="font: 12pt 'Lucida Grande';">fu&thorn;arker</p>

rendered apparently all in Lucida Grande and without the spacing bug shown by
yours and the old builds that lack this patch, but it does show the thorn
character in the old-style antialiasing.
2002-08-14-16, i.e. 1.1 fc2. Bug was marked fixed 08-13.

Will take a ultra-fresh gecko now and see if it is still there. If so, I'll
scream some more.
Howdy.  Got here from bug 150485 (basically: &sup2; and a couple other
entities had huge whitespace after them), which they said was really this bug,
and I believed them.  Anyway, I'm on a fresh 2002081803 build now, and &sup2;
looks great (good work!) but some other characters like &#x2081; (subscript 1)
still have that whitespace.  I'd guess it's just another special case to add.
I'll attach a testcase and screenshot, if you want.  (If you tell me this is
another bug / a new bug / etc., I'll probably believe you again, because I'm
not a Mozilla font guru, I just know when it looks ugly. :-)
macron-vowels diplaying different font than the rest of the text.
This new testcase demonstrates that this bug is not fixed.
Reopening per my comment 47.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
WFM 2002-08-18-03, although the "e" has a different x-height...

Why not redefine this whole bug to "Mozilla doesn't use correct fonts (per
style sheet, per preference, per everything) in correct size, correct
antialiasing, correct style, and correct metrics".
The ugliness of macron chars at http://www.olelo.hawaii.edu/haw/ is bug 163085.
Re: comment 49, that's completely bizarre. This is a shot of the same testcase,
in the same build, on my machine, and it doesn't work. nikd@mac.com, your font
file is called "TITUSCBZ.TTF", right?
Right. Checksum:

> cksum TITUSCBZ.TTF 
3238980406 1900232 TITUSCBZ.TTF
Blocks: 116990
It seems the problem in my case is that Mozilla won't display the Cyrillic text
in the Unicode face, instead reverting to the *Cyrillic* font of Times CY, and
then falling back to the first face in which the yat is present; in my case,
Arial Unicode MS.

The same thing is happening when I change the CSS to Code2000 instead; all the
non-Cyrillic text is drawn in that face, but the Cyrillic text isn't. Freakin'
bizarre. Why in the world would it be doing this?
Niklas, do you have the Cyrillic Language Kit and fonts installed?
Okay, I've tracked down the problem to an old Cyrillic font I had installed,
which Mozilla really, really, really doesn't like; ER Bukinist KOI8. I've filed
bug 164488 about it.

I'l reresolve this as fixed.
Status: REOPENED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → FIXED
Greg, nope. I don't have any language kits installed. I think there's native
Cyrillic, Indic etc. support in Jaguar (haven't bought a copy yet), so I am
basically just waiting for Mac OS 9, MacRoman and all that is associated with it
to die.

As argued in bug 164488, I still think this is a dupe of this bug, and that it
is thus not fixed. This new problem is just an extension of it. Does installing
language kits modify the TEC in Mac OS X?
nikd, Cyrillic chars are no longer rendered through TEC on OS X. So whatever the
problem with that particular font is, it is not a TEC problem.

TEC is no longer used for Latin, Cyrillic and Greek. This bug is fixed. Further
short term deuglification is bug 163085. The longer term way to go is bug 121540.

And, yes, among other things Jaguar comes with Cyrillic and Indic support.
However, Mozilla on Jaguar doesn't appear to support the Indic scripts. (Again,
bug 121540 is the reasonable way to go, IMO.)

The way I see it, bug 121540 is mainly held back by a fear of performance issues
and a fear of text rendering regressions in some limited cases.
Bug 121540 is the way to go. It seems like the developers will be just endlessly
addressing various little problems otherwise in old undocumented code that's due
to be replaced on a large scale anyway.

I know, it's easy for me to say it, since I'm not the one writing it. I just
think it will save more work than it will make in the end.
Regression: http://homepage.mac.com/nikd/test/test.html

Þ and ð are again displayed much too small and with CJK spacing.

Mozilla 1.1 release.
I've seen nothing regarding my comment 45.  I would guess it's part of this
bug, but if it's going to be fixed by bug 121540 that's fine, too; I'm not
picky about which bug report fixes it.  :-)

Should this be duped or dependent on that bug, then?  From reading the comments,
I *think* the state of affairs is "This was a problem; we fixed several cases,
but we're not going to waste more time chasing down every last Unicode character
because a fix to bug 121540 will solve them all." -- I'm not sure how to flag
a bug like that, but I guess I'd just like that to be on the record for people
searching for bugs.  (Is that a reasonably accurate summary, or did I interpret
it wrong?)
Re: comment 59, this patch is post-1.1.

Re: comment 60, Ken, try a recent nightly trunk build. If your problem still
exhibits itself, please file a new bug with a testcase for further investigation.
Alright, this patch is post-1.1 (although appearing in Mozilla 1.1b and 1.1fc)
with target 1.2a, while bug 121540, which should obsolete this bug completely,
has target 1.1a, although it did not make it in the 1.1 release.

Am I the only one lost here?
Bug 121540 just needs to be retargeted, that's all.
Okay, have a look at [http://greg.tcp.com/mozilla/Unicode/Numero/Test.xml].
Still bad using FizzillaCFM/2002082909. Anyone else?

(Yours may not look like the second, but it almost certainly won't look like the
first, which it should.)
Yeah, very bad. Wrong x-height (hilight the text), wrong shading (too black).
Same with Lucida Grande, so it is not a problem with Windows TrueType fonts.
Strictly speaking, numero isn't Greek, Cyrillic, or Latin. It's in Letterlike
Symbols, so what's the suggestion, guys? Reopen this or file a new one?
I'd prefer new bug reports instead of keeping reopening this one.
Filed bug 165878 about the problem that only the primary font suggestion is
passed to ATSUI (comment #41).
*** Bug 116990 has been marked as a duplicate of this bug. ***
Depends on: 180372
No longer blocks: 157673
Somehow the arguments here and in similar bugs don't make sense. Consider
http://bugzilla.mozilla.org/attachment.cgi?id=106786&action=view and look at the
HTML source. The preferred font in the CSS is STKaiti, which _does_ contain all
the Pinyin characters used in the sample text (check with TextEdit, although
those accented chars have a completely different style). STKaiti is certainly
not encoded in MacRoman, and it does contain all necessary glyphs, but yet
Lucida Grande is used for all accented Latin chars. The result is of course
pathetic.

Why is the fallback kicking in here? Because characters out of range for a
certain (Mac OS TEC) script (Chinese) are used? And how is bug 165878 gonna take
care of this?

The essence of this bug seems to be that characters out of range for a certain
limited Mac OS script (MacRoman, Simplifed Chinese, KOI8-R, ...) will be screwed
up, effectively shattering multilingual Unicode pages. But then again, adding
Chinese signs to a Latin-based text doesn't have this weird effect.

Since Latin chars are not displayed with the font defined in the stylsheet in
this example, and since this font contains all the necessary glyphs, this bug
cannot be considered fixed. Right?
This bug is fixed. The symptom of this bug was that certain Latin/Greek/Cyrillic
chars were rendered using too wide glyphs from CJK fonts. You are seeing Lucida
Grande instead, so you are seeing another bug. 

As usual, that sort of things will continue to show up until the gfx font
functionality on OS X is implemented using full-throttle ATSUI instead of
patching the pre-OS X code.
The initial bug report concerned Kanji spaces, but this bug eventually evolved:

Comment 6 (Henri Sivonen):

"As ftang suspected, Apple's legacy Text Encoding Converter is the culprit. When
TEC is disabled, the right fonts are used for Icelandic, Old English, Central
European, Greek and Cyrillic characters. However, for some reason the Latin
chars in question don't get CoreGraphics anti-aliasing."

Comment 37 (Henri Sivonen):

"I verified that the right fonts are now used for alphabetic characters in the
cases described in this bug and in the duplicates. Even polytonic Greek now
works given a proper font. However, the display of character U+21B5 seems to
have regressed at some point since Chimera branched. (Probably unrelated.)"

However, as hath been demonstrated, the right font is not at all used. TEC still
hooks in where it shouldn't for some Latin characters, only now in a Chinese
context (which hasn't been considered before). I haven't found another bug about
this.

Fact remains: Mozilla should be avoided in a Pinyin context.
> TEC still hooks in where it shouldn't for some Latin characters, only now 
> in a Chinese context (which hasn't been considered before).

Is there any evidence that the particular problem is caused by TEC and that it
isn't a different bug? Does the problem go away, if you recompile Mozilla with
TEC disabled for everything (including CJK)?
Mark as verified per comment #71, please open new bug for any remaining problem.
Status: RESOLVED → VERIFIED
*** Bug 136597 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: