Closed Bug 111728 (tec-osx) Opened 23 years ago Closed 22 years ago

TEC causes problems with Greek, Cyrillic and some Latin chars (Icelandic, Polish, Czech)

Categories

(Core :: Internationalization, defect)

PowerPC
macOS
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla1.2alpha

People

(Reporter: hsivonen, Assigned: ftang)

References

()

Details

(Keywords: intl)

Attachments

(8 files, 3 obsolete files)

Build ID: 2001112020 FizzillaCFM Steps to reproduce: 1) Load a page that contain eth, thorn, or l with stroke. eg. http://www.unics.uni-hannover.de/nhtcapri/multilingual1.html Actual results: Even if the surrounding text is Times, some letters with stroke, eth and thorn are rendered using another font. The other font has over-wide kerning for Latin text which suggest characters have been designed to match the width of Han/Hangul/Kanji blocks. I don't remember seeing this with Mac OS X 10.0.x which had Japanese fonts but no Korean or Chinese fonts. Also, the problematic chars don't look like chars from the Hiragino fonts. That's why I suspect the characters come from Hei, AppleGothic or Apple LiGothic Medium that came with Mac OS X 10.1. Expected results: Since Times, Helvetica etc. contain glyphs for the characters in question, expected the usual Latin fonts to be used. Additional information: Affected languages include Icelandic, Polish and Old English.
MacOSX->nhotta cc'ing shanjian
Assignee: yokoyama → nhotta
I am not sure if Frank wants to do this after he comes back. Accept for now and set 0.9.9.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.9
Depends on: 105137
No longer depends on: 105137
maybe we should disable TECFallback and let ATSUI fallback kick in directly. Not sure about performacne impact. see bug 111731 also. same issue.
Keywords: intl
QA Contact: teruko → ylong
Target Milestone: mozilla0.9.9 → mozilla1.2
*** Bug 149689 has been marked as a duplicate of this bug. ***
See also bug 148361 and bug 150485 on the Mac Mozilla product.
As ftang suspected, Apple's legacy Text Encoding Converter is the culprit. When TEC is disabled, the right fonts are used for Icelandic, Old English, Central European, Greek and Cyrillic characters. However, for some reason the Latin chars in question don't get CoreGraphics anti-aliasing. Attaching a patch that turns off TEC. Problem: With the patch, I'm seeing severe problems with wrong glyphs showing up semi-randomly at http://www.cs.tut.fi/~jkorpela/html/guide/entities.html
*** Bug 148361 has been marked as a duplicate of this bug. ***
*** Bug 150485 has been marked as a duplicate of this bug. ***
*** Bug 111731 has been marked as a duplicate of this bug. ***
Alias: tec-osx
Summary: Chinese or Korean font used for some Latin chars → TEC causes problems with Greek, Cyrillic and some Latin chars (Icelandic, Polish, Czech)
I suspect Mozilla is treating pages encoded as ISO-8859-1 somehow differently from pages using another encoding. Or at least I can't come up with another explanation for the problems I'm seeing with http://www.cs.tut.fi/~jkorpela/html/guide/entities.html Can anyone more familiar with the code confirm whether my assumption is correct.
Keywords: review
It's probably not about any special treatment of ISO-8859-1, but about non-MacRoman chars getting cached somewhere in such a way that the first-rendered char gets repeated when Mozilla is supposed to draw the next char. After scrolling, the right glyph appears when I select a char. Then that char is repeated when I select other chars until I scroll again. The patch isn't ready for review until the problems with http://www.cs.tut.fi/~jkorpela/html/guide/entities.html are solved.
Keywords: review
The ATSUStyle needs to be re-applied to an ATSUTextLayout after the text pointer has been updated. Attaching a new patch that fixes the issue. Looking for r=.
Attachment #93014 - Attachment is obsolete: true
Keywords: review
Blocks: 159809
the ATSUITextLayout fix seems a good stuff. It might address one of the big problme I face recently. I don't think we could just turn TEC fallback off like this way.
spin off the atsui issue into bug 160001
Henri Sivonen: 1. I don't want to take the disable TEC fallback patch. This will slow down Japanese/Chinese/Korean display 2. I want to take the ATSUI fix. I file bug 160001 for that 3. I think we should skip TEC Fallback for some charactesrs to fulfill your need. How about this. I will add a fast check and skip TEC fallback if the characters is Latin, Cyrillic or Greek. ? Will that solve your problem ?
reassign to ftang
Assignee: nhotta → ftang
Status: ASSIGNED → NEW
Attachment #93064 - Attachment is obsolete: true
Blocks: 157673
Status: NEW → ASSIGNED
Keywords: nsbeta1+
Comment on attachment 93221 [details] [diff] [review] patch with fix in bug 160001 obsolete last patch r=nhotta Please fix indentation.
Attachment #93221 - Flags: review+
Do we know _why_ TEC has issues with Greek, Latin and Cyrillic? This code is complex enough as it is. Adding yet more weird and wonderful special cases scares me.
> How about this. I will add a fast check and skip TEC fallback if the > characters is Latin, Cyrillic or Greek. ? Will that solve your problem ? Yes, that will solve the problem with the duplicates of this bug. However, would it be better to skip TEC for anything except Chinese, Japanese and Korean instead of skipping TEC for Latin, Greek and Cyrillic?
sfraser wrote: >Do we know _why_ TEC has issues with Greek, Latin and Cyrillic? Yes, we know the following reason 1. TEC fallback will pick a the Greek, Latin or Cyrillic from a Japanese, Chinese or Korean font since CJK font have some Greek, Latin and Cyrillic glyphs. so if the font face is "Times New Roman" and some characters is in "Times New Roman" but cannot be encoded in MacRoman (Those characters part of the WGL4 but not part of MacRoman), then if the CJK script is installed, we will use CJK font to render it. With the patch, it will skip the CJK fonts and let ATSUI fallback to use "Times New Roman" font to render it. 2. The problem is start from a certain version of TEC, it start to convert some character with compatability mapping. For example, in the ISO-8859-1 there are three character- 1/4, 1/2, and 3/4. Somehow start from a particular version of TEC (1.5 ?), it convert it to three characters for each one of them. - '1' + '/' + '4' , '1' + '/' + '2', '3' + '/' + '4'. But since the "Times New Roman" have a '1/4' glyph, it is better to render it with 1 glyph instead of 3 glyphs. >However, would it be better to skip TEC for anything except Chinese, >Japanese and Korean instead of skipping TEC for Latin, Greek and Cyrillic? Why it will be better? Please list the reason. Why skip TEC and use ATSUI is better for Thai? Why skip TEC and use ATSUI is better for Devanagri? Why skip TEC and use ATSUI is better for other mac scripts ? Why skip TEC and use ATSUI is better for punctation mark?
Why Times New Roman? I can't think of a *worse* font. Lucida Grande is far more suited for such a replacement, because 1) it is the default Mac OS X font, and 2) it is a linear font that displays very well on computer screens (and is also more in line with CJK calligraphy than copies of roman stone engravings (serif fonts)). On the other hand, I don't even understand why this is still discussed. Using ATSUI would solve a bunch of outstanding bugs right away. Use of ATSUI is really to be expected from serious Mac OS X applications. It is currently the number one Mozilla drawback. Performance hit? Jaguar.
Re: Comment #21 > Why it will be better? Please list the reason. I don't know whether ATSUI one char at a time would be better for all non-CJK, which is why I asked. I thought that perhaps Latin, Greek and Cyrillic aren't the only scripts TEC causes trouble with. (I suppose one-char-at-a-time ATSUI won't work with scripts that require contextual glyph selection or something similar.) Re: Comment #22 > Why Times New Roman? Times, actually. Or Helvetica. Or any Latin font that has glyphs for chars that aren't in the MacRoman repertoire. As far as the Latin script is concerned this bug is about avoiding the font change when the font that is used for a-z also has the other required chars. > Lucida Grande is far more suited for such a replacement, If the suggested font doesn't have the required glyphs, ATSUI will use Lucida Grande if it has the glyphs. > On the other hand, I don't even understand why this is still discussed. > Using ATSUI would solve a bunch of outstanding bugs right away. It would, yes. And I hope Mozilla will move to full throttle ATSUI. However, fixing this bug makes sense in the interim, because we can have this fix now, and moving to full throttle ATSUI will take more time. > Use of ATSUI is really to be expected from serious Mac OS X applications. Indeed.
> 1. TEC fallback will pick a the Greek, Latin or Cyrillic from a Japanese, > Chinese or Korean font since CJK font have some Greek, Latin and Cyrillic > glyphs. so if the font face is "Times New Roman" and some characters is > in "Times New Roman" but cannot be encoded in MacRoman (Those characters part > of the WGL4 but not part of MacRoman), then if the CJK script is installed, we > will use CJK font to render it. With the patch, it will skip the CJK fonts and > let ATSUI fallback to use "Times New Roman" font to render it. Thank you for the explanation. This kind of information needs to go into comments in the code, for every bit of special-casing that's in there. If someone wants to come along and clean that code up, they'll need all this information. Indeed, I'll need it for the ATSUI patch, at some point. > 2. The problem is start from a certain version of TEC, it start to convert some > character with compatability mapping. For example, in the ISO-8859-1 there are > three character- 1/4, 1/2, and 3/4. Somehow start from a particular version of > TEC (1.5 ?), it convert it to three characters for each one of them. - '1' > + '/' + '4' , '1' + '/' + '2', '3' + '/' + '4'. But since the "Times New Roman" > have a '1/4' glyph, it is better to render it with 1 glyph instead of 3 glyphs. > Can you get this information by calling TECGetInfo() ?
Blocks: 160317
I agree with you now seems a good time to go full ATSUI . When the time we wrote taht part of code, we first try ATSUI, and basically the performance and quality are very very very bad in 1999. I think with today's hardware and MacOS X ATSUI, thing change a lot. I think sfraser is working on that ATSUI issue. So... please focus, this is a minor tweak before that is ready.
As having worked a lot on some of the problematic encodings (MacIcelandic, MacCroatian, MacTurkish etc.) I would like to recommend that you go or a full ATSUI implementation. Covering all those special cases is very messy and believe me it will never look right. Trying to find fallback glyphs from various fonts is not easy and in the end it will result in more work than to just decide to drop it and go for ATSUI. On the sidenote, Apple will be introducing Unicode-only keyboards in Jaguar so forms must also support Unicode input so that the users from countries like Iceland and Turkey gan use Google etc.
>I would like to recommend that you go or a full ATSUI implementation. that won't have any impact on this micro fix.
>On the sidenote, Apple will be introducing Unicode-only keyboards in Jaguar > so forms must also support Unicode input so that the users from countries > like Iceland and Turkey gan use Google etc. THis issue is not related to this bug at all. We already support unicode-only keyboard input as today. This issue won't impact this fix at all.
here is sfraser's comment: Please revise the patch by adding a comment explaining these special-case exceptions (as in your bugzilla comment). Thanks Simon
this patch need to be update after merge with MathML landing
*** Bug 161979 has been marked as a duplicate of this bug. ***
*** Bug 109461 has been marked as a duplicate of this bug. ***
sfraser, please sr=
Attachment #93221 - Attachment is obsolete: true
Comment on attachment 95131 [details] [diff] [review] updated atch v1.1 r=nhotta typo? "have houndred more glyph"
Attachment #95131 - Flags: review+
Comment on attachment 95131 [details] [diff] [review] updated atch v1.1 sr=sfraser. Now if only all the other special-casing had the same level of comments. :-)
Attachment #95131 - Flags: superreview+
Keywords: reviewapproval
fixed and land into trunk
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
ftang: Thank you. I verified that the right fonts are now used for alphabetic characters in the cases described in this bug and in the duplicates. Even polytonic Greek now works given a proper font. However, the display of character U+21B5 seems to have regressed at some point since Chimera branched. (Probably unrelated.) I filed bug 163085 about the use of QD rasterizing I mentioned in comment #6.
Attached image screenshot #1
Attached image screenshot 2
OK, don't have any experience in building, patching etc., but if I got it correctly, then this should be fixed in the nightlies. To me it is _improved_, but definitely not fixed. There still seem to be some sort of a problem with Polish glyphs (looks to me more like normal vs. bold difference, than wrong font used, though I might be wrong). Please have a look at the following attachments (filed a minute ago): 1. http://www.linuxnews.pl - the first column of the page renders wrong characters (in terms of size); marked them with red circles. BUT the second column of the same page renders them perfectly! (marked with green circles). 2. http://www.google.com.pl/language_tools?hl=pl - as in the previous one, the Polish glyphs seem to be too big (and too thin). 3. (not attached) have a look at the originally mentioned unicode testpage at http://www.unics.uni-hannover.de/nhtcapri/multilingual1.html - it looks like glyphs are for some reason bigger/bolder than regular latin letters.
lehu@it.pl, there are two issues, both of which are out of scope of this bug report. First, the Polish characters that aren't in the MacRoman repertoire are being rendered using the old QuickDraw rasterizer instead of the new Core Graphics rasterizer (163085). The other problem is with the choice of font. Linux News and Google suggest Arial as their first font choice. However, the version of Arial that comes with OS X isn't the same version of Arial that comes with Windows XP. The version that comes with OS X is limited to MacRoman while the version that comes with Windows XP has glyphs for virtually every language that uses Latin characters. The ATSUI fallback code seems to get only the first font choice and if the required glyph isn't in that font, ATSUI will do the fallback on its own without paying attention to the secondary font specified in CSS. (The general fallback font is Lucida Grande which comes only as regular and bold--not italic.) So the pages would look better if they had specified Helvetica as the first choice or if Mozilla could pass all the CSS font suggestions to ATSUI. I don't think a bug has been filed yet on this latter issue.
Attached image Bug persists...
Dunno why this bug is considered fixed. It is not, not in any way. The attachment shows the word "fuÞarker" (from the page http://homepage.mac.com/nikd/dvd/lotr_fellowship_of_the_ring/), and it is still rendered in obsolete MacRoman fashion. The first font choice in the style sheet is indeed Lucida Grande, which is also the default font I use for browsing. Non-4GL fonts from M$ do not exist on my system. It is also the roman font variant, neither bold nor italic. So what gives? No matter how superior Mozilla is to Omniweb in most regards, this is exactly the kind of stuff that makes Omniweb users laugh their butts off... (Btw, naming the attached file "fuÞarker.png" made it impossible to upload, so I had to rename it to "futharker.png"... *so* trapped in legacy MacRoman.)
nikd@mac.com, what build did you take that screenshot in? Using FizzillaCFM/2002081803, I view the code: <p style="font: 12pt 'Lucida Grande';">fu&thorn;arker</p> rendered apparently all in Lucida Grande and without the spacing bug shown by yours and the old builds that lack this patch, but it does show the thorn character in the old-style antialiasing.
2002-08-14-16, i.e. 1.1 fc2. Bug was marked fixed 08-13. Will take a ultra-fresh gecko now and see if it is still there. If so, I'll scream some more.
Howdy. Got here from bug 150485 (basically: &sup2; and a couple other entities had huge whitespace after them), which they said was really this bug, and I believed them. Anyway, I'm on a fresh 2002081803 build now, and &sup2; looks great (good work!) but some other characters like &#x2081; (subscript 1) still have that whitespace. I'd guess it's just another special case to add. I'll attach a testcase and screenshot, if you want. (If you tell me this is another bug / a new bug / etc., I'll probably believe you again, because I'm not a Mozilla font guru, I just know when it looks ugly. :-)
macron-vowels diplaying different font than the rest of the text.
This new testcase demonstrates that this bug is not fixed.
Reopening per my comment 47.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
WFM 2002-08-18-03, although the "e" has a different x-height... Why not redefine this whole bug to "Mozilla doesn't use correct fonts (per style sheet, per preference, per everything) in correct size, correct antialiasing, correct style, and correct metrics".
The ugliness of macron chars at http://www.olelo.hawaii.edu/haw/ is bug 163085.
Re: comment 49, that's completely bizarre. This is a shot of the same testcase, in the same build, on my machine, and it doesn't work. nikd@mac.com, your font file is called "TITUSCBZ.TTF", right?
Right. Checksum: > cksum TITUSCBZ.TTF 3238980406 1900232 TITUSCBZ.TTF
Blocks: 116990
It seems the problem in my case is that Mozilla won't display the Cyrillic text in the Unicode face, instead reverting to the *Cyrillic* font of Times CY, and then falling back to the first face in which the yat is present; in my case, Arial Unicode MS. The same thing is happening when I change the CSS to Code2000 instead; all the non-Cyrillic text is drawn in that face, but the Cyrillic text isn't. Freakin' bizarre. Why in the world would it be doing this?
Niklas, do you have the Cyrillic Language Kit and fonts installed?
Okay, I've tracked down the problem to an old Cyrillic font I had installed, which Mozilla really, really, really doesn't like; ER Bukinist KOI8. I've filed bug 164488 about it. I'l reresolve this as fixed.
Status: REOPENED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → FIXED
Greg, nope. I don't have any language kits installed. I think there's native Cyrillic, Indic etc. support in Jaguar (haven't bought a copy yet), so I am basically just waiting for Mac OS 9, MacRoman and all that is associated with it to die. As argued in bug 164488, I still think this is a dupe of this bug, and that it is thus not fixed. This new problem is just an extension of it. Does installing language kits modify the TEC in Mac OS X?
nikd, Cyrillic chars are no longer rendered through TEC on OS X. So whatever the problem with that particular font is, it is not a TEC problem. TEC is no longer used for Latin, Cyrillic and Greek. This bug is fixed. Further short term deuglification is bug 163085. The longer term way to go is bug 121540. And, yes, among other things Jaguar comes with Cyrillic and Indic support. However, Mozilla on Jaguar doesn't appear to support the Indic scripts. (Again, bug 121540 is the reasonable way to go, IMO.) The way I see it, bug 121540 is mainly held back by a fear of performance issues and a fear of text rendering regressions in some limited cases.
Bug 121540 is the way to go. It seems like the developers will be just endlessly addressing various little problems otherwise in old undocumented code that's due to be replaced on a large scale anyway. I know, it's easy for me to say it, since I'm not the one writing it. I just think it will save more work than it will make in the end.
Regression: http://homepage.mac.com/nikd/test/test.html Þ and ð are again displayed much too small and with CJK spacing. Mozilla 1.1 release.
I've seen nothing regarding my comment 45. I would guess it's part of this bug, but if it's going to be fixed by bug 121540 that's fine, too; I'm not picky about which bug report fixes it. :-) Should this be duped or dependent on that bug, then? From reading the comments, I *think* the state of affairs is "This was a problem; we fixed several cases, but we're not going to waste more time chasing down every last Unicode character because a fix to bug 121540 will solve them all." -- I'm not sure how to flag a bug like that, but I guess I'd just like that to be on the record for people searching for bugs. (Is that a reasonably accurate summary, or did I interpret it wrong?)
Re: comment 59, this patch is post-1.1. Re: comment 60, Ken, try a recent nightly trunk build. If your problem still exhibits itself, please file a new bug with a testcase for further investigation.
Alright, this patch is post-1.1 (although appearing in Mozilla 1.1b and 1.1fc) with target 1.2a, while bug 121540, which should obsolete this bug completely, has target 1.1a, although it did not make it in the 1.1 release. Am I the only one lost here?
Bug 121540 just needs to be retargeted, that's all.
Okay, have a look at [http://greg.tcp.com/mozilla/Unicode/Numero/Test.xml]. Still bad using FizzillaCFM/2002082909. Anyone else? (Yours may not look like the second, but it almost certainly won't look like the first, which it should.)
Yeah, very bad. Wrong x-height (hilight the text), wrong shading (too black). Same with Lucida Grande, so it is not a problem with Windows TrueType fonts.
Strictly speaking, numero isn't Greek, Cyrillic, or Latin. It's in Letterlike Symbols, so what's the suggestion, guys? Reopen this or file a new one?
I'd prefer new bug reports instead of keeping reopening this one.
Filed bug 165878 about the problem that only the primary font suggestion is passed to ATSUI (comment #41).
*** Bug 116990 has been marked as a duplicate of this bug. ***
Depends on: 180372
No longer blocks: 157673
Somehow the arguments here and in similar bugs don't make sense. Consider http://bugzilla.mozilla.org/attachment.cgi?id=106786&action=view and look at the HTML source. The preferred font in the CSS is STKaiti, which _does_ contain all the Pinyin characters used in the sample text (check with TextEdit, although those accented chars have a completely different style). STKaiti is certainly not encoded in MacRoman, and it does contain all necessary glyphs, but yet Lucida Grande is used for all accented Latin chars. The result is of course pathetic. Why is the fallback kicking in here? Because characters out of range for a certain (Mac OS TEC) script (Chinese) are used? And how is bug 165878 gonna take care of this? The essence of this bug seems to be that characters out of range for a certain limited Mac OS script (MacRoman, Simplifed Chinese, KOI8-R, ...) will be screwed up, effectively shattering multilingual Unicode pages. But then again, adding Chinese signs to a Latin-based text doesn't have this weird effect. Since Latin chars are not displayed with the font defined in the stylsheet in this example, and since this font contains all the necessary glyphs, this bug cannot be considered fixed. Right?
This bug is fixed. The symptom of this bug was that certain Latin/Greek/Cyrillic chars were rendered using too wide glyphs from CJK fonts. You are seeing Lucida Grande instead, so you are seeing another bug. As usual, that sort of things will continue to show up until the gfx font functionality on OS X is implemented using full-throttle ATSUI instead of patching the pre-OS X code.
The initial bug report concerned Kanji spaces, but this bug eventually evolved: Comment 6 (Henri Sivonen): "As ftang suspected, Apple's legacy Text Encoding Converter is the culprit. When TEC is disabled, the right fonts are used for Icelandic, Old English, Central European, Greek and Cyrillic characters. However, for some reason the Latin chars in question don't get CoreGraphics anti-aliasing." Comment 37 (Henri Sivonen): "I verified that the right fonts are now used for alphabetic characters in the cases described in this bug and in the duplicates. Even polytonic Greek now works given a proper font. However, the display of character U+21B5 seems to have regressed at some point since Chimera branched. (Probably unrelated.)" However, as hath been demonstrated, the right font is not at all used. TEC still hooks in where it shouldn't for some Latin characters, only now in a Chinese context (which hasn't been considered before). I haven't found another bug about this. Fact remains: Mozilla should be avoided in a Pinyin context.
> TEC still hooks in where it shouldn't for some Latin characters, only now > in a Chinese context (which hasn't been considered before). Is there any evidence that the particular problem is caused by TEC and that it isn't a different bug? Does the problem go away, if you recompile Mozilla with TEC disabled for everything (including CJK)?
Mark as verified per comment #71, please open new bug for any remaining problem.
Status: RESOLVED → VERIFIED
*** Bug 136597 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: