Closed Bug 85373 Opened 24 years ago Closed 10 years ago

combining characters / combining mark not supported

Categories

(Core :: Internationalization, defect)

x86
Windows 98
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
Future

People

(Reporter: Martin.T.Kutschker, Unassigned)

Details

(Keywords: fonts)

Attachments

(2 files)

From Bugzilla Helper:
BuildID:    2001050515 and 2001060703 (milestones 0.9/0.9.1)

In pre 0.9.x milestones the combing overline was displayed correctly above the 
previous character. Since 0.9 this is broken (you get a separate and rather 
long overscore), while it still works for other combining characters (eg 
combinig double overline).

Reproducible: Always
Steps to Reproduce:
Eg. "√x̅" as a button label (square root of X)

Actual Results:  The sqare root character, the X and an overscore left to to 
right.

Expected Results:  The sqare root character, the X with the overscore ABOVE the 
X.

See the effect by installing mozCalc (http://mozcalc.mozdev.org).
These two testcases work correctly for me on win2k CVS build from today.
But they don't work on Linux trunk 2001061121, or Mac 2001060708; on those
systems, the 773 displays as "?".

To my knowledge, xptoolkit does not deal with character-to-glyph mapping. 
It just uses the underlying modules.

-> i18n

Status: UNCONFIRMED → NEW
Ever confirmed: true
Sigh. Didn't add the testcases or reassign. Let's try that again.

These two testcases work correctly for me on win2k CVS build from today.
But they don't work on Linux trunk 2001061121, or Mac 2001060708; on those
systems, the 773 displays as "?".

<?xml version="1.0"?>
<?xml-stylesheet href="chrome://global/skin" type="text/css"?>
<window
  xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul"
  xmlns:html="http://www.w3.org/1999/xhtml">
    <button label="&#8730;x&#773;"/>
</window>


<html>
<body>
  <form>
    <input type='button' value="&#8730;x&#773;"/>
  </form>
</body>
</html>


To my knowledge, xptoolkit does not deal with character-to-glyph mapping. 
It just uses the underlying modules.

-> i18n
Assignee: trudelle → nhotta
Component: XP Toolkit/Widgets: XUL → Internationalization
QA Contact: jrgm → andreasb
I tried again using Windows build 2001061204 (German Win98 SE), but I did not
work either.
Martin, could you attach screen shots, both actual and expected results, thanks.
Attached image expected behaviour
Attached image actual behaviour
We have not plan to support combining mark yet. mark it as future. 
Assignee: nhotta → ftang
Target Milestone: --- → Future
Hang on! It did work for 0.7 to 0.8.1 and it still works for other combining
characters. I remember claims that Mozilla has the best Unicode support to be
found. "Future" is really disappointing.
I think we never support combination mark correctly. You may accidentally get 
correct behavior if the font you choose contains both x and the overline. For 
example, if you use
<div style="font-family: 'Lucida Sans Unicode';">#8730;x&#773;</style>
in html, it may still accidentally work. 
Status: NEW → ASSIGNED
QA Contact: andreasb → ylong
Changed the subject to reflect the real issue and added "fonts" 
and "correctness" as keywords, though would have entered "unicode" if it had 
been allowed.
Keywords: correctness, fonts
Summary: combining overline (0x0305) not combining → combining characters / combining mark not supported
>  To my knowledge, xptoolkit does not deal with character-to-glyph mapping. 
> It just uses the underlying modules.

I guess this is the case and underlying modules don't do much for
Latin combinging characters and some other combining characters. 

Whether combining characters are rendered
as 'spacing' or 'combing'(non-spacing) seems completely depneds on
what font is used to render the page both under Windows 2k/XP
and Linux/Unix/X11. For instance, 
http://www.columbia.edu/kermit/st-erkenwald.html
gets rendered as expected when CODE2000 font by James Kass is
used while it's not if Arial MS Unicode is used under MS Windows XP/2k.
 
Another example is http://jshin.net/i18n/korean/hunmin.html.
It's only displayed correctly when fonts in which 
glyphs for conjoining Hangul vowels and final consonants have zero-widths. 
are used. 

It's not clear what Mozilla has to do in this case. It can be
argued that Mozilla should delegate this task
to underlying rendering engines available in OS'
(Uniscribe, Pango, QuickDraw/AAT)  along
with fonts with advanced features like Opentype. Others may
think that Mozilla should do more for these cases like
it does for Thai (and partly U+1100 Hangul Jamos under X11). 
sorry for spamming. I didn't realize that the URL field is empty. Becase
Middle
English sample at Kermit web page seems as good as any others for demonstrating
the issue at hand, it'd be nice if somebody with the previlige to do so
would add it to the URL field. 
 
> It's not clear what Mozilla has to do in this case. It can be
> argued that Mozilla should delegate this task
> to underlying rendering engines available in OS'
> (Uniscribe, Pango, QuickDraw/AAT)  along
> with fonts with advanced features like Opentype. Others may
> think that Mozilla should do more for these cases like
> it does for Thai (and partly U+1100 Hangul Jamos under X11). 

I was wrong thinking that the _entire_ task of rendering Latin combining
characters (for that matter, any combining characters) can be
delegated to rendering engines offered by OS'. Mozilla still has
to do some works. The following is what James Kass (the designer
of CODE2000 Unicode font) wrote the following to the Unicode list:

 Code2000 has only minimal tables for Latin OpenType pending system
support from Microsoft needed for testing purposes.  For glyph
positioning, only 3 kerning pairs are included.  For glyph substitution,
only 10 discretionary ligatures plus several 'enclosed' glyph substitutions.

Whatever is happening in Mozilla to improve the appearance of Latin
text with combiners must be an innovation of the good folks at Mozilla.

Lacking Latin OpenType support in Uniscribe, many "core" fonts
available through Microsoft may well have chosen to use dotted
circle glyphs as the default display glyphs while awaiting system
support.  This is probably a good approach, because fonts like
Code2000, Cardo, et cetera, still have to rely on default glyph
positioning in most products, which can result in undesirable
overstriking.  Where they don't overstrike, these combiners
are as often as not poorly aligned.  There's very little font
developers can do to correct this without OpenType support
being enabled for Latin script.

Microsoft is working on adding Latin OpenType support to
Uniscribe.  Meanwhile, some browsers seem to be attempting
to display, for example, the string "a" + "combining acute" as
"aacute" if the glyph is available in the font as a precomposed
glyph and the character is available in the Standard as a
precomposed character.  This is why some combiners may
seem to work in certain cases, and not others.

For instance, in Outlook Express on Windows 9x, the following
alphabet + combining acutes is entered as the single letter
followed by the combining acute mark.  In the default font
here, the expected result is that every single combining acute
glyph would overstrike its corresponding base letter.  This is
because the default combining glyph position in this font is
designed for lower case letters (with no ascenders) heights...:
(the following line is in UTF-8. set CharacterCoding to UTF-8)

ĀB̄C̄D̄ĒF̄ḠH̄ĪJ̄K̄L̄M̄N̄ŌP̄Q̄R̄S̄T̄ŪV̄W̄X̄ȲZ̄

...but, on this system, the capital letters A, E, G, I, O, and U are
all getting a combining macron at the correct caps height.  This
is not happening based upon any instructions within the font.
Rather, the system appears to be making these substitutions
based on some system table, which in turn is based on TUS.
(The rest of the capital letters look awful, they're being
overstriked (overstricken?) by the combiners.)

Anyway, hope this info is helpful.
This problem seems to affect only some combining characters. For instance, this
URI shows that combining key caps combine, but not the combining circle:

http://www.cadenceweb.com:8080/newsletter/sheerin/test/index.html#ExpertSet

This page provides a more complete test of all combining characters:
http://www.cadenceweb.com:8080/newsletter/sheerin/test/ExpertCharacterSet.html
I suspect that the characters that appear to combine do so because the font
provides incorrect metrics for the combining glyph.
> I suspect that the characters that appear to combine do so because the font
> provides incorrect metrics for the combining glyph

Certainly it's font depedent as well as platform/toolkit depedent. However,
I would not say a font provides "incorrect" metrics if it assigns
zero-advance-width for non-spacing combing characters. Given the fact that
neither Uniscribe(at least released version) nor Pango (for that
matter, ICU layout part is not an exception. ATSUI may do better, but
I have no info.) provides support for combining marks to use with
Latin/Cyrillic/Greek letters, it's a reasonable fallback to assign
zero advance width to non-spacing combining marks to make them work
for quite a lot of cases (by simple overstriking). It may not even be a 
fallback but is arguably the 'right thing' to do.

As for platform-depdent part... Mozilla-Win uses standard text APIs
as opposed to Uniscribe APIs used by MS IE. One of the largest difference
between two[1] is that the former supports complex script handling
on Win2k/XP only while the latter supports it on any lang. version
of Win9x/ME as well as on Win2k/XP. [2]  At the moment, there's little
difference between two approaches in terms of Latin combining mark
handling because MS has just begun to implement it in Uniscribe.
However, when it becomes available, on Win9x/ME Mozilla-win
cannot handle it while MS IE can do it across Win32 platforms. 

On other platforms, none of X11/gtk(x11core/FT, Xft) supports it.
I'm not sure of the situation on Mac, but Mozilla doesn't use
Mac's native ATSUI so that I guess it's similar on Mac. 


[1] Another difference is that caret movement/positioning/selection
don't work for complex scripts if standard text APIs are used.

[2] Even standard text APIs provides support for Hebrew on Hebrew Win9x/ME , 
Arabic on Arabic Win9x/ME and Thai on Thai Win9x/ME. However, Indic scripts 
are not so lucky because MS never supported Indic scripts with Win32 'A' APIs 
that are used on Win9x/ME. 
what a hack. I have not touch mozilla code for 2 years. I didn't read these bugs
for 2 years. And they are still there. Just close them as won't fix to clean up.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → WONTFIX
Mass Re-open of Frank Tangs Won't fix debacle. Spam is his responsibility not my own
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Mass Re-assinging Frank Tangs old bugs that he closed won't fix and had to be
re-open. Spam is his fault not my own
Assignee: ftang → nobody
Status: REOPENED → NEW
Looks similar to bug 197649.
do we still (in 2008, with Firefox 3 coming) support Windows 98 ?

can I close this as Won't Fix ?
QA Contact: amyy → i18n
Gecko has supported combining characters from at least the Firefox 1.5 time. Reopen if you think there is a specific problem that doesn't work correctly.
Status: NEW → RESOLVED
Closed: 20 years ago10 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: