Open Bug 40882 Opened 24 years ago Updated 2 years ago

should combine characters (diacritic and base) separated by element (frame) boundaries

Categories

(Core :: Layout: Text and Fonts, defect, P3)

defect

Tracking

()

Future

People

(Reporter: lkemmel, Unassigned)

References

Details

(Keywords: intl)

From Bugzilla Helper:
User-Agent: Mozilla/4.61 [en] (WinNT; I)
BuildID:    2000030708

Diacritic character should be displayed under its base, not after.

HTML below demonstrates that. A diacritic (Hebrew vowel QAMATS) is displayed 
adjacent to the base letter (Hebrew letter ALEF):

<HTML>
  <HEAD></HEAD>
  <BODY>
    <P><font face="Arial"><B>&#x05D0;</B>&#x05B8;
    </P>
  </BODY>
</HTML>

The case when the vowel and the letter are in the same tag, produces correct 
behaviour.


Reproducible: Always

I expect the similar results when having some set of characters that form 
ligature.
Moving to Internationalization, although this could be a purely layout problem.
Assignee: asadotzler → ftang
Component: Browser-General → Internationalization
QA Contact: jelwell → teruko
This is an issue about rendering combination mark across FRAME. 
Since this particular issue is realted to hebrew. Reassign to mkaply@us.ibm.com
Assignee: ftang → mkaply
Status: UNCONFIRMED → NEW
Ever confirmed: true
Sorry everyone is getting this again - I wanted Lina's comments in the bug.


One remark: This issue doesn't seem to be specific to hebrew only. 

Some quotes from the Unicode Standard: 
"Some scripts, such as Hebrew, Arabic, and the scripts of India and Southeast 
Asia, have combining characters indicated in the charts in relation to dotted 
circles to show their position relative to the base character."
"Diacritics are the principal class of combining characters used with European 
alphabets."

I tried the following test case:

<p>&#x0061;&#x030B;</p>
<p><b>&#x0061;</b>&#x030B;</p>

Each paragraph contains identical sequence of 2 characters (the latin letter "a" 
and the diactiric "double acute accent" (used in Hungarian) ), but in the 1st 
paragraph they appear in the same token, in the 2nd -- in separate tokens. 
Although I don't know if this combination in correct linguistically, it's 
rational to suppose that it should form the same shape. However, I could see 
that these 2 paragraphs were rendered differently.

Also, this problem is not specific to combining classes. Ligatures behave 
similarly; for example, the Arabic ligature LamAlef:

<p>&#x0644;&#x0627;</p>
<p><b>&#x0644;</b>&#x0627;</p>

(Only the 1st paragraph is displayed properly.)
Status: NEW → ASSIGNED
Changed QA contact to andreasb@netscape.com.
QA Contact: teruko → andreasb
This bug belongs in layout based on the new testcase from Lina.

Layout is not combining characters across frames.

Again, here is a non Hebrew testcase:

<p>&#x0061;&#x030B;</p>
<p><b>&#x0061;</b>&#x030B;</p>
Assignee: mkaply → karnaze
Status: ASSIGNED → NEW
Component: Internationalization → Layout
QA Contact: andreasb → petersen
This bug is fixed for Hebrew combinings (marked with #ifdef FIX_FOR_BUG_40882 
in layout/html/base/src/nsLineLayout.cpp and 
layout/base/src/nsBidiPresUtils.cpp).
However, it could be preferable to use OpenType tables, as Erik suggested 
(which, at the same time, would kill another bird - wrong positioning of 
combinings (Hebrew, at least) on a non-bidi platform).
not a table bug. reassigning
Assignee: karnaze → attinasi
Target Milestone: --- → Future
->fonts & text
Assignee: attinasi → font
Component: Layout → Layout: Fonts and Text
QA Contact: petersen → ian
Summary: Diacritic character and its base character, when contained in separate HTML tags, are positioned incorrectly → Need to combine characters across frames
Summary: Need to combine characters across frames → should combine characters (diacritic and base) separated by element (frame) boundaries
we need a generic grapheme cluster breaker/iterator that works across 'frames'. 
Depends on: grapheme-breaker
Keywords: intl
*** Bug 297707 has been marked as a duplicate of this bug. ***
Gecko 1.9 has some improvements. Testcases from comment 0 and the duped bug work for me. The testcases in comment 10 mostly work short of the black points in the Hebrew tests. The testcases in comment 5 are still broken.
In Gecko 1.9 I think there are two cases that don't work:
1) when the base character and the diacritics are in different fonts (including bold and non-bold). There might be room for improvement here, but perfection is unattainable.
2) if the base character and the diacritic have different colors, the diacritic is forced to the color of the base character.
Assignee: layout.fonts-and-text → nobody
QA Contact: ian → layout.fonts-and-text
Severity: minor → S4
You need to log in before you can comment on or make changes to this bug.