Closed Bug 553981 Opened 15 years ago Closed 14 years ago

first-letter splits up Hangul Jamo sequences

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla2.0b8

People

(Reporter: smontagu, Assigned: jfkthame)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 3 obsolete files)

The failing reftest (layout/reftests/first-letter/329069-2.html) 15 years ago Simon Montagu :smontagu 167 bytes, text/html		Details
Reference rendering (layout/reftests/first-letter/329069-2-ref.html) 15 years ago Simon Montagu :smontagu 198 bytes, text/html		Details
patch, enhance gfxPlatform::SetupClusterBoundaries to support Hangul, etc 14 years ago Jonathan Kew [:jfkthame] 24.48 KB, patch	karlt : review+	Details \| Diff \| Splinter Review
patch for checkin 14 years ago Karl Tomlinson (:karlt) 25.43 KB, patch		Details \| Diff \| Splinter Review
patch, updated to fix the textrun assertion 14 years ago Jonathan Kew [:jfkthame] 25.79 KB, patch	karlt : review- karlt : feedback+	Details \| Diff \| Splinter Review
patch, updated properly this time! 14 years ago Jonathan Kew [:jfkthame] 26.75 KB, patch	karlt : review+ roc : approval2.0+	Details \| Diff \| Splinter Review

Simon Montagu :smontagu

Reporter

Description

•

15 years ago

Spun off from bug 553834. See also attachment 213727 [details] in bug 329069.

Simon Montagu :smontagu

Reporter

Comment 1

•

15 years ago

Attached file The failing reftest (layout/reftests/first-letter/329069-2.html) — Details

Simon Montagu :smontagu

Reporter

Comment 2

•

15 years ago

Attached file Reference rendering (layout/reftests/first-letter/329069-2-ref.html) — Details

Karl Tomlinson (:karlt)

Comment 3

•

14 years ago

Correcting the styles in the test makes it pass on Linux (only) on m-c. (It doesn't pass on 1.9.2, so something has been fixed.) http://hg.mozilla.org/mozilla-central/rev/7b49ee09259b

Karl Tomlinson (:karlt)

Updated

•

14 years ago

Blocks: 329069

Jonathan Kew [:jfkthame]

Assignee

Comment 4

•

14 years ago

This should be fixed by enhancing gfxPlatform::SetupClusterBoundaries() to handle Hangul syllables. Starting on a patch...

Jonathan Kew [:jfkthame]

Assignee

Comment 5

•

14 years ago

Attached patch patch, enhance gfxPlatform::SetupClusterBoundaries to support Hangul, etc (obsolete) — Details — Splinter Review

This patch should also fix bug 614468.

Assignee: nobody → jfkthame

Attachment #495020 - Flags: review?(karlt)

Karl Tomlinson (:karlt)

Updated

•

14 years ago

Blocks: 614468

Karl Tomlinson (:karlt)

Comment 6

•

14 years ago

Comment on attachment 495020 [details] [diff] [review] patch, enhance gfxPlatform::SetupClusterBoundaries to support Hangul, etc > # read Scripts.txt >-my %scriptAliases = ( >- 'MEETEI_MAYEK' => 'MEITEI_MAYEK' >-); >- > open FH, "< $ARGV[1]/Scripts.txt" or die "can't open UCD file Scripts.txt\n"; > while (<FH>) { > if (m/([0-9A-F]{4,6})(?:\.\.([0-9A-F]{4,6}))*\s+;\s+([^ ]+)/) { > my $script = uc($3); >- $script = $scriptAliases{$script} if exists $scriptAliases{$script}; Can you leave a note in this bug, please, explaining why this is no longer needed? > // 8-bit text doesn't have clusters. > // XXX is this true in all languages??? > // behdad: don't think so. Czech for example IIRC has a > // 'ch' grapheme. >+ // jfkthame: but that's not expected to behave as a grapheme cluster >+ // for selection/editing/etc. FWIW "Slovak ch digraph" seems to be grouped with other clusters that I expect would normally behave as a cluster for selection and first-letter. http://www.unicode.org/reports/tr29/tr29-17.html#Grapheme_Cluster_Boundaries Could we get "control" (incl Cf) characters here? GB4 (and GB5 wrt Prepend if the Thai/Lao code were enabled) are not handled. Those are probably unusual situations so no need to address that here. >+ PRUint8 category = gfxUnicodeProperties::GetGeneralCategory(ch); I would suggest "hb_category_t category =" for consistency in the comparisons below. Logically the relational operators will perform integral promotion of operands anyway as part of "usual arithmetic conversions". How much difference this makes to compiled code and warnings about comparison between different types, I don't know. >+ if ((ch & ~0xff) == 0x1100 || >+ (ch >= 0xa960 && ch <= 0xa97f) || >+ (ch >= 0xac00 && ch <= 0xd7ff)) Indentation. >+ if ( (hangulState & gfxUnicodeProperties::LV) && >+ !(hangulState & gfxUnicodeProperties::T)) >+ { >+ aTextRun->SetGlyphs(i, extendCluster, nsnull); >+ hangulState = gfxUnicodeProperties::V; >+ } if (hangulState && !(hangulState & gfxUnicodeProperties::T)) is a little simpler, or perhaps with hangulState != NONE for clarity. IIU GB6 - 8 correctly, hangulState should be set here regardless of the old hangulState. >+ if (hangulState & (gfxUnicodeProperties::V | >+ gfxUnicodeProperties::T)) >+ { >+ aTextRun->SetGlyphs(i, extendCluster, nsnull); >+ hangulState = gfxUnicodeProperties::T; >+ } Similarly for hangulState here. It may be simplest to always initialize hangulType to NONE at the start of the loop. That way hangulState can be set to hangulType at the end of the loop and we don't need to make sure we have a "hangulState = gfxUnicodeProperties::NONE" for every case. >+ typedef enum { >+ NONE = 0x00, >+ L = 0x01, >+ V = 0x02, >+ T = 0x04, >+ LV = 0x03, >+ LVT = 0x07 >+ } HSType; The normal C++ idiom is enum HSType { ... }. That might lead to more helpful compiler messages than a typedef for an anonymous enum (but I'm don't know for certain). This claims the identifiers gfxUnicodeProperties::NONE, gfxUnicodeProperties::L, etc, which may lead to conflicts in the future. How about HST_NONE/HST_L/etc.? r+ if you explain MEETEI_MAYEK, fix up the indentation, always set hangulState for each character, and are happy with HST_*. I'll leave the other things to your discretion.

Attachment #495020 - Flags: review?(karlt) → review+

Jonathan Kew [:jfkthame]

Assignee

Comment 7

•

14 years ago

(In reply to comment #6) > > # read Scripts.txt > >-my %scriptAliases = ( > >- 'MEETEI_MAYEK' => 'MEITEI_MAYEK' > >-); > >- > > open FH, "< $ARGV[1]/Scripts.txt" or die "can't open UCD file Scripts.txt\n"; > > while (<FH>) { > > if (m/([0-9A-F]{4,6})(?:\.\.([0-9A-F]{4,6}))*\s+;\s+([^ ]+)/) { > > my $script = uc($3); > >- $script = $scriptAliases{$script} if exists $scriptAliases{$script}; > > Can you leave a note in this bug, please, explaining why this is no longer > needed? The name of the constant defined in hb-unicode.h was updated to match the spelling used in the Scripts.txt data file, so we no longer need to treat it as an "alias" when reading the Unicode data and relating this to the values in the harfbuzz header file.

Karl Tomlinson (:karlt)

Updated

•

14 years ago

Blocks: 614476

Karl Tomlinson (:karlt)

Comment 8

•

14 years ago

Attached patch patch for checkin (obsolete) — Details — Splinter Review

From http://hg.mozilla.org/try/rev/ec6f06e5e2c3 which addresses review comments. Also marked newly passing reftests. Requesting approval2.0 so that bug 569770 can land without regression from bug 614468.

Attachment #496069 - Flags: approval2.0?

Jonathan Kew [:jfkthame]

Assignee

Comment 9

•

14 years ago

(In reply to comment #8) > Created attachment 496069 [details] [diff] [review] > patch for checkin > > From http://hg.mozilla.org/try/rev/ec6f06e5e2c3 which addresses review > comments. Note that on tryserver, this caused crashtest orange on Linux debug (only) - an assertion "Glyphruns not coalesced" in gfx/tests/crashtests/419255-1.html. I've been looking into that, and believe it is (a) harmless, and (b) caused by a separate underlying bug in textrun construction, which I'll file shortly. We should still land this, but be aware that we'll need to mark the assertion in the crashtest manifest until the issue causing it is fixed.

Karl Tomlinson (:karlt)

Updated

•

14 years ago

Attachment #496069 - Flags: approval2.0?

Karl Tomlinson (:karlt)

Comment 10

•

14 years ago

Comment on attachment 496069 [details] [diff] [review] patch for checkin >@@ -2377,16 +2377,20 @@ gfxFontGroup::InitTextRun(gfxContext *aC > gfxScriptItemizer scriptRuns(aString, aLength); > > PRUint32 runStart = 0, runLimit = aLength; > PRInt32 runScript = HB_SCRIPT_LATIN; > while (scriptRuns.Next(runStart, runLimit, runScript)) { > InitTextRun(aContext, aTextRun, aString, aLength, > runStart, runLimit, runScript); > } >+ >+ // Is this actually necessary? Without it, gfxTextRun::CopyGlyphDataFrom may assert >+ // "Glyphruns not coalesced", but does that matter? >+ aTextRun->SortGlyphRuns(); >@@ -2438,20 +2442,16 @@ gfxFontGroup::InitTextRun(gfxContext *aC > } > > // It's possible for CoreText to omit glyph runs if it decides they contain > // only invisibles (e.g., U+FEFF, see reftest 474417-1). In this case, we > // need to eliminate them from the glyph run array to avoid drawing "partial > // ligatures" with the wrong font. > aTextRun->SanitizeGlyphRuns(); > >- // Is this actually necessary? Without it, gfxTextRun::CopyGlyphDataFrom may assert >- // "Glyphruns not coalesced", but does that matter? >- aTextRun->SortGlyphRuns(); Sounds like we should remove this part of the patch. This is not needed for these bugs, right? It looked right to me, but sounds like it is causing enough unexpected consequences that it should be dealt with separately.

Jonathan Kew [:jfkthame]

Assignee

Comment 11

•

14 years ago

(In reply to comment #10) > Comment on attachment 496069 [details] [diff] [review] > patch for checkin > > >@@ -2377,16 +2377,20 @@ gfxFontGroup::InitTextRun(gfxContext *aC > > gfxScriptItemizer scriptRuns(aString, aLength); > > > > PRUint32 runStart = 0, runLimit = aLength; > > PRInt32 runScript = HB_SCRIPT_LATIN; > > while (scriptRuns.Next(runStart, runLimit, runScript)) { > > InitTextRun(aContext, aTextRun, aString, aLength, > > runStart, runLimit, runScript); > > } > >+ > >+ // Is this actually necessary? Without it, gfxTextRun::CopyGlyphDataFrom may assert > >+ // "Glyphruns not coalesced", but does that matter? > >+ aTextRun->SortGlyphRuns(); > > >@@ -2438,20 +2442,16 @@ gfxFontGroup::InitTextRun(gfxContext *aC > > } > > > > // It's possible for CoreText to omit glyph runs if it decides they contain > > // only invisibles (e.g., U+FEFF, see reftest 474417-1). In this case, we > > // need to eliminate them from the glyph run array to avoid drawing "partial > > // ligatures" with the wrong font. > > aTextRun->SanitizeGlyphRuns(); > > > >- // Is this actually necessary? Without it, gfxTextRun::CopyGlyphDataFrom may assert > >- // "Glyphruns not coalesced", but does that matter? > >- aTextRun->SortGlyphRuns(); > > Sounds like we should remove this part of the patch. > This is not needed for these bugs, right? > It looked right to me, but sounds like it is causing enough unexpected > consequences that it should be dealt with separately. I don't think the movement of that code is the reason for the assertion showing up - moving it up from the individual script-run method to the top-level gfxFontGroup::InitTextRun merely avoids redundantly re-processing the runs as each new script run is handled. The "Glyphruns not coalesced" assertion on Linux crashtest, however, comes from a textrun that is being created by nsTransformingTextRunFactory; the problem is that this may assemble a textrun by copying from several sources, and fails to coalesce adjacent sub-runs using the same font in this situation. The change to cluster-marking here must be tickling run-construction in such a way that this starts to show up; I'm going to try and reproduce it consistently with a simpler testcase (and without the clustering patch), so that we can file, fix, and verify it reliably.

Jonathan Kew [:jfkthame]

Assignee

Comment 12

•

14 years ago

OK, I've figured out why we're getting the assertion. A testcase that will reproduce it *without* the SetupClusterBoundaries patch here, at least on my Ubuntu system, is: data:text/html,<div style="font-family:OpenSymbol;text-transform:uppercase">x̃ ̃ The contributing factors here are: - the use of text-transform, so that we go through an nsTransformingTextRunFactory that rebuilds textruns - the presence *after* the space of a combining mark which is NOT supported in the style's (first-listed) font family - the last character *before* the space falls back to the same font as the combining mark after it This situation arises in crashtests/419255-1.html after patching the cluster code because we then (correctly) treat the U+200C character similarly to a combining mark for clustering purposes (UAX#29, GB9; ZWNJ and ZWJ are Grapheme_Extend). This triggers the code path at http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxTextRunWordCache.cpp#514 that "backs up" by one character to include the space when copying words, and that leads to the assertion due to a bug in gfxTextRun::AddGlyphRun. The fix is straightforward; I'll post a new patch here that includes it.

Jonathan Kew [:jfkthame]

Assignee

Comment 13

•

14 years ago

Attached patch patch, updated to fix the textrun assertion (obsolete) — Details — Splinter Review

Marking for re-review so you can check the fix in gfxTextRun::AddGlyphRun. The rest is unchanged.

Attachment #496069 - Attachment is obsolete: true

Attachment #496160 - Flags: review?(karlt)

Attachment #496160 - Flags: approval2.0?

Karl Tomlinson (:karlt)

Comment 14

•

14 years ago

Comment on attachment 496160 [details] [diff] [review] patch, updated to fix the textrun assertion This version of the patch has lost the changes addressing previous review comments. > if (lastGlyphRun->mCharacterOffset == aUTF16Offset) { >+ // avoid leaving a zero-length run if the offset hasn't changed >+ // by overwriting the last entry instead of appending >+ if (numGlyphRuns > 1 && >+ mGlyphRuns[numGlyphRuns - 2].mFont == aFont) >+ { >+ // if the (last-1) run had the same font as the new one wants, >+ // merge with it instead of creating adjacent runs with the >+ // same font >+ mGlyphRuns.TruncateLength(numGlyphRuns - 1); >+ return NS_OK; This code looks good, but I got a bit confused reading the comments. The "if the offset hasn't changed" condition has already been satisfied at the point of the first comment. Can we either move the comment to before the conditional that it applies to, or change it to "The offset hasn't changed. Avoid leaving a zero-length run by ..." Similarly for the "(last-1)" comment.

Attachment #496160 - Flags: review?(karlt)

Attachment #496160 - Flags: review-

Attachment #496160 - Flags: feedback+

Attachment #496160 - Flags: approval2.0?

Jonathan Kew [:jfkthame]

Assignee

Comment 15

•

14 years ago

(In reply to comment #14) > This version of the patch has lost the changes addressing previous review > comments. Argh, how did I do that? :( Thought I started from the latest one..... will re-post shortly. Thanks for catching it. Will update comments as well.

Jonathan Kew [:jfkthame]

Assignee

Comment 16

•

14 years ago

Attached patch patch, updated properly this time! — Details — Splinter Review

I believe this includes the proper updates this time - sorry about the bad version!

Attachment #495020 - Attachment is obsolete: true

Attachment #496160 - Attachment is obsolete: true

Attachment #496341 - Flags: review?(karlt)

Karl Tomlinson (:karlt)

Comment 17

•

14 years ago

Comment on attachment 496341 [details] [diff] [review] patch, updated properly this time! Excellent, thanks.

Attachment #496341 - Flags: review?(karlt)

Attachment #496341 - Flags: review+

Attachment #496341 - Flags: approval2.0?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 18

•

14 years ago

Comment on attachment 496341 [details] [diff] [review] patch, updated properly this time! Seems like the logic in gfxTextRun::AddGlyphRun could be simplified by first checking whether we should wipe out the current last glyph run (because it's zero-length), then checking whether we should merge the new glyph run with the new last glyph run.

Attachment #496341 - Flags: approval2.0? → approval2.0+

Karl Tomlinson (:karlt)

Updated

•

14 years ago

Blocks: 617869

Karl Tomlinson (:karlt)

Comment 19

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/4861b72cc66d Filed bug 617869 for the WINNT failure.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla2.0b8

Jonathan Kew [:jfkthame]

Assignee

Comment 20

•

14 years ago

(In reply to comment #18) > Seems like the logic in gfxTextRun::AddGlyphRun could be simplified by first > checking whether we should wipe out the current last glyph run (because it's > zero-length), then checking whether we should merge the new glyph run with the > new last glyph run. Yes, it could be a bit simpler, though we'd need to watch out for the case where the "current last glyph run" that got wiped out was in fact the first (only) glyph run, so that we're left without a "new last glyph run" to compare with. OTOH, the current code is arranged so that the common case (where "lastGlyphRun->mFont == aFont" is true) is checked first; a quick test with an instrumented browser shows that we get an early return from here about 95% of the time. So I'm inclined to leave it this way.

Alice0775 White

Updated

•

14 years ago

Depends on: 658881

Alice0775 White

Updated

•

14 years ago

Depends on: 658912

Alice0775 White

Updated

•

14 years ago

Depends on: 680402

You need to log in before you can comment on or make changes to this bug.