1056516 - presence of explicit soft-hyphen in a word should override automatic hyphenation

Reporter

Description

•

10 years ago

:dbaron reports from this week's CSSWG telcon: > The CSS WG just agreed to accept > http://dev.w3.org/csswg/css-text/issues-lc-2013#issue-21 > which says that if a word has soft hyphens, automatic hyphenation is > disabled for that word. Currently, when (-moz-)hyphens:auto is set, an explicit  does nothing to prevent us finding and using automatic hyphenation points. > (There's some discussion about then re-allowing automatic > hyphenation if one of the remaining pieces of the word after > breaking at soft hyphens is still too long for the line.)

Jonathan Kew [:jfkthame]

Reporter

Updated

•

8 years ago

Blocks: css-text-3

Jeremy Chen (chenpighead@gmail.com)

Assignee

Updated

•

8 years ago

Assignee: nobody → jeremychen

Jeremy Chen (chenpighead@gmail.com)

Assignee

Updated

•

8 years ago

URL: https://drafts.csswg.org/css-text-3/#...

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 2

•

8 years ago

Looks like our current implementation treats the explicitly suggested hyphenation opportunities (U+00AD and U+2010) and automatic hyphenation opportunities as the same when (-moz-)hyphens:auto is set. So, take the example borrowed from Bug 1323218: data:text/html,<div lang="en" style="border:1px solid blue; width:5em; hyphens:auto;">supercalifragilistic It can be seen there's an automatic hyphenation opportunity between char 'l' and char 'i', which is between two explicitly hyphenation opportunities (&shy). Since we set the width to be 5em, which is somewhere between the automatic hyphenation opportunity and the second &shy, the automatic hyphenation opportunity gets the win. If we change the width to be 7em, which is some where right after the second &shy, then we shall see the second &shy gets the win. I think the point is that we don't have a way to tell whether the last hyphenation break opportunity (see lastBreakUsedHyphenation in [1]) is explicitly suggested or determined automatically by a language resource. To implement what the spec says, we may need a way to tell these two kinds of hyphenation break opportunities. [1] http://searchfox.org/mozilla-central/rev/568e68ade5c6e3d29abba1d1bc25fe87fa1da962/gfx/thebes/gfxTextRun.cpp#928

Jonathan Kew [:jfkthame]

Reporter

Comment 3

•

8 years ago

Remember that it won't be sufficient to keep track of whether the *preceding* break opportunity was explicit or automatic; we also need to know whether there is any *future* explicit hyphenation break within the word, because if there is, then we should ignore a potential automatic break. Testcase: data:text/html,<tt lang="en" style="hyphens:auto;display:block;width:17ch"> supercali fragilisticexpiali docious This should render as supercali fragilistic- expiali docious because the explicit  in "fragilisticexpiali" takes precedence over automatic hyphen positions; but currently we render it as supercali frag- ilisticexpiali docious instead.

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 4

•

8 years ago

Yes. My idea is that we cache the last explicit hyphenation break. So, whenever a substring is at an automatic hyphenation break, we can check whether the cached explicit hyphenation break is within the same word. If they're within the same word, we use the cached break instead of the automatic one. To implement this algorithm, we might need a way to tell whether the two break positions are within the same word. Maybe there exists an API already? Jonathan, does this algorithm make sense to you?

Flags: needinfo?(jfkthame)

Khaled Hosny

Comment 5

•

8 years ago

Wouldn’t it be possible to just check if the word has any soft hyphens and if so don’t do automatic hyphenation on it at all? E.g. nsHyphenator::Hyphenate() keep track if any soft hyphen have been found and check for that before passing the word to hnj_hyphen_hyphenate2().

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 6

•

8 years ago

Quote from specification, "Automatic hyphenation opportunities within a word must be ignored if the word contains a conditional hyphen ( or U+00AD), in favor of the conditional hyphen(s). However, if, even after breaking at such opportunities, a portion of that word is is still too long to fit on one line, an automatic hyphenation opportunity may be used." This means, we should not ignore automatic hyphenations at all if there has any soft hyphens within the word. Instead, we might want to keep the information of automatic hyphenations. Then, we can decide whether we should ignore/keep the automatic hyphenations depending on the context while checking hyphenation breaks.

Jonathan Kew [:jfkthame]

Reporter

Comment 7

•

8 years ago

Right, in extreme cases we may need to use automatic hyphenation positions *in addition to* an explicit . The actual call to libhyphen (to find automatic hyphenation points) happens in nsLineBreaker::FindHyphenationPoints, called from FlushCurrentWord (or directly from AppendText). But at this point, I think any explicit soft hyphens (or other ignorables) have already been filtered out of the text presented for hyphenation (may be worth checking this?). Also, the chunks of text (informally considered "words", as in FlushCurrentWord) that are presented here may not always correspond to "words" for the purpose of auto vs manual hyphens. IIRC, they're really just space-delimited runs of text, but there may be multiple words (separated by non-space punctuation characters) in a single call to FindHyphenationPoints. I suspect the easiest way forward, given how the line-breaking code currently works, may be as you suggest -- distinguishing between automatic hyphen positions and manually-inserted ones. The information about possible break positions is stored in the gfxTextRun as part of the CompressedGlyph record for each character. As it happens, break information goes into a two-bit field (see gfxShapedText::CompressedGlyph::FLAGS_CAN_BREAK_BEFORE), and right now we only use three values (FLAG_BREAK_TYPE_NONE / NORMAL / HYPHEN). So we should be able to split FLAG_BREAK_TYPE_HYPHEN into two values, *_MANUAL and *_AUTO, without disrupting the CompressedGlyph storage; and then gfxTextRun::BreakAndMeasureText will be able to tell the difference and choose the right breakpoint (I think...). Basically, when BreakAndMeasureText is looking for a break, it'll need to know the type of the last candidate break it has passed. If that was NORMAL (i.e. not a hyphen at all), then either type of hyphen is acceptable as a new break position; but if the last candidate was a HYPHEN_MANUAL, then we should not accept an AUTO one as a new break possibility; only a later MANUAL hyphen or a NORMAL break is allowed. Once we reach another NORMAL break, however, we're into a new word, and once again either type of hyphen can be considered. I haven't totally thought through all this, but it seems to me the approach should be able to work.

Flags: needinfo?(jfkthame)

Comment hidden (mozreview-request)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 9

•

8 years ago

Comment on attachment 8825284 [details] Bug 1056516 - use AutoTArray for hyphenBuffer in BreakAndMeasureText. This is the implementation that I was thinking of while I was proposing my idea. In this way, we don't need to split FLAG_BREAK_TYPE_HYPHEN into two values, *_MANUAL and *_AUTO. Does this make sense to you? Or, is there any case that I might miss?

Attachment #8825284 - Flags: feedback?(jfkthame)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 10

•

8 years ago

Comment on attachment 8825284 [details] Bug 1056516 - use AutoTArray for hyphenBuffer in BreakAndMeasureText. I seem misunderstand the code in GetHyphenationBreaks(). Will try the approach suggested in comment 7. Clear request.

Attachment #8825284 - Flags: feedback?(jfkthame)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 11

•

8 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #7) > Right, in extreme cases we may need to use automatic hyphenation positions > *in addition to* an explicit . > > The actual call to libhyphen (to find automatic hyphenation points) happens > in nsLineBreaker::FindHyphenationPoints, called from FlushCurrentWord (or > directly from AppendText). But at this point, I think any explicit soft > hyphens (or other ignorables) have already been filtered out of the text > presented for hyphenation (may be worth checking this?). Also, the chunks of > text (informally considered "words", as in FlushCurrentWord) that are > presented here may not always correspond to "words" for the purpose of auto > vs manual hyphens. IIRC, they're really just space-delimited runs of text, > but there may be multiple words (separated by non-space punctuation > characters) in a single call to FindHyphenationPoints. > > I suspect the easiest way forward, given how the line-breaking code > currently works, may be as you suggest -- distinguishing between automatic > hyphen positions and manually-inserted ones. The information about possible > break positions is stored in the gfxTextRun as part of the CompressedGlyph > record for each character. > > As it happens, break information goes into a two-bit field (see > gfxShapedText::CompressedGlyph::FLAGS_CAN_BREAK_BEFORE), and right now we > only use three values (FLAG_BREAK_TYPE_NONE / NORMAL / HYPHEN). So we should > be able to split FLAG_BREAK_TYPE_HYPHEN into two values, *_MANUAL and > *_AUTO, without disrupting the CompressedGlyph storage; and then > gfxTextRun::BreakAndMeasureText will be able to tell the difference and > choose the right breakpoint (I think...). > > Basically, when BreakAndMeasureText is looking for a break, it'll need to > know the type of the last candidate break it has passed. If that was NORMAL > (i.e. not a hyphen at all), then either type of hyphen is acceptable as a > new break position; but if the last candidate was a HYPHEN_MANUAL, then we > should not accept an AUTO one as a new break possibility; only a later > MANUAL hyphen or a NORMAL break is allowed. Once we reach another NORMAL > break, however, we're into a new word, and once again either type of hyphen > can be considered. > > I haven't totally thought through all this, but it seems to me the approach > should be able to work. After I implemented this algorithm and did some experiments, the algorithm is not good enough to make the example in Comment 3 work correctly. If we reach another NORMAL break, and accept either type of hyphen as future candidate break, the example in Comment 3 would be rendered as supercali frag- ilisticexpiali docious because there's a NORMAL break before "fragilisticexpiali", which allows the HYPHEN_AUTO as a next candidate break. Looks like recording the last candidate break is not enough, I think we still need a way to be able to check if there's a future candidate HYPHEN_MANUAL break in the same word. So we could always honor HYPHEN_MANUAL breaks if needed. Hope we could come up with an implementation that doesn't increase the complexity of BreakAndMeasureText().

Jonathan Kew [:jfkthame]

Reporter

Comment 12

•

8 years ago

Looks like you're right, that won't fully work as it stands. Are you working on (or planning to try) an alternative here, or should I try to re-think it and see if I can come up with a better approach?

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 13

•

8 years ago

Although I'm on the Chinese New Year holiday and may response slower than usual, but yes, I'm planning to try an alternative. Actually I was just about to needinfo you for feedback. So far, I only came up with a concept/idea (please see below). If you have any suggestions/feedback, please let me know. Thank you. I am wondering if we could save an information, something like "AUTO_HYPHEN_WITH_A_MANUAL_HYPHEN_IN_THE_SAME_WORD", in gfxShapedText::CompressedGlyph. The idea is, when hyphens:auto is set, while in BreakAndMeasureText, if this is a AUTO_HYPHEN_WITH_A_MANUAL_HYPHEN_IN_THE_SAME_WORD, only take this as a break candidate if this is the only candidate. Otherwise, always discard AUTO_HYPHEN_WITH_A_MANUAL_HYPHEN_IN_THE_SAME_WORD. To get this information, we need at least extra O(N) time complexity to do the detection, where 'N' is the length of the texts. I think we might want to avoid doing this check in BreakAndMeasureText, since we may call BreakAndMeasureText multiple times while doing reflow/restyle (due to line break change or something). Moreover, in the current logic, we may break in the middle of a word, which makes it impossible to do the check for every single word in BreakAndMeasureText. It would be great if we could detect this only one time, and store it in a place that we can reuse it whenever we don't rebuild text runs (maybe somewhere in gfxShapedText::CompressedGlyph?). Perhaps we can do a quick detection once we know the positions of MANUAL_HYPHEN and AUTO_HYPHEN in a text run. I haven't checked thoroughly about which place we should put this detection logic.

Flags: needinfo?(jfkthame)

Jonathan Kew [:jfkthame]

Reporter

Comment 14

•

8 years ago

You can squeeze an extra kind of break info into CompressedGlyph (as noted earlier), as the FLAGS_CAN_BREAK_BEFORE field is two bits wide but only has three distinct values at present. So I think this is probably workable, though I'm not sure exactly where the scan to check for manual hyphens in the word will need to go.

Flags: needinfo?(jfkthame)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 15

•

8 years ago

Attached patch wip. (obsolete) — Details — Splinter Review

MozReview-Commit-ID: q7Y0n3pLJi

Jeremy Chen (chenpighead@gmail.com)

Assignee

Updated

•

8 years ago

Attachment #8825284 - Attachment is obsolete: true

Comment hidden (mozreview-request)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Updated

•

8 years ago

Attachment #8833912 - Attachment is obsolete: true

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 17

•

8 years ago

Comment on attachment 8825284 [details] Bug 1056516 - use AutoTArray for hyphenBuffer in BreakAndMeasureText. We want HYPHEN_AUTO to honer HYPHEN_MANUAL only if HYPHEN_MANUAL does exist in the current text run. Since we determine whether a HYPHEN_MANUAL exists in GetHyphenationBreaks(), we should probably keep the detection logic in GetHyphenationBreaks() as well. Which means, we don't need to squeeze an extra bit to store any extra information. Hi Jonathan, does this implementation make sense to you?

Attachment #8825284 - Flags: feedback?(jfkthame)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Updated

•

8 years ago

Attachment #8825284 - Flags: feedback?(jfkthame)

Comment hidden (mozreview-request)

Jonathan Kew [:jfkthame]

Reporter

Comment 19

•

8 years ago

mozreview-review

Comment on attachment 8825284 [details] Bug 1056516 - use AutoTArray for hyphenBuffer in BreakAndMeasureText. https://reviewboard.mozilla.org/r/103452/#review112040 I haven't been through the code here in detail, but I tried applying the patch and experimenting locally and I don't think the behavior that results is what we want. Try a testcase like data:text/html,<code lang="en" style="display:block;hyphens:auto;width:15ch">supercalifragilisticexpialidocious Without the patch, we hyphenate as supercalifrag- ilisticexpiali- docious (allowing auto hyphens to override the manually-inserted , which is the bug we want to fix here); but with the patch I get su- per- cal- ifragilistic- expialidocious which does indeed use the  position, but it also inserts far too many auto hyphens in the first part of the word. :(

Attachment #8825284 - Flags: review?(jfkthame) → review-

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 20

•

8 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #19) > Comment on attachment 8825284 [details] > Bug 1056516 - let auto hyphen honer soft hyphen when hyphens:auto is set. > > https://reviewboard.mozilla.org/r/103452/#review112040 > > I haven't been through the code here in detail, but I tried applying the > patch and experimenting locally and I don't think the behavior that results > is what we want. Try a testcase like > > data:text/html,<code lang="en" > style="display:block;hyphens:auto;width:15ch">supercalifragilistic > expialidocious > > Without the patch, we hyphenate as > > supercalifrag- > ilisticexpiali- > docious > > (allowing auto hyphens to override the manually-inserted , which is the > bug we want to fix here); but with the patch I get > > su- > per- > cal- > ifragilistic- > expialidocious > > which does indeed use the  position, but it also inserts far too many > auto hyphens in the first part of the word. :( Ok, I know what went wrong in the proposed implementation. We already have the info that we need, so just a little fixup in the BreakAndMeasureText() should be enough. Also got an idea to make some renaming stuff in my patch, so it may help you to better understand it. This case should be covered by the test part of the patch (the second one.). Just realized that I mis-labeled the hyphenation int the reference file. :( I'll fix this in the next version as well.

Comment hidden (mozreview-request)

Jonathan Kew [:jfkthame]

Reporter

Comment 22

•

8 years ago

mozreview-review

Comment on attachment 8825284 [details] Bug 1056516 - use AutoTArray for hyphenBuffer in BreakAndMeasureText. https://reviewboard.mozilla.org/r/103452/#review112836 Cool - this definitely shows much better behavior! One factor that it doesn't yet address is that there are actually two kinds of "manual hyphen breaks" we need to be aware of: there's the soft-hyphen (U+00AD or ), which this patch seems to handle, but there's also the case of explicit hyphen (U+2010) or hyphen-minus (U+002D). If we have an already-hyphenated word (like that!), we should use the explicit hyphen in preference to auto-hyphen positions, just like we do (with this patch) for  breaks. Testcase: data:text/html,<code lang="en" style="display:block;hyphens:auto;width:15ch">Supercali-fragilistic-expialidocious should break only at the explicit hyphens, as: Supercali- fragilistic- expialidocious but what we actually get is: Supercali-frag- ilistic-expi- alidocious I realize the focus so far has been on the  positions, but I think we really need to fix both cases here. ::: gfx/src/nsFontMetrics.cpp:86 (Diff revision 4) > virtual void GetHyphenationBreaks(gfxTextRun::Range aRange, > - bool* aBreakBefore) { > + bool* aBreakBefore, > + bool* aIsAutoWithSoftInSameWord) { Rather than two separate `bool*` parameters (i.e. output arrays of booleans), I think you should use a single `uint8_t*` param (or maybe a specific HyphenFlags type), and define a couple of flag bits for BREAK_BEFORE and IS_AUTO_... (This becomes particularly significant for the call site in nsTextFrame::AddInlineMinISizeForFlow, where the array may be quite large. Having the separate params will likely lead to worse CPU cache behavior, in addition to increasing memory use.) ::: layout/generic/nsTextFrame.cpp:3605 (Diff revision 4) > + nsTArray<bool> isWordBoundary; > + isWordBoundary.SetLength(aRange.Length()); > + memset(isWordBoundary.Elements(), false, aRange.Length() * sizeof(bool)); > + // Get info about boundarirs between words > for (uint32_t i = 0; i < aRange.Length(); ++i) { > + int32_t fragIndex = mFrag->GetLength() > aRange.end ? > + aRange.start + i : i; > + if (mFrag->CharAt(fragIndex) == ' ') { > + isWordBoundary[i] = true; > + } > + } It shouldn't be necessary to create a new isWordBoundary array here, and do a separate loop over the text to populate it. I think a possible approach would be to remember the most recent word-boundary position, so that the first time you encounter a soft-hyphen in a word, you can update the status of the positions between word-boundary and soft hyphen; and when soft hyphen has been encountered, set a flag to keep marking the "word has a soft hyphen" until the next word boundary, when the state is cleared and the new boundary position is remembered. So a single loop over the text should be sufficient, with an extra internal pass just over the subrange from most-recent-boundary to the first soft hyphen in a word, but a full second pass isn't needed.

Attachment #8825284 - Flags: review?(jfkthame)

Jonathan Kew [:jfkthame]

Reporter

Comment 23

•

8 years ago

There's also an issue of some kind with this patch when the text being processed exceeds the MEASUREMENT_BUFFER_SIZE (100 chars) used in gfxTextRun::BreakAndMeasureText. Testcase: data:text/html,<code lang="en" style="display:block;hyphens:auto;width:100ch"> ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZsupercalifragilisticexpialidociousABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ (if bugzilla line-wraps that, remove the newlines and any resulting spaces from the test string) Here, there are a number of explicit  positions in the middle section of the text, but they are ignored and auto hyphen positions are used instead: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTU- VWXYZsupercalifragilisticexpialidociousABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDE- FGHIJKLMNOPQRSTUVWXYZ However, if we increase the MEASUREMENT_BUFFER_SIZE in gfxTextRun.cpp to 1000, then a  position gets correctly used: ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTU- VWXYZsupercalifragilisticexpialidocious- ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ This seems to suggest that the case where the text crosses a measurement-buffer boundary is not being handled properly -- state is lost at that point, or something. (While natural-language words won't normally be so long, there are exceptional cases such as extremely long names of chemical compounds[1] where it could become an issue. So we should try to handle the cross-buffer-boundary case correctly if possible.) [1] https://en.wikipedia.org/wiki/Longest_word_in_English

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 24

•

8 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #22) > Comment on attachment 8825284 [details] > Bug 1056516 - let auto hyphen honor soft hyphen when hyphens:auto is set. > > https://reviewboard.mozilla.org/r/103452/#review112836 > > Cool - this definitely shows much better behavior! > > One factor that it doesn't yet address is that there are actually two kinds > of "manual hyphen breaks" we need to be aware of: there's the soft-hyphen > (U+00AD or ), which this patch seems to handle, but there's also the > case of explicit hyphen (U+2010) or hyphen-minus (U+002D). If we have an > already-hyphenated word (like that!), we should use the explicit hyphen in > preference to auto-hyphen positions, just like we do (with this patch) for >  breaks. > > Testcase: > > data:text/html,<code lang="en" > style="display:block;hyphens:auto;width:15ch">Supercali-fragilistic- > expialidocious > > should break only at the explicit hyphens, as: > > Supercali- > fragilistic- > expialidocious > > but what we actually get is: > > Supercali-frag- > ilistic-expi- > alidocious > > I realize the focus so far has been on the  positions, but I think we > really need to fix both cases here. Fair enough. > ::: gfx/src/nsFontMetrics.cpp:86 > (Diff revision 4) > > virtual void GetHyphenationBreaks(gfxTextRun::Range aRange, > > - bool* aBreakBefore) { > > + bool* aBreakBefore, > > + bool* aIsAutoWithSoftInSameWord) { > > Rather than two separate `bool*` parameters (i.e. output arrays of > booleans), I think you should use a single `uint8_t*` param (or maybe a > specific HyphenFlags type), and define a couple of flag bits for > BREAK_BEFORE and IS_AUTO_... > > (This becomes particularly significant for the call site in > nsTextFrame::AddInlineMinISizeForFlow, where the array may be quite large. > Having the separate params will likely lead to worse CPU cache behavior, in > addition to increasing memory use.) Sounds good. > ::: layout/generic/nsTextFrame.cpp:3605 > (Diff revision 4) > > + nsTArray<bool> isWordBoundary; > > + isWordBoundary.SetLength(aRange.Length()); > > + memset(isWordBoundary.Elements(), false, aRange.Length() * sizeof(bool)); > > + // Get info about boundarirs between words > > for (uint32_t i = 0; i < aRange.Length(); ++i) { > > + int32_t fragIndex = mFrag->GetLength() > aRange.end ? > > + aRange.start + i : i; > > + if (mFrag->CharAt(fragIndex) == ' ') { > > + isWordBoundary[i] = true; > > + } > > + } > > It shouldn't be necessary to create a new isWordBoundary array here, and do > a separate loop over the text to populate it. > > I think a possible approach would be to remember the most recent > word-boundary position, so that the first time you encounter a soft-hyphen > in a word, you can update the status of the positions between word-boundary > and soft hyphen; and when soft hyphen has been encountered, set a flag to > keep marking the "word has a soft hyphen" until the next word boundary, when > the state is cleared and the new boundary position is remembered. > > So a single loop over the text should be sufficient, with an extra internal > pass just over the subrange from most-recent-boundary to the first soft > hyphen in a word, but a full second pass isn't needed. Yes, you're right. Thank you for the feedback. Will fix these in the next version.

Comment hidden (mozreview-request)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 26

•

8 years ago

mozreview-review-reply

Comment on attachment 8825284 [details] Bug 1056516 - use AutoTArray for hyphenBuffer in BreakAndMeasureText. https://reviewboard.mozilla.org/r/103452/#review112836 Just re-checked the specification and realized that It'd be better to file an issue for this. https://github.com/w3c/csswg-drafts/issues/1040

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 27

•

8 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #23) > There's also an issue of some kind with this patch when the text being > processed exceeds the MEASUREMENT_BUFFER_SIZE (100 chars) used in > gfxTextRun::BreakAndMeasureText. Testcase: > > data:text/html,<code lang="en" > style="display:block;hyphens:auto;width:100ch"> > ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXY > ZABCDEFGHIJKLMNOPQRSTUVWXYZsupercalifragilistic > expialidocious > ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXY > Z > > (if bugzilla line-wraps that, remove the newlines and any resulting spaces > from the test string) > > Here, there are a number of explicit  positions in the middle section > of the text, but they are ignored and auto hyphen positions are used instead: > > ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWX > YZABCDEFGHIJKLMNOPQRSTU- > VWXYZsupercalifragilisticexpialidociousABCDEFGHIJKLMNOPQRSTUVWXYZABCDE > FGHIJKLMNOPQRSTUVWXYZABCDE- > FGHIJKLMNOPQRSTUVWXYZ > > However, if we increase the MEASUREMENT_BUFFER_SIZE in gfxTextRun.cpp to > 1000, then a  position gets correctly used: > > ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWX > YZABCDEFGHIJKLMNOPQRSTU- > VWXYZsupercalifragilisticexpialidocious- > ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWX > YZ > > This seems to suggest that the case where the text crosses a > measurement-buffer boundary is not being handled properly -- state is lost > at that point, or something. (While natural-language words won't normally be > so long, there are exceptional cases such as extremely long names of > chemical compounds[1] where it could become an issue. So we should try to > handle the cross-buffer-boundary case correctly if possible.) > > [1] https://en.wikipedia.org/wiki/Longest_word_in_English To maintain extra info in the caller of GetHyphenationBreaks(), and to switch back-and-forth between different hyphenBuffers seem complicate the codes a lot. One thing that we can do is, eliminate the MEASUREMENT_BUFFER_SIZE buffer size limitation, and use a AutoTArray<HyphenType, 100> instead. This would just like what we've already done in [1], which we even use such a much larger number (BIG_TEXT_NODE_SIZE = 4096) in AutoTArray's default allocation size. In this way, we don't need to complicate our current approach to just for this extreme case. Jonathan, what do you think? [1] http://searchfox.org/mozilla-central/rev/cac6cb6a10afb8ebb2ecfbeeedaff7c66f57dd75/layout/generic/nsTextFrame.cpp#8371

Flags: needinfo?(jfkthame)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 28

•

8 years ago

(In reply to Jeremy Chen [:jeremychen] UTC+8 from comment #26) > Comment on attachment 8825284 [details] > Bug 1056516 - (wip) let auto hyphen honor manual hyphen when hyphens:auto is > set. > > https://reviewboard.mozilla.org/r/103452/#review112836 > > Just re-checked the specification and realized that It'd be better to file > an issue for this. > https://github.com/w3c/csswg-drafts/issues/1040 Looks like this is a duplicate of https://github.com/w3c/csswg-drafts/issues/618. Quote from the issue, ``` The resolution is to leave this and questions of hyphenation prioritization undefined and up to the UA. ``` So, I guess we should be fine. Let's fix both cases (soft hyphen and explicit hyphen) in this bug.

Jonathan Kew [:jfkthame]

Reporter

Comment 29

•

8 years ago

(In reply to Jeremy Chen [:jeremychen] UTC+8 from comment #28) > (In reply to Jeremy Chen [:jeremychen] UTC+8 from comment #26) > > Comment on attachment 8825284 [details] > > Bug 1056516 - (wip) let auto hyphen honor manual hyphen when hyphens:auto is > > set. > > > > https://reviewboard.mozilla.org/r/103452/#review112836 > > > > Just re-checked the specification and realized that It'd be better to file > > an issue for this. > > https://github.com/w3c/csswg-drafts/issues/1040 > > Looks like this is a duplicate of > https://github.com/w3c/csswg-drafts/issues/618. > Quote from the issue, > ``` > The resolution is to leave this and questions of hyphenation prioritization > undefined and up to the UA. > ``` > > So, I guess we should be fine. Let's fix both cases (soft hyphen and > explicit hyphen) in this bug. I agree, having just re-read the discussion in that issue. While it's true that treating them both in this way will not always be the perfect behavior, it's an improvement over what we have. To do better, we'd need a more complex model with support for additional levels or priorities of breakpoints, and line-breaking rules that take account of a number of variously-weighted factors. (In reply to Jeremy Chen [:jeremychen] UTC+8 from comment #27) > One thing that we can do is, eliminate the MEASUREMENT_BUFFER_SIZE buffer > size limitation, and use a AutoTArray<HyphenType, 100> instead. This would > just like what we've already done in [1], which we even use such a much > larger number (BIG_TEXT_NODE_SIZE = 4096) in AutoTArray's default allocation > size. In this way, we don't need to complicate our current approach to just > for this extreme case. Yes, that's probably a reasonable approach here.

Flags: needinfo?(jfkthame)

Comment hidden (mozreview-request)

Jonathan Kew [:jfkthame]

Reporter

Comment 38

•

8 years ago

mozreview-review

Comment on attachment 8838630 [details] Bug 1056516 - add tests to our local reftests folder. https://reviewboard.mozilla.org/r/113472/#review115244 I'm not sure if we should be submitting all this to w3c-css, or only a more limited set. The problem is that CSS Text leaves the behavior partly up to the implementation; in particular, for the case where a manual hyphen position is present, but one of the resulting word fragments is still too long to fit on a line, it reads to me like the UA could either let the long fragment overflow the line OR it could use auto-hyphen positions to break it further: > However, if, even after breaking at such opportunities, a portion of that word is is still too long to fit on one line, an automatic hyphenation opportunity may be used.[1] This means that there is not a single correct reference for the first three examples in test 1, nor for the ultra-long example in test 3; the behavior in these cases is dependent on a "may" rather than a "must" (or even "should"). [1] https://drafts.csswg.org/css-text/#valdef-hyphens-auto ::: layout/reftests/w3c-css/submitted/text3/hyphenation-control-1-ref.html:28 (Diff revision 2) > + > + In this test, we set width to somewhere between every pair of adjacent hyphenations, > + then we see if lines break correctly. > +--> > +<code style="width:4ch;"> > +fragilisticexpiali For the reference file, instead of putting in all the  points and letting line-breaking do its thing, I think you should explicitly hyphenate and break, thus: frag-<br>ilis-<br>tic-<br>ex-<br>pi-<br>ali to make it absolutely clear what the expected result is. ::: layout/reftests/w3c-css/submitted/text3/hyphenation-control-2-ref.html:18 (Diff revision 2) > +} > +</style> > +</head> > +<body lang="en-us"> > +<code style="width:15ch;"> > +Supercalifragilisticexpialidocious As above, use explicit hyphens and `<br>` breaks in the reference file. ::: layout/reftests/w3c-css/submitted/text3/hyphenation-control-2.html:21 (Diff revision 2) > +<code style="width:15ch;"> > +Supercali-fragilistic-expialidocious > +</code> It would be good to have a second test element using ‐ in place of the ASCII hyphen-minus. The assert above mentions both, so let's test both. ::: layout/reftests/w3c-css/submitted/text3/hyphenation-control-3-ref.html:18 (Diff revision 2) > +} > +</style> > +</head> > +<body lang="en-us"> > +<code style="width:100ch;"> > +ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZsupercalifragilisticexpialidociousABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ As above, use `-<br>` for the expected breaks.

Jonathan Kew [:jfkthame]

Reporter

Comment 39

•

8 years ago

mozreview-review

Comment on attachment 8825284 [details] Bug 1056516 - use AutoTArray for hyphenBuffer in BreakAndMeasureText. https://reviewboard.mozilla.org/r/103452/#review115246 In general, this looks like a useful change; I think there are a few things that can be tightened up, see below. ::: gfx/thebes/gfxTextRun.cpp:857 (Diff revision 7) > > NS_ASSERTION(aStart + aMaxLength <= GetLength(), "Substring out of range"); > > - Range bufferRange(aStart, aStart + > - std::min<uint32_t>(aMaxLength, MEASUREMENT_BUFFER_SIZE)); > - PropertyProvider::Spacing spacingBuffer[MEASUREMENT_BUFFER_SIZE]; > + Range bufferRange(aStart, aStart + aMaxLength); > + AutoTArray<PropertyProvider::Spacing, 100> spacingBuffer; > + spacingBuffer.AppendElements(bufferRange.Length()); As we're no longer working with a limited buffer size, I think we should make the allocations here (both this `AppendElements`, and the one for `hyphenBuffer` below) fallible, and fall back to just ignoring spacing and/or hyphenation if they fail. Oh, and move the `AppendElements` inside the `if (haveSpacing)` test; there's no need to allocate (and initialize) the elements at all if that's not true. ::: layout/generic/nsTextFrame.cpp:3582 (Diff revision 7) > -PropertyProvider::GetHyphenationBreaks(Range aRange, bool* aBreakBefore) > +PropertyProvider::GetHyphenationBreaks(Range aRange, nsTArray<bool>& aBreakBefore) > { > NS_PRECONDITION(IsInBounds(mStart, mLength, aRange), "Range out of bounds"); > NS_PRECONDITION(mLength != INT32_MAX, "Can't call this with undefined length"); To make this actually benefit from `aBreakBefore` being an array rather than a raw pointer to a buffer, add an assertion here: MOZ_ASSERT(aBreakBefore.Length() == aRange.Length()); (or maybe `>=`, which would still be safe, if we ever want to call this for a range that is smaller than the buffer we're providing). ::: layout/generic/nsTextFrame.cpp:3590 (Diff revision 7) > NS_PRECONDITION(mLength != INT32_MAX, "Can't call this with undefined length"); > > if (!mTextStyle->WhiteSpaceCanWrap(mFrame) || > mTextStyle->mHyphens == StyleHyphens::None) > { > - memset(aBreakBefore, false, aRange.Length() * sizeof(bool)); > + memset(aBreakBefore.Elements(), false, aRange.Length() * sizeof(bool)); I think this may be redundant now, because with `aBreakBefore` being an `nsTArray` instead of just a block of `malloc`'d memory, its elements will have been default-initialized to `false` already, won't they? In fact, perhaps we should even have a (debug-build) assertion at the top of this method to check that the passed-in array is properly initialized to all-`false`: MOZ_ASSERT(!aBreakBefore.Contains(true)); ::: layout/generic/nsTextFrame.cpp:3617 (Diff revision 7) > - memset(aBreakBefore + runOffsetInSubstring, false, run.GetRunLength()*sizeof(bool)); > + memset(aBreakBefore.Elements() + runOffsetInSubstring, false, > + run.GetRunLength() * sizeof(bool)); As above, is this redundant?

Attachment #8825284 - Flags: review?(jfkthame)

Jonathan Kew [:jfkthame]

Reporter

Comment 40

•

8 years ago

mozreview-review

Comment on attachment 8838628 [details] Bug 1056516 - use HyphenType to store different types of hyphenations. https://reviewboard.mozilla.org/r/113468/#review115270 This will need an update following the comments on the preceding patch, so I'll read it again after that. (Looks good so far, though.)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 41

•

8 years ago

mozreview-review-reply

Comment on attachment 8825284 [details] Bug 1056516 - use AutoTArray for hyphenBuffer in BreakAndMeasureText. https://reviewboard.mozilla.org/r/103452/#review115246 > As we're no longer working with a limited buffer size, I think we should make the allocations here (both this `AppendElements`, and the one for `hyphenBuffer` below) fallible, and fall back to just ignoring spacing and/or hyphenation if they fail. > > Oh, and move the `AppendElements` inside the `if (haveSpacing)` test; there's no need to allocate (and initialize) the elements at all if that's not true. Will do. > To make this actually benefit from `aBreakBefore` being an array rather than a raw pointer to a buffer, add an assertion here: > > MOZ_ASSERT(aBreakBefore.Length() == aRange.Length()); > > (or maybe `>=`, which would still be safe, if we ever want to call this for a range that is smaller than the buffer we're providing). Okay. > I think this may be redundant now, because with `aBreakBefore` being an `nsTArray` instead of just a block of `malloc`'d memory, its elements will have been default-initialized to `false` already, won't they? > > In fact, perhaps we should even have a (debug-build) assertion at the top of this method to check that the passed-in array is properly initialized to all-`false`: > > MOZ_ASSERT(!aBreakBefore.Contains(true)); Hmmm, supposely `AppendElements` should've handled the default-initialization steps already. However, after adding an debug-only assertion, it turns out that `aBreakBefore` is not initialized to all-false. Seems we intentionally DON'T call C++ default initialization for the primative types. See http://searchfox.org/mozilla-central/rev/12cf11303392edac9f1da0c02e3d9ad2ecc8f4d3/xpcom/ds/nsTArray.h#658-663. > As above, is this redundant? Like I replied above, this is unavoidable. :(

Jonathan Kew [:jfkthame]

Reporter

Comment 42

•

8 years ago

(In reply to Jeremy Chen [:jeremychen] UTC+8 from comment #41) > Seems we intentionally DON'T call C++ default initialization for the > primative types. See > http://searchfox.org/mozilla-central/rev/ > 12cf11303392edac9f1da0c02e3d9ad2ecc8f4d3/xpcom/ds/nsTArray.h#658-663. OK, so we do need the initialization here after all - thanks for checking.

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 43

•

8 years ago

mozreview-review-reply

Comment on attachment 8838630 [details] Bug 1056516 - add tests to our local reftests folder. https://reviewboard.mozilla.org/r/113472/#review115244 Nice catch! I agree that whenever we met a word like "may" appearing in the web standards, we should think twice about whether or not putting the tests into WPT or CSSWG. Lesson learned. :) I'll move all these tests to our local reftests folder in the next version. I think submiting a very limited subset of the tests into CSSWG is still something worth doing, so I'll split a separated patch for this. > For the reference file, instead of putting in all the  points and letting line-breaking do its thing, I think you should explicitly hyphenate and break, thus: > > frag-<br>ilis-<br>tic-<br>ex-<br>pi-<br>ali > > to make it absolutely clear what the expected result is. Fair enough. > As above, use explicit hyphens and `<br>` breaks in the reference file. Okay. > It would be good to have a second test element using ‐ in place of the ASCII hyphen-minus. The assert above mentions both, so let's test both. Will do. > As above, use `-<br>` for the expected breaks. Okay.

Comment hidden (mozreview-request)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Updated

•

8 years ago

Attachment #8839080 - Flags: review?(jfkthame)

Jonathan Kew [:jfkthame]

Reporter

Comment 49

•

8 years ago

mozreview-review

Comment on attachment 8825284 [details] Bug 1056516 - use AutoTArray for hyphenBuffer in BreakAndMeasureText. https://reviewboard.mozilla.org/r/103452/#review115394 Suggested a couple small tweaks to the code below; but more important, on thinking about this again I'm not so sure it's a good idea. My concern is that with the switch from the fixed-size "sliding window" buffer for spacing and hyphenation to a single buffer that is sized for the entire text, we will now be retrieving the spacing and hyphenation for the entire text passed to BreakAndMeasureText, even though the loop may end up exiting early, after processing only a small amount of the text. (In most cases, we'll then call BreakAndMeasureText again to find the next line-break, starting from the previously-found break, which means we'll retrieve the spacing and hyphenation for the rest of the text all over again. And again, and again.) So I think with the existing code, we'll usually just call GetAdjustedSpacing and GetHyphenationBreaks once for the first 100-char chunk of the text; in most cases, that'll be enough to find the line-break we need. Then on the next call to BreakAndMeasureText, we'll get data for 100 chars starting from the previous break, and so on. Whereas after the patch here, if we're working on a 1000-line paragraph, we'll get spacing/hyphenation data for the entire 1000 chars; then we find a break after, say, 70; so next time, we get the data again for the remaining 930 chars, then 860, etc. Ouch. Let's take a step back and reconsider this; while it looks great (and simplifies things nicely) for the common case of small text runs, it could significantly hurt line-breaking perf on large blocks. We should try to avoid that. ::: gfx/thebes/gfxTextRun.cpp:856 (Diff revision 8) > aMaxLength = std::min(aMaxLength, GetLength() - aStart); > > NS_ASSERTION(aStart + aMaxLength <= GetLength(), "Substring out of range"); > > - Range bufferRange(aStart, aStart + > - std::min<uint32_t>(aMaxLength, MEASUREMENT_BUFFER_SIZE)); > + Range bufferRange(aStart, aStart + aMaxLength); > + AutoTArray<PropertyProvider::Spacing, 100> spacingBuffer; This is a pretty 'hot' part of text layout, so I'd like to make sure that heap allocation is almost vanishingly rare here for real content, except for pathological cases where the site is basically trying to DoS the browser with massive chunks of text. So it might be worth instrumenting the code to record buffer lengths and running something like talos Tp to see what shows up. I suspect lengths of several hundred are not uncommon, and maybe an auto-buffer size more like 4K (compare BIG_TEXT_NODE_SIZE) might be worthwhile. ::: gfx/thebes/gfxTextRun.cpp:859 (Diff revision 8) > - GetAdjustedSpacing(this, bufferRange, aProvider, spacingBuffer); > + if (spacingBuffer.AppendElements(bufferRange.Length())) { > + GetAdjustedSpacing(this, bufferRange, aProvider, spacingBuffer.Elements()); > + } If the allocation fails here, I think you can just reset `haveSpacing` to false (and then won't need the `!spacingBuffer.IsEmpty()` check later). ::: gfx/thebes/gfxTextRun.cpp:863 (Diff revision 8) > if (haveSpacing) { > - GetAdjustedSpacing(this, bufferRange, aProvider, spacingBuffer); > + if (spacingBuffer.AppendElements(bufferRange.Length())) { > + GetAdjustedSpacing(this, bufferRange, aProvider, spacingBuffer.Elements()); > + } > } > - bool hyphenBuffer[MEASUREMENT_BUFFER_SIZE]; > + AutoTArray<bool, 100> hyphenBuffer; Again, maybe a larger auto-buffer. ::: gfx/thebes/gfxTextRun.cpp:869 (Diff revision 8) > bool haveHyphenation = aProvider && > (aProvider->GetHyphensOption() == StyleHyphens::Auto || > (aProvider->GetHyphensOption() == StyleHyphens::Manual && > (mFlags & gfxTextRunFactory::TEXT_ENABLE_HYPHEN_BREAKS) != 0)); > if (haveHyphenation) { > + if (hyphenBuffer.AppendElements(bufferRange.Length())) { Similarly, turn off `haveHyphenation` if allocation fails.

Attachment #8825284 - Flags: review?(jfkthame)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 50

•

8 years ago

mozreview-review-reply

Comment on attachment 8825284 [details] Bug 1056516 - use AutoTArray for hyphenBuffer in BreakAndMeasureText. https://reviewboard.mozilla.org/r/103452/#review115394 I agree this might hurt line-breaking perf significantly under certain cases. It also makes no sense to always store unnecessary information in every round of BreakAndMeasureText. The current method is proposed to solve the extremely long word case. To do so, we could either (1) maintain/store extra information between neighboring buffers, and access them back and forth if needed, or (2) use an AutoTArray to store hyphen-breaks information of the whole text run, so boundaries of the neighboring buffers are gone, no extra information needed, no additional overhead for accessing neighboring buffers, problem solved. From Comment 49 (and along with the Talos Tp test results [1]), it seems (2) is not an option anymore. I'm not sure (1) is still something worth doing for now, since that could hurt the performance as well. Maybe or maybe not there's an option (3), which might need more time to figure out. What I'm tring to ask is: 1. Do we still want to take extremely long words into consideration here? I suggest we leave this rare case in the real world as it is for now. Maybe file a separate follow-up bug to track this issue, and move the test for extremely-long-word to there as well. 2. Do we want to enlarge the size of MEASUREMENT_BUFFER_SIZE? It seems to me the MEASUREMENT_BUFFER_SIZE is representing a tradeoff of `mem usage` and a guess about `normally how many chars it would take to find a line break`. Enlarging the size would further make the extremely-long-word cases even more rare, since the current implementation could well handle cases for words that have lengths <= MEASUREMENT_BUFFER_SIZE. A larger MEASUREMENT_BUFFER_SIZE would definitely take more longer words into consideration. 3. Even if we're going to enlarge the size, what buffer size we'd prefer? I'm changing the MEASUREMENT_BUFFER_SIZE to 4k and send a Talos Tp test for it [2]. Hope this could help us decide a reasonable larger buffersize if we decide to go this way. [1] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=584c35d5187b&newProject=try&newRevision=2ec2cba9908af440969d4ac356add7fdf300aa26&framework=1&showOnlyImportant=0 [2] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=584c35d5187b&newProject=try&newRevision=c97a8a03d53623425f3f4cfb6fac47230fbf61a6&framework=1&showOnlyImportant=0 > This is a pretty 'hot' part of text layout, so I'd like to make sure that heap allocation is almost vanishingly rare here for real content, except for pathological cases where the site is basically trying to DoS the browser with massive chunks of text. > > So it might be worth instrumenting the code to record buffer lengths and running something like talos Tp to see what shows up. I suspect lengths of several hundred are not uncommon, and maybe an auto-buffer size more like 4K (compare BIG_TEXT_NODE_SIZE) might be worthwhile. Comparing the original unchanged codes as a baseline (using MEASUREMENT_BUFFER_SIZE = 100) to the current implementation (using AutoTArray<4096>), the result is shown in [1]. Looks like there's a serious regression on `tp5o XRes opt e10s`. :( [1] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=584c35d5187b&newProject=try&newRevision=2ec2cba9908af440969d4ac356add7fdf300aa26&framework=1&showOnlyImportant=0 > If the allocation fails here, I think you can just reset `haveSpacing` to false (and then won't need the `!spacingBuffer.IsEmpty()` check later). Okay. > Similarly, turn off `haveHyphenation` if allocation fails. Okay.

Xidorn Quan [:xidorn] UTC+10

Comment 51

•

8 years ago

The number of runs for tp5o XRes is too few to be meaningful.

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 52

•

8 years ago

(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #51) > The number of runs for tp5o XRes is too few to be meaningful. Oops, not sure if this is related to the Talos bug (Bug 1317189). Tests were rebuilt successfully on Windows and OSX, but stuck on Linus. :( I'd better to send another round for Linux platform only, since this is the platform that the regression shows.

Jonathan Kew [:jfkthame]

Reporter

Comment 53

•

8 years ago

(In reply to Jeremy Chen [:jeremychen] UTC+8 from comment #50) > 1. Do we still want to take extremely long words into consideration here? > I suggest we leave this rare case in the real world as it is for now. Maybe > file a separate follow-up bug to track this issue, and move the test for > extremely-long-word to there as well. Yes, we could do that if we don't see a better solution here at the moment. > 2. Do we want to enlarge the size of MEASUREMENT_BUFFER_SIZE? > It seems to me the MEASUREMENT_BUFFER_SIZE is representing a tradeoff of > `mem usage` and a guess about `normally how many chars it would take to find > a line break`. Enlarging the size would further make the extremely-long-word > cases even more rare, since the current implementation could well handle > cases for words that have lengths <= MEASUREMENT_BUFFER_SIZE. A larger > MEASUREMENT_BUFFER_SIZE would definitely take more longer words into > consideration. It would, but on the other hand it would increase the overhead on short lines: whenever the buffer extends beyond the actual line-break that we find, it means we have fetched more data than we need, and we'll re-fetch that same data when we come back to get the next line-break. (If we use a buffer of 100 chars, but end up breaking lines after every 50 chars, we're throwing away - and then re-fetching - data for 50 chars each time; increasing the buffer size multiplies that effect.) So let's not do that.

Jeremy Chen (chenpighead@gmail.com)

Assignee

Updated

•

8 years ago

Blocks: 1341231

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 54

•

8 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #53) > (In reply to Jeremy Chen [:jeremychen] UTC+8 from comment #50) > > 1. Do we still want to take extremely long words into consideration here? > > I suggest we leave this rare case in the real world as it is for now. Maybe > > file a separate follow-up bug to track this issue, and move the test for > > extremely-long-word to there as well. > > Yes, we could do that if we don't see a better solution here at the moment. Filed Bug 1341231. > > 2. Do we want to enlarge the size of MEASUREMENT_BUFFER_SIZE? > > It seems to me the MEASUREMENT_BUFFER_SIZE is representing a tradeoff of > > `mem usage` and a guess about `normally how many chars it would take to find > > a line break`. Enlarging the size would further make the extremely-long-word > > cases even more rare, since the current implementation could well handle > > cases for words that have lengths <= MEASUREMENT_BUFFER_SIZE. A larger > > MEASUREMENT_BUFFER_SIZE would definitely take more longer words into > > consideration. > > It would, but on the other hand it would increase the overhead on short > lines: whenever the buffer extends beyond the actual line-break that we > find, it means we have fetched more data than we need, and we'll re-fetch > that same data when we come back to get the next line-break. (If we use a > buffer of 100 chars, but end up breaking lines after every 50 chars, we're > throwing away - and then re-fetching - data for 50 chars each time; > increasing the buffer size multiplies that effect.) So let's not do that. Okay. Thank you for the feedback. I'll update the patch set accordingly.

Jonathan Kew [:jfkthame]

Reporter

Comment 55

•

8 years ago

I wonder.... could we get the "best of both worlds" by using the existing behavior of fetching spacing/hyphenation data in 100-character chunks, to avoid fetching lots of data we don't need for the current line; but instead of *moving* the buffer in the case where we end up needing more, we'd *extend* it at that point and append the next chunk to the existing data? That way, we'd still be able to look back to the beginning of the string if necessary. We could do this if we changed the measurement buffers from fixed-size arrays to auto-arrays, but instead of sizing them for the full length of the text right away, we'd grow them only as needed to fetch additional data. They could have an auto-buffer size of 4K, for example, so that we could extend them to that size without spilling over to heap allocation, even though we'd only use 100 chars for the initial chunk.

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 56

•

8 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #55) > I wonder.... could we get the "best of both worlds" by using the existing > behavior of fetching spacing/hyphenation data in 100-character chunks, to > avoid fetching lots of data we don't need for the current line; but instead > of *moving* the buffer in the case where we end up needing more, we'd > *extend* it at that point and append the next chunk to the existing data? > That way, we'd still be able to look back to the beginning of the string if > necessary. > > We could do this if we changed the measurement buffers from fixed-size > arrays to auto-arrays, but instead of sizing them for the full length of the > text right away, we'd grow them only as needed to fetch additional data. > They could have an auto-buffer size of 4K, for example, so that we could > extend them to that size without spilling over to heap allocation, even > though we'd only use 100 chars for the initial chunk. Great point!!! This sounds promising.

Comment hidden (mozreview-request)

Jonathan Kew [:jfkthame]

Reporter

Comment 62

•

8 years ago

mozreview-review

Comment on attachment 8825284 [details] Bug 1056516 - use AutoTArray for hyphenBuffer in BreakAndMeasureText. https://reviewboard.mozilla.org/r/103452/#review115582 This looks really promising, I think we're close to being ready here. A few suggestions to consider; in particular, can we avoid re-fetching data from the beginning of the buffers when we extend them? ::: gfx/thebes/gfxTextRun.cpp:901 (Diff revision 9) > bufferRange.start = i; > bufferRange.end = std::min(aStart + aMaxLength, > i + MEASUREMENT_BUFFER_SIZE); With the change to how the buffer works (it grows as needed rather than being moved), I think it would make things clearer to leave `bufferRange.start` untouched here, and just update the `.end` field; that way, `bufferRange` will continue to correctly reflect the range of data present in the buffers. That means instead of `AppendElements(bufferRange.Length()` below, you'll want something like `SetLength(bufferRange.Length())` to update the buffer size. Also, probably good to add a `fallible` parameter. Looking at the nsTArray header, it kinda seems like this `AppendElements` method (and the `SetLength` that could replace it) is currently always fallible, but being explicit is kinder to future readers of the code (and we might change the default behavior some day). ::: gfx/thebes/gfxTextRun.cpp:906 (Diff revision 9) > + GetAdjustedSpacing(this, Range(aStart, bufferRange.end), > + aProvider, spacingBuffer.Elements()); This re-fetches spacing for the entire range from `aStart`, but we already have the data from `aStart` to the previous `bufferRange.end` in the buffer, don't we? So can we call something like GetAdjustedSpacing(this, Range(prevEnd, bufferRange.end), aProvider, spacingBuffer.Elements() + prevEnd - bufferRange.start); here, and avoid the repeated work? (Assumes a local `prevEnd` was set before updating `bufferRange.end` above.) ::: gfx/thebes/gfxTextRun.cpp:914 (Diff revision 9) > + aProvider->GetHyphenationBreaks(Range(aStart, bufferRange.end), > + hyphenBuffer); Same question, can we get data only for the newly-added range of the buffer? Or do we depend on getting the data in a single call that spans the boundary where the buffer-extension happened? ::: gfx/thebes/gfxTextRun.cpp:929 (Diff revision 9) > // would trigger an infinite loop. > if (aSuppressBreak != eSuppressAllBreaks && > (aSuppressBreak != eSuppressInitialBreak || i > aStart)) { > bool atNaturalBreak = mCharacterGlyphs[i].CanBreakBefore() == 1; > - bool atHyphenationBreak = !atNaturalBreak && > - haveHyphenation && hyphenBuffer[i - bufferRange.start]; > + bool atHyphenationBreak = !atNaturalBreak && haveHyphenation && > + hyphenBuffer[i - aStart]; With the change suggested above, `bufferRange.start` will stay equal to `aStart`, and only `.end` will move; so this change should no longer be needed. ::: gfx/thebes/gfxTextRun.cpp:965 (Diff revision 9) > - PropertyProvider::Spacing *space = > - &spacingBuffer[i - bufferRange.start]; > + PropertyProvider::Spacing space = > + spacingBuffer[i - aStart]; As above, keep `bufferRange.start` fixed at the true start, so this change is unneeded. If you want to get rid of the pointer usage, I'd go for a const refererence rather than copying the value to a local var.

Attachment #8825284 - Flags: review?(jfkthame)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 63

•

8 years ago

mozreview-review-reply

Comment on attachment 8825284 [details] Bug 1056516 - use AutoTArray for hyphenBuffer in BreakAndMeasureText. https://reviewboard.mozilla.org/r/103452/#review115582 Per discussion on IRC, I'll extend hyphenBuffer only for the newly-added range of the buffer. GetHyphenationBreaks would return the hyphen types {None, Manual, Auto}. The responsibility of telling whether an auto hyphen is AutoWithManualInSameWord or AutoWithoutManualInSameWord is left to BreakAndMeasureText then. The only overhead could be, if we extend the hyphenBuffer, and some auto hyphen positioned in somewhere previous to our current position, we might need to get back to the latest word boundary and re-start the BreakAndMeasureText from there. The overhead highly depends on how long could a word be and how soon can we detect a manual hyphen in a word if there exists one. So, the cost could be significant only if we meet some extremely long word and there's a manual hyphen in the very late position of the word. Hmm... shouldd be rare I think. > This re-fetches spacing for the entire range from `aStart`, but we already have the data from `aStart` to the previous `bufferRange.end` in the buffer, don't we? So can we call something like > > GetAdjustedSpacing(this, Range(prevEnd, bufferRange.end), aProvider, > spacingBuffer.Elements() + prevEnd - bufferRange.start); > > here, and avoid the repeated work? (Assumes a local `prevEnd` was set before updating `bufferRange.end` above.) Second thoughts, I think there might be not much good for us to use AutoTArray for spacingBuffer. Compared with hyphenBuffer, spacingBuffer is more context-free and independent. I think the refactoring work could be left to Bug 1340852. So we can focus on the current bug first. > Same question, can we get data only for the newly-added range of the buffer? Or do we depend on getting the data in a single call that spans the boundary where the buffer-extension happened? Yes, we can. I'll update this in the next version.

Comment hidden (mozreview-request)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 69

•

8 years ago

mozreview-review

Comment on attachment 8825284 [details] Bug 1056516 - use AutoTArray for hyphenBuffer in BreakAndMeasureText. https://reviewboard.mozilla.org/r/103450/#review115888 Couple changes in this version: 1. move the logic of auto-hyphen-classification to gfxTextRun. 2. to achieve (1), we need to export the ability for gfxTextRun to get the style of text, so I add a getter to PropertyProvider. 3. hyphenation-control-2-ref.html has been changed to fix the failures on windows platform. https://treeherder.mozilla.org/#/jobs?repo=try&revision=e1136c30686f2f0012da954a28e500ae0d91c74f&selectedJob=79042339

Comment hidden (mozreview-request)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 75

•

8 years ago

mozreview-review

Comment on attachment 8838629 [details] Bug 1056516 - let auto hyphen honor manual hyphen when hyphens:auto is set. https://reviewboard.mozilla.org/r/113470/#review115990 Fix a wrong array index which causes intermittents on my own tests. https://treeherder.mozilla.org/#/jobs?repo=try&revision=24fbc3cde23e95603da9557725241155648ed3a3

Comment hidden (mozreview-request)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 81

•

8 years ago

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8cceb01ed6b5298f42d279bdac069a04fe48aa53

Jonathan Kew [:jfkthame]

Reporter

Comment 82

•

8 years ago

mozreview-review

Comment on attachment 8825284 [details] Bug 1056516 - use AutoTArray for hyphenBuffer in BreakAndMeasureText. https://reviewboard.mozilla.org/r/103452/#review117378 OK, looks good. One minor suggestion for a tweak that I think will help future readers understand the code, without affecting behavior. ::: gfx/thebes/gfxTextRun.cpp:897 (Diff revision 12) > > - uint32_t i; > + uint32_t i, prevEnd; > for (i = aStart; i < end; ++i) { > if (i >= bufferRange.end) { > // Fetch more spacing and hyphenation data > + prevEnd = bufferRange.end; Actually, I think the code below will be easier to understand if we do uint32_t oldHyphenBufferLength = hyphenBuffer.Length(); to remember the old length instead of end position; that will simplify the expression needed when calling GetHyphenationBreaks. ::: gfx/thebes/gfxTextRun.cpp:905 (Diff revision 12) > i + MEASUREMENT_BUFFER_SIZE); > + // For spacing, we always overwrite the old data with the newly > + // fetched one. However, for hyphenation, hyphenation data sometimes > + // depends on the context in every word (if "hyphens: auto" is set). > + // To ensure we get enough information between neighboring buffers, > + // we grow the hyphenBuffer instead of overwrite it. I think it'd be good to add to this comment something like "Note that this means bufferRange does not correspond to the entire hyphenBuffer, but only to the most recently added portion. Therefore, we need to add the old length to hyphenBuffer.Elements() when getting more data." ::: gfx/thebes/gfxTextRun.cpp:912 (Diff revision 12) > GetAdjustedSpacing(this, bufferRange, aProvider, spacingBuffer); > } > if (haveHyphenation) { > - aProvider->GetHyphenationBreaks(bufferRange, hyphenBuffer); > + if (hyphenBuffer.AppendElements(bufferRange.Length(), fallible)) { > + aProvider->GetHyphenationBreaks(bufferRange, > + hyphenBuffer.Elements() + prevEnd - aStart); With the suggestion above, this becomes hyphenBuffer.Elements() + oldHyphenBufferLength which I think is clearer.

Attachment #8825284 - Flags: review?(jfkthame) → review+

Jonathan Kew [:jfkthame]

Reporter

Comment 83

•

8 years ago

mozreview-review

Comment on attachment 8838628 [details] Bug 1056516 - use HyphenType to store different types of hyphenations. https://reviewboard.mozilla.org/r/113468/#review117384 ::: layout/generic/nsTextFrame.cpp:3619 (Diff revision 7) > // by other skipped characters, we won't use them. > allowHyphenBreakBeforeNextChar = > mFrag->CharAt(run.GetOriginalOffset() + run.GetRunLength() - 1) == CH_SHY; > } else { > int32_t runOffsetInSubstring = run.GetSkippedOffset() - aRange.start; > - memset(aBreakBefore + runOffsetInSubstring, false, > + memset(aBreakBefore + runOffsetInSubstring, clean up the trailing space ::: layout/generic/nsTextFrame.cpp:3626 (Diff revision 7) > - run.GetSkippedOffset() > mStart.GetSkippedOffset()); > + run.GetSkippedOffset() > mStart.GetSkippedOffset()) ? > + HyphenType::Manual : HyphenType::None; The line break should go before the ?, according to our usual style. And it might be more readable to separate the two results as well, even though line-length doesn't require it: aBreakBefore[...] = ....... ? HyphenType::Manual : HyphenType::None; IMO that makes the parts of the conditional operator much easier to spot.

Attachment #8838628 - Flags: review?(jfkthame) → review+

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 84

•

8 years ago

mozreview-review-reply

Comment on attachment 8825284 [details] Bug 1056516 - use AutoTArray for hyphenBuffer in BreakAndMeasureText. https://reviewboard.mozilla.org/r/103452/#review117378 Yeah, the following suggestions indeed improves the readability of the code. Thanks for the comment.

Comment hidden (mozreview-request)

Jonathan Kew [:jfkthame]

Reporter

Comment 90

•

8 years ago

mozreview-review

Comment on attachment 8838629 [details] Bug 1056516 - let auto hyphen honor manual hyphen when hyphens:auto is set. https://reviewboard.mozilla.org/r/113470/#review117388 A few minor details noted below, but more importantly, with this patch applied I'm seeing some weird behavior in edge cases that needs to be investigated/fixed. Try examples like data:text/html,<div lang="en" style="font-family:monospace;hyphens:auto;width:0ch;">tick-tock or the even more extreme data:text/html,<div lang="en" style="font-family:monospace;hyphens:auto;width:0ch;">a-b and observe the spurious hyphens we get. :( ::: gfx/thebes/gfxTextRun.h:23 (Diff revision 7) > #include "mozilla/MemoryReporting.h" > #include "DrawMode.h" > #include "harfbuzz/hb.h" > #include "nsUnicodeScriptCodes.h" > #include "nsColor.h" > +#include "nsStyleStruct.h" // for nsStyleText Not needed, I think, see below. (Actually, even if you did need to add the StyleText() method, which I doubt, you'd only need a forward declaration of nsStyleText, not the whole #include.) ::: gfx/thebes/gfxTextRun.h:203 (Diff revision 7) > + // Get style of text that we are actually looking at. > + virtual const nsStyleText* StyleText() = 0; I don't think you need to add this; there's already a GetHyphensOption() method that returns the only field you need from the text styles. This prompted me to look at PropertyProvider and its implementations, and it's pretty messy. It would be nice to annotate the overrides to make it clear what they are, remove any redundant methods (e.g. there's already a StyleText() accessor in the nsTextFrame implementation that nobody currently uses, afaics), make methods and data members 'const' where possible, declare the implementations 'final', etc. Maybe worth a followup bug to do some cleanup? ::: gfx/thebes/gfxTextRun.h:354 (Diff revision 7) > + bool& aHasManualHyphenInSameWord, > + uint32_t& aMostRecentWordBoundary); Probably better to pass these as pointers, so that it's clear at the call site that they may be modified. ::: gfx/thebes/gfxTextRun.cpp:847 (Diff revision 7) > + uint32_t start = aRange.start == aStart ? > + aRange.start : std::min<uint32_t>(aRange.start, aMostRecentWordBoundary); Is the `aRange.start == aStart` test really needed here? It looks to me like just doing the `std::min` should be enough. (And maybe add an assertion that `aMostRecentWordBoundary >= aStart`, which I think is always true.) ::: gfx/thebes/gfxTextRun.cpp:851 (Diff revision 7) > + > + uint32_t start = aRange.start == aStart ? > + aRange.start : std::min<uint32_t>(aRange.start, aMostRecentWordBoundary); > + > + for (uint32_t i = start; i < aRange.end; ++i) { > + if (mCharacterGlyphs[i].CharIsSpace()) { We should probably care about CharIsTab and CharIsNewline here as well.

Attachment #8838629 - Flags: review?(jfkthame)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 91

•

8 years ago

mozreview-review-reply

Comment on attachment 8838629 [details] Bug 1056516 - let auto hyphen honor manual hyphen when hyphens:auto is set. https://reviewboard.mozilla.org/r/113470/#review117388 Okay, I see. I think this is a side effect of introducing explicit hyphen as a break opportunity. However, the explicit hyphen is used only if there exists at least one auto hyphen in the same word, right? > I don't think you need to add this; there's already a GetHyphensOption() method that returns the only field you need from the text styles. > > This prompted me to look at PropertyProvider and its implementations, and it's pretty messy. It would be nice to annotate the overrides to make it clear what they are, remove any redundant methods (e.g. there's already a StyleText() accessor in the nsTextFrame implementation that nobody currently uses, afaics), make methods and data members 'const' where possible, declare the implementations 'final', etc. Maybe worth a followup bug to do some cleanup? Good idea. Filed Bug 1343516. > Probably better to pass these as pointers, so that it's clear at the call site that they may be modified. Okay. > Is the `aRange.start == aStart` test really needed here? It looks to me like just doing the `std::min` should be enough. > > (And maybe add an assertion that `aMostRecentWordBoundary >= aStart`, which I think is always true.) Okay. > We should probably care about CharIsTab and CharIsNewline here as well. Okay.

Comment hidden (mozreview-request)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 97

•

8 years ago

mozreview-review

Comment on attachment 8838629 [details] Bug 1056516 - let auto hyphen honor manual hyphen when hyphens:auto is set. https://reviewboard.mozilla.org/r/113470/#review118188 Add an extra check to ensure that we only raise explicit hyphen breaks for those words that have at least one auto hyphen breaks within them. This is definitely adding a potential performance risk for extreme cases, such like an extreme long word with `NO` auto hyphen in the same word, which seems pretty rare to me.

Jonathan Kew [:jfkthame]

Reporter

Comment 98

•

8 years ago

mozreview-review

Comment on attachment 8838629 [details] Bug 1056516 - let auto hyphen honor manual hyphen when hyphens:auto is set. https://reviewboard.mozilla.org/r/113470/#review118194 ::: gfx/thebes/gfxTextRun.h:351 (Diff revisions 8 - 9) > - bool& aHasManualHyphenInSameWord, > - uint32_t& aMostRecentWordBoundary); > + uint32_t* aMostRecentWordBoundary, > + bool* aHasManualHyphenInSameWord, > + bool* aHasExplicitHyphenInSameWord, > + bool* aHasAutoHyphenInSameWord); With the number of in/out parameters used here to maintain state, I'm starting to think we should instead create a struct, something like struct HyphenationState { uint32_t mostRecentBoundary; bool hasManualHyphen; bool hasExplicitHyphen; bool hasAutoHyphen; }; to collect all these flags and just pass a pointer to a single `wordState` variable when calling Classify. (I haven't studied and understood the rest of the changes here yet, but this struck me on first looking at the patch, so I'll go ahead and post it before reviewing further...)

Comment hidden (mozreview-request)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Updated

•

8 years ago

Status: NEW → ASSIGNED

Jonathan Kew [:jfkthame]

Reporter

Comment 104

•

8 years ago

mozreview-review

Comment on attachment 8838629 [details] Bug 1056516 - let auto hyphen honor manual hyphen when hyphens:auto is set. https://reviewboard.mozilla.org/r/113470/#review120478 Looks good, thanks (and sorry for the delay here).

Attachment #8838629 - Flags: review?(jfkthame) → review+

Jonathan Kew [:jfkthame]

Reporter

Comment 105

•

8 years ago

mozreview-review

Comment on attachment 8838630 [details] Bug 1056516 - add tests to our local reftests folder. https://reviewboard.mozilla.org/r/113472/#review120480

Attachment #8838630 - Flags: review?(jfkthame) → review+

Jonathan Kew [:jfkthame]

Reporter

Comment 106

•

8 years ago

mozreview-review

Comment on attachment 8839080 [details] Bug 1056516 - submit Hyphenation Control reftests to CSSWG. https://reviewboard.mozilla.org/r/113832/#review120482

Attachment #8839080 - Flags: review?(jfkthame) → review+

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 107

•

8 years ago

Thank you for the review!!! :) Since it has been a while, I'll re-base the patchset and send another try run before landing.

Comment hidden (mozreview-request)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 113

•

8 years ago

Try looks good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a4ef9642d75987ad05786a17718123eb65439724 except for the known issue on Windows 2012 platform (tracked in Bug 1300466).

Pulsebot

Comment 114

•

8 years ago

Pushed by jichen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/aaec04f76481 use AutoTArray for hyphenBuffer in BreakAndMeasureText. r=jfkthame https://hg.mozilla.org/integration/autoland/rev/41d39e1e440b use HyphenType to store different types of hyphenations. r=jfkthame https://hg.mozilla.org/integration/autoland/rev/0bcf5f91b014 let auto hyphen honor manual hyphen when hyphens:auto is set. r=jfkthame https://hg.mozilla.org/integration/autoland/rev/753067cc11f7 add tests to our local reftests folder. r=jfkthame https://hg.mozilla.org/integration/autoland/rev/70586dbb509c submit Hyphenation Control reftests to CSSWG. r=jfkthame

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 116

•

8 years ago

I think this bug fixes/changes the behavior of "hyphens: auto" under certain situations, might worth some documentation work.

Keywords: dev-doc-needed

Carsten Book [:Tomcat]

Comment 117

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/aaec04f76481 https://hg.mozilla.org/mozilla-central/rev/41d39e1e440b https://hg.mozilla.org/mozilla-central/rev/0bcf5f91b014 https://hg.mozilla.org/mozilla-central/rev/753067cc11f7 https://hg.mozilla.org/mozilla-central/rev/70586dbb509c

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

status-firefox55: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla55

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 118

•

8 years ago

When importing into https://hg.csswg.org/test/ , I got: remote: vendor-imports/mozilla/mozilla-central-reftests/text3/hyphenation-control-1.html status changed to 'Needs Work' due to error: remote: Not linked to a specification. It looks like the test is missing a link rel="help" linking it to a specification.

Flags: needinfo?(jeremychen)

Ryan VanderMeulen [:RyanVM]

Comment 119

•

8 years ago

Linux/OSX nightlies today are seeing topcrashes with a [@ gfxTextRun::BreakAndMeasureText ] signature that look highly likely to be from this bug. Backed out so we can respin hopefully less-crashy nightlies. https://crash-stats.mozilla.com/report/index/9fa0418b-3824-467b-abd2-b93d12170310 https://hg.mozilla.org/mozilla-central/rev/a8d497b09753c91783b68c5805c64f34a2f39629

Status: RESOLVED → REOPENED

status-firefox55: fixed → ---

Resolution: FIXED → ---

Target Milestone: mozilla55 → ---

Jonathan Kew [:jfkthame]

Reporter

Updated

•

8 years ago

Depends on: 1346248

Jonathan Kew [:jfkthame]

Reporter

Comment 120

•

8 years ago

Attached patch followup, don't re-accumulate character widths when we back up in BreakAndMeasureText to re-scan from start of word for hyphenation purposes — Details — Splinter Review

When we need to collect a new chunk of spacing and hyphenation data in BreakAndMeasureText, and then potentially move skip backwards in the textrun to re-scan from start of word (to handle hyphen types properly), we should _not_ re-process the widths of the rescanned characters, because they have already been accumulated into the total advance. So we need to keep track of when we're doing such a re-scan, and skip the width accumulation in that case. Here's a proposed patch (on top of the patches that originally landed here) to address this problem.

Attachment #8846104 - Flags: review?(jeremychen)

Ryan VanderMeulen [:RyanVM]

Comment 121

•

8 years ago

We didn't see any crashes in automation with the signature in question, which makes me sad. Any chance we can add some test coverage for the scenario that was causing us to crash here?

Flags: needinfo?(jfkthame)

Mats Palmgren (inactive)

Comment 122

•

8 years ago

I'm a bit concerned about this change: >- bool hyphenBuffer[MEASUREMENT_BUFFER_SIZE]; >+ AutoTArray<bool, 4096> hyphenBuffer; That makes 'hyphenBuffer' 4112 *bytes* in a 64-bit build. That's rather hefty for a stack variable that's only rarely needed. Can we use nsTArray instead and only call SetCapacity() on it if we actually need it? (i.e. when 'haveHyphenation' is true)

Jonathan Kew [:jfkthame]

Reporter

Comment 123

•

8 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #121) > We didn't see any crashes in automation with the signature in question, > which makes me sad. Any chance we can add some test coverage for the > scenario that was causing us to crash here? Yes, we can add the testcase from bug 1346248 as a crashtest here. (In reply to Mats Palmgren (:mats) from comment #122) > I'm a bit concerned about this change: > > >- bool hyphenBuffer[MEASUREMENT_BUFFER_SIZE]; > >+ AutoTArray<bool, 4096> hyphenBuffer; > > That makes 'hyphenBuffer' 4112 *bytes* in a 64-bit build. > That's rather hefty for a stack variable that's only rarely needed. > > Can we use nsTArray instead and only call SetCapacity() on it if we > actually need it? (i.e. when 'haveHyphenation' is true) I guess that's probably reasonable; haveHyphenation is likely to stay false in the great majority of cases. (FWIW, the size here was based on BIG_TEXT_NODE_SIZE as used in nsTextFrame for a number of AutoTArray sizes, including a hyphenation buffer in nsTextFrame::AddInlineMinISizeForFlow that is likewise probably unused most of the time; perhaps we should switch that to nsTArray as well, for the same reason?)

Flags: needinfo?(jfkthame)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 124

•

8 years ago

Comment on attachment 8846104 [details] [diff] [review] followup, don't re-accumulate character widths when we back up in BreakAndMeasureText to re-scan from start of word for hyphenation purposes Review of attachment 8846104 [details] [diff] [review]: ----------------------------------------------------------------- Looks good. Thank you.

Attachment #8846104 - Flags: review?(jeremychen) → review+

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 125

•

8 years ago

(In reply to David Baron :dbaron: ⌚️UTC-8 from comment #118) > When importing into https://hg.csswg.org/test/ , I got: > > remote: > vendor-imports/mozilla/mozilla-central-reftests/text3/hyphenation-control-1. > html status changed to 'Needs Work' due to error: > remote: Not linked to a specification. > > It looks like the test is missing a link rel="help" linking it to a > specification. Okay. Will add this before re-landing.

Flags: needinfo?(jeremychen)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 126

•

8 years ago

Attached patch import crashtest. — Details — Splinter Review

This test is imported from the fuzzy-test in Bug 1346248. MozReview-Commit-ID: 806wV6pWWaz

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 127

•

8 years ago

Comment on attachment 8846249 [details] [diff] [review] import crashtest. This test is imported from the fuzzy-test in Bug 1346248. Not sure if I added the right authorship (just refer to the original patch in Bug 1346248).

Attachment #8846249 - Flags: review?(jfkthame)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 128

•

8 years ago

Comment on attachment 8846249 [details] [diff] [review] import crashtest. Hi Jason, could you help feedback the authorship of this patch? Thanks.

Attachment #8846249 - Flags: feedback?(jkratzer)

Jonathan Kew [:jfkthame]

Reporter

Comment 129

•

8 years ago

Comment on attachment 8846249 [details] [diff] [review] import crashtest. Review of attachment 8846249 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/tests/crashtests/crashtests.list @@ +133,5 @@ > load 1308394.html > load 1317403-1.html # bug 1331533 > load 1325159-1.html > load 1331683.html > +load 1056516.html Please insert the new entry in its properly-sorted place in the list.

Attachment #8846249 - Flags: review?(jfkthame) → review+

Jonathan Kew [:jfkthame]

Reporter

Comment 130

•

8 years ago

(In reply to Jeremy Chen [:jeremychen] UTC+8 from comment #124) > Comment on attachment 8846104 [details] [diff] [review] > followup, don't re-accumulate character widths when we back up in > BreakAndMeasureText to re-scan from start of word for hyphenation purposes > > Review of attachment 8846104 [details] [diff] [review]: > ----------------------------------------------------------------- > > Looks good. Thank you. BTW, I think you should fold this into the part-3 patch before re-landing.

Calixte Denizet (:calixte)

Updated

•

8 years ago

Depends on: 1346520

Comment hidden (mozreview-request)

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 137

•

8 years ago

mozreview-review

Comment on attachment 8825284 [details] Bug 1056516 - use AutoTArray for hyphenBuffer in BreakAndMeasureText. https://reviewboard.mozilla.org/r/103450/#review121314 Address Jonathan's comment (comment 129) and David's comment (comment 118). It's kind of sad that MozReview can't carry r+....:(

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 138

•

8 years ago

Try looks good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=bc94503d8562d3f7317bf208917dfb03d3b6cf1f

Pulsebot

Comment 139

•

8 years ago

Pushed by jichen@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/d9721bdd58d5 use AutoTArray for hyphenBuffer in BreakAndMeasureText. r=jfkthame https://hg.mozilla.org/integration/mozilla-inbound/rev/35671fd90af7 use HyphenType to store different types of hyphenations. r=jfkthame https://hg.mozilla.org/integration/mozilla-inbound/rev/31ceca5176da let auto hyphen honor manual hyphen when hyphens:auto is set. r=jfkthame https://hg.mozilla.org/integration/mozilla-inbound/rev/f50781ae2a85 add tests to our local reftests folder. r=jfkthame https://hg.mozilla.org/integration/mozilla-inbound/rev/76193c32cfa5 submit Hyphenation Control reftests to CSSWG. r=jfkthame https://hg.mozilla.org/integration/mozilla-inbound/rev/a197cc799cdf add crashtest. r=jfkthame

Carsten Book [:Tomcat]

Comment 140

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d9721bdd58d5 https://hg.mozilla.org/mozilla-central/rev/35671fd90af7 https://hg.mozilla.org/mozilla-central/rev/31ceca5176da https://hg.mozilla.org/mozilla-central/rev/f50781ae2a85 https://hg.mozilla.org/mozilla-central/rev/76193c32cfa5 https://hg.mozilla.org/mozilla-central/rev/a197cc799cdf

Status: REOPENED → RESOLVED

Closed: 8 years ago → 8 years ago

status-firefox55: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla55

Jeremy Chen (chenpighead@gmail.com)

Assignee

Updated

•

8 years ago

Attachment #8846249 - Flags: feedback?(jkratzer)

Chris Mills [:cmills]

Comment 141

•

8 years ago

(In reply to Jeremy Chen [:jeremychen] UTC+8 from comment #116) > I think this bug fixes/changes the behavior of "hyphens: auto" under certain > situations, might worth some documentation work. Hi Jeremy, I've had a look at the documentation for the hyphens property, and updated both the auto value description and the line break opportunities description to make the behaviour more explicit: https://developer.mozilla.org/en-US/docs/Web/CSS/hyphens#Values https://developer.mozilla.org/en-US/docs/Web/CSS/hyphens#Suggesting_line_break_opportunities I've not changed it much — what this big fix describes is really what the page was already saying. - I just made it more explicit. I've also not added a note to the Fx55 release notes, as this is a bug fix, rather than a new feature, removal, etc. Let me know if that sounds OK to you, or if you feel that more is needed. Thanks!

Flags: needinfo?(jeremychen)

Keywords: dev-doc-needed → dev-doc-complete

Jeremy Chen (chenpighead@gmail.com)

Assignee

Comment 142

•

8 years ago

(In reply to Chris Mills (Mozilla, MDN editor) [:cmills] from comment #141) > Hi Jeremy, > > I've had a look at the documentation for the hyphens property, and updated > both the auto value description and the line break opportunities description > to make the behaviour more explicit: > > https://developer.mozilla.org/en-US/docs/Web/CSS/hyphens#Values > https://developer.mozilla.org/en-US/docs/Web/CSS/ > hyphens#Suggesting_line_break_opportunities > > I've not changed it much — what this big fix describes is really what the > page was already saying. - I just made it more explicit. I've also not added > a note to the Fx55 release notes, as this is a bug fix, rather than a new > feature, removal, etc. > > Let me know if that sounds OK to you, or if you feel that more is needed. > Thanks! The MDN change looks good to me. I agree that this is more like a bug fix, so not adding this to the release notes is fine. Thank you for doing this. :)

Flags: needinfo?(jeremychen)

Jonathan Kew [:jfkthame]

Reporter

Updated

•

7 years ago

Depends on: 1415581

Jonathan Kew [:jfkthame]

Reporter

Updated

•

6 years ago

Depends on: 1476437

Jonathan Kew [:jfkthame]

Reporter

Updated

•

6 years ago

Depends on: 1478574

Bug 1056516 - use AutoTArray for hyphenBuffer in BreakAndMeasureText. 8 years ago Jeremy Chen (chenpighead@gmail.com) 59 bytes, text/x-review-board-request	jfkthame : review+	Details
wip. 8 years ago Jeremy Chen (chenpighead@gmail.com) 13.39 KB, patch		Details \| Diff \| Splinter Review
Bug 1056516 - use HyphenType to store different types of hyphenations. 8 years ago Jeremy Chen (chenpighead@gmail.com) 59 bytes, text/x-review-board-request	jfkthame : review+	Details
Bug 1056516 - let auto hyphen honor manual hyphen when hyphens:auto is set. 8 years ago Jeremy Chen (chenpighead@gmail.com) 59 bytes, text/x-review-board-request	jfkthame : review+	Details
Bug 1056516 - add tests to our local reftests folder. 8 years ago Jeremy Chen (chenpighead@gmail.com) 59 bytes, text/x-review-board-request	jfkthame : review+	Details
Bug 1056516 - submit Hyphenation Control reftests to CSSWG. 8 years ago Jeremy Chen (chenpighead@gmail.com) 59 bytes, text/x-review-board-request	jfkthame : review+	Details
followup, don't re-accumulate character widths when we back up in BreakAndMeasureText to re-scan from start of word for hyphenation purposes 8 years ago Jonathan Kew [:jfkthame] 3.76 KB, patch	chenpighead : review+	Details \| Diff \| Splinter Review
import crashtest. 8 years ago Jeremy Chen (chenpighead@gmail.com) 1.59 KB, patch	jfkthame : review+	Details \| Diff \| Splinter Review
Bug 1056516 - add crashtest. 8 years ago Jeremy Chen (chenpighead@gmail.com) 59 bytes, text/x-review-board-request		Details