Whitespace around the segment break should be removed before segment break transformation

RESOLVED FIXED in Firefox 53

Status

()

Core
Layout: Text
P2
enhancement
RESOLVED FIXED
7 months ago
5 months ago

People

(Reporter: kanru, Assigned: jeremychen)

Tracking

(Depends on: 1 bug)

unspecified
mozilla53
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox53 fixed)

Details

(URL)

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(5 attachments, 1 obsolete attachment)

(Reporter)

Description

7 months ago
Quote https://bugzilla.mozilla.org/show_bug.cgi?id=1081858#c37

(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #37)
> Created attachment 8809207 [details]
> testcase
> 
> It seems to me this fix is not complete. See the testcase. If there is any
> whitespace around the segment break, there will still be a whitespace left.
> 
> Per CSS Text 3, those whitespaces should have been removed in the first step
> of Phase I, so existence of them should not affect the result of the segment
> break transformation applied in the second step.
(Assignee)

Comment 1

6 months ago
Start to work on this.

Quote from the spec. (https://drafts.csswg.org/css-text/#white-space-phase-1), "All spaces and tabs immediately preceding or following a segment break are removed". Thanks to Kan-Ru's work in Bug 1081858, a segment break is removed if the character immediately before or immediately after the segment break is the zero-width space character (U+200B).

So, after a quick peek, I think we shall make sure that all spaces and tabs immediately preceding or following a segment break should turn into zero-width space character (U+200B) before segment break transformation. Haven't thought through the whole implementation design.
Assignee: nobody → jeremychen
(Assignee)

Updated

6 months ago
(Assignee)

Comment 2

6 months ago
Maybe we shall patch these IsTrimmableSpace functions [1], so they can treat "spaces and tabs immediately preceding or following a segment break" as trimmable.


[1] http://searchfox.org/mozilla-central/rev/ef3cede9a5b050fffade9055ca274654f2bc719e/layout/generic/nsTextFrame.cpp#804-838
(Assignee)

Comment 3

6 months ago
(In reply to Jeremy Chen [:jeremychen] UTC+8 from comment #2)
> Maybe we shall patch these IsTrimmableSpace functions [1], so they can treat
> "spaces and tabs immediately preceding or following a segment break" as
> trimmable.
> 
> 
> [1]
> http://searchfox.org/mozilla-central/rev/
> ef3cede9a5b050fffade9055ca274654f2bc719e/layout/generic/nsTextFrame.cpp#804-
> 838

IsTrimmableSpace functions seems aim to be used for trimming spaces preceding/tailing spaces, not for spaces between words. Maybe let text transformation, i.e., TransformText(), do some tricks in here http://searchfox.org/mozilla-central/rev/0c055ccbcf96846044fc9a71421bd9b7978686f7/layout/generic/nsTextFrameUtils.cpp#126-146 could fix this.
Comment hidden (mozreview-request)
(Assignee)

Updated

6 months ago
Attachment #8816402 - Flags: feedback?(xidorn+moz)

Comment 5

6 months ago
mozreview-review
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review97478

::: layout/generic/nsTextFrameUtils.h:105
(Diff revision 1)
> +  static bool
> +  IsSpaceBeforeOrAfterSegmentBreak(const char16_t* aChars,
> +                                   uint32_t aPos, uint32_t aLength)
> +  {
> +    MOZ_ASSERT(aLength > 0,
> +               "Why do we call this for a zero-length text run?");
> +    MOZ_ASSERT(IsSpaceOrTab(aChars[aPos]),
> +               "Why do we call this for a non-space character?");
> +    uint32_t i = aPos - 1;
> +    while (IsSpaceOrTab(aChars[i]) && i > 0) {
> +      i--;
> +    }
> +    if (IsSegmentBreak(aChars[i])) {
> +      return true;
> +    }
> +    i = aPos + 1;
> +    while (IsSpaceOrTab(aChars[i]) && i < aLength - 1) {
> +      i++;
> +    }
> +    if (IsSegmentBreak(aChars[i])) {
> +      return true;
> +    }
> +    return false;
> +  }

This function is too long to be put in header... It doesn't seem to me it needs to be here.

::: layout/generic/nsTextFrameUtils.cpp:143
(Diff revision 1)
> -        if (inWhitespace) {
> +        if (inWhitespace ||
> +            // According to CSS Text 3 - 4.1.1. Phase I: Collapsing and
> +            // Transformation, all spaces/tabs immediately preceding or
> +            // following a segment break should be removed if white space
> +            // characters are considered collapsible.
> +            IsSpaceBeforeOrAfterSegmentBreak(&aText[i], i, aLength)) {

This seems to be an O(n^2) algorithm... if you have 1000 whitespace before a segment break, you would check every whitespace 500 times in average.

It occurs to me that we actually can have a nice two phases impl follows the spec with O(n) algorithm. We can probably do a linear scan when we meet a whitespace and record skippable whitespaces, then check things based on that.

Actually how does the code you added work? It seems to me IsSegmentBreakSkipChar above is still checking content in aText which is the input string?

::: layout/reftests/text/segment-break-transformation-2.html:10
(Diff revision 1)
> +<p>一些    
> +中文</p>

Probably add a comment above indicating that this tailing spaces is by intend for testing the transformation rule, so that people wouldn't accidently trim it.

Actually I was thinking about adding a comment at the end of line so that they wouldn't be recognized as tailing spaces at all. But that would break this text node into two, which may make it test something different.

And please add another line to check what happens if both sides of the segment break have whitespaces.

Oh and probably you can also add tests to check what whould happen if some comment is added at different places, e.g. between the space and Chinese character, between two spaces, between space and the segment break.

We could have several test file which shares the same reference file.

In addition, I think the tests should probably go to reftests/w3c-css/submitted, so that we submit them to w3c's test repo.
Attachment #8816402 - Flags: feedback?(xidorn+moz) → feedback-
(Assignee)

Comment 6

6 months ago
mozreview-review-reply
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review97478

> This seems to be an O(n^2) algorithm... if you have 1000 whitespace before a segment break, you would check every whitespace 500 times in average.
> 
> It occurs to me that we actually can have a nice two phases impl follows the spec with O(n) algorithm. We can probably do a linear scan when we meet a whitespace and record skippable whitespaces, then check things based on that.
> 
> Actually how does the code you added work? It seems to me IsSegmentBreakSkipChar above is still checking content in aText which is the input string?

IsSegmentBreakSkipChar above can take care of '\n' with a ZERO_WIDTH_SPACE right preceding/following it. But the problem here is that if there're consecutive white spaces preceding/following these ZERO_WIDTH_SPACEs. So, what I was trying to do is skip those consecutive white spaces before/after we check for '\n'. I think that's the reason why this patch could work.

Hmm... you're right about the complexity. We probably should refactor the code into a two phases impl as the spec. says.
(Assignee)

Comment 7

6 months ago
(In reply to Jeremy Chen [:jeremychen] UTC+8 from comment #6)
> Comment on attachment 8816402 [details]
> Bug 1316482 - remove collapsible white spaces as CSS Text Module Level 3
> specifies.
> 
> https://reviewboard.mozilla.org/r/97160/#review97478
> 
> > This seems to be an O(n^2) algorithm... if you have 1000 whitespace before a segment break, you would check every whitespace 500 times in average.
> > 
> > It occurs to me that we actually can have a nice two phases impl follows the spec with O(n) algorithm. We can probably do a linear scan when we meet a whitespace and record skippable whitespaces, then check things based on that.
> > 
> > Actually how does the code you added work? It seems to me IsSegmentBreakSkipChar above is still checking content in aText which is the input string?
> 
> IsSegmentBreakSkipChar above can take care of '\n' with a ZERO_WIDTH_SPACE
> right preceding/following it. But the problem here is that if there're
> consecutive white spaces preceding/following these ZERO_WIDTH_SPACEs. So,
> what I was trying to do is skip those consecutive white spaces before/after
> we check for '\n'. I think that's the reason why this patch could work.
> 
> Hmm... you're right about the complexity. We probably should refactor the
> code into a two phases impl as the spec. says.

This morning, I implemented a O(n) algorithm and found it didn't work. So, I double checked the implementation and found that there's a mistake in my earlier patch (the one got f-). I think that's the reason why that implementation works (which it shouldn't)...

Go back to do the refactoring approach.
(Assignee)

Comment 8

6 months ago
mozreview-review-reply
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review97478

> This function is too long to be put in header... It doesn't seem to me it needs to be here.

You're right. This has been removed in the next version.
Comment hidden (mozreview-request)
(Assignee)

Comment 10

6 months ago
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

Remove the test part in this patch.
I'll check the feedback in comment 5 and write another patch for the tests.

Hi xidorn, please take a look again, thanks.
Attachment #8816402 - Flags: feedback?(xidorn+moz)
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

This algorithm wouldn't work. aSkipChars tracks each character and records whether one character should be skipped. Doing things in two passes and calling aSkipChars.SkipChar() in both passes would mess up everything.

Also, it seems to me the code you added at the beginning of nsTextFrameUtils::TransformText can completely be put in the else block below, since the "aCompression == COMPRESS_WHITESPACE || aCompression == COMPRESS_WHITESPACE_NEWLINE" block has nothing to do with "aCompression == COMPRESS_NONE || aCompression == COMPRESS_NONE_TRANSFORM_TO_SPACE", right?

I also don't think we are allowed to use "std::vector<bool>". In general, we should use our internal equivalents rather than the standard containers. I guess you probably want to take advantage of its storage efficiency, but I would suggest you don't rely on that, because optimizing "std::vector<bool>" in that way has been considered (and proven) to be a wrong decision, and the C++ committee is always looking forward to killing it. (There has been three papers in the past tried to, though unsuccessfully, deprecate and remove it.)

Actually you don't need that one.

I suggest an algorithm this way: when you find a space or tab, scan text following it and collect necessary information, until you find a normal character (not space, tab, or segment break), and then process this range of text according to the transformation rules (calls aSkipChars.SkipChar() and .KeepChar() accordingly), and advance the loop index to the normal character you find.

This way, we would have one pass for majority of text, with one additional pass for spaces, and it only needs constant additional storage (compare to a vector<bool> in your current code), and we would call aSkipChars in the correct order.
Attachment #8816402 - Flags: feedback?(xidorn+moz) → feedback-
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 14

5 months ago
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

Implement the O(n) algorithm as discussed offline.

A little refactoring has be done by separating the text transformation logic into three pieces: one for white spaces, one for segment breaks, and one for other normal characters.
Attachment #8816402 - Flags: feedback?(xidorn+moz)

Comment 15

5 months ago
mozreview-review
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review99696

Ah, it's a bit hard to follow the algorithm. Could you probably split the handling of range of whitespaces into a separate static function, which guarantees that every character in the given range has been marked keep or skip properly? It can probably call something like TransformWhiteSpaces.

There are several small issues in addition:

::: layout/generic/nsTextFrameUtils.cpp:99
(Diff revision 4)
> +      // white spaces to proceed the white space processing, a consecutive run
> +      // of spaces/tabs/segment breaks is collected in a first pass loop, then
> +      // we apply the collapsing and transformation rules to this run in a
> +      // second pass loop.
> +      if (IsSpaceOrTabOrSegmentBreak(ch)) {
> +        bool isRangeHasSegmentBreak = IsSegmentBreak(ch);

bool hasSegmentBreak;

::: layout/generic/nsTextFrameUtils.cpp:100
(Diff revision 4)
> +        uint32_t j = i;
> +        while (j < aLength - 1 && IsSpaceOrTabOrSegmentBreak(aText[j + 1])) {
> +          j++;
> +          isRangeHasSegmentBreak |= IsSegmentBreak(aText[j]);
> +        }

This look can be a for-loop, which should make it clearer to understand.
Attachment #8816402 - Flags: feedback?(xidorn+moz)
(Assignee)

Comment 16

5 months ago
mozreview-review-reply
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review99696

Thank you for the feedback. Will split the handling to a separate static funciton.

> bool hasSegmentBreak;

Will do.

> This look can be a for-loop, which should make it clearer to understand.

Will do.
Comment hidden (mozreview-request)
Comment hidden (obsolete)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 21

5 months ago
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

Hi xidorn, while I was writing this patch, I found another issue.

For a document like this:

<p>一些\n\n\n中文</p>

Is it correct to transform the document like following?

<p>一些   中文</p>


According to the spec (https://drafts.csswg.org/css-text/#line-break-transform), if a segment break is not removable, it should be converted to a space (U+0020). I can't see any of these '\n' could fit the removing rules, so the current implementation would transform all three of them to white spaces. Not sure if I mis-read the spec.... Could you give me some feedback at this as well? Thank you.
Attachment #8816402 - Flags: feedback?(xidorn+moz)
Comment hidden (mozreview-request)
(Assignee)

Updated

5 months ago
Attachment #8816402 - Flags: feedback?(xidorn+moz)

Comment 23

5 months ago
mozreview-review
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review100908

::: layout/generic/nsTextFrameUtils.cpp:52
(Diff revision 8)
> +  bool hasCharPrev = aBegin > 0;
> +  bool hasCharNext = aEnd + 1 < aLength;
> +  bool hasCharPrevPrev = aBegin > 1;
> +  bool hasCharNextNext = aEnd + 2 < aLength;
> +  char16_t charPrev, charNext, charPrevPrev, charNextNext;

You can just use `Maybe<char16_t>` for them, which combines a boolean and the value.

::: layout/generic/nsTextFrameUtils.cpp:254
(Diff revision 8)
> +        for (j = i; j + 1 < aLength && IsSpaceOrTabOrSegmentBreak(aText[j + 1]);
> +             j++) {

Nit: probably better `j = i + 1; j < aLength && Is...(aText[j]); j++`.

::: layout/generic/nsTextFrameUtils.cpp:262
(Diff revision 8)
> -          ucs4after = SURROGATE_TO_UCS4(aText[i + 1], aText[i + 2]);
> -        } else if (i + 1 < aLength) {
> -          ucs4after = aText[i + 1];
>          }
> -        if (i > 0 && IsSegmentBreakSkipChar(ucs4before) &&
> -            i + 1 < aLength && IsSegmentBreakSkipChar(ucs4after)) {
> +
> +        aOutput = TransformWhiteSpaces(aText, aLength, i, j, hasSegmentBreak,

"End" usually means one position after the last element. It seems to me using this definition could make you code here simpler.

Comment 24

5 months ago
mozreview-review
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review100910

::: layout/generic/nsTextFrameUtils.h:91
(Diff revision 8)
> +  static bool IsSegmentBreak(char16_t aCh)
> +  {
> +    return aCh == '\n' || aCh == '\r' || aCh == '\f';
> +  }
> +
> +  static bool IsSpaceOrTab(char16_t aCh)
> +  {
> +    return aCh == ' ' || aCh == '\t';
> +  }
> +
> +  static bool IsSpaceOrTabOrSegmentBreak(char16_t aCh)
> +  {
> +    return IsSpaceOrTab(aCh) || IsSegmentBreak(aCh);
> +  }

It doesn't seem to be necessary to put these functions in the header. They could just be static functions in the cpp file.

But it seems to me this header isn't included in too many places, so it probably doesn't matter that much.
(Assignee)

Comment 25

5 months ago
mozreview-review-reply
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review100908

> You can just use `Maybe<char16_t>` for them, which combines a boolean and the value.

Good idea! Thanks.

> "End" usually means one position after the last element. It seems to me using this definition could make you code here simpler.

Agree! Thanks for the feedback.
(Assignee)

Comment 26

5 months ago
mozreview-review-reply
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review100910

> It doesn't seem to be necessary to put these functions in the header. They could just be static functions in the cpp file.
> 
> But it seems to me this header isn't included in too many places, so it probably doesn't matter that much.

Agree. TransformText() and TransformWhiteSpaces() are the only user for now, so it'll be better to move them in the cpp file. We could consider to put them in the header if they have more callers in the future.
Comment hidden (mozreview-request)
(Assignee)

Comment 28

5 months ago
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

Address feedback from comment 23 and comment 24.

Per the discussion on IRC,

<p>一些\n\n\n中文</p>

should be converted to

<p>一些 中文</p>

This issue (mentioned in comment 21) has been solved in this version as well.
Attachment #8816402 - Flags: feedback?(xidorn+moz)
(Assignee)

Updated

5 months ago
Status: NEW → ASSIGNED

Comment 29

5 months ago
mozreview-review
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review101016

::: layout/generic/nsTextFrameUtils.h:99
(Diff revision 9)
> +  static char16_t* TransformWhiteSpaces(const char16_t* aText, uint32_t aLength,
> +                                        uint32_t aBegin, uint32_t aEnd,
> +                                        bool aHasSegmentBreak,
> +                                        uint32_t aCountSegmentBreak,
> +                                        bool& aInWhitespace,
> +                                        char16_t* aOutput,
> +                                        uint32_t& aFlags,
> +                                        bool& aLastCharArabic,
> +                                        CompressionMode aCompression,
> +                                        gfxSkipChars* aSkipChars);

I think this function can be a static function just in the source file as well. It doesn't seem to me any other place would need this function.

::: layout/generic/nsTextFrameUtils.cpp:41
(Diff revision 9)
>      return true;
>    }
>    return false;
>  }
>  
> +static bool IsSegmentBreak(char16_t aCh)

`static bool` should be in a separate line.

::: layout/generic/nsTextFrameUtils.cpp:46
(Diff revision 9)
> +static bool IsSegmentBreak(char16_t aCh)
> +{
> +  return aCh == '\n' || aCh == '\r' || aCh == '\f';
> +}
> +
> +static bool IsSpaceOrTab(char16_t aCh)

Ditto.

::: layout/generic/nsTextFrameUtils.cpp:51
(Diff revision 9)
> +static bool IsSpaceOrTab(char16_t aCh)
> +{
> +  return aCh == ' ' || aCh == '\t';
> +}
> +
> +static bool IsSpaceOrTabOrSegmentBreak(char16_t aCh)

Ditto.

::: layout/generic/nsTextFrameUtils.cpp:59
(Diff revision 9)
> +                                       bool aHasSegmentBreak,
> +                                       uint32_t aCountSegmentBreak,

If you can always give `aCountSegmentBreak` a meaningful value, it doesn't seem to me having an addition `aHasSegmentBreak` makes any sense.

::: layout/generic/nsTextFrameUtils.cpp:69
(Diff revision 9)
> +  Maybe<char16_t> charPrev = aBegin > 0 ? Some(aText[aBegin - 1])
> +                                        : Maybe<char16_t>(Nothing());
> +  Maybe<char16_t> charNext = aEnd < aLength ? Some(aText[aEnd])
> +                                            : Maybe<char16_t>(Nothing());
> +  Maybe<char16_t> charPrevPrev = aBegin > 1 ? Some(aText[aBegin - 2])
> +                                            : Maybe<char16_t>(Nothing());
> +  Maybe<char16_t> charNextNext = aEnd + 1 < aLength ? Some(aText[aEnd + 1])
> +                                                    : Maybe<char16_t>(Nothing());

As far as I can see, all compilers we require support writing `condition ? Some(value) : Nothing()`. The additional `Maybe<char16_t>` doesn't seem necessary to me.

::: layout/generic/nsTextFrameUtils.cpp:102
(Diff revision 9)
> +      } else {
> +        if (aInWhitespace) {

Nit: I think you can do `else if` here.

::: layout/generic/nsTextFrameUtils.cpp:114
(Diff revision 9)
> +          }
> +          *aOutput++ = ' ';
> +          aSkipChars->KeepChar();
> +        }
> +      }
> +      aLastCharArabic = IS_ARABIC_CHAR(' ');

Whitespace is not arabic char.

::: layout/generic/nsTextFrameUtils.cpp:115
(Diff revision 9)
> +      charPrevPrev = charPrev;
> +      charPrev = Some(char16_t(' '));
> +      aInWhitespace = true;

It would probably less duplicate (and easier to follow) if you just change `ch` to the transformed character and move all this logic (as well as `*aOutput++ = ch;` and `aSkipChars->KeepChar()`) to the end of the loop, so that we have the pattern that, if the char is kept, we run to the end of the loop and handle everything necessary there, otherwise, we skip the char and call `continue` to jump to next one.

::: layout/generic/nsTextFrameUtils.cpp:125
(Diff revision 9)
> +      // segment break characters
> +      if (aCompression == COMPRESS_WHITESPACE) {
> +        *aOutput++ = ch;
> +        aSkipChars->KeepChar();
> +
> +        aLastCharArabic = IS_ARABIC_CHAR(ch);

I don't think any of segment break or whitespace is arabic char. You shouldn't need to touch `aLastCharArabic` at all in this function (and thus it should be removed from parameters).

::: layout/generic/nsTextFrameUtils.cpp:191
(Diff revision 9)
> +        } else {
> +          aFlags |= TEXT_WAS_TRANSFORMED;
> +          *aOutput++ = ' ';
> +          aSkipChars->KeepChar();
> +
> +          aLastCharArabic = IS_ARABIC_CHAR(' ');

As mentioned before, this should be removed as well.

::: layout/generic/nsTextFrameUtils.cpp:266
(Diff revision 9)
> +        uint32_t countSegmentBreak = hasSegmentBreak ? 1 : 0;
> +        uint32_t j;
> +        for (j = i + 1; j < aLength && IsSpaceOrTabOrSegmentBreak(aText[j]);
> +             j++) {
> +          if (IsSegmentBreak(aText[j])) {
> +            hasSegmentBreak |= IsSegmentBreak(aText[j]);

This should be assigning `true` directly... Actually I don't think you need the variable `hasSegmentBreak` at all.
Comment hidden (mozreview-request)
(Assignee)

Comment 31

5 months ago
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

Address all the feedback in comment 29.
Attachment #8816402 - Flags: feedback?(xidorn+moz)

Comment 32

5 months ago
mozreview-review
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review101162

::: layout/generic/nsTextFrameUtils.cpp:96
(Diff revision 10)
> +                 (k + 1 >= aLength ||
> +                  !nsTextFrameUtils::IsSpaceCombiningSequenceTail(&aText[k + 1],
> +                                                                  aLength - (k + 1)))) {

This actually should happen before the `aCountSegmentBreak` check. And I think it is probably better leave this logic in TransformText. If IsSpaceCombiningSequenceTail returns true, this character is not considered as a whitespace.

You can just check the last character in the range, and exclude it if the following character is this.

You may need to take care not to pass in an empty range, nor making it an infinite loop around that character.

::: layout/generic/nsTextFrameUtils.cpp:130
(Diff revision 10)
> +          if ((charPrev.isSome() && IS_ZERO_WIDTH_SPACE(*charPrev)) ||
> +              (charNext.isSome() && IS_ZERO_WIDTH_SPACE(*charNext))) {

I think this is actually a constant once you know the range. It doesn't seem necessary to compute it for each character.

::: layout/generic/nsTextFrameUtils.cpp:151
(Diff revision 10)
> +          if (charPrev.isSome() && IsSegmentBreakSkipChar(ucs4before) &&
> +              charNext.isSome() && IsSegmentBreakSkipChar(ucs4after)) {

Ditto.

::: layout/generic/nsTextFrameUtils.cpp:183
(Diff revision 10)
> +      charPrevPrev = charPrev;
> +      charPrev = Some(ch);

I wonder why do you need to advance charPrev actually? It seems to me we only need to check the codepoints before and after the whole whitespace range. Do we need to know anything about them in the middle of the whitespace range?

::: layout/generic/nsTextFrameUtils.cpp:192
(Diff revision 10)
> +      charPrevPrev = charPrev;
> +      charPrev = Some(char16_t(' '));

Ditto.
Actually I guess you don't even need to keep the four char16_t variables. You can compute the context based on the characters, and just keep the context info. The context info should probably just:
1. whether the characters before and after are both cjk
2. whether the character before is a zero-width whitespace
3. whether the character after is a zero-width whitespace

Then do the transformation accordingly.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 36

5 months ago
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

1. Simplify the logic for segment breaks.
2. Move logic for white space with IsSpaceCombiningSequenceTail back to TransformText main function.
Attachment #8816402 - Flags: feedback?(xidorn+moz)

Comment 37

5 months ago
mozreview-review
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review101278

It looks much better now. Thanks. There are still some issues, though. After fixing them and add the tests, you can request a review.

::: layout/generic/nsTextFrameUtils.cpp:16
(Diff revision 12)
>  #include "nsStyleStruct.h"
>  #include "nsTextFragment.h"
>  #include "nsUnicharUtils.h"
>  #include <algorithm>
>  
> +#include "mozilla/Maybe.h"

It seems you don't need `Maybe` anymore.

::: layout/generic/nsTextFrameUtils.cpp:46
(Diff revision 12)
>  }
>  
> +static bool
> +IsSegmentBreak(char16_t aCh)
> +{
> +  return aCh == '\n' || aCh == '\r' || aCh == '\f';

It doesn't seem to me segment break includes `'\f'`, and it doesn't exist in the original code.

Oh, also, when I check the definition of "segment break", it seems `\r\n` is counted as **one** segment break, rather than two. This may needs some special handling in `TransformWhiteSpaces`.

::: layout/generic/nsTextFrameUtils.cpp:70
(Diff revision 12)
> +                     bool& aInWhitespace,
> +                     char16_t* aOutput,
> +                     uint32_t& aFlags,
> +                     nsTextFrameUtils::CompressionMode aCompression,
> +                     gfxSkipChars* aSkipChars)
> +{

Please add an assertion at the beginning of the function to ensure that `aCompression` is `COMPRESS_WHITESPACE` or `COMPRESS_WHITESPACE_NEWLINE`, where whitespaces are skippable.

::: layout/generic/nsTextFrameUtils.cpp:93
(Diff revision 12)
> +  if (aBegin > 0 && IsSegmentBreakSkipChar(ucs4before) &&
> +      aEnd < aLength && IsSegmentBreakSkipChar(ucs4after)) {
> +    // Discard newlines between characters that have F, W, or H
> +    // EastAsianWidth property and neither side is Hangul.
> +    isSegmentBreakSkipChar = true;
> +  } else {
> +    isSegmentBreakSkipChar = false;
> +  }

You can make the variable be initialized to the expression directly.

And rather than calling the variable `isSegmentBreakSkipChar`, I guess something like `betweenSegmentBreakSkipChars` is probably better?

::: layout/generic/nsTextFrameUtils.cpp:116
(Diff revision 12)
> +        continue;
> +      } else {

`continue` ends the control flow. The `else` here is not necessary. Please remove this `else` and deindent its content.

::: layout/generic/nsTextFrameUtils.cpp:139
(Diff revision 12)
> +      } else {
> +        // aCompression == COMPRESS_WHITESPACE_NEWLINE
> +        if (aCountSegmentBreak == 1 &&
> +            (isPrevZeroWidthSpace || isNextZeroWidthSpace ||
> +             isSegmentBreakSkipChar)) {
> +          sbIndex++;

Rather than increasing `sbIndex` in every if-branch, what about initializing it to zero, and increase it at the start of the branch for segment break?

::: layout/generic/nsTextFrameUtils.cpp:141
(Diff revision 12)
> +          continue;
> +        } else if (sbIndex == 1 && isPrevZeroWidthSpace) {

This `else` is unnecessary.

::: layout/generic/nsTextFrameUtils.cpp:147
(Diff revision 12)
> +          continue;
> +        } else if (sbIndex > 1) {

Ditto.

::: layout/generic/nsTextFrameUtils.cpp:158
(Diff revision 12)
> +          continue;
> +        } else {

Ditto.
Attachment #8816402 - Flags: feedback?(xidorn+moz) → feedback+
(Assignee)

Comment 38

5 months ago
mozreview-review-reply
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review101278

Thank you for the feedback. Will ask for another round of feedback after I finish the tests.

> It doesn't seem to me segment break includes `'\f'`, and it doesn't exist in the original code.
> 
> Oh, also, when I check the definition of "segment break", it seems `\r\n` is counted as **one** segment break, rather than two. This may needs some special handling in `TransformWhiteSpaces`.

Yes, just checked the spec and you're right. Will handlw this in the next version.

> You can make the variable be initialized to the expression directly.
> 
> And rather than calling the variable `isSegmentBreakSkipChar`, I guess something like `betweenSegmentBreakSkipChars` is probably better?

You're right. Will use betweenSegmentBreakSkipChars instead.

> Rather than increasing `sbIndex` in every if-branch, what about initializing it to zero, and increase it at the start of the branch for segment break?

Sounds good. Will do.
Comment hidden (mozreview-request)

Comment 40

5 months ago
mozreview-review
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review101528

::: layout/generic/nsTextFrameUtils.cpp:73
(Diff revision 13)
> +  bool isPrevZeroWidthSpace = aBegin > 0 &&
> +                              IS_ZERO_WIDTH_SPACE(aText[aBegin - 1]);

You can probably use "ZWSP" and state in the comment that it is zero-width space, rather than keeping such long names. ZWSP is a common abbreviation for zero-width space, so it should be fine. But keeping this as-is is also fine.

::: layout/generic/nsTextFrameUtils.cpp:99
(Diff revision 13)
> +  // EastAsianWidth property and neither side is Hangul.
> +  bool betweenSegmentBreakSkipChars =
> +    aBegin > 0 && IsSegmentBreakSkipChar(ucs4before) &&
> +    aEnd < aLength && IsSegmentBreakSkipChar(ucs4after);
> +
> +  // sbIndex is a segment break index between [1, aCountSegmentBreak].

This comment is not incorrect now. It should be [0, aCountSegmentBreak].

::: layout/generic/nsTextFrameUtils.cpp:105
(Diff revision 13)
> +    bool isCRLFSequence =
> +      ch == '\r' && k + 1 < aEnd && aText[k + 1] == '\n';

Viewing your code below, I think it would be easier to record "segment break length", which is two for crlf and one elsewise, instead of "is crlf sequence". Then you can do `sbIndex += sbLength;` and `aSkipChars->SkipChars(sbLength);` and `aCountSegmentBreak == sbLength` instead of lots of additional condition branches.

::: layout/generic/nsTextFrameUtils.cpp:287
(Diff revision 13)
>          if (IsDiscardable(ch, &flags)) {
>            aSkipChars->SkipChar();

Ah, you probably need to consider discardables in space range as well. It is probably not important, though.
Comment hidden (mozreview-request)
OK, now the spec has been updated to handle all segment breaks together [1], so I guess you can significantly simplify your code. You don't need to know the total number of segment breaks, and you don't need to handle crlf sequence.

[1] https://github.com/w3c/csswg-drafts/issues/836
(Assignee)

Comment 43

5 months ago
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #42)
> OK, now the spec has been updated to handle all segment breaks together [1],
> so I guess you can significantly simplify your code. You don't need to know
> the total number of segment breaks, and you don't need to handle crlf
> sequence.
> 
> [1] https://github.com/w3c/csswg-drafts/issues/836

Nice!! Thank you for driving this spec update. :)
Comment hidden (mozreview-request)

Comment 45

5 months ago
mozreview-review
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review101570

::: layout/generic/nsTextFrameUtils.cpp:133
(Diff revision 15)
> +        if (isFollowingSB ||
> +            isPrevZWSP || isNextZWSP || betweenSegmentBreakSkipChars) {

I think, you can use a single boolean flag `isSegmentBreakSkippable` which is initialized by `isPrevZWSP || isNextZWSP || betweenSegmentBreakSkipChars`, and assigned to `true` after the first segment break.

With the single boolean variable, you can probably do something like
```c++
bool isSegmentBreakSkippable =
  (aBegin > 0 && IS_ZERO_WIDTH_SPACE(aText[aBegin - 1])) ||
  (aEnd < aLength && IS_ZERO_WIDTH_SPACE(aText[aEnd]));
if (!isSegmentBreakSkippable) {
  // ... Compute ucs4before and ucs4after, then
  isSegmentBreakSkippable =
    aBegin > 0 && IsSegmentBreakSkipChar(ucs4before) &&
    aEnd < aLength && IsSegmentBreakSkipChar(ucs4after);
}
```

::: layout/generic/nsTextFrameUtils.cpp:135
(Diff revision 15)
> +        // collapsible segment break is removed.  Then the remaining
> +        // segment breaks are either transformed into a space (U+0020)
> +        // or removed depending on the context before and after the break.
> +        if (isFollowingSB ||
> +            isPrevZWSP || isNextZWSP || betweenSegmentBreakSkipChars) {
> +          isFollowingSB = true;

It doesn't seem to me you need to assign it here.

::: layout/generic/nsTextFrameUtils.cpp:139
(Diff revision 15)
> +            isPrevZWSP || isNextZWSP || betweenSegmentBreakSkipChars) {
> +          isFollowingSB = true;
> +          aSkipChars->SkipChar();
> +          continue;
> +        }
> +        keepTransformedWhiteSpace = isFollowingSB = true;

I'd prefer splitting them into two lines.

::: layout/generic/nsTextFrameUtils.cpp:143
(Diff revision 15)
> +    if (IsDiscardable(ch, &aFlags)) {
> +      aSkipChars->SkipChar();
> +      continue;

Discardable should be checked before space and segment break, otherwise the logic above is wrong.

::: layout/generic/nsTextFrameUtils.cpp:225
(Diff revision 15)
> -            (i + 1 < aLength && IS_ZERO_WIDTH_SPACE(aText[i + 1]))) {
> -          aSkipChars->SkipChar();
> -          continue;
> +      // second pass loop.
> +      if (IsSpaceOrTabOrSegmentBreak(ch)) {
> +        bool keepLastSpace = false;
> +        uint32_t countSegmentBreak = IsSegmentBreak(ch) ? 1 : 0;
> +        uint32_t j;
> +        for (j = i + 1; j < aLength && IsSpaceOrTabOrSegmentBreak(aText[j]);

The condition should be `IsSpaceOrTabOrSegmentBreak(aText[j]) || IsDiscardable(aText[j])` now.

Remember to pop out all tailing discardables before checking combining sequence tail.
(Assignee)

Comment 46

5 months ago
mozreview-review-reply
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review101570

> It doesn't seem to me you need to assign it here.

With the change you suggested above, I think we can remove this bool now.

> Discardable should be checked before space and segment break, otherwise the logic above is wrong.

Yes, you're right. I'm moving it to the top of the whole for-loop in TransformWhiteSpaces(), right before the IsSpaceOrTab(ch). So, we can ensure we skip all discardables.

> The condition should be `IsSpaceOrTabOrSegmentBreak(aText[j]) || IsDiscardable(aText[j])` now.
> 
> Remember to pop out all tailing discardables before checking combining sequence tail.

I'm not sure if I understand you correctly.

The reason why we should include IsDiscardable in the TransformWhiteSpaces is that some of the segment breaks might also be discardables, right?
If so, the discardables should be included in the IsSpaceOrTabOrSegmentBreak condition already, why bother adding IsDiscardable in the counting for-loop?
I thought we could just leave the counting-loop as it was, and just move
```c++
if (IsDiscardable(ch, &aFlags)) {
  aSkipChars->SkipChar();
  continue;
}
```
to be the first condition in the for-loop in TransformWhiteSpaces, no?
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Updated

5 months ago
Attachment #8821977 - Flags: feedback?(xidorn+moz)

Comment 49

5 months ago
mozreview-review
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review101592

::: layout/generic/nsTextFrameUtils.cpp:106
(Diff revision 16)
> +      continue;
> +    } else if (IsSpaceOrTab(ch)) {

`else` is not needed after `continue`.

::: layout/generic/nsTextFrameUtils.cpp:222
(Diff revision 16)
> -      } else if (ch == '\n' && aCompression == COMPRESS_WHITESPACE_NEWLINE) {
> -        if ((i > 0 && IS_ZERO_WIDTH_SPACE(aText[i - 1])) ||
> -            (i + 1 < aLength && IS_ZERO_WIDTH_SPACE(aText[i + 1]))) {
> -          aSkipChars->SkipChar();
> -          continue;
> +      // of spaces/tabs/segment breaks is collected in a first pass loop, then
> +      // we apply the collapsing and transformation rules to this run in a
> +      // second pass loop.
> +      if (IsSpaceOrTabOrSegmentBreak(ch)) {
> +        bool keepLastSpace = false;
> +        uint32_t countSegmentBreak = IsSegmentBreak(ch) ? 1 : 0;

You don't need to count segment break anymore, justa boolean variable `hasSegmentBreak` is enough.

::: layout/generic/nsTextFrameUtils.cpp:251
(Diff revision 16)
> -          // EastAsianWidth property and neither side is Hangul.
> -          aSkipChars->SkipChar();
> -          continue;
> +        continue;
> -        }
> -        nowInWhitespace = true;
>        } else {

And this `else` is no longer needed anymore given it immediately follows a `continue`.

Comment 50

5 months ago
mozreview-review-reply
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review101570

> I'm not sure if I understand you correctly.
> 
> The reason why we should include IsDiscardable in the TransformWhiteSpaces is that some of the segment breaks might also be discardables, right?
> If so, the discardables should be included in the IsSpaceOrTabOrSegmentBreak condition already, why bother adding IsDiscardable in the counting for-loop?
> I thought we could just leave the counting-loop as it was, and just move
> ```c++
> if (IsDiscardable(ch, &aFlags)) {
>   aSkipChars->SkipChar();
>   continue;
> }
> ```
> to be the first condition in the for-loop in TransformWhiteSpaces, no?

No, segment breaks would never be discardables. Discardables are always discarded, and thus not considered as part of the text at all. But you need to skip them correctly, and handle the rest of the text as if those characters do not exist at all.

e.g. if I have `ZWSP LF RLO LF`, in which `RLO` is a discardable (bidi control character). In that case, both line feeds should be discarded because of the preceding zero-width space, becuase text should be handled as if `RLO` doesn't exist.

If you don't include discardables in the range, you would fail to handle this case correctly. Similiarly, if I have `CJK LF RLO LF CJK`, the line feeds need to be removed as well.

I said that you need to remove tailing discardables before checking space combining sequence tail because there can be cases like `WS RLO RLO RLO [combining char]`, and if you use `IsSpaceOrTabOrSegmentBreak || IsDiscardable`, you would incorrectly check the last `RLO` for space combining sequence tail rather than the last whitespace.

Comment 51

5 months ago
mozreview-review
Comment on attachment 8821977 [details]
Bug 1316482 - add reftests part 1.

https://reviewboard.mozilla.org/r/101056/#review101596

::: layout/reftests/w3c-css/submitted/text3/text-segment-break-transformation-1.html:26
(Diff revision 1)
> +<div>Test passes if there is <b>one</b> white space between 2nd and 3rd CJK character.
> +<!--Some[LF]Hangul-->
> +<p>&#x4e00;&#x4e9b;&#x000a;&#xc5b8;&#xbb38;</p>

Please split this into an individual test.
And to make the whitespace more obvious, you can consider using Ahem as the first font family in your tests.
Comment on attachment 8821977 [details]
Bug 1316482 - add reftests part 1.

I think there are much more cases that you can test.

There could be two sets of tests: 

The first sets test different combinations of characters at two sides of segment breaks. More specifically, there are seven types of characters involve in these rules:
1. East Asian Full-width (F)
2. East Asian Half-width (H)
3. East Asian Wide (W) except Hangul
4. East Asian Narrow (Na)
5. East Asian Ambiguous (A)
6. Not East Asian (Neutral)
7. Hangul

So there are 49 different combinations. You can write a script to generate those tests and their reference files.

The seond sets test different sequence of sequence breaks / whitespaces / zero-width spaces between two characters. It seems your existing tests are mostly in this set. But I hope you can describe in "assert" of each test what the test is supposed to check exactly, e.g. "This test checks that multiple segment breaks should be removed altogether when both sides of them are Chinese characters", rather than a general sentence for all tests.
Attachment #8821977 - Flags: feedback?(xidorn+moz)
(Assignee)

Comment 54

5 months ago
mozreview-review-reply
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review101570

> No, segment breaks would never be discardables. Discardables are always discarded, and thus not considered as part of the text at all. But you need to skip them correctly, and handle the rest of the text as if those characters do not exist at all.
> 
> e.g. if I have `ZWSP LF RLO LF`, in which `RLO` is a discardable (bidi control character). In that case, both line feeds should be discarded because of the preceding zero-width space, becuase text should be handled as if `RLO` doesn't exist.
> 
> If you don't include discardables in the range, you would fail to handle this case correctly. Similiarly, if I have `CJK LF RLO LF CJK`, the line feeds need to be removed as well.
> 
> I said that you need to remove tailing discardables before checking space combining sequence tail because there can be cases like `WS RLO RLO RLO [combining char]`, and if you use `IsSpaceOrTabOrSegmentBreak || IsDiscardable`, you would incorrectly check the last `RLO` for space combining sequence tail rather than the last whitespace.

Thank you for the explaination. Will address this in the next version.
(Assignee)

Comment 55

5 months ago
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #53)
> Comment on attachment 8821977 [details]
> Bug 1316482 - add tests for CSS Text Segment Break Transformation Rules to
> w3c-css reftests.
> 
> I think there are much more cases that you can test.
> 
> There could be two sets of tests: 
> 
> The first sets test different combinations of characters at two sides of
> segment breaks. More specifically, there are seven types of characters
> involve in these rules:
> 1. East Asian Full-width (F)
> 2. East Asian Half-width (H)
> 3. East Asian Wide (W) except Hangul
> 4. East Asian Narrow (Na)
> 5. East Asian Ambiguous (A)
> 6. Not East Asian (Neutral)
> 7. Hangul
> 
> So there are 49 different combinations. You can write a script to generate
> those tests and their reference files.
> 
> The seond sets test different sequence of sequence breaks / whitespaces /
> zero-width spaces between two characters. It seems your existing tests are
> mostly in this set. But I hope you can describe in "assert" of each test
> what the test is supposed to check exactly, e.g. "This test checks that
> multiple segment breaks should be removed altogether when both sides of them
> are Chinese characters", rather than a general sentence for all tests.

Fair enough. I'll write a script to generate the first sets you mentioned above.

As to the second set, according to your feedback, I'm planning to separate them into two sub-sets.
Both sub-sets consist of sequence of [segment breaks / whitespaces / zero-width spaces] between two characters. One sub-set is for removable segment breaks, in which the segment breaks between two characters should be removed. The other sub-set is for unremovable segment breaks, in which the segment breaks between two characters should be converted to a space (U+0020). And, of course, the assert description should be more specific to represent the test.
(Assignee)

Comment 56

5 months ago
Created attachment 8823165 [details] [diff] [review]
(wip) use script to generate tests for segment break rules.

MozReview-Commit-ID: Lh1w5ZDQfN1
(Assignee)

Comment 57

5 months ago
Comment on attachment 8823165 [details] [diff] [review]
(wip) use script to generate tests for segment break rules.

Hi xidorn, please take a look at this script. Thanks.

I'm not sure if I'm choosing the right characters for all the seven types of East Asian Characters.

Also, I've met two problems with this script:

1. the 'narrow' is not shown correct on the test page, try 004.html, not sure if I'm using the wrong unicode or something.

2. with fullwidth characters previous to the segment break, the segment break seem to be converted to a fullwidth white space, which may be different with that of the reference page, a halfwidth space.
Attachment #8823165 - Flags: feedback?(xidorn+moz)
Comment on attachment 8823165 [details] [diff] [review]
(wip) use script to generate tests for segment break rules.

Review of attachment 8823165 [details] [diff] [review]:
-----------------------------------------------------------------

Overall it looks good. One issue is that the testcases say "between the two characters", but your CHAR_SET actually contains multiple characters for each character set. You can either adjust the statement in the testcases, or trim the characters of each set into just one.

::: layout/reftests/w3c-css/submitted/text3/generate-segment-break-rule-tests.py
@@ +30,5 @@
> +<meta name="assert" content="'segment-break-transformation-rules: with {prev}/{next} in front/back of the semgment break.">
> +<link rel="stylesheet" type="text/css" href="support/ahem.css" />
> +<link rel="match" href="segment-break-transformation-rules-{index:03}-ref.html">
> +<style> p {{ font-family: ahem; }} </style>
> +<div>Pass if there is '{char}' white space between the two characters below.

Why do you need quotes around the word here? Also calling it "char" sounds weird. probably something like "result" or "expect"?

@@ +60,5 @@
> +
> +CHAR_SET = [
> +        ('East Asian Full-width (F)',         'FULLWIDTH'),
> +        ('East Asian Half-width (H)',         'テスト'),
> +        ('East Asian Wide (W) except Hangul', '測試'), 

Tailing whitespace here.
Attachment #8823165 - Flags: feedback?(xidorn+moz) → feedback+
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 62

5 months ago
(In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #58)
> Comment on attachment 8823165 [details] [diff] [review]
> (wip) use script to generate tests for segment break rules.
> 
> Review of attachment 8823165 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Overall it looks good. One issue is that the testcases say "between the two
> characters", but your CHAR_SET actually contains multiple characters for
> each character set. You can either adjust the statement in the testcases, or
> trim the characters of each set into just one.

Adjust the statement in the testcases.

> :::
> layout/reftests/w3c-css/submitted/text3/generate-segment-break-rule-tests.py
> @@ +30,5 @@
> > +<meta name="assert" content="'segment-break-transformation-rules: with {prev}/{next} in front/back of the semgment break.">
> > +<link rel="stylesheet" type="text/css" href="support/ahem.css" />
> > +<link rel="match" href="segment-break-transformation-rules-{index:03}-ref.html">
> > +<style> p {{ font-family: ahem; }} </style>
> > +<div>Pass if there is '{char}' white space between the two characters below.
> 
> Why do you need quotes around the word here? Also calling it "char" sounds
> weird. probably something like "result" or "expect"?

Remove the quotes and use "expect" instead of "char".

> @@ +60,5 @@
> > +
> > +CHAR_SET = [
> > +        ('East Asian Full-width (F)',         'FULLWIDTH'),
> > +        ('East Asian Half-width (H)',         'テスト'),
> > +        ('East Asian Wide (W) except Hangul', '測試'), 
> 
> Tailing whitespace here.

Okay.

Thank you for the feedback. :)
I was going to review the patch here, but it looks like the try run https://treeherder.mozilla.org/#/jobs?repo=try&revision=19232472c75a shows a bunch of failures on existing reftests, which suggests something isn't behaving quite right.... are you working on an updated version to address this?

Comment 64

5 months ago
mozreview-review
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review102668

A few small nits based on reading through the current patch; however, the reftest failures are the real problem at the moment, I guess. Clearing r? flag for now, but please re-request when ready.

::: layout/generic/nsTextFrameUtils.cpp:100
(Diff revision 17)
> +    isSegmentBreakSkippable =
> +      aBegin > 0 && IsSegmentBreakSkipChar(ucs4before) &&
> +      aEnd < aLength && IsSegmentBreakSkipChar(ucs4after);
> +  }
> +
> +  for (uint32_t k = aBegin; k < aEnd; ++k) {

A small nit here - please use `i` rather than `k` as the loop index variable, just because it's more conventional.

::: layout/generic/nsTextFrameUtils.cpp:234
(Diff revision 17)
> -            NS_IS_HIGH_SURROGATE(aText[i + 1]) &&
> -            NS_IS_LOW_SURROGATE(aText[i + 2])) {
> +        // Exclude tailing discardables before checking space combining
> +        // sequence tail.

Call them "trailing" (not "tailing") discardables, please, in this comment and in the name of the `count...` variable (about 4 occurrences).

::: layout/generic/nsTextFrameUtils.cpp:239
(Diff revision 17)
> -            NS_IS_HIGH_SURROGATE(aText[i + 1]) &&
> -            NS_IS_LOW_SURROGATE(aText[i + 2])) {
> -          ucs4after = SURROGATE_TO_UCS4(aText[i + 1], aText[i + 2]);
> -        } else if (i + 1 < aLength) {
> -          ucs4after = aText[i + 1];
> +        // Exclude tailing discardables before checking space combining
> +        // sequence tail.
> +        for (; IsDiscardable(aText[j - 1], &flags); j--) {
> +          countTailingDiscardables++;
> +        }
> +        // If the last white space is combining a sequence tail, exclude it

I think this should read something like

    // If the last white space is followed by a combining sequence tail, ...

::: layout/generic/nsTextFrameUtils.cpp:252
(Diff revision 17)
> +          aOutput = TransformWhiteSpaces(aText, aLength, i, j, hasSegmentBreak,
> +                                         inWhitespace, aOutput, flags,
> +                                         aCompression, aSkipChars);
> +        }
> +        // We need to keep KeepChar()/SkipChar() in order, so process the
> +        // last white space first, then process the tailing discardables.

s/tailing/trailing/
Attachment #8816402 - Flags: review?(jfkthame)
(Assignee)

Comment 65

5 months ago
(In reply to Jonathan Kew (:jfkthame) from comment #63)
> I was going to review the patch here, but it looks like the try run
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=19232472c75a shows a
> bunch of failures on existing reftests, which suggests something isn't
> behaving quite right.... are you working on an updated version to address
> this?

Oops, thank you for reminding me this. I haven't checked the try result yet. I'll check these failures today.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 69

5 months ago
mozreview-review
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review102924

With all your comments been addressed. From the latest try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d56a60743517 this patch doesn't regress any previous test now. However, it seems the two tests sets that I have added are failed. But, they all passed on my local. Not sure if this is because I've installed ahem font on my local, but not on try machine? Please enlighten me a bit if I do something wrong while writing these tests. Thank you.
(In reply to Jeremy Chen [:jeremychen] UTC+8 from comment #69)
> Not
> sure if this is because I've installed ahem font on my local, but not on try
> machine? Please enlighten me a bit if I do something wrong while writing
> these tests. Thank you.

It looks like a bunch of the reference files are lacking the link

  <link rel="stylesheet" type="text/css" href="support/ahem.css" />

that would load the Ahem font from the reftest support directory, whereas it is present in the corresponding testcases. So then you get Ahem used in the test but not the reference, resulting in a mismatch.

(After fixing this, you might want to uninstall your local copy of Ahem and check that the tests still pass for you locally.)
(Assignee)

Comment 71

5 months ago
(In reply to Jonathan Kew (:jfkthame) from comment #70)
> (In reply to Jeremy Chen [:jeremychen] UTC+8 from comment #69)
> > Not
> > sure if this is because I've installed ahem font on my local, but not on try
> > machine? Please enlighten me a bit if I do something wrong while writing
> > these tests. Thank you.
> 
> It looks like a bunch of the reference files are lacking the link
> 
>   <link rel="stylesheet" type="text/css" href="support/ahem.css" />
> 
> that would load the Ahem font from the reftest support directory, whereas it
> is present in the corresponding testcases. So then you get Ahem used in the
> test but not the reference, resulting in a mismatch.
> 
> (After fixing this, you might want to uninstall your local copy of Ahem and
> check that the tests still pass for you locally.)

Thank you!!! You're right.
After I uninstalled my local copy of Ahem, the tests failed just like try.
Then, I added the link, the tests all passed.
Updated patch is on the way.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
The current patch changes existing behavior in a way that concerns me a bit: it causes an isolated CR character (U+000D) in 16-bit text to be treated as a space, whereas currently a lone CR is discarded.

Testcase:

  data:text/html;charset=utf-8,%D1%84%D0%BE%D0%BE&%23xd;%D0%B1%D0%B0%D1%80

Currently, this will render as "фообар" with no visible space, but with the patch, it will become "фоо бар".

AFAICS from https://drafts.csswg.org/css-text-3/#white-space-processing, the change is correct (in the sense that it matches the spec draft): CR is treated as a segment break, just like LF or CRLF, and therefore transformed to <space>.

But I also notice that webkit/blink/edge all appear to match gecko's current behavior (lone CR is discarded), which makes me wonder about web compatibility, browser interop, etc.

I've filed https://github.com/w3c/csswg-drafts/issues/855 to see if I can get some clarity here.
One concern I have is that this patch extensively reworks the 16-bit version of TransformText, but leaves the 8-bit version untouched. In principle, they should behave exactly the same, except that the 8-bit version can skip checking for a bunch of things (ZWSP, CJK characters, Arabic characters, etc.) that can't occur in 8-bit text. But that's purely an optimization.

However, with the refactored 16-bit method here, it becomes really hard to tell by reading the code whether they do in fact share the same behavior (disregarding the features that can only occur in 16-bit text). And as it turns out, they don’t: the behavior of a lone CR is changed in the 16-bit version, as noted above.

What I'd suggest, therefore, is that we add a further patch that converts the refactored 16-bit method here into a template method that can be used for both char16_t and uint8_t text, and drop the old 8-bit method altogether. It should be possible for the compiler to optimize away the extra 16-bit-only stuff that can never be true in the 8-bit version of the method, and this will mean we only have one version of the TransformText logic to maintain in future.

I pushed a try job https://treeherder.mozilla.org/#/jobs?repo=try&revision=7f588382f702f3d2bbb901643fa37827ce63fbdc that includes these patches, plus a followup to do what I am suggesting here. Because this makes the 8-bit code behave like the 16-bit code, it caused an existing reftest (941940-1.html) to fail, as that test specifically includes the lone-CR case (but only in an 8-bit context). So this highlights that we need to confirm whether the change is in fact desired, and harmonise the 8- and 16-bit code paths.
(Assignee)

Comment 77

5 months ago
(In reply to Jonathan Kew (:jfkthame) from comment #76)
> One concern I have is that this patch extensively reworks the 16-bit version
> of TransformText, but leaves the 8-bit version untouched. In principle, they
> should behave exactly the same, except that the 8-bit version can skip
> checking for a bunch of things (ZWSP, CJK characters, Arabic characters,
> etc.) that can't occur in 8-bit text. But that's purely an optimization.
> 
> However, with the refactored 16-bit method here, it becomes really hard to
> tell by reading the code whether they do in fact share the same behavior
> (disregarding the features that can only occur in 16-bit text). And as it
> turns out, they don’t: the behavior of a lone CR is changed in the 16-bit
> version, as noted above.
> 
> What I'd suggest, therefore, is that we add a further patch that converts
> the refactored 16-bit method here into a template method that can be used
> for both char16_t and uint8_t text, and drop the old 8-bit method
> altogether. It should be possible for the compiler to optimize away the
> extra 16-bit-only stuff that can never be true in the 8-bit version of the
> method, and this will mean we only have one version of the TransformText
> logic to maintain in future.
> 
> I pushed a try job
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=7f588382f702f3d2bbb901643fa37827ce63fbdc that
> includes these patches, plus a followup to do what I am suggesting here.
> Because this makes the 8-bit code behave like the 16-bit code, it caused an
> existing reftest (941940-1.html) to fail, as that test specifically includes
> the lone-CR case (but only in an 8-bit context). So this highlights that we
> need to confirm whether the change is in fact desired, and harmonise the 8-
> and 16-bit code paths.

Sounds like a good idea. I'll upload a followup patch that uses the refactored method as a template function for both char16_t and uint8_t text. Then, we shall wait for https://github.com/w3c/csswg-drafts/issues/855 (thank you for filing this.)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 82

5 months ago
Comment on attachment 8823165 [details] [diff] [review]
(wip) use script to generate tests for segment break rules.

Thanks to xidorn's feedback. This patch has been included in the ready-for-review patch set.
Attachment #8823165 - Attachment is obsolete: true
(In reply to Jeremy Chen [:jeremychen] UTC+8 from comment #77)
> Then, we shall wait for https://github.com/w3c/csswg-drafts/issues/855
> (thank you for filing this.)

We could still move forward here, while we wait for feedback/resolution from the CSSWG. What I'd suggest is that we proceed with the refactoring etc here but ensure that we don't change existing behavior for lone CR (so that will require a minor change to the patch, though I haven't analyzed exactly where it needs to happen). This means we should still pass the existing 941940-1.html reftest (which failed on my try push), and we'll continue to match the behavior of other browsers.

Then, if the WG really wants us to change behavior to match the spec text (rather than change the spec to match existing reality) we can deal with that in a separate bug.
(Assignee)

Comment 84

5 months ago
(In reply to Jonathan Kew (:jfkthame) from comment #83)
> (In reply to Jeremy Chen [:jeremychen] UTC+8 from comment #77)
> > Then, we shall wait for https://github.com/w3c/csswg-drafts/issues/855
> > (thank you for filing this.)
> 
> We could still move forward here, while we wait for feedback/resolution from
> the CSSWG. What I'd suggest is that we proceed with the refactoring etc here
> but ensure that we don't change existing behavior for lone CR (so that will
> require a minor change to the patch, though I haven't analyzed exactly where
> it needs to happen). This means we should still pass the existing
> 941940-1.html reftest (which failed on my try push), and we'll continue to
> match the behavior of other browsers.
> 
> Then, if the WG really wants us to change behavior to match the spec text
> (rather than change the spec to match existing reality) we can deal with
> that in a separate bug.

Sounds good to me.

I retraced the original logic and found that a lone CR is not discarded but kept instead. It seems that the kept lone CR is rendered as an empty string, which is just like it is discarded. Anyway, I put the logic back and leave some comment. Please see the updated version.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(In reply to Jeremy Chen [:jeremychen] UTC+8 from comment #84)
> I retraced the original logic and found that a lone CR is not discarded but
> kept instead. It seems that the kept lone CR is rendered as an empty string,
> which is just like it is discarded.

OK, that seems reasonable; CR is one of the handful of control characters that is NOT required to be rendered as a hexbox but as zero-width invisible, assuming it is not present as an actual glyph in the font.

Comment 90

5 months ago
mozreview-review
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review103520

LGTM. (As Xidorn has been providing great feedback and guidance throughout this bug -- thanks! -- I'm basically just rubber-stamping his review here.)
Attachment #8816402 - Flags: review?(jfkthame) → review+

Comment 91

5 months ago
mozreview-review
Comment on attachment 8824330 [details]
Bug 1316482 - use the refactored TransformText as a template function for both char16_t and uint8_t text.

https://reviewboard.mozilla.org/r/102856/#review103522

This is basically the patch I originally wrote in comment 76, and I shouldn't review my own code, so bouncing this one to Xidorn.
Attachment #8824330 - Flags: review?(xidorn+moz)

Comment 92

5 months ago
mozreview-review
Comment on attachment 8824330 [details]
Bug 1316482 - use the refactored TransformText as a template function for both char16_t and uint8_t text.

https://reviewboard.mozilla.org/r/102856/#review103524
Attachment #8824330 - Flags: review?(jfkthame)

Comment 93

5 months ago
mozreview-review
Comment on attachment 8821977 [details]
Bug 1316482 - add reftests part 1.

https://reviewboard.mozilla.org/r/101056/#review103528
Attachment #8821977 - Flags: review?(jfkthame) → review+

Comment 94

5 months ago
mozreview-review
Comment on attachment 8823229 [details]
Bug 1316482 - add reftests part 2.

https://reviewboard.mozilla.org/r/101798/#review103530
Attachment #8823229 - Flags: review?(jfkthame) → review+
(In reply to Jonathan Kew (:jfkthame) from comment #90)
> LGTM. (As Xidorn has been providing great feedback and guidance throughout
> this bug -- thanks! -- I'm basically just rubber-stamping his review here.)

Wait... I didn't take the same level of carefulness for responsing feedbacks as I do for reviewing, so it may probably be better that you check the patch again rather than rubber-stamping my f+ as r+.
Flags: needinfo?(jfkthame)

Comment 96

5 months ago
mozreview-review
Comment on attachment 8824330 [details]
Bug 1316482 - use the refactored TransformText as a template function for both char16_t and uint8_t text.

https://reviewboard.mozilla.org/r/102856/#review103622

The main overhead added by unifying these two functions is scanning whitespaces twice (even three times at worst), which is not necessary for `uint8_t*` because neither ZWSP nor CJK characters can present in an ASCII or Latin-1 string. But I guess that overhead may be hard to avoid... It's probably okay since the first loop should be fast, and it is not a time-complexity regression.

r=me with the following issues addressed.

::: layout/generic/nsTextFrameUtils.cpp:74
(Diff revision 2)
> +  // For 8-bit text (sizeof CharT == 1), the checks here should get optimized
> +  // out, and isSegmentBreakSkippable will always be false.

This is misleading. `isSegmentBreakSkippable` will be set to true after the first segment break.

::: layout/generic/nsTextFrameUtils.cpp:175
(Diff revision 2)
> -char16_t*
> -nsTextFrameUtils::TransformText(const char16_t* aText, uint32_t aLength,
> -                                char16_t* aOutput,
> +template<class CharT>
> +CharT*
> +nsTextFrameUtils::TransformText(const CharT* aText, uint32_t aLength,
> +                                CharT* aOutput,
>                                  CompressionMode aCompression,
>                                  uint8_t* aIncomingFlags,
>                                  gfxSkipChars* aSkipChars,
>                                  uint32_t* aAnalysisFlags)

`nsTextFrameUtils::TransformText` template is part of public API of `nsTextFrameUtils`, while its function body is not available in the header. It may stop working (fail to resolve symbol in link time) once its callsites are moved to a different translation unit (e.g. moved to a different unified source file).

It is probably better explicit instantiating this function template with `uint8_t` and `char16_t` via adding the following lines after the function definition:
```c++
template uint8_t*
nsTextFrameUtils::TransformText(const uint8_t* aText, uint32_t aLength,
                                uint8_t* aOutput,
                                CompressionMode aCompression,
                                uint8_t* aIncomingFlags,
                                gfxSkipChars* aSkipChars,
                                uint32_t* aAnalysisFlags);
template char16_t*
nsTextFrameUtils::TransformText(const char16_t* aText, uint32_t aLength,
                                char16_t* aOutput,
                                CompressionMode aCompression,
                                uint8_t* aIncomingFlags,
                                gfxSkipChars* aSkipChars,
                                uint32_t* aAnalysisFlags);                        
```

::: layout/generic/nsTextFrameUtils.cpp:254
(Diff revision 2)
>            countTrailingDiscardables++;
>          }
>          // If the last white space is followed by a combining sequence tail,
>          // exclude it from the range of TransformWhiteSpaces.
> -        if (aText[j - 1] == ' ' && j < aLength &&
> -            IsSpaceCombiningSequenceTail(&aText[j], aLength - j)) {
> +        if (sizeof(CharT) > 1 && aText[j - 1] == ' ' && j < aLength &&
> +            IsSpaceCombiningSequenceTail((const char16_t*)&aText[j],

Rather than doing the casting, I'd prefer adding a constexpr `IsSpaceCombiningSequenceTail` overload for `const uint8_t*` which returns false directly. I hope we can avoid casting as much as possible when there is any other trivial alternative.
Attachment #8824330 - Flags: review?(xidorn+moz) → review+
(Assignee)

Comment 97

5 months ago
mozreview-review-reply
Comment on attachment 8824330 [details]
Bug 1316482 - use the refactored TransformText as a template function for both char16_t and uint8_t text.

https://reviewboard.mozilla.org/r/102856/#review103622

> This is misleading. `isSegmentBreakSkippable` will be set to true after the first segment break.

You're right. This comment should be corrected.

> Rather than doing the casting, I'd prefer adding a constexpr `IsSpaceCombiningSequenceTail` overload for `const uint8_t*` which returns false directly. I hope we can avoid casting as much as possible when there is any other trivial alternative.

Will do. Thank you for the review.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 102

5 months ago
Try looks good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=aef9da2174d9

Will not land this until get another round of r+ from Jonathan.

Comment 103

5 months ago
mozreview-review
Comment on attachment 8824330 [details]
Bug 1316482 - use the refactored TransformText as a template function for both char16_t and uint8_t text.

https://reviewboard.mozilla.org/r/102856/#review103630

::: layout/generic/nsTextFrameUtils.h:130
(Diff revision 3)
> -
> -  static uint8_t* TransformText(const uint8_t* aText, uint32_t aLength,
> +  template uint8_t*
> +  TransformText(const uint8_t* aText, uint32_t aLength,
> -                                uint8_t* aOutput,
> +                uint8_t* aOutput,
> -                                CompressionMode aCompression,
> +                CompressionMode aCompression,
> -                                uint8_t* aIncomingFlags,
> +                uint8_t* aIncomingFlags,
> -                                gfxSkipChars* aSkipChars,
> +                gfxSkipChars* aSkipChars,
> -                                uint32_t* aAnalysisFlags);
> +                uint32_t* aAnalysisFlags);
> +  template char16_t*
> +  TransformText(const char16_t* aText, uint32_t aLength,
> +                char16_t* aOutput,
> +                CompressionMode aCompression,
> +                uint8_t* aIncomingFlags,
> +                gfxSkipChars* aSkipChars,
> +                uint32_t* aAnalysisFlags);

Add them after the function body in .cpp file is enough. No need to add them here.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 108

5 months ago
mozreview-review
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review103794

OK, I looked through this again and am happy to r+ it, with one minor suggestion for a tweak.

::: layout/generic/nsTextFrameUtils.cpp:95
(Diff revision 23)
> +    isSegmentBreakSkippable =
> +      aBegin > 0 && IsSegmentBreakSkipChar(ucs4before) &&
> +      aEnd < aLength && IsSegmentBreakSkipChar(ucs4after);

It's a minor thing, and maybe the compiler would figure it out and optimize anyway, but... the multiple checks for `aBegin > 0` and `aEnd < aLength` in this block seem unfortunate.

If either of these conditions fails, we aren't going to touch `isSegmentBreakSkippable` at all; therefore, we could move these tests into the initial `if (...)` so that we ignore the whole block if they're not true. That will allow you to remove these checks within the block.
(Assignee)

Comment 109

5 months ago
mozreview-review-reply
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97160/#review103794

> It's a minor thing, and maybe the compiler would figure it out and optimize anyway, but... the multiple checks for `aBegin > 0` and `aEnd < aLength` in this block seem unfortunate.
> 
> If either of these conditions fails, we aren't going to touch `isSegmentBreakSkippable` at all; therefore, we could move these tests into the initial `if (...)` so that we ignore the whole block if they're not true. That will allow you to remove these checks within the block.

Sounds good. Will do.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 114

5 months ago
try looks good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b94781367b1f

Jonathan, thank you for the review.
Xidorn, thank you for providing lots of great feedback and guidance.
Let's ship it!!! \ o /
Flags: needinfo?(jfkthame)

Comment 115

5 months ago
Pushed by jichen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ebd0c6c8b783
remove collapsible white spaces according to the White Space Processing Rules. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/b54126cc63d4
add reftests part 1. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/2a28dc0a75d3
add reftests part 2. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/ad208e73ab6c
use the refactored TransformText as a template function for both char16_t and uint8_t text. r=xidorn

Comment 116

5 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/ebd0c6c8b783
https://hg.mozilla.org/mozilla-central/rev/b54126cc63d4
https://hg.mozilla.org/mozilla-central/rev/2a28dc0a75d3
https://hg.mozilla.org/mozilla-central/rev/ad208e73ab6c
Status: ASSIGNED → RESOLVED
Last Resolved: 5 months ago
status-firefox53: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
When imported to the csswg-test repository, all of the tests gave the error:

remote: vendor-imports/mozilla/mozilla-central-reftests/text3/segment-break-transformation-removable-1.html status changed to 'Needs Work' due to error:
remote: Linked to unversioned specification URL https://drafts.csswg.org/css-text/#line-break-transform. Use https://drafts.csswg.org/css-text-4/... instead.

although I suspect you want css-text-3 instead of css-text-4.

Would you mind fixing this?
Flags: needinfo?(jeremychen)
(Assignee)

Comment 118

5 months ago
(In reply to David Baron :dbaron: ⌚️UTC-8 from comment #117)
> When imported to the csswg-test repository, all of the tests gave the error:
> 
> remote:
> vendor-imports/mozilla/mozilla-central-reftests/text3/segment-break-
> transformation-removable-1.html status changed to 'Needs Work' due to error:
> remote: Linked to unversioned specification URL
> https://drafts.csswg.org/css-text/#line-break-transform. Use
> https://drafts.csswg.org/css-text-4/... instead.
> 
> although I suspect you want css-text-3 instead of css-text-4.
> 
> Would you mind fixing this?

Sure. I'll push a followup to fix this by using css-text-3 tomorrow.
Flags: needinfo?(jeremychen)
I had to back this out for frequent reftest failures on win7debug like https://treeherder.mozilla.org/logviewer.html#?job_id=67741450&repo=autoland

https://treeherder.mozilla.org/#/jobs?repo=autoland&fromchange=10841f767eaf3f85cd1804016553fdeffc07b82e&group_state=expanded&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=success&filter-searchStr=3fd70802c3f5b4e1bc35a83215b433bd10b2bae0&selectedJob=67741450

https://hg.mozilla.org/mozilla-central/rev/2963cf6be7f8
Flags: needinfo?(jeremychen)
(Assignee)

Comment 120

5 months ago
I suspect it's the same thing as mentioned in bug 1311244 comment 43. It seems that adding large set of tests (~50 in this bug and ~20 in bug 1311244) could be an indirect cause.... not sure.
Depends on: 1300355
Flags: needinfo?(jeremychen)
Given this is backed out, reopen.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 126

5 months ago
mozreview-review
Comment on attachment 8816402 [details]
Bug 1316482 - remove collapsible white spaces according to the White Space Processing Rules.

https://reviewboard.mozilla.org/r/97158/#review104488

Update patch set according to David's feedback in comment 117.
(Assignee)

Comment 127

5 months ago
(In reply to Wes Kocher (:KWierso) from comment #119)
> I had to back this out for frequent reftest failures on win7debug like
> https://treeherder.mozilla.org/logviewer.html#?job_id=67741450&repo=autoland
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=autoland&fromchange=10841f767eaf3f85cd1804016553fdeffc07b82e&group_
> state=expanded&filter-resultStatus=testfailed&filter-
> resultStatus=busted&filter-resultStatus=exception&filter-
> resultStatus=success&filter-
> searchStr=3fd70802c3f5b4e1bc35a83215b433bd10b2bae0&selectedJob=67741450
> 
> https://hg.mozilla.org/mozilla-central/rev/2963cf6be7f8

The frequent reftest failures seem on win7 VM debug only, not on win7 opt or win7 debug.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5ca9fb1a91ee
(Assignee)

Comment 128

5 months ago
If I skip the whole text3 folder (where I intent to add lots of tests in this bug), try looks all green. :-/
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7d2a4dc44899
The situation with those tests is pretty sad, but I don't think it's the fault of anything in this bug, and I don't think it should block progress here.

So I'd be in favor of re-landing with that folder skipped on win7 debug (and a comment added in the reftest.list, pointing back to this issue).
(Assignee)

Comment 130

5 months ago
Sounds good to me. Please review the comment I added in the reftest.list.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 136

5 months ago
mozreview-review
Comment on attachment 8825852 [details]
Bug 1316482 - disable CSS Text 3 reftests on Windows7 debug build.

https://reviewboard.mozilla.org/r/103920/#review104638

LGTM. Suggested a couple of wording tweaks.

::: layout/reftests/w3c-css/submitted/reftest.list:59
(Diff revision 1)
>  
>  # Shapes Level 1
>  include shapes1/reftest.list
>  
>  # Text Level 3
> -include text3/reftest.list
> +# After Bug 1316482, we've added large amount of tests in this folder, which

s/After/In/

s/large amount/a large number/ (reads better, IMO)

::: layout/reftests/w3c-css/submitted/reftest.list:62
(Diff revision 1)
>  
>  # Text Level 3
> -include text3/reftest.list
> +# After Bug 1316482, we've added large amount of tests in this folder, which
> +# indirectly causes frequent reftest failures on win7 debug. (see Bug 1300355)
> +# Since the root cause is not these tests, skip them on win7 debug for now.
> +# Once we come up a better solution, we should resume this folder back on

s/come up a/come up with a/
s/resume this folder back/re-enable this folder/
Attachment #8825852 - Flags: review?(jfkthame) → review+
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 142

5 months ago
Pushed by jichen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6719fbff8975
remove collapsible white spaces according to the White Space Processing Rules. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/5e0232da8e0b
add reftests part 1. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/189473cbd70d
add reftests part 2. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/f3defbd3081a
use the refactored TransformText as a template function for both char16_t and uint8_t text. r=xidorn
https://hg.mozilla.org/integration/autoland/rev/eb990ca47d6d
disable CSS Text 3 reftests on Windows7 debug build. r=jfkthame
(Assignee)

Comment 143

5 months ago
(In reply to Jonathan Kew (:jfkthame) from comment #136)
> Comment on attachment 8825852 [details]
> Bug 1316482 - disable CSS Text 3 reftests on Windows7 debug build.
> 
> https://reviewboard.mozilla.org/r/103920/#review104638
> 
> LGTM. Suggested a couple of wording tweaks.
> 
> ::: layout/reftests/w3c-css/submitted/reftest.list:59
> (Diff revision 1)
> >  
> >  # Shapes Level 1
> >  include shapes1/reftest.list
> >  
> >  # Text Level 3
> > -include text3/reftest.list
> > +# After Bug 1316482, we've added large amount of tests in this folder, which
> 
> s/After/In/

Miss this wording tweak. Will push a followup once these patches being merged to m-c, because so far we can't push a followup to autoland.

Comment 144

5 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/6719fbff8975
https://hg.mozilla.org/mozilla-central/rev/5e0232da8e0b
https://hg.mozilla.org/mozilla-central/rev/189473cbd70d
https://hg.mozilla.org/mozilla-central/rev/f3defbd3081a
https://hg.mozilla.org/mozilla-central/rev/eb990ca47d6d
Status: REOPENED → RESOLVED
Last Resolved: 5 months ago5 months ago
Resolution: --- → FIXED

Comment 145

5 months ago
Pushed by jichen@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ab1b798234c5
followup - fix a wording tweak. comment-only, no review, DONTBUILD
https://hg.mozilla.org/mozilla-central/rev/ab1b798234c5
You need to log in before you can comment on or make changes to this bug.