Closed Bug 898984 Opened 11 years ago Closed 8 years ago

Characters outside BMP in XUL label might be cut off

Categories

(Core :: XUL, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: xidorn, Assigned: arai)

References

(Depends on 1 open bug)

Details

Attachments

(7 files, 4 obsolete files)

Attached image screenshot
When the title of a page contains characters outside Basic Multilingual Plane, the width of title text in tab is incorrect, and those characters may be incorrectly cut.
Attached file testcase
Manually change of charset is required to open the testcase. It is encoded in UTF-8 but Bugzilla gives the Content-Type header "text/html; charset=windows-1252".
Attachment #782456 - Attachment mime type: text/html → text/html; charset=utf-8
Component: Tabbed Browser → XUL
Product: Firefox → Core
Attached file xul testcase
Assignee: nobody → quanxunzhen
Attached patch patch part1 (obsolete) — Splinter Review
This part of patchset adds an iterator for Unicode cluster.
Attachment #788673 - Flags: review?(jfkthame)
Attached patch patch part2 (obsolete) — Splinter Review
This part of patchset uses iterators in nsUnicodeProperties to take surrogates and clusters into account.
Attachment #788674 - Flags: review?(smontagu)
Status: NEW → ASSIGNED
Attachment #788674 - Attachment description: bug898984-2.patch → patch part2
(In reply to Xidorn Quan from comment #5)
> Created attachment 788674 [details] [diff] [review]
> patch part2
> 
> This part of patchset uses iterators in nsUnicodeProperties to take
> surrogates and clusters into account.

While the patches here look like they'd fix the specific problem with surrogates, they don't address the deeper problems with nsTextBoxFrame; the basic approach to "cropping" there (iterating through the text one character - or cluster - at a time, measuring each and accumulating a "total width") is broken, and only works for simple scripts and fonts without complex shaping behavior, contextual forms that vary in width, etc.

This is in addition to the fact that its implementation is horrendously inefficient, because it repeatedly re-creates and discards the same textruns. See discussion in bug 834609 and bug 837765.

Rather than adding more code into a fundamentally broken design, I think it would be better to fix these issues "properly" by replacing the guts of nsTextBoxFrame with an implementation based on nsTextFrame, as proposed in bug 837765 comment 1. Is this something you'd be interested to try?
(In reply to Jonathan Kew (:jfkthame) from comment #6)
> Rather than adding more code into a fundamentally broken design, I think it
> would be better to fix these issues "properly" by replacing the guts of
> nsTextBoxFrame with an implementation based on nsTextFrame, as proposed in
> bug 837765 comment 1. Is this something you'd be interested to try?

It sounds good. I'd like to have a try.
Depends on: 837765
Comment on attachment 788673 [details] [diff] [review]
patch part1

Clearing the r? flag for now, in the hope that we'll be able to go with a better solution (as per comments above) instead of this patch. If you have a chance to return to this bug sometime, that would be great.
Attachment #788673 - Flags: review?(jfkthame)
Attachment #788674 - Flags: review?(smontagu)
See Also: → 1164759
Unassign myself as I'm not actively working on this.

Once we have bug 658467 fixed, we would no longer need to fix this.
Assignee: xidorn+moz → nobody
Status: ASSIGNED → NEW
See Also: 1164759658467
I had a working patch in bug 1277894.
taking this over.
Assignee: nobody → arai.unmht
Status: NEW → ASSIGNED
Summary: Characters outside BMP in title of tab might be cut off → Characters outside BMP in XUL label might be cut off
should we really try to replace whole nsTextBoxFrame to address this?
I don't see any reason to keep this not-fixed for 3 years, while the fix is very small.
based on the fact that all duped bugs are about simple case with emoji at the end, and it can be fixed by the patch (either xidorn's or mine),
I believe landing the patch should be better than waiting for the perfect solution.

or perhaps will bug 658467 be fixed shortly?
if the size of the change matters, this patch from bug 1277894 may be smaller.
(I'm fine with either tho)
Attachment #8792387 - Flags: review?(jfkthame)
(In reply to Tooru Fujisawa [:arai] from comment #14)
> or perhaps will bug 658467 be fixed shortly?

It might be but we'll still use 'crop' in other places.
(In reply to Tooru Fujisawa [:arai] from comment #14)
> based on the fact that all duped bugs are about simple case with emoji at
> the end, and it can be fixed by the patch (either xidorn's or mine),
> I believe landing the patch should be better than waiting for the perfect
> solution.

OK, I guess we could take a patch here to fix the broken-emoji examples that are becoming more common, even though the code here also has more fundamental problems.

I think I'd prefer to do something along the lines of Xidorn's patches, using iterators from mozilla::unicode to encapsulate the surrogate handling. However, they'll need some rebasing/updating to apply to current trunk code. Would you care to do that?

It would be good to have some automated tests to cover this, too.
Flags: needinfo?(arai.unmht)
Comment on attachment 8792387 [details] [diff] [review]
Support surrogate pair when cropping XUL label.

Thanks :)

Rebased xidorn's patches and added testcase with in-tree FiraSans' emoij.
will post them here after try run (at least it's passed locally)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=76e2b1553137
Attachment #8792387 - Attachment is obsolete: true
Flags: needinfo?(arai.unmht)
Attachment #8792387 - Flags: review?(jfkthame)
simply rebased and fixed an assertion condition in ClusterReverseIterator::Next.
Attachment #788673 - Attachment is obsolete: true
Attachment #8794043 - Flags: review?(smontagu)
also simply rebased.
Attachment #8794044 - Flags: review?(jfkthame)
Added a testcase that uses in-tree FiraSans-Regular U+1F310 (GLOBE WITH MERIDIANS) and tests crop start, end, center.
this testcase fails with unpatched build.
Attachment #788674 - Attachment is obsolete: true
Attachment #8794045 - Flags: review?(jfkthame)
Comment on attachment 8794043 [details] [diff] [review]
Part 1: Add ClusterReverseIterator in nsUnicodeProperties.

Review of attachment 8794043 [details] [diff] [review]:
-----------------------------------------------------------------

(stealing the review here, as I'm not sure smontagu is around much)

::: intl/unicharutil/util/nsUnicodeProperties.cpp
@@ +525,5 @@
> +            break;
> +        }
> +    } while (mPos > mLimit);
> +
> +    // XXX May need to handle conjoining Jamo

Yes, for proper cluster recognition we should do this. But I guess it's OK to leave that for a possible future enhancement.
Attachment #8794043 - Flags: review?(smontagu) → review+
Comment on attachment 8794044 [details] [diff] [review]
Part 2: Support surrogate pair in XUL cropped element.

Review of attachment 8794044 [details] [diff] [review]:
-----------------------------------------------------------------

While you're touching this code, there are some style-guide issues that we might as well tidy up in the parts you're changing anyway.

One substantive thing: the check for bidi characters here is incomplete, so let's fix that.

::: layout/xul/nsTextBoxFrame.cpp
@@ +687,5 @@
>          case CropRight:
>          {
> +            ClusterIterator iter(mTitle.Data(), mTitle.Length());
> +            const char16_t *dataBegin = iter;
> +            const char16_t *pos = dataBegin;

style nit: please move the "*" over next to "char16_t", here and in all the other examples below.

@@ +703,1 @@
>                      break;

style nit: while you're here, please add the missing braces.

@@ +703,4 @@
>                      break;
>  
> +                if (UCS2_CHAR_IS_BIDI(*pos) ) {
> +                    mState |= NS_FRAME_IS_BIDI;

Let's fix this while we're here. There are now RTL characters that are not in Plane 0, so UCS2_CHAR_IS_BIDI() is not a sufficient check. (Probably when this was originally written, those characters hadn't been added to Unicode.)

Two possible fixes: One, we could check

    if (length > 1 &&
        ...the usual surrogate stuff to compute ch... &&
        UTF32_CHAR_IS_BIDI(ch)) {
        mState |= NS_FRAME_IS_BIDI;
    }

But that's a pretty verbose mess, and we'd need to do it several times here.

The other option would be to take advantage of the fact that all the Plane 1 RTL characters are in a known, contiguous set of Unicode blocks, with the result that their UTF16 representation will always begin with one of a couple of specific high surrogate code units. So we can just check for those surrogate values and set the flag if they're present. In theory, that's an imperfect check -- we could end up setting the bidi flag as a result of an isolated surrogate, for example -- but that's unimportant.

Turns out we already have an example of code where we use this trick:
https://dxr.mozilla.org/mozilla-central/rev/f0e6cc6360213ba21fd98c887b55fce5c680df68/widget/cocoa/TextInputHandler.mm#701

So what I suggest we do is to add this to the UCS2_CHAR_IS_BIDI macro itself, so that it checks for high surrogates that indicate an RTL supplementary-plane block; then we'll automatically get this support in all the (few) places where UCS2_CHAR_IS_BIDI is used, and can drop the extra check from TextInputHandler.mm.

@@ +713,1 @@
>                  return titleWidth;

braces

@@ +738,1 @@
>                      break;

braces

@@ +748,1 @@
>                  return titleWidth;

braces

@@ +788,1 @@
>                      break;

braces (a bunch of times in this chunk)
Comment on attachment 8794045 [details] [diff] [review]
Part 3: Add testcase for surrogate pair in XUL cropped element.

Review of attachment 8794045 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/reftests/xul/reftest.list
@@ +6,5 @@
>  # accesskeys are not normally displayed on Mac, so skip this test
>  skip-if(cocoaWidget) skip-if((B2G&&browserIsRemote)||Mulet) == accesskey.xul accesskey-ref.xul # Initial mulet triage: parity with B2G/B2G Desktop
>  fails-if(cocoaWidget) fails-if(browserIsRemote&&dwrite) fuzzy-if(xulRuntime.widgetToolkit=="gtk3",1,11) skip-if((B2G&&browserIsRemote)||Mulet) == tree-row-outline-1.xul tree-row-outline-1-ref.xul # Initial mulet triage: parity with B2G/B2G Desktop, win8: bug 1254832
>  skip-if((B2G&&browserIsRemote)||Mulet) != tree-row-outline-1.xul tree-row-outline-1-notref.xul # Initial mulet triage: parity with B2G/B2G Desktop
> +skip-if((B2G&&browserIsRemote)||Mulet) == text-crop.xul text-crop-ref.xul # Initial mulet triage: parity with B2G/B2G Desktop

We normally need HTTP(..) for tests that load fonts from locations like (../fonts/....), otherwise same-origin restrictions will prevent them loading. Did you check whether this actually loads the intended font?

(Note that the test _might_ work even if the @font-face rule fails, but that would depend what fallback happens to be found. So we shouldn't rely on it.)
(In reply to Jonathan Kew (:jfkthame) from comment #25)
> Comment on attachment 8794045 [details] [diff] [review]
> Part 3: Add testcase for surrogate pair in XUL cropped element.
> 
> Review of attachment 8794045 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: layout/reftests/xul/reftest.list
> @@ +6,5 @@
> >  # accesskeys are not normally displayed on Mac, so skip this test
> >  skip-if(cocoaWidget) skip-if((B2G&&browserIsRemote)||Mulet) == accesskey.xul accesskey-ref.xul # Initial mulet triage: parity with B2G/B2G Desktop
> >  fails-if(cocoaWidget) fails-if(browserIsRemote&&dwrite) fuzzy-if(xulRuntime.widgetToolkit=="gtk3",1,11) skip-if((B2G&&browserIsRemote)||Mulet) == tree-row-outline-1.xul tree-row-outline-1-ref.xul # Initial mulet triage: parity with B2G/B2G Desktop, win8: bug 1254832
> >  skip-if((B2G&&browserIsRemote)||Mulet) != tree-row-outline-1.xul tree-row-outline-1-notref.xul # Initial mulet triage: parity with B2G/B2G Desktop
> > +skip-if((B2G&&browserIsRemote)||Mulet) == text-crop.xul text-crop-ref.xul # Initial mulet triage: parity with B2G/B2G Desktop
> 
> We normally need HTTP(..) for tests that load fonts from locations like
> (../fonts/....), otherwise same-origin restrictions will prevent them
> loading. Did you check whether this actually loads the intended font?
> 
> (Note that the test _might_ work even if the @font-face rule fails, but that
> would depend what fallback happens to be found. So we shouldn't rely on it.)

I saw FiraSans' emoji (black line globe) on rendering result, that is different than system default one (blue line globe).
will double check if the existence of the target font file surely affects the result.
Here's comparison of testcase rendering on unpatched build,
between the case it points existing FiraSans font and it points non-existing file.
So it looks like the font file is loaded.
just to be clear, they're screenshot of reftest on OSX, with commenting out |document.documentElement.className = "";| to stop the execution.
addressed review comments.
thank you for reviewing :)
Attachment #8794044 - Attachment is obsolete: true
Attachment #8794044 - Flags: review?(jfkthame)
Attachment #8794170 - Flags: review?(jfkthame)
(In reply to Tooru Fujisawa [:arai] from comment #27)
> Created attachment 8794164 [details]
> FiraSans rendering result
> 
> Here's comparison of testcase rendering on unpatched build,
> between the case it points existing FiraSans font and it points non-existing
> file.
> So it looks like the font file is loaded.

Yes, looks fine. I guess the way we load XUL testcases makes the extra annotation unnecessary.
Attachment #8794045 - Flags: review?(jfkthame) → review+
Attachment #8794170 - Flags: review?(jfkthame) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/00a900605c52a89d263683f8a1edb2dcb34b37cb
Bug 898984 - Part 1: Add ClusterReverseIterator in nsUnicodeProperties. r=jfkthame

https://hg.mozilla.org/integration/mozilla-inbound/rev/9a0a1b105571f3f29af09cd686ce88ca76d6f676
Bug 898984 - Part 2: Support surrogate pair in XUL cropped element. r=jfkthame

https://hg.mozilla.org/integration/mozilla-inbound/rev/0be22294cfd825fb949866185169621f500110ee
Bug 898984 - Part 3: Add testcase for surrogate pair in XUL cropped element. r=jfkthame
https://hg.mozilla.org/mozilla-central/rev/00a900605c52
https://hg.mozilla.org/mozilla-central/rev/9a0a1b105571
https://hg.mozilla.org/mozilla-central/rev/0be22294cfd8
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Duplicate of this bug: 404856
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: