XUL textboxes containing supplementary characters (outside BMP) are cropped in mid-surrogate-pair

NEW
Unassigned

Status

()

Core
XUL
--
minor
10 years ago
3 months ago

People

(Reporter: maix, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: See comment 14 for repro)

Attachments

(4 attachments, 1 obsolete attachment)

(Reporter)

Description

10 years ago
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en; rv:1.8.1.8) Gecko/20061201 Firefox/2.0.0.4  (Ubuntu-feisty)
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en; rv:1.8.1.8) Gecko/20061201 Firefox/2.0.0.4  (Ubuntu-feisty)

I had a webpage with the following title: "
(Reporter)

Comment 1

10 years ago
it cut of the text :-/ next bug :)

Can be found here: I had a webpage with the following title: "
(Reporter)

Comment 2

10 years ago
hrhr: http://paste.pocoo.org/show/11677/
From the link in comment 2:

I had a webpage with the following title: "
Whiteboard: CLOSEME - 12/05
Version: unspecified → 2.0 Branch
Err... woah. That's not good...

After the weirdness, this is the text:

('\ud835\udd80\ud835\udd93\ud835\udd8e\ud835\udd88\ud835\udd94\ud835\udd89\ud835\udd8a \ud835\udd8e\ud835\udd98 \ud835\udd88\ud835\udd94\ud835\udd94\ud835\udd91.' in case it is not displayed correctly) (yes, I admit, it was just for playing around :))
Anyway, if there are many tabs, the title is cut off (of course) but sometimes (depending on space available) it is not split between the chars but between the bites.
Image 1, wrong: http://img88.imageshack.us/img88/3997/ffunicode1yt1.png
Image 2, correctly: http://img88.imageshack.us/img88/658/ffunicode2qx8.png


You're using an old version of Firefox. Do you see this issue using Firefox 2.0.0.9 with a clean profile? How about with Firefox 3 Beta 1? 

http://support.mozilla.com/kb/Profiles
I thought that this was a dupe, but I can't find another bug report about it. It is at least mentioned in a code comment at http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/layout/xul/base/src/nsTextBoxFrame.cpp&rev=1.123&mark=651#650
Status: UNCONFIRMED → NEW
Component: Tabbed Browser → Layout: Fonts and Text
Ever confirmed: true
Product: Firefox → Core
Version: 2.0 Branch → unspecified
Maix, can you attach the webpage to this bug report?

(In reply to comment #1)
> it cut of the text :-/ next bug :)

If you report this bug or have already done so, please cc me.

Updated

10 years ago
OS: Linux → All
Hardware: PC → All
Summary: Unicode characters get split if tab is too small → Titles containing supplementary characters (outside BMP) are cropped in mid-surrogate-pair

Updated

10 years ago
QA Contact: tabbed.browser → layout.fonts-and-text
Whiteboard: CLOSEME - 12/05
(Reporter)

Comment 7

10 years ago
Created attachment 289802 [details]
A file with that title
(Reporter)

Comment 8

10 years ago
> If you report this bug or have already done so, please cc me.
But I don't know if it's a bug in firefox or in bugzilla, I tried it with a
form and the output was not displayed but it appeared in the source, here it
doesn't appear in the source??
(In reply to comment #8)
> But I don't know if it's a bug in firefox or in bugzilla.

Probably in bugzilla. I've entered supplementary characters in forms quite often in other sites. Anyway, both firefox and bugzilla are products in bugzilla.mozilla.org, so it can get assigned to the right product later.

Duplicate of this bug: 821647
Duplicate of this bug: 857913

Comment 12

4 years ago
Created attachment 739756 [details]
Screenshot, Linux: Works when there is enough space

Comment 13

4 years ago
This bug got some attention by the LWN article <http://lwn.net/Articles/545741/> about the Fedora release name. The article has the title "Schrödinger's 
Severity: normal → minor

Updated

4 years ago
Attachment #739756 - Attachment description: Screenshot, Linux: WORKSFORME → Screenshot, Linux: Works when there is enough space

Comment 14

4 years ago
Created attachment 739762 [details]
Testcase

Reproduction:
1. Open the testcase (the LWN article, saved as HTML file) in a tab
2. Open a few other webpages in tabs in the same window.
3. Check that you see the cat character in the testpage's tab
4. Slowly resize the browser window, reducing width.
5. Stop shortly after the tab showed the cat as last character.

Actual result:
Tab title is "Schrödinger's [D83D]"

Expected result:
Tab title is "Schrödinger's " or "Schrödinger's
Attachment #289802 - Attachment is obsolete: true

Comment 15

4 years ago
(again, Bugzilla cuts the special "cat" character.)
Whiteboard: See comment 14 for repro
I assume the tab title ends up getting drawn by nsTextBoxFrame. The text-cropping code there is not particularly Unicode- or international-aware (as noted in bug 837765 comment 2); in addition to cropping within a surrogate pair, it's liable to mis-measure in cases with contextual shaping, such as Arabic or Indic scripts.

I believe nsTextFrame would handle the truncation better, if we could switch nsTextBoxFrame over to use that internally instead of its own hacky truncation code.

Comment 17

4 years ago
> it's liable to mis-measure in cases with contextual shaping, such as Arabic or Indic scripts.

I assume that it also either doesn't consider combining characters to be zero-width, or doesn't consider the fact that ï or ĩ (when decomposed) may well be wider than i.
Duplicate of this bug: 941465

Comment 19

4 years ago
Can we just replace this buggy implementation with xul.css:
  *[crop] { text-overflow: ellipsis; }
?
(In reply to Jesse Ruderman from comment #19)
> Can we just replace this buggy implementation with xul.css:
>   *[crop] { text-overflow: ellipsis; }
> ?

The behaviour of text-overflow: ellipsis used to be different to xul cropping, but bug 883884 may soon change that, so that would probably allow such a solution.
Duplicate of this bug: 460441

Comment 22

a year ago
Created attachment 8719500 [details]
Testcase with Consecutive SMP Characters

I've added another testcase. Unlike the existing testcase, this one contains multiple, consecutive supplementary multilingual plane (SMP) (Plane 1) characters in the page title featuring words constructed from the Mathematical Alphanumeric Symbols character block.

You can test it by filling the tab bar with tabs until the testcase tab is the last one in the tab bar, then downsizing the window on the x-axis to force the tab width to decrease.

Not only are surrogate pairs getting split at the overflow ellipsis -- causing characters to render as a box with containing the remaining surrogate's code point -- but the text gets truncated well ahead of where it should be given the tab's size. I end up seeing:

* one to three SMP characters
* a surrogate pair sometimes but not always
* an ellipsis
* wasted empty space at the end of the tab where title text should be

This bug doesn't seem so bad when you're rendering a superfluous emoji, but it's more serious when you're rendering a page title and most of it vanishes.

Updated

a year ago
Attachment #8719500 - Attachment description: test.xhtml → Testcase with Consecutive SMP Characters

Comment 23

a year ago
Created attachment 8719505 [details]
Multiple Tabs Titles Improperly Truncated on Windows 10

The attached image shows that tabs behave differently based on whether they're the selected tab or not.

Updated

a year ago
Attachment #8719505 - Attachment description: Multiple Tabs Titles Improperly Truncated → Multiple Tabs Titles Improperly Truncated on Windows 10
Duplicate of this bug: 1262132
This issue is XUL code (layout/xul/base/src/nsTextBoxFrame.cpp), not text layout.
Component: Layout: Text → XUL

Updated

a year ago
Duplicate of this bug: 1280268

Updated

a year ago
Duplicate of this bug: 1280372

Comment 28

3 months ago
This bug appears to have been indirectly fixed or invalidated by Bug 658467: Fade out tab label on overflow instead of ellipsis in Firefox 53, which is planned for release on 2017-04-18.

It's a good thing too, since it's breaking Facebook pages in the release version of Firefox, which are now using the initial text of posts as tab titles when posts are displayed in isolation. This initial text often contains emoji and is routinely triggering this bug.
OK, so the problem no longer shows up in tab titles with the new styling; great. But the underlying bug in XUL textbox truncation still remains, and will no doubt be reproducible in other situations.

Updated

3 months ago
Summary: Titles containing supplementary characters (outside BMP) are cropped in mid-surrogate-pair → XUL textboxes containing supplementary characters (outside BMP) are cropped in mid-surrogate-pair
For example, visit the testcases in this bug, attachment 739762 [details] and attachment 8719500 [details]; then choose Show All History to open the Library window and view the entries in the recent history, and resize the Name column (or the overall window) so that truncation occurs.

Updated

3 months ago
Duplicate of this bug: 1344009
You need to log in before you can comment on or make changes to this bug.