Closed Bug 404856 Opened 17 years ago Closed 2 years ago

XUL textboxes containing supplementary characters (outside BMP) are cropped in mid-surrogate-pair

Categories

(Core :: XUL, defect)

defect

Tracking

()

RESOLVED DUPLICATE of bug 898984

People

(Reporter: maix42, Unassigned)

References

Details

(Whiteboard: See comment 30 for repro)

Attachments

(4 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en; rv:1.8.1.8) Gecko/20061201 Firefox/2.0.0.4  (Ubuntu-feisty)
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en; rv:1.8.1.8) Gecko/20061201 Firefox/2.0.0.4  (Ubuntu-feisty)

I had a webpage with the following title: "
it cut of the text :-/ next bug :)

Can be found here: I had a webpage with the following title: "
hrhr: http://paste.pocoo.org/show/11677/
From the link in comment 2:

I had a webpage with the following title: "
Whiteboard: CLOSEME - 12/05
Version: unspecified → 2.0 Branch
Err... woah. That's not good...

After the weirdness, this is the text:

('\ud835\udd80\ud835\udd93\ud835\udd8e\ud835\udd88\ud835\udd94\ud835\udd89\ud835\udd8a \ud835\udd8e\ud835\udd98 \ud835\udd88\ud835\udd94\ud835\udd94\ud835\udd91.' in case it is not displayed correctly) (yes, I admit, it was just for playing around :))
Anyway, if there are many tabs, the title is cut off (of course) but sometimes (depending on space available) it is not split between the chars but between the bites.
Image 1, wrong: http://img88.imageshack.us/img88/3997/ffunicode1yt1.png
Image 2, correctly: http://img88.imageshack.us/img88/658/ffunicode2qx8.png


You're using an old version of Firefox. Do you see this issue using Firefox 2.0.0.9 with a clean profile? How about with Firefox 3 Beta 1? 

http://support.mozilla.com/kb/Profiles
I thought that this was a dupe, but I can't find another bug report about it. It is at least mentioned in a code comment at http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/layout/xul/base/src/nsTextBoxFrame.cpp&rev=1.123&mark=651#650
Status: UNCONFIRMED → NEW
Component: Tabbed Browser → Layout: Fonts and Text
Ever confirmed: true
Product: Firefox → Core
Version: 2.0 Branch → unspecified
Maix, can you attach the webpage to this bug report?

(In reply to comment #1)
> it cut of the text :-/ next bug :)

If you report this bug or have already done so, please cc me.
OS: Linux → All
Hardware: PC → All
Summary: Unicode characters get split if tab is too small → Titles containing supplementary characters (outside BMP) are cropped in mid-surrogate-pair
QA Contact: tabbed.browser → layout.fonts-and-text
Whiteboard: CLOSEME - 12/05
Attached file A file with that title (obsolete) —
> If you report this bug or have already done so, please cc me.
But I don't know if it's a bug in firefox or in bugzilla, I tried it with a
form and the output was not displayed but it appeared in the source, here it
doesn't appear in the source??
(In reply to comment #8)
> But I don't know if it's a bug in firefox or in bugzilla.

Probably in bugzilla. I've entered supplementary characters in forms quite often in other sites. Anyway, both firefox and bugzilla are products in bugzilla.mozilla.org, so it can get assigned to the right product later.

This bug got some attention by the LWN article <http://lwn.net/Articles/545741/> about the Fedora release name. The article has the title "Schrödinger's 
Severity: normal → minor
Attachment #739756 - Attachment description: Screenshot, Linux: WORKSFORME → Screenshot, Linux: Works when there is enough space
Attached file Testcase
Reproduction:
1. Open the testcase (the LWN article, saved as HTML file) in a tab
2. Open a few other webpages in tabs in the same window.
3. Check that you see the cat character in the testpage's tab
4. Slowly resize the browser window, reducing width.
5. Stop shortly after the tab showed the cat as last character.

Actual result:
Tab title is "Schrödinger's [D83D]"

Expected result:
Tab title is "Schrödinger's " or "Schrödinger's
Attachment #289802 - Attachment is obsolete: true
(again, Bugzilla cuts the special "cat" character.)
Whiteboard: See comment 14 for repro
I assume the tab title ends up getting drawn by nsTextBoxFrame. The text-cropping code there is not particularly Unicode- or international-aware (as noted in bug 837765 comment 2); in addition to cropping within a surrogate pair, it's liable to mis-measure in cases with contextual shaping, such as Arabic or Indic scripts.

I believe nsTextFrame would handle the truncation better, if we could switch nsTextBoxFrame over to use that internally instead of its own hacky truncation code.
> it's liable to mis-measure in cases with contextual shaping, such as Arabic or Indic scripts.

I assume that it also either doesn't consider combining characters to be zero-width, or doesn't consider the fact that ï or ĩ (when decomposed) may well be wider than i.
Can we just replace this buggy implementation with xul.css:
  *[crop] { text-overflow: ellipsis; }
?
(In reply to Jesse Ruderman from comment #19)
> Can we just replace this buggy implementation with xul.css:
>   *[crop] { text-overflow: ellipsis; }
> ?

The behaviour of text-overflow: ellipsis used to be different to xul cropping, but bug 883884 may soon change that, so that would probably allow such a solution.
I've added another testcase. Unlike the existing testcase, this one contains multiple, consecutive supplementary multilingual plane (SMP) (Plane 1) characters in the page title featuring words constructed from the Mathematical Alphanumeric Symbols character block.

You can test it by filling the tab bar with tabs until the testcase tab is the last one in the tab bar, then downsizing the window on the x-axis to force the tab width to decrease.

Not only are surrogate pairs getting split at the overflow ellipsis -- causing characters to render as a box with containing the remaining surrogate's code point -- but the text gets truncated well ahead of where it should be given the tab's size. I end up seeing:

* one to three SMP characters
* a surrogate pair sometimes but not always
* an ellipsis
* wasted empty space at the end of the tab where title text should be

This bug doesn't seem so bad when you're rendering a superfluous emoji, but it's more serious when you're rendering a page title and most of it vanishes.
Attachment #8719500 - Attachment description: test.xhtml → Testcase with Consecutive SMP Characters
The attached image shows that tabs behave differently based on whether they're the selected tab or not.
Attachment #8719505 - Attachment description: Multiple Tabs Titles Improperly Truncated → Multiple Tabs Titles Improperly Truncated on Windows 10
This issue is XUL code (layout/xul/base/src/nsTextBoxFrame.cpp), not text layout.
Component: Layout: Text → XUL
This bug appears to have been indirectly fixed or invalidated by Bug 658467: Fade out tab label on overflow instead of ellipsis in Firefox 53, which is planned for release on 2017-04-18.

It's a good thing too, since it's breaking Facebook pages in the release version of Firefox, which are now using the initial text of posts as tab titles when posts are displayed in isolation. This initial text often contains emoji and is routinely triggering this bug.
OK, so the problem no longer shows up in tab titles with the new styling; great. But the underlying bug in XUL textbox truncation still remains, and will no doubt be reproducible in other situations.
Summary: Titles containing supplementary characters (outside BMP) are cropped in mid-surrogate-pair → XUL textboxes containing supplementary characters (outside BMP) are cropped in mid-surrogate-pair
For example, visit the testcases in this bug, attachment 739762 [details] and attachment 8719500 [details]; then choose Show All History to open the Library window and view the entries in the recent history, and resize the Name column (or the overall window) so that truncation occurs.
Severity: minor → S4

The severity field for this bug is relatively low, S4. However, the bug has 12 duplicates.
:enndeakin, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(enndeakin)

The last needinfo from me was triggered in error by recent activity on the bug. I'm clearing the needinfo since this is a very old bug and I don't know if it's still relevant.

Flags: needinfo?(enndeakin)

Looks like this was fixed for XUL textboxes (nsTextBoxFrame.cpp) in bug 898984.

However, the issue still exists in XUL treeviews, and shows up e.g. in the History window (comment 30). I'll file a separate bug to fix that.

Status: NEW → RESOLVED
Closed: 2 years ago
Duplicate of bug: 898984
Resolution: --- → DUPLICATE
Whiteboard: See comment 14 for repro → See comment 30 for repro
Blocks: 1799093
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: