Last Comment Bug 404856 - Titles containing supplementary characters (outside BMP) are cropped in mid-surrogate-pair
: Titles containing supplementary characters (outside BMP) are cropped in mid-s...
Status: NEW
See comment 14 for repro
:
Product: Core
Classification: Components
Component: XUL (show other bugs)
: unspecified
: All All
: -- minor with 3 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
: 460441 821647 857913 941465 1262132 1280268 1280372 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-11-21 13:27 PST by maix
Modified: 2016-06-16 02:24 PDT (History)
20 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
A file with that title (307 bytes, text/html)
2007-11-22 04:17 PST, maix
no flags Details
Screenshot, Linux: Works when there is enough space (10.56 KB, image/png)
2013-04-19 12:53 PDT, Ben Bucksch (:BenB)
no flags Details
Testcase (64.11 KB, text/html)
2013-04-19 13:05 PDT, Ben Bucksch (:BenB)
no flags Details
Testcase with Consecutive SMP Characters (176 bytes, application/xhtml+xml)
2016-02-15 07:08 PST, Patrick Dark
no flags Details
Multiple Tabs Titles Improperly Truncated on Windows 10 (10.26 KB, image/png)
2016-02-15 07:22 PST, Patrick Dark
no flags Details

Description maix 2007-11-21 13:27:15 PST
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en; rv:1.8.1.8) Gecko/20061201 Firefox/2.0.0.4  (Ubuntu-feisty)
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en; rv:1.8.1.8) Gecko/20061201 Firefox/2.0.0.4  (Ubuntu-feisty)

I had a webpage with the following title: "
Comment 1 maix 2007-11-21 13:30:16 PST
it cut of the text :-/ next bug :)

Can be found here: I had a webpage with the following title: "
Comment 2 maix 2007-11-21 13:34:58 PST
hrhr: http://paste.pocoo.org/show/11677/
Comment 3 Samuel Sidler (old account; do not CC) 2007-11-21 19:58:18 PST
From the link in comment 2:

I had a webpage with the following title: "
Comment 4 Samuel Sidler (old account; do not CC) 2007-11-21 20:00:48 PST
Err... woah. That's not good...

After the weirdness, this is the text:

('\ud835\udd80\ud835\udd93\ud835\udd8e\ud835\udd88\ud835\udd94\ud835\udd89\ud835\udd8a \ud835\udd8e\ud835\udd98 \ud835\udd88\ud835\udd94\ud835\udd94\ud835\udd91.' in case it is not displayed correctly) (yes, I admit, it was just for playing around :))
Anyway, if there are many tabs, the title is cut off (of course) but sometimes (depending on space available) it is not split between the chars but between the bites.
Image 1, wrong: http://img88.imageshack.us/img88/3997/ffunicode1yt1.png
Image 2, correctly: http://img88.imageshack.us/img88/658/ffunicode2qx8.png


You're using an old version of Firefox. Do you see this issue using Firefox 2.0.0.9 with a clean profile? How about with Firefox 3 Beta 1? 

http://support.mozilla.com/kb/Profiles
Comment 5 Simon Montagu :smontagu 2007-11-21 20:35:08 PST
I thought that this was a dupe, but I can't find another bug report about it. It is at least mentioned in a code comment at http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/layout/xul/base/src/nsTextBoxFrame.cpp&rev=1.123&mark=651#650
Comment 6 Simon Montagu :smontagu 2007-11-21 20:38:39 PST
Maix, can you attach the webpage to this bug report?

(In reply to comment #1)
> it cut of the text :-/ next bug :)

If you report this bug or have already done so, please cc me.
Comment 7 maix 2007-11-22 04:17:30 PST
Created attachment 289802 [details]
A file with that title
Comment 8 maix 2007-11-22 10:44:06 PST
> If you report this bug or have already done so, please cc me.
But I don't know if it's a bug in firefox or in bugzilla, I tried it with a
form and the output was not displayed but it appeared in the source, here it
doesn't appear in the source??
Comment 9 Simon Montagu :smontagu 2007-11-22 12:17:30 PST
(In reply to comment #8)
> But I don't know if it's a bug in firefox or in bugzilla.

Probably in bugzilla. I've entered supplementary characters in forms quite often in other sites. Anyway, both firefox and bugzilla are products in bugzilla.mozilla.org, so it can get assigned to the right product later.

Comment 10 Simon Montagu :smontagu 2012-12-15 20:56:28 PST
*** Bug 821647 has been marked as a duplicate of this bug. ***
Comment 11 Simon Montagu :smontagu 2013-04-06 22:30:01 PDT
*** Bug 857913 has been marked as a duplicate of this bug. ***
Comment 12 Ben Bucksch (:BenB) 2013-04-19 12:53:11 PDT
Created attachment 739756 [details]
Screenshot, Linux: Works when there is enough space
Comment 13 Ben Bucksch (:BenB) 2013-04-19 12:58:15 PDT
This bug got some attention by the LWN article <http://lwn.net/Articles/545741/> about the Fedora release name. The article has the title "Schrödinger's 
Comment 14 Ben Bucksch (:BenB) 2013-04-19 13:05:04 PDT
Created attachment 739762 [details]
Testcase

Reproduction:
1. Open the testcase (the LWN article, saved as HTML file) in a tab
2. Open a few other webpages in tabs in the same window.
3. Check that you see the cat character in the testpage's tab
4. Slowly resize the browser window, reducing width.
5. Stop shortly after the tab showed the cat as last character.

Actual result:
Tab title is "Schrödinger's [D83D]"

Expected result:
Tab title is "Schrödinger's " or "Schrödinger's
Comment 15 Ben Bucksch (:BenB) 2013-04-19 13:06:38 PDT
(again, Bugzilla cuts the special "cat" character.)
Comment 16 Jonathan Kew (:jfkthame) 2013-04-19 13:29:50 PDT
I assume the tab title ends up getting drawn by nsTextBoxFrame. The text-cropping code there is not particularly Unicode- or international-aware (as noted in bug 837765 comment 2); in addition to cropping within a surrogate pair, it's liable to mis-measure in cases with contextual shaping, such as Arabic or Indic scripts.

I believe nsTextFrame would handle the truncation better, if we could switch nsTextBoxFrame over to use that internally instead of its own hacky truncation code.
Comment 17 Matthias Urlichs 2013-04-19 13:46:19 PDT
> it's liable to mis-measure in cases with contextual shaping, such as Arabic or Indic scripts.

I assume that it also either doesn't consider combining characters to be zero-width, or doesn't consider the fact that ï or ĩ (when decomposed) may well be wider than i.
Comment 18 Simon Montagu :smontagu 2013-11-20 22:48:16 PST
*** Bug 941465 has been marked as a duplicate of this bug. ***
Comment 19 Jesse Ruderman 2013-11-21 01:57:59 PST
Can we just replace this buggy implementation with xul.css:
  *[crop] { text-overflow: ellipsis; }
?
Comment 20 Simon Montagu :smontagu 2015-07-13 10:34:29 PDT
(In reply to Jesse Ruderman from comment #19)
> Can we just replace this buggy implementation with xul.css:
>   *[crop] { text-overflow: ellipsis; }
> ?

The behaviour of text-overflow: ellipsis used to be different to xul cropping, but bug 883884 may soon change that, so that would probably allow such a solution.
Comment 21 Simon Montagu :smontagu 2015-07-15 08:38:49 PDT
*** Bug 460441 has been marked as a duplicate of this bug. ***
Comment 22 Patrick Dark 2016-02-15 07:08:24 PST
Created attachment 8719500 [details]
Testcase with Consecutive SMP Characters

I've added another testcase. Unlike the existing testcase, this one contains multiple, consecutive supplementary multilingual plane (SMP) (Plane 1) characters in the page title featuring words constructed from the Mathematical Alphanumeric Symbols character block.

You can test it by filling the tab bar with tabs until the testcase tab is the last one in the tab bar, then downsizing the window on the x-axis to force the tab width to decrease.

Not only are surrogate pairs getting split at the overflow ellipsis -- causing characters to render as a box with containing the remaining surrogate's code point -- but the text gets truncated well ahead of where it should be given the tab's size. I end up seeing:

* one to three SMP characters
* a surrogate pair sometimes but not always
* an ellipsis
* wasted empty space at the end of the tab where title text should be

This bug doesn't seem so bad when you're rendering a superfluous emoji, but it's more serious when you're rendering a page title and most of it vanishes.
Comment 23 Patrick Dark 2016-02-15 07:22:58 PST
Created attachment 8719505 [details]
Multiple Tabs Titles Improperly Truncated on Windows 10

The attached image shows that tabs behave differently based on whether they're the selected tab or not.
Comment 24 Makoto Kato [:m_kato] (PTO 6/20-21, 6/24) 2016-04-05 19:12:49 PDT
*** Bug 1262132 has been marked as a duplicate of this bug. ***
Comment 25 Makoto Kato [:m_kato] (PTO 6/20-21, 6/24) 2016-04-05 19:15:52 PDT
This issue is XUL code (layout/xul/base/src/nsTextBoxFrame.cpp), not text layout.
Comment 26 :Gijs Kruitbosch 2016-06-16 02:23:57 PDT
*** Bug 1280268 has been marked as a duplicate of this bug. ***
Comment 27 :Gijs Kruitbosch 2016-06-16 02:24:23 PDT
*** Bug 1280372 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.