Closed
Bug 991427
Opened 10 years ago
Closed 10 years ago
formatSearchContextItem does not handle surrogate pairs correctly
Categories
(Firefox :: Menus, defect)
Tracking
()
People
(Reporter: masanori.ogino, Assigned: andreaio.it, Mentored)
Details
Attachments
(2 files, 3 obsolete files)
80 bytes,
text/html
|
Details | |
3.26 KB,
patch
|
Gavin
:
review+
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:28.0) Gecko/20100101 Firefox/28.0 (Beta/Release) Build ID: 20140331100457 Steps to reproduce: 1. Open attached test.html, 2. Select all, and 3. Do right-click. Actual results: In the context menu, <selection text> in 'Search <engine> for "<selection text>"' item contains seven fires and a "D83D in a box" character (depending fonts installed to the machine; the test file uses an Emoji character but any character beyond BMP causes similar problem) before an elipsis. Expected results: Selection text is trancated by a correct boundary (hopefully grapheme, at least code point).
Comment 1•10 years ago
|
||
Reproducible on: Mozilla/5.0 (X11; Linux i686; rv:31.0) Gecko/20100101 Firefox/31.0 A character that is not part of the text on the page is displayed before the ellipsis.
Status: UNCONFIRMED → NEW
Component: Untriaged → Menus
Ever confirmed: true
Updated•10 years ago
|
Flags: firefox-backlog+
Updated•10 years ago
|
Whiteboard: [mentor] p=5
Comment 2•10 years ago
|
||
Marco: You're mentoring this?
Comment 3•10 years ago
|
||
I think Marco just marked this to be mentored by someone TBD
Updated•10 years ago
|
Whiteboard: [mentor] p=5 → [mentor wanted] p=5
Assignee | ||
Comment 4•10 years ago
|
||
I think that truncating the selected characters to an even number in the formatSearchContextItem function will solve the problem.
Assignee | ||
Comment 5•10 years ago
|
||
Can someone assign me to this bug?
Comment 6•10 years ago
|
||
Yes, I can. Thanks for taking this up, Andrea! Please ask any questions you have here in the bug.
Assignee: nobody → andreaio.it
Assignee | ||
Comment 7•10 years ago
|
||
Attachment #8469319 -
Flags: review?(gavin.sharp)
Comment 8•10 years ago
|
||
Unfortunately there's no guarantee that truncating at 16 "JS characters" is safer - it might fix that particular test case, but it's still possible for there to be a UTF-16 surrogate pair boundary at the 16 "JS character" point. Here's an example: data:text/html,<p>aaaaaaaaaaaaaaa%26%23x1f525%3B</p> I'm not actually sure how to best fix this "the right way". What we want is to truncate by code points rather than JS characters, so we need a way to map "15 Unicode code points of string X" to "X JS characters of string X", but I'm not sure how to do that via the XPCOM tools we have offhand.
Comment 9•10 years ago
|
||
Comment on attachment 8469319 [details] [diff] [review] Changed the number of characters displayed from 15 to 16 to make UTF 16 work correctly thanks for writing the patch, though! Let me see if I can provide better advice for an alternative approach.
Attachment #8469319 -
Flags: review?(gavin.sharp) → review-
Assignee | ||
Comment 10•10 years ago
|
||
Attachment #8469319 -
Attachment is obsolete: true
Attachment #8470465 -
Flags: review?(gavin.sharp)
Comment 11•10 years ago
|
||
I did not know codePointAt existed! Neat. I might suggest an alternate approach, though: how about just checking whether the 16th character in the string is a lead surrogate, and if so just including the next character? I.e.: function truncateText(str) { let truncLength = 15; if (str.length > 15) { // Don't chop a surrogate pair in half let truncChar = str[15].charCodeAt(0); if (truncChar >= 0xD800 && truncChar <= 0xDBFF) truncLength++; } return str.substr(0, truncLength); }
Assignee | ||
Comment 12•10 years ago
|
||
Because a surrogate pair is one character. If you just check the last the string would be a character shorter for every surrogate pair.
Comment 13•10 years ago
|
||
A surrogate pair is two "JS characters", though (that's why codePointAt and charCodeAt both exist). So the code I mocked up should work (though I haven't tested it!).
Comment 14•10 years ago
|
||
Essentially, I'm suggesting truncating at 15 "JS characters", like the current code, but change that to truncating to 16 characters if the 15th and 16th JS characters are a surrogate pair (you can tell whether they are by just looking for the "lead surrogate" in the 15th character).
Comment 15•10 years ago
|
||
Comment on attachment 8470465 [details] [diff] [review] Now the length of text showed is right I think my suggested fix is simpler and should work equally well, but let me know if I'm missing something!
Attachment #8470465 -
Flags: review?(gavin.sharp) → feedback-
Assignee | ||
Comment 16•10 years ago
|
||
Attachment #8470465 -
Attachment is obsolete: true
Attachment #8471683 -
Flags: review?(gavin.sharp)
Comment 17•10 years ago
|
||
Comment on attachment 8471683 [details] [diff] [review] Check if the last character of the string is part of a surrogate pair Thanks Andrea! I would just elaborate in the comment: // If the JS character after our truncation point is a trail surrogate, // include it in the truncated string to avoid splitting a surrogate pair. if (truncChar >= 0xDC00 && truncChar <= 0xDFFF) [...] We should get an automated test for this feature added as well. It should be relatively simple to edit browser_bug970746.js (and browser_bug970746.xhtml) to add coverage for this case. The details on how to run the browser chrome test suite (of which browser_bug970746 is part) are at https://developer.mozilla.org/en-US/docs/Browser_chrome_tests . So the next steps would be to adjust the patch to include the new comment, as well as the test changes I propose above. Feel free to let me know (here, or via IRC or email) if you have other questions, or if you'd like me to write the test instead.
Attachment #8471683 -
Flags: review?(gavin.sharp) → review+
Updated•10 years ago
|
Mentor: gavin.sharp
Assignee | ||
Comment 18•10 years ago
|
||
Attachment #8471683 -
Attachment is obsolete: true
Comment 19•10 years ago
|
||
Comment on attachment 8474156 [details] [diff] [review] Check if the last character of the string is part of a surrogate pair and added the test Thanks so much Andrea! I pushed this to fx-team with a couple of minor tweaks (removed a now-redundant comment, fixed whitespace in browser_bug970746.xhtml).
Attachment #8474156 -
Flags: review+
Comment 20•10 years ago
|
||
https://hg.mozilla.org/integration/fx-team/rev/82722ad59732
Points: --- → 5
OS: Linux → All
Hardware: x86_64 → All
Whiteboard: [mentor wanted] p=5
Target Milestone: --- → Firefox 34
Comment 21•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/82722ad59732
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Updated•10 years ago
|
Iteration: --- → 34.2
Flags: qe-verify?
Updated•10 years ago
|
Flags: qe-verify? → qe-verify+
Comment 22•10 years ago
|
||
Steps for verification: - run the STR in comment 0 both with the testcase attached here (attachment 8401038 [details]) and the URL from comment 8: data:text/html,<p>aaaaaaaaaaaaaaa%26%23x1f525%3B</p>
Comment 23•10 years ago
|
||
Reproduced the initial issue on Firefox 28 (build ID: 20140314220517) Confirming the fix on Firefox 34 beta 10 (build ID: 20141117202603) using the STR from comment 0 and the URL from comment 8.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•