Closed
Bug 1260825
Opened 9 years ago
Closed 9 years ago
utf16 surrogates causes graphite shaper to produce incorrect cluster information.
Categories
(Core :: Graphics: Text, defect)
Tracking
()
RESOLVED
FIXED
mozilla49
Tracking | Status | |
---|---|---|
firefox49 | --- | fixed |
People
(Reporter: firefox, Assigned: jfkthame, NeedInfo)
Details
(Whiteboard: gfx-noted)
Attachments
(7 files)
204.80 KB,
application/x-font-ttf
|
Details | |
6.93 KB,
patch
|
Details | Diff | Splinter Review | |
871 bytes,
patch
|
Details | Diff | Splinter Review | |
1.90 KB,
patch
|
Details | Diff | Splinter Review | |
442 bytes,
text/html
|
Details | |
1.30 KB,
patch
|
jrmuizel
:
review+
|
Details | Diff | Splinter Review |
116.36 KB,
patch
|
jrmuizel
:
review+
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/48.0.2564.116 Chrome/48.0.2564.116 Safari/537.36
Steps to reproduce:
Install attached font
<html>
<body contenteditable='true'>
𖽐
<BR>
𖽐𖽐<BR>
𖽐𖽐𖽐<BR>
𖽐𖽐𖽐𖽐<BR>
aaaa<BR>
𖽐𖼊<BR>
𖽐𖼊𖽪<BR>
</body>
</html>
Actual results:
Strange selection behaviour due to Firefox treating non ligature glyphs as ligatures dues to UTF16 surrogates.
Expected results:
A single non Ligature glyph should not be treated as a ligature just because its character is encoded as UTF16 surrogates.
Reporter | ||
Comment 1•9 years ago
|
||
Unittest showing problem with graphite shaper with UTF16 surrogates.
Test case uses font "Miao Unicode"
(For my build I had to copy gfxGraphiteShaper.h, gfxHarfBuzzShaper.h, gfxPlatformFontList.h
to obj*/dist/include)
Run test command : "./mach gtest Shaper.*"
Reporter | ||
Comment 2•9 years ago
|
||
Proof of Concept patch.
Makes unittest pass + fixes Firefox selection behaviour.
Comment 3•9 years ago
|
||
I think the problem lies in this line (268?):
if (clusters[cIndex].baseChar + clusters[cIndex].nChars < after + 1) {
clusters[cIndex].nChars = after + 1 - clusters[cIndex].baseChar;
with the after + 1. That needs to take into account the increased size. I think we need something along the lines of this monster:
const int nAfter = gr_cinfo_base(gr_seg_cinfo(aSegment, gr_slot_after(slot)) + 1);
if (clusters[cIndex].baseChar + clusters[cIndex].nChars < nAfter) {
clusters[cIndex].nChars = nAfter - clusters[cIndex].baseChar;
A bit more of a mouthful, but hopefully fixes the problem cleanly for all sizes of a codepoint.
Comment 4•9 years ago
|
||
Eat your own dogfood Martin. OK here's the updated section of the example/clusters.c
unsigned int nAfter = gr_slot_after(is) + 1;
unsigned int cAfter = nAfter < numCodePoints ?
gr_cinfo_base(gr_seg_cinfo(seg, nAfter)) : lenstr;
if (clusters[ci].base_char + clusters[ci].num_chars < cAfter)
clusters[ci].num_chars = cAfter - clusters[ci].base_char;
where lenstr is the length of the string in code units (not characters, which is what numCodePoints is).
See: 2e8e41c1606633b0f02b0e6634e886c80496bc31
Reporter | ||
Comment 5•9 years ago
|
||
Thanks for comments.
I'm assuming "Eat you own dogfood" comment in the next reply - means that this isn't expected to work.
But just to confirm this: the following code doesn't fix the the unittest fail nor fix the selection in the test case example. (The code is slightly modified to fix what I assumed to be a typo - so the +1 doesn't fail at incrementing incomplete ptr type.)
const int nAfter = gr_cinfo_base(gr_seg_cinfo(aSegment, gr_slot_after(slot))) + 1;
if (clusters[cIndex].baseChar + clusters[cIndex].nChars < nAfter) {
clusters[cIndex].nChars = nAfter - clusters[cIndex].baseChar;
(In reply to martin_hosken from comment #3)
> I think the problem lies in this line (268?):
>
> if (clusters[cIndex].baseChar + clusters[cIndex].nChars < after + 1) {
> clusters[cIndex].nChars = after + 1 - clusters[cIndex].baseChar;
>
> with the after + 1. That needs to take into account the increased size. I
> think we need something along the lines of this monster:
>
> const int nAfter = gr_cinfo_base(gr_seg_cinfo(aSegment,
> gr_slot_after(slot)) + 1);
> if (clusters[cIndex].baseChar + clusters[cIndex].nChars < nAfter) {
> clusters[cIndex].nChars = nAfter - clusters[cIndex].baseChar;
>
> A bit more of a mouthful, but hopefully fixes the problem cleanly for all
> sizes of a codepoint.
Reporter | ||
Comment 6•9 years ago
|
||
Assuming this was a pattern I should follow, I produced this (inefficient proof of concept) code.
uint32_t numCodePoints = aLength;
for(uint32_t i = 0; i < aLength; i++)
if (NS_IS_LOW_SURROGATE(aText[i])) numCodePoints--;
unsigned int nAfter = gr_slot_after(slot) + 1;
unsigned int cAfter = nAfter < numCodePoints ?
gr_cinfo_base(gr_seg_cinfo(aSegment, nAfter)) : aLength;
if (clusters[cIndex].baseChar + clusters[cIndex].nChars < cAfter)
clusters[cIndex].nChars = cAfter - clusters[cIndex].baseChar;
which replaced:
// extend cluster if necessary to reach the glyph's "after" index
if (clusters[cIndex].baseChar + clusters[cIndex].nChars < after + 1) {
clusters[cIndex].nChars = after + 1 - clusters[cIndex].baseChar;
With this new code, the unittest passes and the selection is fixed in the test case example.
(In reply to martin_hosken from comment #4)
> Eat your own dogfood Martin. OK here's the updated section of the
> example/clusters.c
>
> unsigned int nAfter = gr_slot_after(is) + 1;
> unsigned int cAfter = nAfter < numCodePoints ?
> gr_cinfo_base(gr_seg_cinfo(seg, nAfter)) : lenstr;
> if (clusters[ci].base_char + clusters[ci].num_chars < cAfter)
> clusters[ci].num_chars = cAfter - clusters[ci].base_char;
>
> where lenstr is the length of the string in code units (not characters,
> which is what numCodePoints is).
>
> See: 2e8e41c1606633b0f02b0e6634e886c80496bc31
Reporter | ||
Comment 7•9 years ago
|
||
Comment 8•9 years ago
|
||
By 'eat my own dogfood' I was meaning that the first code fragment I gave in comment #3 turned out to be faulty when I added it to the cluster.c example code in the graphite codebase, but that the code in comment #4 seems to at least not make things worse, but is taken straight from the cluster.c.
You say that that code from comment #3 didn't fix the bug. I agree. It probably didn't. But did adapted code from comment $4 without the loop scanning surrogates, work? I'm intrigued.
TIA.
Reporter | ||
Comment 9•9 years ago
|
||
I did try and send it to the try server - so you could see for your self (and my result would be verified)
https://hg.mozilla.org/try/rev/c736647d5f10924561e3ee0589f91be3a3cbbe59
but the try server seems to be currently broken.
(In reply to martin_hosken from comment #8)
> By 'eat my own dogfood' I was meaning that the first code fragment I gave in
> comment #3 turned out to be faulty when I added it to the cluster.c example
> code in the graphite codebase, but that the code in comment #4 seems to at
> least not make things worse, but is taken straight from the cluster.c.
>
> You say that that code from comment #3 didn't fix the bug. I agree. It
> probably didn't. But did adapted code from comment $4 without the loop
> scanning surrogates, work? I'm intrigued.
I'm guess I'm not understand you correctly. Yes the latest patch I posted based on your code example in comment #4 worked. (unittest passed and selection is test case was no longer broken) The surrogate scanning was done in the calculation of numCodePoints. (Since I couldn't find any readily available function to tell me the number of codepoints in char16_t* str)
>
> TIA.
Reporter | ||
Comment 10•9 years ago
|
||
Updated•9 years ago
|
Whiteboard: gfx-noted
Reporter | ||
Updated•9 years ago
|
Attachment #8736755 -
Flags: review?(jfkthame)
Assignee | ||
Comment 11•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Attachment #8748553 -
Attachment mime type: text/plain → text/html
Assignee | ||
Comment 12•9 years ago
|
||
I'm not sure how well I understand this yet, but can we do a simpler fix that just adds a check for surrogate pairs where needed, without adding a whole new loop over the text? AFAICS, this fixes the problematic behavior; are there other cases where issues would still arise?
Attachment #8748564 -
Flags: feedback?(firefox)
Reporter | ||
Comment 13•9 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #12)
> Created attachment 8748564 [details] [diff] [review]
> Alternative patch to add check for surrogate pairs
>
> I'm not sure how well I understand this yet, but can we do a simpler fix
> that just adds a check for surrogate pairs where needed, without adding a
> whole new loop over the text? AFAICS, this fixes the problematic behavior;
> are there other cases where issues would still arise?
This seems very similar to the original proof of concept patch I submitted:
https://bug1260825.bmoattachments.org/attachment.cgi?id=8736402
So from my perspective, it's likely to fix all the issues I'm aware of.
However I understood (maybe wrongly) from martin's comment #4 that a different fix approach was required. (which is what I tried to do with:
https://bugzilla.mozilla.org/attachment.cgi?id=8736755&action=edit
)
Thanks!
Flags: needinfo?(martin_hosken)
Assignee | ||
Comment 14•9 years ago
|
||
Note that in gecko we're dealing with UTF-16, and therefore the only case where code units ≠ characters is when we have a surrogate pair; we don't need to deal with the more complex variable-length characters that you'd find in UTF-8. I think Martin's example code aims to support any of the encoding forms, but we don't actually need that, so the simpler check for surrogate pairs should be sufficient.
Assignee | ||
Updated•9 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reporter | ||
Comment 15•9 years ago
|
||
To be clear - the patch from comment #14 solves this issue for me.
What needs to be done to get this patch into firefox?
Does the unittest from comment #1 need cleaned up and included in the patch?
Thanks.
Assignee | ||
Comment 16•9 years ago
|
||
I was hoping Martin would take another look, too, and either confirm that for our purposes this patch should be fine, or explain why we need to do something more complex.
I've just pinged him by email to see if he has any comments to offer.
Comment 17•9 years ago
|
||
Shooting from the hip. Yes the SURROGATE test would do the job just fine. I'm just wondering about variation selectors. In Graphite they are treated as full characters, so shouldn't be an issue. But always worth a 'hmm must think that through'. Will take a closer look tomorrow, if you want.
Assignee | ||
Updated•9 years ago
|
Attachment #8736755 -
Flags: review?(jfkthame)
Assignee | ||
Updated•9 years ago
|
Attachment #8748564 -
Flags: feedback?(firefox) → review?(jmuizelaar)
Assignee | ||
Comment 18•9 years ago
|
||
(In reply to Tom Hindle from comment #15)
> Does the unittest from comment #1 need cleaned up and included in the patch?
Given that the test requires installing a specific font on the test system, I'm not sure it's worth trying to include at this point; the patch here is simple enough that I think we can take it based on manual verification that it fixes the selection issues.
I guess it may be possible to create a reftest testcase for the selection highlighting, which could use @font-face to load the required test font...
Assignee | ||
Comment 19•9 years ago
|
||
Here's a reftest based on the strings from comment 0; with current code, it fails because we measure the <span> with the cyan background incorrectly and highlight varying widths depending on the following text. The patch here fixes this so that we correctly identify the extent of the span in all cases.
Attachment #8753789 -
Flags: review?(jmuizelaar)
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED
Updated•9 years ago
|
Attachment #8748564 -
Flags: review?(jmuizelaar) → review+
Updated•9 years ago
|
Attachment #8753789 -
Flags: review?(jmuizelaar) → review+
Assignee | ||
Comment 20•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/a43b0bd1d0807baf826aa2ac8adb3eaee3447e94
Bug 1260825 - Check for surrogate pairs when processing clusters in graphite-shaped text. r=jrmuizel
https://hg.mozilla.org/integration/mozilla-inbound/rev/1bcf7733815714ed9dfa5675b99c12369c5dfbd3
Bug 1260825 - Reftest for highlighting of text involving surrogate pairs with graphite shaping. r=jrmuizel
Comment 21•9 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/a43b0bd1d080
https://hg.mozilla.org/mozilla-central/rev/1bcf77338157
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox49:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
You need to log in
before you can comment on or make changes to this bug.
Description
•