Closed Bug 260663 Opened 21 years ago Closed 20 years ago

Highlighting text with ASCII char causes line width for selection to be incorrect and bouncing text

Categories

(Core Graveyard :: GFX: Gtk, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: netdragon, Assigned: blizzard)

References

(Blocks 1 open bug, )

Details

Attachments

(5 files, 4 obsolete files)

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040808 Firefox/0.9.3 Fedora Core 2 Linux Go to URL http://www.jjj.de/shell/colors.sh.txt Highlight all text with this character in it: Keep moving the mouse up and down and watch the text bounce. Try to select that character individually, and see that it isn't fully selected. The comments here really move around: UNDERSCORE='[04m' # only works in xterms BLINK='[05m' # doesn't work in xterms Could the width of that character is incorrectly reported?
Attached image Screen shot of page
Here is a screenshot of how the page looks while selecting text.
Attached file The text file
Here is the file, in case it disappears.
This affects 1.0 PR on Windoze too. Although, in this case, you can select the entire character, but the whole line can't be selected. This character also wreaks havok in textareas on Windows it seems.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
is that an xft build or x11core? this component is definitely wrong, trying gfx
Assignee: guifeatures → blizzard
Component: XP Apps: GUI Features → GFX: Gtk
QA Contact: ian
I'm not sure. The Linux version was whatever rpm was available with Fedora 2 (Christopher Blizzard should know since he's at Redhat) :-)
This appears to be ascii char 27, escape character (Now it's so obvious since it's an escape sequence)
Summary: Highlighting text with Unicode char? causes line width for selection to be incorrect and bouncing text → Highlighting text with ASCII char causes line width for selection to be incorrect and bouncing text
The screenshot was taken with an Xft build. It's odd that there is a problem with our fallback rendering and selection (I thought we did the right thing with reporting the width.) Ccing rbs to see what Gfx:Win does. Remotely related to this bug (if related at all) is bug 229896. More closely related is bug 205387 and bugs referred to there.
I am seeing that Seamonkey "jiggles" with Gfx:Win too (as per comment 3). When you select, the text gets slightly shorter, when you deselect, it returns to its width. Result: a "jiggling" (or "bouncing") effect. > I thought we did the right thing with reporting the width. It is intriguing indeed.
The only think that I can think of is that we've got some rounding errors and/or we do extra text measurement when we shouldn't be. For the pango work that I'm doing I've been thinking about how to fix selection. It's just completely unusable right now with shaped languages like arabic. We're doing some Very Wrong Things.
(In reply to comment #9) > It's just completely > unusable right now with shaped languages like arabic. We're doing some Very > Wrong Things. We already know what's wrong there. See bug 229896 and the international release notes (http://www.mozilla.org/releases/mozilla1.7/known-issues-int.html) <quote> The cursor movement and selection/highlighting don't work as expected, yet. It takes multiple keystrokes to delete or move by what's usually perceived as a unit in complex scripts because Mozilla works in terms of Unicode characters instead of graphemes. For the same reason, the boundaries of the text selection/highlighting can fall between Unicode characters making up a single grapheme. (Bugs 79286, 85420, 100173, 122552, 157534, 229896 ) </quote>
Blocks: 214715
This patch implements proper text selection using grapheme cluster boundaries, including keyboard and mouse selection and updates the text frame code to properly display selection using that information when it's available.
Attachment #166371 - Flags: review?(jshin)
Updated patch.
Attachment #166371 - Attachment is obsolete: true
Attachment #166371 - Flags: review?(jshin)
Attachment #166797 - Flags: superreview?(roc)
Attachment #166797 - Flags: review?(jshin)
I have a question about graphemes. What are we supposed to do if someone writes <span class="foo">X</span>Y where XY together form a grapheme, and we have style rules for .foo? Say, the text-decoration or text color is set specially on X?
That's been discussed extensively on the Unicode list and hadn't been yet resolved/settled at least by the last time I checked. Sorry for the delay in review. I'll get to this sometime this week.
We need it to be settled, because the answer will have a big impact on how the text code evolves going forward.
Jungshik, if there is no settled answer yet, it would still be useful to know what you think the answer will be, or what it *should* be.
It would also be useful to know what other software does in this kind of situation.
Comment on attachment 166797 [details] [diff] [review] patch that fixes problems with keyboard nav at end of line in textareas Overall, it looks nice. Here are a few points: 1. layout part needs smontagu to take a second look. it's a bit hard to review with too little context ('-p -u8' would have helped). 2. Nit : in various places, 'language(s)' is used, but 'script(s)' would make more sense. Hindi written in Devanagari takes a great care to avoid splittng grapheme cluster apart, but Hindi written in Latin letters takes less care (assuming it doesn't use any combining diacritic marks) (In mozilla source tree, we fail to make that distinction in some places - e.g. langGroup had better be script(Group) in most cases) 3. Nit2: 'aInx' for 'index' seems a bit odd. How about spelling it out ('aIndex') or using a more common 'aIdx'? 4. Surrogate check can/should be more robust. Does Pango's utf16->utf8 conversion API return null if the input string has an orphaned (unpaired) surrogate code point? If so, the current check would be fine. Otherwise, in nsFontMetricsPango, it has to be checked whether a high surrogate is followed by a low surrogate. In layout (it's not a part of what you changed), this check is for sure necessary. 5. We have to think a bit more as to how this fits in with a grand scheme of things (bug 276079, bug 229896, bug 205476, bug 121540, bug 157967, bug 218887 and other bugs listed at http://www.mozilla.org/releases/mozilla1.7.5/known-issues-int.html#display ). I'm not 100% sure, but I guess this approach can coexist with what we'll have to resolve bug 229896. So, I'm putting r, but Simon's second look would be nice.
Attachment #166797 - Flags: review?(jshin) → review+
(In reply to comment #16) > Jungshik, if there is no settled answer yet, it would still be useful to know > what you think the answer will be, or what it *should* be. In most cases, I think '<span>c1</span>c2' should be treated as two separate graphemes even though 'c1' and 'c2' (when not separated by <span>) form a single grapheme. By enclosing 'c1' with <span> and separating from 'c2', the author of a document must mean that. An often-cited case against this interpretation is '<span style="color: blue">c1</span><span style="color: red">c2</span>' where the author of a document wants to color different parts of a single grapheme with different colors. This may be useful in some language teaching material. Currently, the font technologies (Pango/Uniscribe + Opentype-GSUB/GPOS and ATSUI+AAT font) don't make it easy (if not impossible) to do this kind of rendering, I believe. I'll check what Safari and MSIE do.
I'll get to this in a day or two
(In reply to comment #19) > An often-cited case against this interpretation is '<span style="color: > blue">c1</span><span style="color: red">c2</span>' where the author of a > document wants to color different parts of a single grapheme with different > colors. This may be useful in some language teaching material. Currently, the > font technologies (Pango/Uniscribe + Opentype-GSUB/GPOS and ATSUI+AAT font) > don't make it easy (if not impossible) to do this kind of rendering, I > believe. They can always use images. I don't think this should determine our strategy. I like your position.
> 4. Surrogate check can/should be more robust. Does Pango's utf16->utf8 > conversion API return null if the input string has an orphaned (unpaired) > surrogate code point? If so, the current check would be fine. Yes. Pango will return null if you pass an invalid unicode string (including an unpaired pair.) Layout used to do this more often than it does now, which is why I discovered that we needed that check at all.
jshin, did you happen to figure out what MSIE and safari do with this?
Status: NEW → ASSIGNED
This is a very simple test case (I'll make a more sophisticated test case later). There are two lines of text written in Devanagari. The first line is U+0915 (Devanagari Letter KA) followed by U+094D (Devanagari Virma) both of which are in a single span. In the second line, they're enclosed in two separate <span></span> with inline styles (color: blue and color: red). To my great surprise, Safari renders both lines identically in terms of the shape while in the second line U+0915 is rendered in blue and U+094D is rendered in red. So, I was wrong to say that AAT/ATSUI can't do this. Apparently, they can.... Whether Safari is right is another matter. To check MS IE, I have to reboot my machine to Windows. I'll do that later.
MSIE renders both lines with the same positioning. In the second line the base character and the diacritic are both colored blue. The behaviour is much the same in http://www.qsm.co.il/Hebrew/HebrewTest/color.htm
What if the spans use a different size, or a different font? Is this Devanagari pair a true ligature in the font, or is the renderer just composing a base character with a diacritic programmatically? I wonder what should happen (or does happen in IE) if you have spans with different fonts and the characters would be a ligature in the first font but not in the second font.
(In reply to comment #26) > What if the spans use a different size, or a different font? I'll try that. Another test we can do is how line breaking is handled for cases like '<span>verylong</span><span>word</span>' ('verylongword' is just made of Latin letters in the ASCII range) in MS IE and Safari. > Is this Devanagari pair a true ligature in the font, or is the renderer just > composing a base character with a diacritic programmatically? It depends. 'End-users'(?) are not supposed to know about that details ....Apparently, ATSUI has 'attributed/annotated' strings (Pango has something similar). > I wonder what should happen (or does happen in IE) if you have spans with > different fonts and the characters would be a ligature in the first font but not > in the second font. This test is a bit hard to do unless I have the latest version of Uniscribe DLL (which may only be available with the purchase of MS Office/Word 2005?) that supports opentype GSUB/GPOS for Latin. I might try something similar with Devanagari. See also bug 157967
I think we should do what IE seems to be doing, and treat each grapheme as belonging to the span that contains its first character.
This patch finishes support for using pango for rendering and adds support for grapheme-based mouse and keyboard selection. Also contains some other random fixes which happen to be in my tree. Nothing too scary.
Attachment #166797 - Attachment is obsolete: true
Attachment #174072 - Flags: review?(roc)
+ * This will fill in the aClusterStarts array with information about + * what characters are the start of clusters for display. Could you say what the information *is*? It appears to be just an array of booleans. In that case you should probably make it a PRUint8*. And please document that the array length must be the same as the length of the string, aLength. + PRUint32 &aInx) = 0; Just return the value from the function, and -1 if it failed. You might want to say whether coordinate 0 is counted as part of the first character or to the left of the first character. + if (IS_HIGH_SURROGATE(aText[pos])) { + aClusterStarts[pos] = 1; Is this correct? There are no clusters that contain plane-1 characters after the first position? + layoutRun->glyphs->glyphs[i].geometry.width = mPangoSpaceWidth; Is Owen happy about code like this? + found = pango_layout_xy_to_index(layout, localCoord, 0, + &inx, &trailing); So Pango might take the y-coordinate into account. I assume there are funky scripts, or kerning situations, where characters overlap horizontally and the y coordinate might be relevant. So why don't we pass a Y coordinate into GetPosition? + if (!found) { + if (inx = 0) + aInx = 0; So we don't distinguish between "off the left of the string" and "in the first character". I think we should. I guess we should return -1 for "off the left" and something else for failure. Overall this code looks slow, lots of dynamic allocation and presumably recomputation of layout in Pango. Let's not optimize it now, but keep it in mind. + // Convert the width into app units + for (int i = 0; i < n_ranges; i++) { + aWidth += (ranges[(i * 2) + 1] - ranges[(i * 2)]) / PANGO_SCALE; + } I don't understand why this is the right thing to do. Isn't the width just the difference between the leftmost range edge and the rightmost range edge? Can't ranges overlap? I guess you're not really drawing using the range positions, so I don't know how much of the Pango layout we're really using. > Index: gfx/src/gtk/nsGCCache.cpp What are these changes all about? + // Fill in the cluster hint information, if it's available. + nsCOMPtr<nsIRenderingContext> acx; + PRUint32 clusterHint = 0; + + nsIPresShell *shell = aPresContext->GetPresShell(); + if (shell) { + shell->CreateRenderingContext(this, getter_AddRefs(acx)); + + // Find the font metrics for this text + SetFontFromStyle(acx, mStyleContext); + + if (acx) + acx->GetHints(clusterHint); + clusterHint &= NS_RENDERING_HINT_TEXT_CLUSTERS; + } + + if (clusterHint) { + acx->GetClusterInfo(paintBuffer.mBuffer, (PRUint32)textLength, (PRUint32 *)clusterBuffer.mBuffer); + } + else { + memset(clusterBuffer.mBuffer, 1, sizeof(PRInt32) * textLength); + } + Please move this to its own function. Overall I like this patch a lot.
(In reply to comment #30) > + * This will fill in the aClusterStarts array with information about > + * what characters are the start of clusters for display. > > Could you say what the information *is*? It appears to be just an array of > booleans. In that case you should probably make it a PRUint8*. And please > document that the array length must be the same as the length of the string, > aLength. OK, I've added more information in the description. > > + PRUint32 &aInx) = 0; > > Just return the value from the function, and -1 if it failed. You might want to > say whether coordinate 0 is counted as part of the first character or to the > left of the first character. OK, yeah, that makes sense. Added. > > + if (IS_HIGH_SURROGATE(aText[pos])) { > + aClusterStarts[pos] = 1; > > Is this correct? There are no clusters that contain plane-1 characters after the > first position? That's just dealing with one-half of a surrogate pair while walking the original UTF-16 string. Code that uses the return should actually ignore that value, assuming it's walking the UTF-16 string in the same way. > > + layoutRun->glyphs->glyphs[i].geometry.width = mPangoSpaceWidth; > > Is Owen happy about code like this? I wouldn't say "happy." But he's OK with it. Until we actually fix Mozilla's layout code to handle spaces better it's what we're going to have to use. It doesn't create pango-compatible layout, but it does mean that everyone agrees about how spaces are used. > > + found = pango_layout_xy_to_index(layout, localCoord, 0, > + &inx, &trailing); > > So Pango might take the y-coordinate into account. I assume there are funky > scripts, or kerning situations, where characters overlap horizontally and the y > coordinate might be relevant. So why don't we pass a Y coordinate into GetPosition? I don't think that's an issue, actually. Should give you the info in any case. > > + if (!found) { > + if (inx = 0) > + aInx = 0; > > So we don't distinguish between "off the left of the string" and "in the first > character". I think we should. I guess we should return -1 for "off the left" > and something else for failure. Actually, we do handle both of those cases. We're only interested in what cluster position the cursor is closest to. The upstream code doesn't care if it's off the left or over the left hand side of the cluster. It doesn't draw a difference (or doesn't appear to, anyway.) So '0' means that it's off the left hand side, Length + 1 means it's off the right hand side. > > Overall this code looks slow, lots of dynamic allocation and presumably > recomputation of layout in Pango. Let's not optimize it now, but keep it in mind. Yeah, it's slow. Noticably slow. But it works in all the languages I've tested except Etruscan (which I suspect is a problem in Mozilla, not pango.) > > + // Convert the width into app units > + for (int i = 0; i < n_ranges; i++) { > + aWidth += (ranges[(i * 2) + 1] - ranges[(i * 2)]) / PANGO_SCALE; > + } > > I don't understand why this is the right thing to do. Isn't the width just the > difference between the leftmost range edge and the rightmost range edge? Can't > ranges overlap? I guess you're not really drawing using the range positions, so > I don't know how much of the Pango layout we're really using. I checked and ranges don't overlap, so that's safe but it's a good thought. I'll do some testing with changing that as you suggest - I suspect that that will work (that might have been grafted from an older version of the code that found the beginning and end of the string in that loop instead of letting get_x_ranges() do the hard work.) > > > Index: gfx/src/gtk/nsGCCache.cpp > > What are these changes all about? Oh, yeah. Those are pretty important. Gdk keeps a copy of the GC state around so it can be applied to RENDER operations, which don't use the standard GC; Changing the GC underneath the Gdk object can cause a lot of subtle problems. > > + // Fill in the cluster hint information, if it's available. > + nsCOMPtr<nsIRenderingContext> acx; [..] > > Please move this to its own function. Done. > > Overall I like this patch a lot. I'll upload a new version for your review shortly.
Attached patch patch #2 after roc's comments (obsolete) — Splinter Review
Yeah, you were right - you can do that range calculation in one pass. Updated patch attached.
Attachment #174072 - Attachment is obsolete: true
Attachment #175455 - Flags: superreview?(roc)
Attachment #175455 - Flags: review?(roc)
Attachment #174072 - Flags: review?(roc)
(In reply to comment #31) > (In reply to comment #30) > > + * This will fill in the aClusterStarts array with information about > > + * what characters are the start of clusters for display. > > > > Could you say what the information *is*? It appears to be just an array of > > booleans. In that case you should probably make it a PRUint8*. And please > > document that the array length must be the same as the length of the string, > > aLength. > > OK, I've added more information in the description. Can you make it a PRUint8* too please? > > + found = pango_layout_xy_to_index(layout, localCoord, 0, > > + &inx, &trailing); > > > > So Pango might take the y-coordinate into account. I assume there are funky > > scripts, or kerning situations, where characters overlap horizontally and > > the y coordinate might be relevant. So why don't we pass a Y coordinate into > > GetPosition? > > I don't think that's an issue, actually. Should give you the info in any > case. Please pass y into GetPosition and pass it through to pango_layout_xy, even if we don't currently know that it matters. > > + if (!found) { > > + if (inx = 0) > > + aInx = 0; > > > > So we don't distinguish between "off the left of the string" and "in the > > first character". I think we should. I guess we should return -1 for "off > > the left" and something else for failure. > > Actually, we do handle both of those cases. We're only interested in what > cluster position the cursor is closest to. The upstream code doesn't care if > it's off the left or over the left hand side of the cluster. It doesn't draw > a difference (or doesn't appear to, anyway.) So '0' means that it's off the > left hand side, Length + 1 means it's off the right hand side. Alright, document this in the comment for nsIRenderingContext::GetPosition(). I don't like saying "the upstream code doesn't care", since that might change, but I guess the textframe itself (or its parent) is responsible for detecting when the point is in the bounds of the textframe, so this is OK.
Comment on attachment 175455 [details] [diff] [review] patch #2 after roc's comments minusing for one more iteration
Attachment #175455 - Flags: superreview?(roc)
Attachment #175455 - Flags: superreview-
Attachment #175455 - Flags: review?(roc)
Attachment #175455 - Flags: review-
Attachment #175455 - Attachment is obsolete: true
Attachment #175869 - Flags: superreview?(roc)
Attachment #175869 - Flags: review?(roc)
Comment on attachment 175869 [details] [diff] [review] patch addressing roc's comments > (PRUint8 *)clusterBuffer.mBuffer Lose the unnecessary cast. + nscoord aX, + nscoord aY Make this "nsPoint aPt".
Attachment #175869 - Flags: superreview?(roc)
Attachment #175869 - Flags: superreview+
Attachment #175869 - Flags: review?(roc)
Attachment #175869 - Flags: review+
Blocks: Persian
Attached patch updated patchSplinter Review
This patch fixes a couple of problems. Some ascii text was being mis-rendered with pango enabled. There were some rounding errors in the code.
Attachment #181543 - Flags: superreview?(roc)
Attachment #181543 - Flags: review?(roc)
Attachment #181543 - Flags: superreview?(roc)
Attachment #181543 - Flags: superreview+
Attachment #181543 - Flags: review?(roc)
Attachment #181543 - Flags: review+
Comment on attachment 181543 [details] [diff] [review] updated patch Pretty low risk. Largely affects pango-only builds.
Attachment #181543 - Flags: approval1.8b2?
Comment on attachment 181543 [details] [diff] [review] updated patch a=asa
Attachment #181543 - Flags: approval1.8b2? → approval1.8b2+
Checked in. Marking FIXED.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
This caused bug 307537
Depends on: 307537
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: