Highlighting text with ASCII char causes line width for selection to be incorrect and bouncing text

RESOLVED FIXED

Status

Core Graveyard
GFX: Gtk
RESOLVED FIXED
14 years ago
10 years ago

People

(Reporter: Brian 'netdragon' Bober, Assigned: blizzard)

Tracking

(Blocks: 1 bug)

Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(5 attachments, 4 obsolete attachments)

(Reporter)

Description

14 years ago
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040808 Firefox/0.9.3

Fedora Core 2 Linux

Go to URL http://www.jjj.de/shell/colors.sh.txt
Highlight all text with this character in it: 
Keep moving the mouse up and down and watch the text bounce.
Try to select that character individually, and see that it isn't fully selected.

The comments here really move around:
UNDERSCORE='[04m'  # only works in xterms
BLINK='[05m'       # doesn't work in xterms

Could the width of that character is incorrectly reported?
(Reporter)

Comment 1

14 years ago
Created attachment 159531 [details]
Screen shot of page

Here is a screenshot of how the page looks while selecting text.
(Reporter)

Comment 2

14 years ago
Created attachment 159532 [details]
The text file

Here is the file, in case it disappears.
(Reporter)

Comment 3

14 years ago
This affects 1.0 PR on Windoze too. Although, in this case, you can select the
entire character, but the whole line can't be selected. This character also
wreaks havok in textareas on Windows it seems.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
is that an xft build or x11core?

this component is definitely wrong, trying gfx
Assignee: guifeatures → blizzard
Component: XP Apps: GUI Features → GFX: Gtk
QA Contact: ian
(Reporter)

Comment 5

14 years ago
I'm not sure. The Linux version was whatever rpm was available with Fedora 2
(Christopher Blizzard should know since he's at Redhat) :-)
(Reporter)

Comment 6

14 years ago
This appears to be ascii char 27, escape character (Now it's so obvious since
it's an escape sequence)
Summary: Highlighting text with Unicode char? causes line width for selection to be incorrect and bouncing text → Highlighting text with ASCII char causes line width for selection to be incorrect and bouncing text

Comment 7

14 years ago
The screenshot was taken with an Xft build. It's odd that there is a problem
with our fallback rendering and selection (I thought we did the right thing with
reporting the width.)
 Ccing rbs to see what Gfx:Win does.

Remotely related to this bug (if related at all) is bug 229896. More closely
related is bug 205387 and bugs referred to there. 

Comment 8

14 years ago
I am seeing that Seamonkey "jiggles" with Gfx:Win too (as per comment 3). When 
you select, the text gets slightly shorter, when you deselect, it returns to 
its width. Result: a "jiggling" (or "bouncing") effect.

> I thought we did the right thing with reporting the width.
It is intriguing indeed.
(Assignee)

Comment 9

14 years ago
The only think that I can think of is that we've got some rounding errors and/or
we do extra text measurement when we shouldn't be.  For the pango work that I'm
doing I've been thinking about how to fix selection.  It's just completely
unusable right now with shaped languages like arabic.  We're doing some Very
Wrong Things.

Comment 10

14 years ago
(In reply to comment #9)
> It's just completely
> unusable right now with shaped languages like arabic.  We're doing some Very
> Wrong Things.

We already know what's wrong there. See bug 229896 and the international release
notes (http://www.mozilla.org/releases/mozilla1.7/known-issues-int.html)

<quote>
The cursor movement and selection/highlighting don't work as expected, yet. It
takes multiple keystrokes to delete or move by what's usually perceived as a
unit in complex scripts because Mozilla works in terms of Unicode characters
instead of graphemes. For the same reason, the boundaries of the text
selection/highlighting can fall between Unicode characters making up a single
grapheme. (Bugs 79286, 85420, 100173, 122552, 157534, 229896 )
</quote>
(Assignee)

Updated

14 years ago
Blocks: 214715
(Assignee)

Comment 11

14 years ago
Created attachment 166371 [details] [diff] [review]
patch that implements proper text selection using pango

This patch implements proper text selection using grapheme cluster boundaries,
including keyboard and mouse selection and updates the text frame code to
properly display selection using that information when it's available.
(Assignee)

Updated

14 years ago
Attachment #166371 - Flags: review?(jshin)
(Assignee)

Comment 12

14 years ago
Created attachment 166797 [details] [diff] [review]
patch that fixes problems with keyboard nav at end of line in textareas

Updated patch.
Attachment #166371 - Attachment is obsolete: true
(Assignee)

Updated

14 years ago
Attachment #166371 - Flags: review?(jshin)
(Assignee)

Updated

14 years ago
Attachment #166797 - Flags: superreview?(roc)
Attachment #166797 - Flags: review?(jshin)
I have a question about graphemes. What are we supposed to do if someone writes
  <span class="foo">X</span>Y
where XY together form a grapheme, and we have style rules for .foo? Say, the
text-decoration or text color is set specially on X?

Comment 14

14 years ago
That's been discussed extensively on the Unicode list and hadn't been yet
resolved/settled at least by the last time I checked.

Sorry for the delay in review. I'll get to this sometime this week.
We need it to be settled, because the answer will have a big impact on how the
text code evolves going forward.
Jungshik, if there is no settled answer yet, it would still be useful to know
what you think the answer will be, or what it *should* be.
It would also be useful to know what other software does in this kind of situation.

Comment 18

14 years ago
Comment on attachment 166797 [details] [diff] [review]
patch that fixes problems with keyboard nav at end of line in textareas

Overall, it looks nice. Here are a few points:

1. layout part needs smontagu to take a second look. it's a bit hard to review
with too little context ('-p -u8' would have helped). 

2. Nit : in various places, 'language(s)' is used, but 'script(s)' would make
more sense. Hindi written in Devanagari takes a great care to avoid splittng
grapheme cluster apart, but Hindi written in Latin letters takes less care
(assuming it doesn't use any combining diacritic marks)
(In mozilla source tree, we fail to make that distinction in some places - e.g.
langGroup had better be script(Group) in most cases)

3. Nit2: 'aInx' for 'index' seems a bit odd. How about spelling it out
('aIndex') or using a more common 'aIdx'?

4. Surrogate check can/should be more robust. Does Pango's utf16->utf8
conversion API return null if the input string has an orphaned (unpaired)
surrogate code point? If so, the current check would be fine. Otherwise, in
nsFontMetricsPango, it has to be checked whether a high surrogate is followed
by a low surrogate. In layout (it's not a part of what you changed), this check
is for sure necessary. 

5. We have to think a bit more as to how this fits in with a grand scheme of
things (bug 276079, bug 229896, bug 205476, bug 121540, bug 157967, bug 218887
and other bugs listed at
http://www.mozilla.org/releases/mozilla1.7.5/known-issues-int.html#display ).
I'm not 100% sure, but I guess this approach can coexist with what we'll have
to resolve bug 229896. So, I'm putting r, but Simon's second look would be
nice.
Attachment #166797 - Flags: review?(jshin) → review+

Comment 19

14 years ago
(In reply to comment #16)
> Jungshik, if there is no settled answer yet, it would still be useful to know
> what you think the answer will be, or what it *should* be.

In most cases, I think '<span>c1</span>c2' should be treated as two separate
graphemes even though 'c1' and 'c2' (when not separated by <span>) form a single
grapheme. By enclosing 'c1' with <span> and separating from 'c2', the author of
a document must mean that. 

An often-cited case against this interpretation is '<span style="color:
blue">c1</span><span style="color: red">c2</span>' where the author of a
document wants to color different parts of a single grapheme with different
colors. This may be useful in some language teaching material. Currently, the
font technologies (Pango/Uniscribe + Opentype-GSUB/GPOS and ATSUI+AAT font)
don't make it easy (if not impossible) to do this kind of rendering, I believe. 
 
I'll check what Safari and MSIE do. 
I'll get to this in a day or two
(In reply to comment #19)
> An often-cited case against this interpretation is '<span style="color:
> blue">c1</span><span style="color: red">c2</span>' where the author of a
> document wants to color different parts of a single grapheme with different
> colors. This may be useful in some language teaching material. Currently, the
> font technologies (Pango/Uniscribe + Opentype-GSUB/GPOS and ATSUI+AAT font)
> don't make it easy (if not impossible) to do this kind of rendering, I
> believe. 

They can always use images. I don't think this should determine our strategy. I
like your position.
(Assignee)

Comment 22

14 years ago
> 4. Surrogate check can/should be more robust. Does Pango's utf16->utf8
> conversion API return null if the input string has an orphaned (unpaired)
> surrogate code point? If so, the current check would be fine.

Yes.  Pango will return null if you pass an invalid unicode string (including an
unpaired pair.)  Layout used to do this more often than it does now, which is
why I discovered that we needed that check at all.
(Assignee)

Comment 23

14 years ago
jshin, did you happen to figure out what MSIE and safari do with this?
Status: NEW → ASSIGNED

Comment 24

14 years ago
Created attachment 170734 [details]
a very simple test case

This is a very simple test case (I'll make a more sophisticated test case
later). There are two lines of text written in Devanagari. The first line is
U+0915 (Devanagari Letter KA) followed by U+094D (Devanagari Virma) both of
which are in a single span. In the second line, they're enclosed in two
separate <span></span> with inline styles (color: blue and color: red).

To my great surprise, Safari renders both lines identically in terms of the
shape while in the second line U+0915 is rendered in blue and U+094D is
rendered in red.  So, I was wrong to say that AAT/ATSUI can't do this.
Apparently, they can....  Whether Safari is right is another matter.

To check MS IE, I have to reboot my machine to Windows. I'll do that later.
MSIE renders both lines with the same positioning. In the second line the base
character and the diacritic are both colored blue. The behaviour is much the
same in http://www.qsm.co.il/Hebrew/HebrewTest/color.htm
What if the spans use a different size, or a different font?

Is this Devanagari pair a true ligature in the font, or is the renderer just
composing a base character with a diacritic programmatically?

I wonder what should happen (or does happen in IE) if you have spans with
different fonts and the characters would be a ligature in the first font but not
in the second font.

Comment 27

14 years ago
(In reply to comment #26)
> What if the spans use a different size, or a different font?

I'll try that. 

Another test we can do is how line breaking is handled for cases like 
'<span>verylong</span><span>word</span>' ('verylongword' is just made of Latin
letters in the ASCII range) in MS IE and Safari. 

> Is this Devanagari pair a true ligature in the font, or is the renderer just
> composing a base character with a diacritic programmatically?

It depends. 'End-users'(?) are not supposed to know about that details
....Apparently, ATSUI has 'attributed/annotated' strings (Pango has something
similar).
 
> I wonder what should happen (or does happen in IE) if you have spans with
> different fonts and the characters would be a ligature in the first font but not
> in the second font.

This test is a bit hard to do unless I have the latest version of Uniscribe DLL
(which may only be available with the purchase of MS Office/Word 2005?) that
supports opentype GSUB/GPOS for Latin. I might try something similar with
Devanagari.  

See also bug 157967
I think we should do what IE seems to be doing, and treat each grapheme as
belonging to the span that contains its first character.
(Assignee)

Comment 29

14 years ago
Created attachment 174072 [details] [diff] [review]
patch updated to trunk including selection and other fixes

This patch finishes support for using pango for rendering and adds support for
grapheme-based mouse and keyboard selection.  Also contains some other random
fixes which happen to be in my tree.  Nothing too scary.
Attachment #166797 - Attachment is obsolete: true
Attachment #174072 - Flags: review?(roc)
+   * This will fill in the aClusterStarts array with information about
+   * what characters are the start of clusters for display.

Could you say what the information *is*? It appears to be just an array of
booleans. In that case you should probably make it a PRUint8*. And please
document that the array length must be the same as the length of the string,
aLength.

+                         PRUint32 &aInx) = 0;

Just return the value from the function, and -1 if it failed. You might want to
say whether coordinate 0 is counted as part of the first character or to the
left of the first character.

+        if (IS_HIGH_SURROGATE(aText[pos])) {
+            aClusterStarts[pos] = 1;

Is this correct? There are no clusters that contain plane-1 characters after the
first position?

+                layoutRun->glyphs->glyphs[i].geometry.width = mPangoSpaceWidth;

Is Owen happy about code like this?

+    found = pango_layout_xy_to_index(layout, localCoord, 0,
+                                     &inx, &trailing);

So Pango might take the y-coordinate into account. I assume there are funky
scripts, or kerning situations, where characters overlap horizontally and the y
coordinate might be relevant. So why don't we pass a Y coordinate into GetPosition?

+    if (!found) {
+        if (inx = 0)
+            aInx = 0;

So we don't distinguish between "off the left of the string" and "in the first
character". I think we should. I guess we should return -1 for "off the left"
and something else for failure.

Overall this code looks slow, lots of dynamic allocation and presumably
recomputation of layout in Pango. Let's not optimize it now, but keep it in mind.

+    // Convert the width into app units
+    for (int i = 0; i < n_ranges; i++) {
+        aWidth += (ranges[(i * 2) + 1] - ranges[(i * 2)]) / PANGO_SCALE;
+    }

I don't understand why this is the right thing to do. Isn't the width just the
difference between the leftmost range edge and the rightmost range edge? Can't
ranges overlap? I guess you're not really drawing using the range positions, so
I don't know how much of the Pango layout we're really using.

> Index: gfx/src/gtk/nsGCCache.cpp

What are these changes all about?

+          // Fill in the cluster hint information, if it's available.
+          nsCOMPtr<nsIRenderingContext> acx;
+          PRUint32 clusterHint = 0;
+
+          nsIPresShell *shell = aPresContext->GetPresShell();
+          if (shell) {
+            shell->CreateRenderingContext(this, getter_AddRefs(acx));
+
+            // Find the font metrics for this text
+            SetFontFromStyle(acx, mStyleContext);
+
+            if (acx)
+              acx->GetHints(clusterHint);
+            clusterHint &= NS_RENDERING_HINT_TEXT_CLUSTERS;
+          }
+
+          if (clusterHint) {
+            acx->GetClusterInfo(paintBuffer.mBuffer, (PRUint32)textLength,
(PRUint32 *)clusterBuffer.mBuffer);
+          }
+          else {
+            memset(clusterBuffer.mBuffer, 1, sizeof(PRInt32) * textLength);
+          }
+

Please move this to its own function.

Overall I like this patch a lot.
(Assignee)

Comment 31

14 years ago
(In reply to comment #30)
> +   * This will fill in the aClusterStarts array with information about
> +   * what characters are the start of clusters for display.
> 
> Could you say what the information *is*? It appears to be just an array of
> booleans. In that case you should probably make it a PRUint8*. And please
> document that the array length must be the same as the length of the string,
> aLength.

OK, I've added more information in the description.

> 
> +                         PRUint32 &aInx) = 0;
> 
> Just return the value from the function, and -1 if it failed. You might want to
> say whether coordinate 0 is counted as part of the first character or to the
> left of the first character.

OK, yeah, that makes sense.  Added.

> 
> +        if (IS_HIGH_SURROGATE(aText[pos])) {
> +            aClusterStarts[pos] = 1;
> 
> Is this correct? There are no clusters that contain plane-1 characters after the
> first position?

That's just dealing with one-half of a surrogate pair while walking the original
UTF-16 string.  Code that uses the return should actually ignore that value,
assuming it's walking the UTF-16 string in the same way.

> 
> +                layoutRun->glyphs->glyphs[i].geometry.width = mPangoSpaceWidth;
> 
> Is Owen happy about code like this?

I wouldn't say "happy."  But he's OK with it.  Until we actually fix Mozilla's
layout code to handle spaces better it's what we're going to have to use.  It
doesn't create pango-compatible layout, but it does mean that everyone agrees
about how spaces are used.

> 
> +    found = pango_layout_xy_to_index(layout, localCoord, 0,
> +                                     &inx, &trailing);
> 
> So Pango might take the y-coordinate into account. I assume there are funky
> scripts, or kerning situations, where characters overlap horizontally and the y
> coordinate might be relevant. So why don't we pass a Y coordinate into
GetPosition?

I don't think that's an issue, actually.  Should give you the info in any case.

> 
> +    if (!found) {
> +        if (inx = 0)
> +            aInx = 0;
> 
> So we don't distinguish between "off the left of the string" and "in the first
> character". I think we should. I guess we should return -1 for "off the left"
> and something else for failure.

Actually, we do handle both of those cases.  We're only interested in what
cluster position the cursor is closest to.  The upstream code doesn't care if
it's off the left or over the left hand side of the cluster.  It doesn't draw a
difference (or doesn't appear to, anyway.)  So '0' means that it's off the left
hand side, Length + 1 means it's off the right hand side.

> 
> Overall this code looks slow, lots of dynamic allocation and presumably
> recomputation of layout in Pango. Let's not optimize it now, but keep it in mind.

Yeah, it's slow.  Noticably slow.  But it works in all the languages I've tested
except Etruscan (which I suspect is a problem in Mozilla, not pango.)

> 
> +    // Convert the width into app units
> +    for (int i = 0; i < n_ranges; i++) {
> +        aWidth += (ranges[(i * 2) + 1] - ranges[(i * 2)]) / PANGO_SCALE;
> +    }
> 
> I don't understand why this is the right thing to do. Isn't the width just the
> difference between the leftmost range edge and the rightmost range edge? Can't
> ranges overlap? I guess you're not really drawing using the range positions, so
> I don't know how much of the Pango layout we're really using.

I checked and ranges don't overlap, so that's safe but it's a good thought. 
I'll do some testing with changing that as you suggest - I suspect that that
will work (that might have been grafted from an older version of the code that
found the beginning and end of the string in that loop instead of letting
get_x_ranges() do the hard work.)

> 
> > Index: gfx/src/gtk/nsGCCache.cpp
> 
> What are these changes all about?

Oh, yeah.  Those are pretty important.  Gdk keeps a copy of the GC state around
so it can be applied to RENDER operations, which don't use the standard GC;
Changing the GC underneath the Gdk object can cause a lot of subtle problems.

> 
> +          // Fill in the cluster hint information, if it's available.
> +          nsCOMPtr<nsIRenderingContext> acx;
[..]
> 
> Please move this to its own function.

Done.

> 
> Overall I like this patch a lot.

I'll upload a new version for your review shortly.
(Assignee)

Comment 32

14 years ago
Created attachment 175455 [details] [diff] [review]
patch #2 after roc's comments

Yeah, you were right - you can do that range calculation in one pass.  Updated
patch attached.
Attachment #174072 - Attachment is obsolete: true
Attachment #175455 - Flags: superreview?(roc)
Attachment #175455 - Flags: review?(roc)
(Assignee)

Updated

14 years ago
Attachment #174072 - Flags: review?(roc)
(In reply to comment #31)
> (In reply to comment #30)
> > +   * This will fill in the aClusterStarts array with information about
> > +   * what characters are the start of clusters for display.
> > 
> > Could you say what the information *is*? It appears to be just an array of
> > booleans. In that case you should probably make it a PRUint8*. And please
> > document that the array length must be the same as the length of the string,
> > aLength.
> 
> OK, I've added more information in the description.

Can you make it a PRUint8* too please?

> > +    found = pango_layout_xy_to_index(layout, localCoord, 0,
> > +                                     &inx, &trailing);
> > 
> > So Pango might take the y-coordinate into account. I assume there are funky
> > scripts, or kerning situations, where characters overlap horizontally and
> > the y coordinate might be relevant. So why don't we pass a Y coordinate into
> > GetPosition?
> 
> I don't think that's an issue, actually.  Should give you the info in any
> case.

Please pass y into GetPosition and pass it through to pango_layout_xy, even if
we don't currently know that it matters.

> > +    if (!found) {
> > +        if (inx = 0)
> > +            aInx = 0;
> > 
> > So we don't distinguish between "off the left of the string" and "in the
> > first character". I think we should. I guess we should return -1 for "off
> > the left" and something else for failure.
> 
> Actually, we do handle both of those cases.  We're only interested in what
> cluster position the cursor is closest to.  The upstream code doesn't care if
> it's off the left or over the left hand side of the cluster.  It doesn't draw
> a difference (or doesn't appear to, anyway.)  So '0' means that it's off the
> left hand side, Length + 1 means it's off the right hand side.

Alright, document this in the comment for nsIRenderingContext::GetPosition(). I
don't like saying "the upstream code doesn't care", since that might change, but
I guess the textframe itself (or its parent) is responsible for detecting when
the point is in the bounds of the textframe, so this is OK.
Comment on attachment 175455 [details] [diff] [review]
patch #2 after roc's comments

minusing for one more iteration
Attachment #175455 - Flags: superreview?(roc)
Attachment #175455 - Flags: superreview-
Attachment #175455 - Flags: review?(roc)
Attachment #175455 - Flags: review-
(Assignee)

Comment 35

14 years ago
Created attachment 175869 [details] [diff] [review]
patch addressing roc's comments
Attachment #175455 - Attachment is obsolete: true
Attachment #175869 - Flags: superreview?(roc)
Attachment #175869 - Flags: review?(roc)
Comment on attachment 175869 [details] [diff] [review]
patch addressing roc's comments

> (PRUint8 *)clusterBuffer.mBuffer

Lose the unnecessary cast.

+ nscoord aX,
+ nscoord aY

Make this "nsPoint aPt".
Attachment #175869 - Flags: superreview?(roc)
Attachment #175869 - Flags: superreview+
Attachment #175869 - Flags: review?(roc)
Attachment #175869 - Flags: review+

Updated

14 years ago
Blocks: 285718
(Assignee)

Comment 37

13 years ago
Created attachment 181543 [details] [diff] [review]
updated patch

This patch fixes a couple of problems.	Some ascii text was being mis-rendered
with pango enabled.  There were some rounding errors in the code.
(Assignee)

Updated

13 years ago
Attachment #181543 - Flags: superreview?(roc)
Attachment #181543 - Flags: review?(roc)
Attachment #181543 - Flags: superreview?(roc)
Attachment #181543 - Flags: superreview+
Attachment #181543 - Flags: review?(roc)
Attachment #181543 - Flags: review+
(Assignee)

Comment 38

13 years ago
Comment on attachment 181543 [details] [diff] [review]
updated patch

Pretty low risk.  Largely affects pango-only builds.
Attachment #181543 - Flags: approval1.8b2?

Comment 39

13 years ago
Comment on attachment 181543 [details] [diff] [review]
updated patch

a=asa
Attachment #181543 - Flags: approval1.8b2? → approval1.8b2+
(Assignee)

Comment 40

13 years ago
Checked in.  Marking FIXED.
Status: ASSIGNED → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → FIXED
This caused bug 307537
Depends on: 307537
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.