Closed Bug 961737 Opened 10 years ago Closed 10 years ago

Regression: atk_text_get_text() is broken w.r.t. embedded object characters

Categories

(Core :: Disability Access APIs, defect)

x86_64
Linux
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla30

People

(Reporter: jdiggs, Assigned: surkov)

References

(Blocks 1 open bug)

Details

Attachments

(4 files, 2 obsolete files)

Steps to reproduce:
1. Load the attached accessible-event listener in a terminal
2. (Re)load the test case (attachment 827653 [details])
3. Compare the results from Firefox 24 and nightly.

Things that implement AtkText and claim to have text (string, character count) used to report that text when using atk_get_text_at_offset(). That seems to no longer be the case when the text consists of one or more embedded object characters. Only getting a character works. See results below.

I am admittedly not a fan of digging down via the accessible text interface trying to piece together lines and implementing an Orca-controlled caret for Gecko content. ;) BUT, given that this seems necessary, whatever change is responsible for this new behavior, it breaks stuff for Orca. And even if, as part of the Great Orca Rewrite I'm doing, I solve this in Orca, that's not going to solve things for users of previous versions of Orca. So I'm going to assume this is an accidental regression, but if it is by design I still think it needs to be reversed for those "long term stable release" folks.

==================
Firefox 24 results
==================
-> [document frame | List item bug] (0, 137, 1080, 1756)
   textinfo: offset: 0 length: 1 string: [OBJ]
   lineinfo: <[OBJ]> (0, 1) (8, 145, 1064, 52)
   wordinfo: <[OBJ]> (0, 1) (8, 145, 1064, 52)
   charinfo: <[OBJ]> (0, 1) (8, 145, 1064, 52)
    -> [list | ] (8, 145, 1064, 52)
       textinfo: offset: -1 length: 2 string: [OBJ]       
       lineinfo: <[OBJ]> (0, 1) (48, 145, 1024, 26)
       wordinfo: <[OBJ]> (0, 1) (48, 145, 1024, 26)
       charinfo: <[OBJ]> (0, 1) (48, 145, 1024, 26)
        -> [list item | • Item 1] (32, 145, 1040, 26)
           textinfo: offset: -1 length: 7 string: •Item 1
           lineinfo: <•Item 1> (0, 7) (32, 145, 65, 26)
           wordinfo: <1> (6, 7) (88, 145, 9, 26)
           charinfo: <•> (0, 1) (32, 155, 8, 10)
        -> [list item | • Item 2] (32, 171, 1040, 26)
           textinfo: offset: -1 length: 7 string: •Item 2
           lineinfo: <•Item 2> (0, 7) (32, 171, 71, 26)
           wordinfo: <2> (6, 7) (93, 171, 10, 26)
           charinfo: <•> (0, 1) (32, 181, 8, 10)

===============
Nightly results
===============
-> [document frame | List item bug] (0, 110, 1080, 1783)
   textinfo: offset: 0 length: 1 string: [OBJ]
   lineinfo: <> (-1, -1) (0, 0, 0, 0)
   wordinfo: <> (0, 0) (0, 0, 0, 0)
   charinfo: <[OBJ]> (0, 1) (8, 118, 1064, 52)
    -> [list | ] (8, 118, 1064, 52)
       textinfo: offset: -1 length: 2 string: [OBJ]
       lineinfo: <> (-1, -1) (0, 0, 0, 0)
       wordinfo: <> (1, 1) (0, 0, 0, 0)
       charinfo: <[OBJ]> (0, 1) (48, 118, 1024, 26)
        -> [list item | • Item 1] (32, 118, 1040, 26)
           textinfo: offset: -1 length: 7 string: •Item 1
           lineinfo: <•Item 1> (0, 7) (32, 118, 65, 26)
           wordinfo: <> (1, 1) (0, 0, 0, 0)
           charinfo: <•> (0, 1) (32, 128, 16, 10)
        -> [list item | • Item 2] (32, 144, 1040, 26)
           textinfo: offset: -1 length: 7 string: •Item 2
           lineinfo: <•Item 2> (0, 7) (32, 144, 71, 26)
           wordinfo: <•> (0, 1) (32, 154, 16, 10)
           charinfo: <•> (0, 1) (32, 154, 16, 10)
I think it's reasonable if text consisting of embedded char returns the char when line is requested
Not sure what wanted behavior is though. Say if we have

<body>
  <p>first paragraph</p>
  <p>second paragrahp</p>
</body>

so the document text is two embed characters. Say if the first paragraph takes multiple lines then what is getTextAtOffset(0, LINE_BOUNDARY) supposed to return? Should it be (0, 1) or (0, 0)?
Flags: needinfo?(jamie)
(In reply to alexander :surkov from comment #2)
> so the document text is two embed characters. Say if the first paragraph
> takes multiple lines then what is getTextAtOffset(0, LINE_BOUNDARY) supposed
> to return? Should it be (0, 1) or (0, 0)?
It should be (0, 1), which indicates that the embedded object consumes at least one entire line. (0, 0) is a collapsed range suggesting that the line contains nothing, which is incorrect.
Flags: needinfo?(jamie)
Attached patch part1: patch1 (obsolete) — Splinter Review
part1: fix a generic (non list) case
Assignee: nobody → surkov.alexander
Attachment #8363303 - Flags: review?(trev.saunders)
Attached patch part1:patch2Splinter Review
Attachment #8363303 - Attachment is obsolete: true
Attachment #8363303 - Flags: review?(trev.saunders)
Attachment #8363305 - Flags: review?(trev.saunders)
Comment on attachment 8363305 [details] [diff] [review]
part1:patch2

>+    addTextOffset = static_cast<uint32_t>(
>+      aIsEndOffset && (addTextOffset > 0 || descendantAcc->IndexInParent() > 0));

imho that's a really hard way to read that.

if (aIsEndOffset)
  addOffset = addOffset > 0 || child->IndexInParent() > 0 ? 1 : 0;
else
  addOffset = 0

is a lot clearer even though its longer.  It also seems like it would be nice if addOffset could be a bool if we should add one or not, not a number to add.

Also that's going to make that loop a good bit slower,  maybe check if its the first content child? I guess that's not a great idea.  Maybe we should just finally devirtualize GetChildAt() and then use FirstChild()

>+      testTextAtOffset(["ht_5" ], BOUNDARY_LINE_START,
>+                       [ [ 0, 0, kEmbedChar, 0, 1 ] ]);

why not test BOUNDARY_LINE_END isn't that what you fix?
Attachment #8363305 - Flags: review?(trev.saunders) → review+
(In reply to Trevor Saunders (:tbsaunde) from comment #6)
> Comment on attachment 8363305 [details] [diff] [review]
> part1:patch2
> 
> >+    addTextOffset = static_cast<uint32_t>(
> >+      aIsEndOffset && (addTextOffset > 0 || descendantAcc->IndexInParent() > 0));
> 
> imho that's a really hard way to read that.
> 
> if (aIsEndOffset)
>   addOffset = addOffset > 0 || child->IndexInParent() > 0 ? 1 : 0;
> else
>   addOffset = 0
> 
> is a lot clearer even though its longer.

ok if you want

>  It also seems like it would be
> nice if addOffset could be a bool if we should add one or not, not a number
> to add.

addTextOffset may also be used to keep an offset in text node

> Also that's going to make that loop a good bit slower,  maybe check if its
> the first content child? I guess that's not a great idea.  Maybe we should
> just finally devirtualize GetChildAt() and then use FirstChild()

instead IndexInParent? not sure

> >+      testTextAtOffset(["ht_5" ], BOUNDARY_LINE_START,
> >+                       [ [ 0, 0, kEmbedChar, 0, 1 ] ]);
> 
> why not test BOUNDARY_LINE_END isn't that what you fix?

original test is LINE_START, iirc _END consts were finally deprecated so I don't longer worry about it
(In reply to alexander :surkov from comment #7)
> (In reply to Trevor Saunders (:tbsaunde) from comment #6)
> > Comment on attachment 8363305 [details] [diff] [review]
> > part1:patch2
> > 
> > >+    addTextOffset = static_cast<uint32_t>(
> > >+      aIsEndOffset && (addTextOffset > 0 || descendantAcc->IndexInParent() > 0));
> > 
> > imho that's a really hard way to read that.
> > 
> > if (aIsEndOffset)
> >   addOffset = addOffset > 0 || child->IndexInParent() > 0 ? 1 : 0;
> > else
> >   addOffset = 0
> > 
> > is a lot clearer even though its longer.
> 
> ok if you want

thx, my eyes bleed on the other form ;)

> >  It also seems like it would be
> > nice if addOffset could be a bool if we should add one or not, not a number
> > to add.
> 
> addTextOffset may also be used to keep an offset in text node

yeah, I guess this is text stuff its doomed to being hard to read.

> > Also that's going to make that loop a good bit slower,  maybe check if its
> > the first content child? I guess that's not a great idea.  Maybe we should
> > just finally devirtualize GetChildAt() and then use FirstChild()
> 
> instead IndexInParent? not sure

yeah, just trying to think of way to make this less slow.

> > >+      testTextAtOffset(["ht_5" ], BOUNDARY_LINE_START,
> > >+                       [ [ 0, 0, kEmbedChar, 0, 1 ] ]);
> > 
> > why not test BOUNDARY_LINE_END isn't that what you fix?
> 
> original test is LINE_START, iirc _END consts were finally deprecated so I
> don't longer worry about it

hm, and windows / jsat don't use them? I guess that's fine, just looks kind of unbalanced.
(In reply to Trevor Saunders (:tbsaunde) from comment #8)

> > ok if you want
> 
> thx, my eyes bleed on the other form ;)

I cannot say I disliked the previous form :)

> > >  It also seems like it would be
> > > nice if addOffset could be a bool if we should add one or not, not a number
> > > to add.
> > 
> > addTextOffset may also be used to keep an offset in text node
> 
> yeah, I guess this is text stuff its doomed to being hard to read.

heritage :)

> > instead IndexInParent? not sure
> 
> yeah, just trying to think of way to make this less slow.

yep, shouldn't be bottleneck though

> > original test is LINE_START, iirc _END consts were finally deprecated so I
> > don't longer worry about it
> 
> hm, and windows / jsat don't use them? I guess that's fine, just looks kind
> of unbalanced.

agree. windows yes, jsat is quite likely too, just patching saving some time for urgent stuff
Whiteboard: [leave open]
Attached patch part2_cleanup:patch1 (obsolete) — Splinter Review
make cleaning up of DOMPointToOffset, the result code is not really equivalent to old one but mochitest suite passes. CharacterCount() as fallback value in DOMPointToOffset is primarily needed for "stranges" of PeekOffset work.
Attachment #8365545 - Flags: review?(trev.saunders)
Attachment #8365545 - Attachment is obsolete: true
Attachment #8365545 - Flags: review?(trev.saunders)
Attachment #8366114 - Flags: review?(trev.saunders)
Attachment #8368950 - Flags: review?(trev.saunders)
Comment on attachment 8366114 [details] [diff] [review]
part2_cleanup:patch2

>+HyperTextAccessible::DOMPointToOffset(nsINode* aNode, int32_t aNodeOffset,
>+                                      bool aIsEndOffset) const
> {
>-  if (!aHyperTextOffset)
>-    return nullptr;
>-  *aHyperTextOffset = 0;
>-
>   if (!aNode)
>-    return nullptr;
>+    return 0;

when can it happen? can we just ban it?

A>+  // If the given DOM point cannot be mapped into offset relative this hypertext
>+  // offset then return length as fallback value.

when exactly is this needed?

Its really hard to say if this is exactly correct, but it doesn't seem particularly unreasonable.
Attachment #8366114 - Flags: review?(trev.saunders) → review+
(In reply to Trevor Saunders (:tbsaunde) from comment #15)
> >+HyperTextAccessible::DOMPointToOffset(nsINode* aNode, int32_t aNodeOffset,
> >+                                      bool aIsEndOffset) const
> >   if (!aNode)
> >-    return nullptr;
> >+    return 0;
> 
> when can it happen? can we just ban it?

somebody relies on it, I saw crashes

> 
> A>+  // If the given DOM point cannot be mapped into offset relative this
> hypertext
> >+  // offset then return length as fallback value.
> 
> when exactly is this needed?

when you do PeekOffset on input element then you can get outside input element, that was a case iirc

> Its really hard to say if this is exactly correct, but it doesn't seem
> particularly unreasonable.

the thesis is common for text interface stuff, we increase test coverage and one day it will guarantee the correct work
Comment on attachment 8368950 [details] [diff] [review]
part3 (list items):patch1

>+  int32_t offset = aOffset;
>+  Accessible* descendant = aDescendant;
>+  while (descendant && !descendant->IsDoc()) {

shouldn't we always find this before we get to a document? why can't we just assert that?

>+    // HTML list items needs special processing because PeekOffset doesn't work
>+    // with list bullets.

please file a bug to fix it

>+            (aAmount == eSelectBeginLine || aAmount == eSelectLine)) {
>+          if (text != this)
>+            return TransformOffset(text, 0, false);
>+
>+          // If we are in list item text then fall down into original procedure
>+          // since text may be multiline.

ending the block with a comment seems weird on top of this already being ubber ugly

>+        }
>+

extra blank line

>+      } else {
>+        // We assume that bullet is one single char.

can we assert that?

>+          // Ask a text leaf next to the bullet for an offset since list item
>+          // may be multiline.
>+          return aOffset + 1 < CharacterCount() ?

when is this not true?

>-  if (hyperTextOffset == CharacterCount() && aDirection == eDirPrevious)
>-    return 0;
>+  if (aDirection == eDirPrevious) {
>+    // If we reached the end during search, this means we didn't find the DOM point
>+    // and we're actually at the start of the paragraph

when is this ok? if there's text there shouldn't we have accessibles?


can you make sure we test list items with no bullet and test stuff with :before and :after?
Attachment #8368950 - Flags: review?(trev.saunders) → review+
(In reply to Trevor Saunders (:tbsaunde) from comment #17)

> >+  int32_t offset = aOffset;
> >+  Accessible* descendant = aDescendant;
> >+  while (descendant && !descendant->IsDoc()) {
> 
> shouldn't we always find this before we get to a document? why can't we just
> assert that?

it's sounds good

> >+    // HTML list items needs special processing because PeekOffset doesn't work
> >+    // with list bullets.
> 
> please file a bug to fix it

I'm not sure this a bug since keyboard skips list bullets from navigation

> >+            (aAmount == eSelectBeginLine || aAmount == eSelectLine)) {
> >+          if (text != this)
> >+            return TransformOffset(text, 0, false);
> >+
> >+          // If we are in list item text then fall down into original procedure
> >+          // since text may be multiline.
> 
> ending the block with a comment seems weird on top of this already being
> ubber ugly

ok, I'll remove the comment since it doesn't really seem useful


> >+      } else {
> >+        // We assume that bullet is one single char.
> 
> can we assert that?

assert is not really needed since this is obviously wrong assumption (for example in case of list-style-type:upper-roman style). I changed it to comment: "It works only when the bullet is one single char." I will file a bug.

> >+          // Ask a text leaf next to the bullet for an offset since list item
> >+          // may be multiline.
> >+          return aOffset + 1 < CharacterCount() ?
> 
> when is this not true?

no text list items, I added "(if not empty)" after "Ask a text leaf next"

> >-  if (hyperTextOffset == CharacterCount() && aDirection == eDirPrevious)
> >-    return 0;
> >+  if (aDirection == eDirPrevious) {
> >+    // If we reached the end during search, this means we didn't find the DOM point
> >+    // and we're actually at the start of the paragraph
> 
> when is this ok? if there's text there shouldn't we have accessibles?

I guess it's something related with PeekOffset work, it's full of surprises

> can you make sure we test list items with no bullet and test stuff with
> :before and :after?

:before and :after will be broken since PeekOffset presumably don't work with them. I could have a test for no bullet case but it rather belongs to list bullets bug
Whiteboard: [leave open]
https://hg.mozilla.org/mozilla-central/rev/826d951bd6fa
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: