Closed Bug 384101 Opened 18 years ago Closed 8 years ago

text.getTextAtOffset broken for TEXT_BOUNDARY_LINE_START

Categories

(Core :: Disability Access APIs, defect)

defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: wwalker, Assigned: ginnchen+exoracle)

References

(Depends on 1 open bug, Blocks 2 open bugs, )

Details

(Keywords: access, Whiteboard: [auto-closed:inactivity])

In GNOME bugzilla bug http://bugzilla.gnome.org/show_bug.cgi?id=355525, I was tracking down why a certain feature of Orca wasn't working correctly.  It turns out that the URL I was using as a test case (http://bugzilla.gnome.org/attachment.cgi?id=83911) contained a document whose text included embedded object and new line characters.

To help debug this in Orca, I added the following code to examine the text of the document frame.  This code merely just goes character by character through the text of the document frame, calling getTextAtOffset for each character position: 

        if accessible.role == rolenames.ROLE_DOCUMENT_FRAME:
            for i in range(0, length):
                character = self.script.getText(accessible, i, i + 1)
                if character == self.script.EMBEDDED_OBJECT_CHARACTER:
                    character = "EMBEDDED_OBJECT_CHARACTER"
                elif character == "\n":
                    character = "\\n"
                print "%d. '%s'" % (i, character)
                [string, startOffset, endOffset] = text.getTextAtOffset(
                    i,
                    atspi.Accessibility.TEXT_BOUNDARY_LINE_START)
                print "  line(%d, %d) = '%s'" \
                      % (startOffset, endOffset, string)

For each character in the text for the document frame, the output tells us what
the index of the character is, what the character itself is, and what Gecko
thinks the line is for that character, including the start and end offset for
the line.  Things seem to start failing around character 19, which is the 'T' that begins the line "This sentence is bold." Instead of failing as it did, I would expect getTextAtOffset for a value of TEXT_BOUNDARY_START to return the entire line.  Here's the sample output:

0. 'EMBEDDED_OBJECT_CHARACTER'
  line(0, 2) = '
'
1. '\n'
  line(0, 2) = '
'
2. '\n'
  line(2, 3) = '
'
3. 'EMBEDDED_OBJECT_CHARACTER'
  line(3, 4) = ''
4. 'EMBEDDED_OBJECT_CHARACTER'
  line(5, 18) = 'Text Formats
'
5. 'T'
  line(5, 18) = 'Text Formats
'
6. 'e'
  line(5, 18) = 'Text Formats
'
7. 'x'
  line(5, 18) = 'Text Formats
'
8. 't'
  line(5, 18) = 'Text Formats
'
9. ' '
  line(5, 18) = 'Text Formats
'
10. 'F'
  line(5, 18) = 'Text Formats
'
11. 'o'
  line(5, 18) = 'Text Formats
'
12. 'r'
  line(5, 18) = 'Text Formats
'
13. 'm'
  line(5, 18) = 'Text Formats
'
14. 'a'
  line(5, 18) = 'Text Formats
'15. 't'
  line(5, 18) = 'Text Formats
'
16. 's'
  line(5, 18) = 'Text Formats
'
17. '\n'
  line(5, 18) = 'Text Formats
'
18. '\n'
  line(18, 19) = '
'
19. 'T'
  line(18, 20) = '
T'
20. 'h'
  line(18, 19) = '
'
21. 'i'
  line(18, 19) = '
'
22. 's'
  line(18, 19) = '
'
23. ' '
  line(18, 19) = '
'
24. 's'
  line(18, 19) = '
'
25. 'e'
  line(18, 19) = '
'
26. 'n'
  line(18, 19) = '
'
27. 't'
  line(18, 19) = '
'
28. 'e'
  line(18, 19) = '
'
29. 'n'
  line(18, 19) = '
'
30. 'c'
  line(18, 19) = '
'
31. 'e'
  line(18, 19) = '
'
32. ' '
  line(18, 19) = '
'
33. 'i'
  line(18, 19) = '
'
34. 's'
  line(18, 19) = '
'
35. ' '
  line(18, 19) = '
'
36. 'b'
  line(18, 19) = '
'
37. 'o'
  line(18, 19) = '
'
38. 'l'
  line(18, 19) = '
'
39. 'd'
  line(18, 19) = '
'
40. '.'
  line(18, 19) = '
'
41. 'EMBEDDED_OBJECT_CHARACTER'
  line(41, 42) = ''
Assignee: nobody → aaronleventhal
Component: Disability Access → Disability Access APIs
Product: Firefox → Core
QA Contact: disability.access → accessibility-apis
An oddity can be seen using Accerciser on the test page (2nd link in opening comment).  The accessible at 0 4 8 0 0 2 is a ghost accessible (not a link) with no role or text.  In addition, the second and third lines of text are not shown in the accessible tree.
After examining the markup, I suspect the nasty <br> bug is to blame.
Here's a JS version of the problem with some of the unnecessary stuff stripped out:
http://www.mozilla.org/access/qa/linestest/jstest
Correctly spelled:
http://www.mozilla.org/access/qa/linetest/jstest

I'm working on this bug now, but if you want to look at the JS testcase, wait until the mozilla.org website updates (about 45 minutes).
Interesting, I didn't think newlines in the source affect anything unless it's a <pre> or styled with white-space: pre.

This has the problem:
<b><br>
</b><b>123</b>

This doesn't:
<b><br></b><b>123</b>

The top one has an extra newline.
In fact the <b> around 123 doesn't matter, so this has the problem too:

<b><br>
</b>123

That's the smallest testcase I can find which still has the problem.
The problem also exists if there is a space instead of a newline, after the <br>.
<b><br> </b>123
I think the issue is in DOMPointToOffset()

We're giving it 0 for content offset.
Broken case: it returns 0 for hyperTextOffset
Working case: it returns 1 for hyperTextOffset
In other words, we're not getting past the <br> in the hypertext.
Yikes! Gecko thinks the start of the line is before the <br>.

That causes another problem -- if you go into the middle of the line and hit Home, it jumps up a line!
Filed bug 384452 on the core Gecko problem causing this.
Depends on: 384452
Assignee: aaronleventhal → ginn.chen
If we're going to fix this in Firefox 3 we might need to find a way around bug 384452 -- e.g. look for the incorrect results an fix them in our a11y module.
Whiteboard: orca:normal
Aaron - as part of our performance improvement work for Orca, we were considering using the getTextAtOffset for TEXT_BOUNDARY_LINE_START.  I think I recall you saying at the Boston 2007 summit that getTextAtOffset was now working.

Has this bug been fixed?
This is the last known case that is broken, and it's because caret navigation is also broken for this. It's not easy to fix this apparently. But, I still think it would be far better for Orca to use getTextAtOffset() and add error checking, because the code is stable now, and much faster than what you're currently doing with character bounds.

If you want you can make it an experimental pref at first, and keep your current code around.
I'm changing the whiteboard status from orca:normal to orca:urgent.  I'm in the midst of converting Orca away from using extents to obtain line contents, and it would be nice to minimize (eliminate?) the need for special-casing things.  

Thanks in advance!
Whiteboard: orca:normal → orca:urgent
Aaron, are you working on this bug?
Ginn, actually no. I think I told Will that I could but I will not deal with core selection internals right now.
I'm not sure if the following is another instance of the same bug or something different, so I'm starting here. :-)

If you go to: https://buy.garmin.com/shop/shop.do?cID=134 and examine any of the table cells that contain checkboxes using Accerciser, you should find the following to be true for TEXT_BOUNDARY_LINE_START:

* getTextAtOffset() works as expected given:
    * an offset of 0 (link that's the image)
    * an offset of 1 (link that contains a section)

* getTextAtOffset() with an offset of 2 (the checkbox) causes the
  full cell contents to be returned.
Friendly ping. :-)
Blocks: texta11y
OS: Linux → All
Hardware: PC → All
Flags: in-testsuite?
AUTO-CLOSED. This bug untouched for over 2000 days. Please reopen if you can confirm the bug and help it progress.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Whiteboard: orca:urgent → [auto-closed:inactivity]
You need to log in before you can comment on or make changes to this bug.