Open Bug 453605 Opened 14 years ago Updated 1 month ago

text.getAttributes() can take an extraordinarily long time to return given a sufficiently large text object.

Categories

(Core :: Disability Access APIs, defect, P3)

defect

Tracking

()

People

(Reporter: jdiggs, Unassigned)

References

(Blocks 2 open bugs)

Details

(Keywords: access, perf)

Attachments

(2 files)

Steps to reproduce:

1. Load the test case
2. Launch Accerciser
3. In Accerciser's tree of accessibles, highlight the document
   frame (note, Accerciser might take a few seconds to respond)
4. In Accerciser's iPython console type the following:

    text = acc.queryText()<Return>
    text.getAttributes(0)<Return>

Expected results: getAttributes() would return instantly.

Actual results: getAttributes() takes at least 6 seconds and sometimes 8 seconds to return.

Specs of the box where I tried this: AMD Athlon 64 X2 (Dual Core) 4200+ with 4GB RAM.

Granted, to highlight the issue I did place all of the text from "War and Peace" in the document frame object which is rather extreme. :-) But please bear with me BECAUSE:

1. There aren't even any text attributes present other than the
   default. All I did was copy and paste the contents and add
   <br /> tags. It's taking 6-8 seconds on a box with a dual
   core processor and 4 GB RAM to find out that, sure enough,
   there are no non-default text attributes. :-)

2. If you paste all of the content into a Gedit document and
   repeat the test, getAttributes() returns *instantly.*

3. If you open the test case in Gedit (with syntax highlighting
   enabled) and spot-check different offsets, getAttributes()
   returns *instantly* (correctly reporting the attributes).

Thus it would seem to be a Firefox/Mozilla bug. That leaves the question of, does this actually occur "in the wild"? The answer seems to be "yes and no." Michael Pedersen (who I believe is on the browser-china-atf alias, and who should feel free to chime in) has pointed out examples of eBooks (Bookshare as I recall) and also certain search results from the Library of Congress' National Library Service (aka the Braille and Talking Book folks) digital download catalog.

Due to various and sundry issues with passwords, copyright, and the like, I couldn't make a test case based on those. However, it seems that there are at least a healthy number of eBooks with sufficiently large text objects to make this a problem. Not an 6-8 second problem, but, say, a 1-2 second problem. Still too long for getAttributes() to return I would argue. 

In addition, considering real-world cases, given a line with a single bold word in the middle, there are three segments of text for which attributes must be obtained rather than one. Add another bold word somewhere else in the middle of that line, and now we're up to five segments. If getAttributes() took only a quarter of a second to return, that's still over a full second to get the text attributes for one line of text with a couple of bold words. :-(

Sorry to be long-winded in my report, and thanks VERY much in advance for taking a look at it!!
Heh. Bugzilla wouldn't let me upload it as html -- or even zipped up -- due to size. :-) So I put it on my server instead. Your test case choices are:

* http://grain-of-salt.com/foo/wnp.html.zip
* http://grain-of-salt.com/foo/wnp.html

Thanks again!!
No longer blocks: textattra11y
Keywords: perf
Thanks Alexander. My plan was to comment on those bugs instead of filing this one. But I had initially made the 274 instances of the string "war" "<b>war</b>" and in doing so managed to hang Accerciser, hang Firefox, and cause the AT-SPI registry daemon to segfault by asking for the text attributes. :-) After that, I figured it seemed worthy of a dedicated bug. ;-)
Keywords: perf
Confirmed also on Windows. It takes 4 to 5 seconds on my Intel Core2Duo 2.67 gHz with 3 GB of RAM and Windows XP using AccProbe.
OS: Linux → All
Hardware: PC → All
Do I understand right you call text.getAttributes on document accessible?
Yes and no.... We call text.getAttributes on the accessible which contains the text of interest. Often that's a paragraph or a section or a heading or .... However, we still find plenty of pages where the text of interest is not inside any such object but instead directly in the page's body, i.e. document frame. In that case, yes indeed we'd wind up calling text.getAttributes on the document accessible.
Ok. So
1) Run through accessible tree (may help a bit because a11y tree is subset of DOM tree)
2) Do not walk into embedded char object (should bring big help but see bug 445677 comment #5)
3) Do not walk whole subtree every time for every text attribute (I suppose it should bring main performance win, in several times)
Flags: blocking1.9.1?
Keywords: perf
I don't see this as blocking 1.9.1.  Please re-nom if you disagree.  1.9.1 is a time-based release with a clear set of features defining the release.  This seems like an edge case that wouldn't hold back the release.
Flags: blocking1.9.1? → blocking1.9.1-
Joanie, can you explain again why you have to consume all text attributes at the doc level for support of text attributes in Orca?
What's the status of this bug, since bug 475522 was fixed?
(In reply to comment #10)
> What's the status of this bug, since bug 475522 was fixed?

NEW, we need to fix bug 445677 firstly.
(In reply to comment #11)
> (In reply to comment #10)
> > What's the status of this bug, since bug 475522 was fixed?
> 
> NEW, we need to fix bug 445677 firstly.

Bug 445677 patch is ready, it makes this testcase faster in 8 times. Originally it takes 4 seconds, after it takes .5 seconds on my machine.
It stills takes ~4 secs with trunk from today. I did a simple profiling with sysprof and most of the ff4 cpu time is spent on call(s) to nsTextAttrsMgr::GetRange, and internally, that function seems to spend most of the cpu time in in Equal and TextLength funcs.

I'm attaching the sysprof xml log and a quick screenshot
(In reply to comment #15)
> Created attachment 505071 [details]
> Screenshot of profiling data for quick view

Notice that those numbers are for the whole cpu time and firefox cpu time is 87,19
Fer, did you profile on a non-debug build?
good catch. Repeating with optimized non-debug build

It appears that this Accerciser is no longer a usable application, but I imagine that the original issue would be related to how fast ORCA would take in a very large text page before starting to read it. Am I right?

How do you think it would be best to confirm (or verify) it? Thanks, James!

Flags: needinfo?(jteh)

Accerciser should still be usable. That said, you could also test this with NVDA by doing the following:

  1. Load a large document containing only text and <br> tags. (The original test case from comment 2 doesn't appear to be available any more, unfortunately.)
  2. Focus the document.
  3. Press NVDA+control+z to open the NVDA Python console.
  4. Enter this command:
    focus.IAccessibleTextObject.attributes(0)

Earlier comments in this bug suggest that this should have improved as compared to the original state, but it might still be a problem. I can understand why that might be the case; when an Accessible contains a lot of text nodes, we have to scan all the text nodes to figure out where the attributes change. The caching project we're working on (bug 1694563) may improve the responsiveness of these calls, but it might also cause some content process jank during page load.

Flags: needinfo?(jteh)

(In reply to Joanmarie Diggs from comment #0)

In addition, considering real-world cases, given a line with a single bold
word in the middle, there are three segments of text for which attributes
must be obtained rather than one. Add another bold word somewhere else in
the middle of that line, and now we're up to five segments. If
getAttributes() took only a quarter of a second to return, that's still over
a full second to get the text attributes for one line of text with a couple
of bold words. :-(

This issue would only occur where there are a lot of consecutive text nodes with the exact same formatting. As soon as there is a node with different formatting (e.g. bold), we can stop comparing nodes for formatting equality. A further mitigating factor here is that in most real world cases of formatted text (including bold), the text would usually be split into paragraphs or the like as well. Paragraphs means separate Accessibles, which means attributes won't have to scan text outside the paragraph in question.

Wouldn't know where to get a "proper" test page from, but fortunately, there's another attachment in comment 14.
On the other hand, it does not seem to cover the same case that you've described because it does not contain <br> tags.

The page is about a 3MB XML document that would take forever to scroll to the bottom.
It contains a lot of objects, like this:

<objects>
    <object id="5776">
        <name>"In file /lib/libdbus-1.so.3.5.2"</name>
        <total>1</total>
        <self>0</self>
    </object>

And also lots of nodes:

<nodes>
    <node id="1">
        <object>2</object>
        <siblings>0</siblings>
        <children>3</children>
        <parent>0</parent>
        <total>5910</total>
        <self>0</self>
        <toplevel>1</toplevel>
    </node>

If I perform the steps you wrote in comment 20 for the attachment described above:

  1. Lunch NVDA and the browser.
  2. Load the large XML file in the browser.
    Notice issue: Upon focus, the NVDA will read the local address in the address bar and then say "loading document" and then but the browser becomes unresponsive for a large amount of time (~15 minutes or more!!!). In the same situation but with Ubuntu and ORCA, the browser remains unresponsive for a smaller amount of time (under 1 min).
  3. Press hotkey NVDA + CTRL + Z.
    Notice: NVDA Python Console is being opened.
  4. Paste this command ( focus.IAccessibleTextObject.attributes(0) ) into the console;
    Notice: It gives out the following response instantly:
    (0, 1, None)

In conclusion, assuming that the original issue was that the NVDA python console took a long time to give out a response, it appears that this issue does not occur with NVDA and Nightly v97.0a1 or ESR v91.3esr.

Questions:

  1. Is the XML file described above valid to test with?
  2. Do you think the issue with NVDA taking a long time to load is properly addressed already in another bug? Do you know which one that is?
  3. Is the console response a correct one?
  4. How can I test the same case for ORCA (Ubuntu) and/or VoiceOver (MacOS)?
  5. How do you think this bug should be updated?

Sorry for the wall of text, James. Please help.

Flags: needinfo?(jteh)

(In reply to Bodea Daniel [:danibodea] from comment #22)

Wouldn't know where to get a "proper" test page from, but fortunately, there's another attachment in comment 14.
On the other hand, it does not seem to cover the same case that you've described because it does not contain <br> tags.

...

  1. Is the XML file described above valid to test with?

I don't believe that document is a valid test case for this bug as described. It may have been once - we might have changed the way we view XML documents - but even if that were true, it isn't any longer.

  1. Do you think the issue with NVDA taking a long time to load is properly addressed already in another bug? Do you know which one that is?

There's no exact match, but issues like this will hopefully be addressed by our Cache the World project (bug 1694563). Performance bugs that will probably be improved as a result of that work are tracked as blocked by bug 1737192.

  1. Is the console response a correct one?

For this document, yes.

  1. How can I test the same case for ORCA (Ubuntu) and/or VoiceOver (MacOS)?

For Orca, you'd need to use Accerciser. I'm not sure why that wasn't usable for you; I haven't tried it in a while, but it did work for me a couple of years ago. For VoiceOver, I'm not sure.

  1. How do you think this bug should be updated?

Given comment 21, I've dropped the severity and priority of this bug. If there's any impact on real world usage, I think it'd be extremely rare.

Severity: major → S4
Flags: needinfo?(jteh)
Priority: -- → P3
Flags: needinfo?(daniel.bodea)

Based on the last comment, I will retire from this thread.

Flags: needinfo?(daniel.bodea)
You need to log in before you can comment on or make changes to this bug.