Open Bug 1141181 Opened 9 years ago Updated 2 months ago

Implement support for the sentence boundary/granularity in AtkText.

Categories

(Core :: Disability Access APIs, task)

x86_64
Linux
task

Tracking

()

People

(Reporter: jdiggs, Unassigned)

References

(Blocks 2 open bugs)

Details

Currently atk_text_get_text_at_offset returns '', 0, 0 for ATK_TEXT_BOUNDARY_SENTENCE_START. And presumably the new AtkText granularity does/will as well. It would be helpful to have this support implemented for the purposes of SayAll by sentence and navigation by sentence in Orca.

Thanks!
See Also: → 1667219
Type: defect → task
Severity: normal → S3

Jamie: This bug is almost nine years old. I'd like to remove Orca's hackaround to piece together sentences for Gecko. Any chance this could be prioritized? Thanks in advance!

As far as I know, Gecko's text navigation framework doesn't have a concept of sentences at all, since there's no standard command to navigate by sentence, select by sentence, etc. If we added something to the accessibility API for that, we'd effectively be making it up just for accessibility. I also have no idea how we'd do that in a way which is correct across languages.

Out of interest, I looked at how Chrome does this. It looks like it does have a way of iterating sentences. I'm not sure if it internally uses ICU or not. However, it's fairly easy to break this:
data:text/html,Hi. This <span>is a test.</span>
Even though the HyperTextAccessible contains "Hi. This is a test." with no embedded objects, Chrome breaks this into three sentences: "Hi.", "This ", "is a test.".
This is because it only looks at the text of the current node, since sentence iterators require full blocks of text. If we're going to do this, we're either going to need a framework which can handle partial sentences or we're going to need to pass the entire text between block boundaries. The latter is pretty scary for performance; we might end up passing millions of lines of text.

I hadn't realized sentences being broken at a span. That said, I don't know how bothered I am by it or how difficult it would be to fix it in Chromium without impacting performance. I'll try find some time to dig into that.

In the meantime, if I said I would be happy with something which matches what Chromium does currently, would you be ok with that?

At this stage, I don't really understand what Chrome's sentence iterator does internally; i.e. whether it just wraps ICU or whether it does other stuff too. So, I don't know how feasible this is in Gecko. As I said, we don't have a sentence iterator, since we don't use it for anything else.

We'd most likely need to implement this in core code, rather than just in our ATK layer, so I'm not super happy with this being a half-baked implementation that only considers the current text node. A lot of web content is split over many text nodes, so I'm not sure how useful that would be anyway. I guess we could try to find some way of doing this in an ATK specific way, maybe an ifdef or something.

That's fair regarding not wanting to do something half-baked. And I've waited this long, so I can certainly wait a bit longer for a proper solution that you all feel good about.

Related aside: I did notice on this issue that a related Mac issue was cited, namely bug 1667219.

Related question: Do Windows screen readers not do a SayAll by sentence, or do they all just figure out the sentence boundaries themselves?

Because sentence boundaries tend to be inconsistently supported on Windows (I'm looking at Firefox but also other things here :) ), NVDA has a hacky sentence boundary implementation which is only used for say all. It is very biased towards languages derived from Latin alphabets. However, because it's only used for say all, it doesn't really matter too much if it's wrong. It just means that there might be some slightly unnatural breaks during say all, since NVDA will break at a certain number of lines (10 I think?) if it hasn't found a sentence break before then.

I'm not sure what JAWS does, but it's probably similar.

I'll try to look into ICU at some point to see how feasible that would be for this purpose.

We have mozilla::intl::SentenceBreakIteratorUtf16. It does need a complete string though, so we'll need to work out how to make it work across Accessibles. The easiest solution would be to pass the entire paragraph, but I do worry about performance.

You need to log in before you can comment on or make changes to this bug.