Incorrect results when get_text_at_offset() is used to get the last word on the line in wrapped text
Categories
(Core :: Disability Access APIs, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox67 | --- | fixed |
People
(Reporter: jdiggs, Assigned: samuel.thibault)
References
(Blocks 1 open bug)
Details
Attachments
(6 files, 10 obsolete files)
857 bytes,
text/plain
|
Details | |
283 bytes,
text/html
|
Details | |
2.30 KB,
patch
|
Details | Diff | Splinter Review | |
15.67 KB,
patch
|
samuel.thibault
:
review+
|
Details | Diff | Splinter Review |
17.40 KB,
patch
|
samuel.thibault
:
review+
|
Details | Diff | Splinter Review |
4.82 KB,
patch
|
samuel.thibault
:
review+
|
Details | Diff | Splinter Review |
1. Resize the Firefox window so that the attached test case's text wraps into the following four lines: View this text in a Firefox window so that the text wraps. Having done so, use get_text_at_offset() to get the word at the offset at the beginning of a line. Then repeat the test for the offset immediately before that word's start offset. 2. Start the attached event listener in a terminal. 3. Enable caret navigation in Firefox (F7). 4. Position the caret at the start of line 2, line 3, and line 4. Expected results: The terminal output would show the current word and the previous word. Actual results: The terminal output shows the current word twice. Impact: Orca, which has to implement its own caret navigation for Gecko, gets stuck at the beginning of each line of wrapped text when moving backwards word by word. Notes: 1. I repeated the test in Gedit and Epiphany/WebKitGtk. Both of them work as expected. Output for all three tests provided below. 2. This bug is present in at least versions 23 and 27.0a1 (2013-09-23). Results from performing the steps described above: ============================================== Firefox: Word at offset 59: <Having > Start: 59 End: 66 Word at offset 58: <Having > Start: 59 End: 66 Word at offset 119: <the > Start: 119 End: 123 Word at offset 118: <the > Start: 119 End: 123 Word at offset 179: <for > Start: 179 End: 183 Word at offset 178: <for > Start: 179 End: 183 ============================================== Gedit: Word at offset 59: <Having > Start: 59 End: 66 Word at offset 58: <wraps. > Start: 52 End: 59 Word at offset 119: <the > Start: 119 End: 123 Word at offset 118: <at > Start: 116 End: 119 Word at offset 179: <for > Start: 179 End: 183 Word at offset 178: <test > Start: 174 End: 179 ============================================== Epiphany: Word at offset 59: <Having > Start: 59 End: 66 Word at offset 58: <wraps. > Start: 52 End: 59 Word at offset 119: <the > Start: 119 End: 123 Word at offset 118: <at > Start: 116 End: 119 Word at offset 179: <for > Start: 179 End: 183 Word at offset 178: <test > Start: 174 End: 179 ==============================================
Reporter | ||
Comment 1•10 years ago
|
||
Comment 2•10 years ago
|
||
must be a dupe of bug 880159
Comment 3•10 years ago
|
||
one thing, Joanie, can you answer bug 880159 comment #10 please? It doesn't seem like your example relies on -2 magic offset.
Comment 4•10 years ago
|
||
I'm not convinced this is a duplicate unless I'm misunderstanding something. Bug 880159 covers the case where the caret is at the end of a wrapping line, so the caret seems to be at the start of the word. Joanie is retrieving the word for the offset *before* the start of the word, so it should return the previous word, no questions asked. Strangely, in Windows, this one seems to work as expected.
Comment 5•10 years ago
|
||
right, I missed that given offsets are line end and line end - 1 offsets
Assignee | ||
Comment 6•5 years ago
|
||
I have dived into this one, here is what happens, taking the case when we have the text wrapped this way:
Having done so, use get_text_at_offset() to get the word at
the offset at the beginning of a line. Then repeat the test
and we request the word at offset of the space just after "at", with the start-of-word boundary:
In nsIFrame::PeekOffset's eSelectWord case, a first call is made to current->PeekOffsetWord (i.e. nsTextFrame::PeekOffsetWord) with the content of the first line. This terminates immediately because we are already at the end of line. nsIFrame::PeekOffset then calls it a second time with the content of the second line, after setting state.SetSawBeforeType(); to account for the line jump equivalent to a whitespace. In nsTextFrame::PeekOffsetWord, aState->mAtStart is still true, and thus it believes it should continue within word "the". This is because state.Update() was in the end never called.
In the attached patch, I make nsIFrame::PeekOffset call state.Update() along state.SetSawBeforeType() to account for the fact that it is like we had processed a whitespace, and we should thus now stop at the next start of word.
It is being tested on https://treeherder.mozilla.org/#/jobs?repo=try&revision=cafad117118bba9e432f3b221fc4490ee1cec266
Assignee | ||
Comment 7•5 years ago
|
||
So it unbreaks some cases, but breaks others, notably words which get split because they just can't fit on a single line. In that case we indeed do not want to have such a linebreak to be taken as a whitespace. But do we have a way to know, from nsIFrame::GetFrameFromDirection, to know whether the line jump is from a split word (thus not whitespace), or from an expected line break (thus whitespace)? Will we just always have a frame with IsBrFrame()?
Actually, looking more into details in the "at" word case with the requested offset on the 'a', in nsTextFrame::PeekOffsetWord it looks at 'a', then 'n', then gets out of the loop without looking at the space after "at". when requesting from offset of the space character after "at", the loop exits immediately (like mentioned in previous comment) because of the same reason, and the next call directly starts with the next line, and thus the space was missed. In the case of "from" the space character is looked at. I'd say to have proper behavior the space character should also be looked at, because the linebreak only comes from lack of space, and in the split word case we don't actually want to consider it as a whitespace. So the ClusterIterator in that case shouldn't actually trim the whitespaces when used from nsIFrame::PeekOffset through PeekOffsetWord, perhaps we can just add a parameter to nsPeekOffsetStruct for HyperTextAccessible::FindOffset to specify it?
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 8•5 years ago
|
||
Jamie, do you have any opinion on my latest comment?
Comment 9•5 years ago
|
||
(In reply to Samuel Thibault from comment #7)
Actually, looking more into details in the "at" word case with the requested offset on the 'a', in nsTextFrame::PeekOffsetWord it looks at 'a', then 'n', then gets out of the loop without looking at the space after "at". when requesting from offset of the space character after "at", the loop exits immediately (like mentioned in previous comment) because of the same reason, and the next call directly starts with the next line, and thus the space was missed. In the case of "from" the space character is looked at. I'd say to have proper behavior the space character should also be looked at, because the linebreak only comes from lack of space, and in the split word case we don't actually want to consider it as a whitespace. So the ClusterIterator in that case shouldn't actually trim the whitespaces when used from nsIFrame::PeekOffset through PeekOffsetWord, perhaps we can just add a parameter to nsPeekOffsetStruct for HyperTextAccessible::FindOffset to specify it?
Daniel, how does the suggestion look ok from layout perspective? It seems it has no effect on keyboard navigation but unblocks a11y on text extraction issue.
Comment 10•5 years ago
|
||
This is the first I've seen this PeekOffset code, so I don't really know what its invariants are & can't comment on the correctness. (Looking at the hg blame surrounding this line, it looks pretty old, though it seems jfkthame has reviewed at least one patch that touched it in the past couple years (Bug 1375825) -- so I'm going to punt the needinfo to him in the hopes that he's got a bit more orientation for this code than I do.)
Assignee | ||
Comment 11•5 years ago
|
||
Mmm, Jamie, Joanmarie, in the case where a word gets split at end of line because it cannot fit in one line anyway, do we want to make the accessibility layer think that this is a word break, or not? I'd tend to say not since that's just a visual rendering, but the tests in accessible/tests/mochitest/text/test_atcaretoffset.html say yes (but we can change that).
Reporter | ||
Comment 12•5 years ago
|
||
(In reply to Samuel Thibault from comment #11)
Mmm, Jamie, Joanmarie, in the case where a word gets split at end of line because it cannot fit in one line anyway, do we want to make the accessibility layer think that this is a word break, or not? I'd tend to say not since that's just a visual rendering, but the tests in accessible/tests/mochitest/text/test_atcaretoffset.html say yes (but we can change that).
I would tend to lean toward yes BECAUSE it's a visual rendering. That said, ask your colleagues who are blind, and perhaps on the Orca list too, what they want and expect. Then, if you make Gecko do that, and Orca just trusts Gecko, everyone will be happy. :)
Assignee | ||
Comment 13•5 years ago
|
||
Here is what it looks like, https://treeherder.mozilla.org/#/jobs?repo=try&revision=8c64b47fd864c8ee90aa614ee174642a192b6292 shows no regression, only unrelated intermitent-marked failures.
Comment 14•5 years ago
|
||
We really should have test coverage for this, so that we can verify the behavior (and detect any future regressions). Could you add appropriate tests (that fail currently, and pass with the fix) in a second patch?
Assignee | ||
Comment 15•5 years ago
|
||
Here is a test case, being tested with the fix on https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffa51d121c32acf66c71221e52307359ed866054
Assignee | ||
Comment 16•5 years ago
|
||
The try is positive.
Comment 17•5 years ago
|
||
Comment on attachment 9043360 [details] [diff] [review] patch Review of attachment 9043360 [details] [diff] [review]: ----------------------------------------------------------------- This is probably OK, but I'd be a lot happier with it if we could update the GetTrimmedOffsets method as suggested below. Also, AFAICT this patch will change behavior in cases that don't even involve any line-breaks. If you add a test element to test_wordboundary.html that has extra spaces that will be collapsed by normal white-space processing, e.g. as a variation on id="d6": <div id="d6a"> hello all </div> and then set up the corresponding tests, this patch will alter the results for it even though it's a single-line test. That may be fine -- I'm not sure the current behavior here is particularly intended or necessary -- but please look into this example and confirm whether the changed results are acceptable (and if so, let's add it to the test as well). ::: layout/generic/nsFrameSelection.h @@ +121,5 @@ > bool mJumpLines; > > + // mJumpLinesBreakWord: Whether jumping across line is to be considered as a > + // word break > + bool mJumpLinesBreakWord; What does "be considered as a word break" mean here? It's not clear to me from this comment how `mJumpLinesBreakWord == true` would differ from `mJumpLines == false` when using eSelectWord; either one of them sounds like it would make the line-boundary act as a break. Would it make sense to give this a name more closely linked to the space-trimming behavior that it affects? `bool mTrimSpaces` or something like that? The connection between "jumping across lines" and the flags eventually passed to GetTrimmedOffsets isn't obvious at the moment. ::: layout/generic/nsTextFrame.h @@ +575,5 @@ > int32_t GetEnd() const { return mStart + mLength; } > }; > TrimmedOffsets GetTrimmedOffsets(const nsTextFragment* aFrag, bool aTrimAfter, > + bool aPostReflow = true, > + bool aTrimBefore = true) const; This GetTrimmedOffsets API is kinda questionable already, having two trailing bool parameters (one of them with a default); it's awfully easy to get parameters mixed up and inadvertently pass the wrong bools to a method like this. Adding a third bool param, again with a default, makes it even worse. :( So although it makes for a bit more work, I'd be much happier if we revise this to take strongly-typed flags instead of multiple "interchangeable" bools. Rather than three separate typed flags, I'd be inclined to go with a single TrimmedOffsetFlags enum that can hold the three flags here, with suitably named values; and update all the callsites to clearly spell out which flags they're passing. (Looks like there are about 15 call sites to be updated, so this is fairly manageable. In fact, what I'd probably suggest would be to start with a patch that converts the existing two flags to a single typed-enum flags param, and update callers accordingly, with no change in behavior; and then a new version of this patch can add its flag to that enum.) Given that 'true' seems to be the more common value for all these flags, and in particular the new flag wants to default to true, you could choose to define them inversely in the flags enum, something like enum class TrimmedOffsetFlags : uint8_t { kDefaultTrimFlags = 0, kNoPostReflow = 1 << 0, kNoTrimAfter = 1 << 1, kNoTrimBefore = 1 << 2 }; so that the new flag for 'before' doesn't have to be explicitly added to all the existing callsites in order to maintain their behavior when it is introduced.
Assignee | ||
Comment 18•5 years ago
|
||
I'd be much happier if we revise this to take strongly-typed flags instead of multiple "interchangeable" bools.
Right, I was wondering about it too. I have made such a rework in the attached patch, successfully tested on https://treeherder.mozilla.org/#/jobs?repo=try&revision=0b86dad68ba187bf56fecd4ec2f1436cd5907a3f&selectedJob=229258587
Assignee | ||
Comment 19•5 years ago
|
||
AFAICT this patch will change behavior in cases that don't even involve any line-breaks. If you add a test element to test_wordboundary.html that has extra spaces that will be collapsed by normal white-space processing, e.g. as a variation on id="d6":
<div id="d6a"> hello all </div>
My patch does not actually change the behavior here. With the current firefox, a trailing space in d6 would already show up in the gettext results. The heading space is eaten both by the current firefox and with my change. And intermediate spaces are collapsed in both cases.
I am currently testing by changes with the attached test changes.
::: layout/generic/nsFrameSelection.h
- // mJumpLinesBreakWord: Whether jumping across line is to be considered as a
- // word break
- bool mJumpLinesBreakWord;
What does "be considered as a word break" mean here?
[...]
Would it make sense to give this a name more closely linked to the space-trimming behavior that it affects?bool mTrimSpaces
or something like that?
Mmm, this sounds like a remnant of my previous thinking. Like Joanmarie said, my change lets the linebreak act as a word break, it just doesn't eat spaces. I'll fix that once we have the flag rework in.
Comment 20•5 years ago
|
||
Comment on attachment 9045159 [details] [diff] [review] patch Review of attachment 9045159 [details] [diff] [review]: ----------------------------------------------------------------- Thanks; although it's a bit verbose, I think this will help us keep track of what those flags mean much better than just true/false, especially once you add another flag in the main patch here. ::: layout/generic/nsTextFrame.h @@ +578,5 @@ > + kNoPostReflow = 1 << 0, > + kNoTrimAfter = 1 << 1 > + }; > + TrimmedOffsets GetTrimmedOffsets(const nsTextFragment* aFrag, > + TrimmedOffsetFlags aFlags) const; At this point, what do you think about giving aFlags a default value of kDefaultTrimFlags? I count around 6 callsites where we unconditionally pass kDefaultTrimFlags, so making it a default value here would allow those places to be more concise, while having explicit flag names wherever we want non-default behavior.
Updated•5 years ago
|
Comment 21•5 years ago
|
||
(In reply to Samuel Thibault from comment #19)
AFAICT this patch will change behavior in cases that don't even involve any line-breaks. If you add a test element to test_wordboundary.html that has extra spaces that will be collapsed by normal white-space processing, e.g. as a variation on id="d6":
<div id="d6a"> hello all </div>
My patch does not actually change the behavior here. With the current firefox, a trailing space in d6 would already show up in the gettext results. The heading space is eaten both by the current firefox and with my change. And intermediate spaces are collapsed in both cases.
Are you sure about this? I tried adding this version of test d6 with extra (collapsible) spaces. It passes as written with current code, but if I apply your patch, it starts failing, which indicates that behavior does change. I didn't look into it deeply but I think the handling of the trailing space must have changed.
Assignee | ||
Comment 22•5 years ago
|
||
Here is an updated version. The try server seems to have a hard time finding a host to run a few of the tests, but the result seems positive:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=10ac0f920f34be53f6298ecabe1c563d8082f1bc
Updated•5 years ago
|
Assignee | ||
Comment 23•5 years ago
|
||
My patch does not actually change the behavior here. With the current firefox, a trailing space in d6 would already show up in the gettext results. The heading space is eaten both by the current firefox and with my change. And intermediate spaces are collapsed in both cases.
Are you sure about this? I tried adding this version of test d6 with extra (collapsible) spaces. It passes as written with current code, but if I apply your patch, it starts failing, which indicates that behavior does change. I didn't look into it deeply but I think the handling of the trailing space must have changed.
Ah, indeed.
So the change is when the passed offset is beyond the end of the last word, i.e. after the space after the last word. My change makes getTextAtOffset return the space in that case, while it was returning the last word previously. Actually it wasn't just the last word, but " all ", which looks very odd for an end-of-word boundary. I'd say the new behavior is more coherent, actually. I have here attached the resulting testcases.
Comment 24•5 years ago
|
||
Yes, this seems reasonable to me, I just wanted to be sure we're aware of what's changing (and add test coverage so that any future changes in this area are more likely to be noticed, too).
Comment 25•5 years ago
|
||
Comment on attachment 9045255 [details] [diff] [review] flags-patch Review of attachment 9045255 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, thanks. Re-reading the patch, I did wonder if it would be clearer to rename kNoPostReflow as kNotPostReflow? If you think that makes sense, feel free to rename it and carry over the r+.
Assignee | ||
Comment 26•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 27•5 years ago
|
||
Thanks!
Ah, I can't add the checkin-needed keyword myself, because the bug is not assigned to me.
Comment 28•5 years ago
|
||
That we can fix!
However, while we could safely check in the first patch by itself, I'd actually suggest waiting until the full set is ready (which looks like it shouldn't take much now), and landing all together. It tends to complicate things if we land patches one by one within a single bug, especially if any uplifts or backouts start to get involved.
Assignee | ||
Comment 29•5 years ago
|
||
Here is the reworked patch on top of the flags-patch
Assignee | ||
Comment 30•5 years ago
|
||
Here are the reworked testcases.
The patch is tested (without the additional testcases) on
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1950b6ca60d02a114052cb257d7ec7a46521634c&selectedJob=229876841
With the additional testcases:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3232620dfc3de9767cc9763a2d8bd21a3b69f944
Assignee | ||
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Comment 31•5 years ago
|
||
While trying to apply any of the two patches in my console to land in on the integration trees i get this error:
applying patch1
patching file layout/generic/nsTextFrame.cpp
Hunk #1 FAILED at 2960
Hunk #3 FAILED at 7950
2 out of 4 hunks FAILED -- saving rejects to file layout/generic/nsTextFrame.cpp.rej
patching file layout/generic/nsTextFrame.h
Hunk #2 FAILED at 571
1 out of 2 hunks FAILED -- saving rejects to file layout/generic/nsTextFrame.h.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working directory
errors during apply, please fix and qrefresh patch1
Can you take a look into it?
Assignee | ||
Comment 32•5 years ago
|
||
Sorry, I forgot to mention that flags-patch (patch0) needs to be applied first, then patch (patch1), then testcases (patch2)
Comment 33•5 years ago
|
||
Hi Samuel. I've tried every combination to land these patches, always get hunks failed because they are not applied in their order. Please take a look and try to fix it and then re-add chekin-needed.
Examples: https://irccloud.mozilla.com/file/uZNuG6OG/image.png
https://irccloud.mozilla.com/file/CKVeQIl0/image.png
https://irccloud.mozilla.com/file/w1a7kEUY/image.png
Also, please update the commit messages for each patch so we know which one is which.
Assignee | ||
Comment 34•5 years ago
|
||
Assignee | ||
Comment 35•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 36•5 years ago
|
||
Assignee | ||
Comment 37•5 years ago
|
||
Ok, here it is respinned:
- first flags-patch
- then patch
- then testcases
Comment 38•5 years ago
|
||
Pushed by apavel@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a2d905ec0a76
nsTextFrame::GetTrimmedOffsets: Rework flag parameters r=jfkthame
https://hg.mozilla.org/integration/mozilla-inbound/rev/6c9191097079
layout: Do not trim spaces when inspected from accessibility layer r=jfkthame
https://hg.mozilla.org/integration/mozilla-inbound/rev/4dee747dc404
layout: add more tests for text inspection from accessibility layer r=jfkthame
Comment 39•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/a2d905ec0a76
https://hg.mozilla.org/mozilla-central/rev/6c9191097079
https://hg.mozilla.org/mozilla-central/rev/4dee747dc404
Updated•4 years ago
|
Description
•