1346535 - atk_text_get_get() return value incorrect after multibyte character

Reporter

Description

•

9 years ago

Steps to reproduce: 1. Load data:text/html;charset=utf-8,<div>Hello %F0%9F%92%97 world</div> 2. Use Accerciser to find the accessible object corresponding with the <div> element. 3. In the ipython console, type the following: ----- text = acc.queryText() allText = text.getText(0,-1) ["%s %s" % (allText[i], text.getText(i, i+1)) for i in range(len(allText))] ----- Expected results: The values would match. Actual results: The values match only until the multibyte character is reached. It seems like the get text implementation is assuming single-byte chars (see example output below). Also note that performing the same test using the Gedit text editor works as expected. ['H H', 'e e', 'l l', 'l l', 'o o', ' ', '

Joanmarie Diggs

Reporter

Comment 1

•

9 years ago

Interesting. It appears bugzilla doesn't like multibyte characters either as I had pasted more output than the above. Trying again (the full output shows up in preview.) ==== Firefox ==== ['H H', 'e e', 'l l', 'l l', 'o o', ' ', '

Joanmarie Diggs

Reporter

Comment 2

•

9 years ago

Attached file broken-multibyte.txt - Full output of performing steps in opening report — Details

(Attaching output because Bugzilla doesn't want to display it, contrary to what the Preview suggests.)

alexander :surkov (:asurkov)

Updated

•

9 years ago

Blocks: texta11y

David Bolter [:davidb] (NeedInfo me for attention)

Comment 3

•

8 years ago

Joanie what is the user impact?

Flags: needinfo?(jdiggs)

Priority: -- → P3

Joanmarie Diggs

Reporter

Comment 4

•

8 years ago

The user impact is that Orca will present the wrong character to the user if it doesn't hack around this.

Flags: needinfo?(jdiggs)

Joanmarie Diggs

Reporter

Comment 5

•

8 years ago

Make that the wrong substring, which might be a character or might not. Related: Orca would be attempting to access content in this fashion as a result of encountering a bogus return value when trying to get text by char, word, line, or sentence at offset. So this bug here boils down to: When hacking around some other text implementation bug, Orca must also beware the presence of multibyte chars.

Samuel Thibault

Assignee

Comment 6

•

8 years ago

It seems it happens only for characters whose utf-16 encoding needs more more than one 16bit value: U+FFFD works fine, but U+10000 doesn't work. That's coherent with the shift that shows up in Joanmarie's test: the character counts as two, and for both U+FFFD is returned to accerciser. I guess HyperTextAccessible::TextSubstring etc. offsets are utf-16 offsets then? (I guess that is coming from the Windows world) That's a real concern because that shifts everything as soon as on web pages there are characters not in unicode16, e.g. very notably emojis...

alexander :surkov (:asurkov)

Comment 7

•

8 years ago

It appears Gecko indeed stores strings as double-bytes arrays [1], so layout we query for rendered text also operates with XPCOM strings [2]. Daniel, do you know if there are any issues with non unicode16 characters support in layout (RenderedText in particular)? [1] https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Glue_classes/nsAString [2] https://dxr.mozilla.org/mozilla-central/source/layout/generic/nsIFrame.h?q=nsIFrame%3A%3ARenderedText&redirect_type=direct#2622

Flags: needinfo?(dholbert)

Daniel Holbert [:dholbert]

Comment 8

•

8 years ago

I don't know - punting my needinfo to jfkthame who's more likely to know.

Flags: needinfo?(dholbert) → needinfo?(jfkthame)

Jonathan Kew [:jfkthame]

Comment 9

•

8 years ago

(In reply to Samuel Thibault from comment #6) > I guess HyperTextAccessible::TextSubstring etc. offsets are utf-16 offsets > then? (I guess that is coming from the Windows world) That's a real concern > because that shifts everything as soon as on web pages there are characters > not in unicode16, e.g. very notably emojis... I don't think this comes from the Windows world in particular; it's simply that the DOM interfaces are entirely built around 16-bit character codes (i.e. UTF-16 code units, NOT necessarily Unicode characters). So if HyperTextAccessible is dealing with DOM offsets and ranges, those will indeed be based on UTF-16. Perhaps this needs to be clarified, e.g. in the nsIAccessibleText interface, which seems to refer to "offset" and "count" values without being explicit about what they're counting. I would expect an element containing a single emoji character to report a length of 2, for example. (Or even more: e.g. there could be a Fitzpatrick skin-tone modifier, resulting in a length of 4 for the single emoji. DOM interfaces don't abstract this away, and I don't expect the a11y interfaces do so either.) So in the example from comment 0, the problem (I think) is that: text.getText(i, i+1) is explicitly reading a single *code unit* from a UTF-16 string, which results in an unpaired surrogate.

Flags: needinfo?(jfkthame)

Samuel Thibault

Assignee

Comment 10

•

8 years ago

I referred to the Windows world in the sense that there, wchar_t is 16bit, while on Unix wchar_t is 32bit and thus gets completely away with the need to care about surrogates. The example of code is at-spi-based, which manipulates unicode characters in utf-8, and so getText(i, i+1) does make sense there (and how would it know that it's a surrogate without reading it?). In at-spi, offsets etc. count unicode characters, simply. With a skin-tone modifier, that would get 2 unicode characters. It happens that accessible/atk/nsMaiInterfaceText.cpp's getTextCB() gets only one surrogate, and when it calls NS_ConvertUTF16toUTF8 that returns U+FFFD. To make conversions properly, everything there should take care whether there are non-unicode16 charaacters, and fix lengths and offsets appropriately :/

Jonathan Kew [:jfkthame]

Comment 11

•

8 years ago

(In reply to Samuel Thibault from comment #10) > The example of code is at-spi-based, which manipulates unicode characters in > utf-8, and so getText(i, i+1) does make sense there (and how would it know > that it's a surrogate without reading it?). In at-spi, offsets etc. count > unicode characters, simply. With a skin-tone modifier, that would get 2 > unicode characters. It sounds like there's a mismatch, then, between the interpretation of offsets/lengths in nsIAccessibleText and the expectations of the at-spi interface. Maybe we should resolve that within HyperTextAccessible? It'll presumably come with some performance cost, though, as string indexing will no longer be straightforward. And I don't know whether it would break anything that's depending on the current behavior, where offsets in that interface relate directly to DOM offsets.

Samuel Thibault

Assignee

Comment 12

•

8 years ago

I guess it would break the iaccessible2 interface on Windows where unicode is encoded with 16bit values, yes. DocAccessibleChild.cpp also returns nsString. It should rather be handled in nsMailInterfaceText.cpp: that's where NS_ConvertUTF16toUTF8 is called just before handing the text to atk. I'm thinking that one way to avoid the indexing shifts issues would be to use a UTF16toUTF8 converter which converts surrogate pairs into two characters: the non-unicode16 character and a BOM (U+FEFF). And in the cases where nsMailInterfaceText.cpp gets from TextSubstring some text starting with a trailing surrogate, it would replace it with U+FEFF, and in the cases where it ends with a heading surrogate, it would call TextSubstring again with increased length to get the non-unicode16 character (or just always do it and ignore the additional character if not needed). Other alternatives look quite invasive to me: parsing the whole text each time to compute the length? adding to nsString a variant of the Length method to avoid recomputing the size each time? And adding to it an Append variant that indexes unicode characters, etc. etc.

James Teh [:Jamie]

Comment 13

•

8 years ago

(In reply to Samuel Thibault from comment #12) > I'm thinking that one way to avoid the indexing shifts issues would be to > use a UTF16toUTF8 converter which converts surrogate pairs into two > characters: the non-unicode16 character and a BOM (U+FEFF). And in the cases > where nsMailInterfaceText.cpp gets from TextSubstring some text starting > with a trailing surrogate, it would replace it with U+FEFF, and in the cases > where it ends with a heading surrogate, it would call TextSubstring again > with increased length to get the non-unicode16 character (or just always do > it and ignore the additional character if not needed). That's an interesting and creative idea. :) So to clarify, for "Hello U+1F497 world": Retrieving the entire string would return: "Hello U+1F497U+FEFF world" Retrieving (6, 7) would return U+1F497 Retrieving (7, 8) would return U+FEFF That could make things a bit ugly for braille routing, though. We also have another bug here which is that the character boundary doesn't handle characters which deal with multiple Unicode code points. For example, the sequence U+091C U+094D U+091E is actually a single "character" semantically, even though it consists of 3 Unicode code points (and this doesn't even have any surrogates). However, we should file a separate bug for that.

Samuel Thibault

Assignee

Comment 14

•

8 years ago

(In reply to James Teh [:Jamie] from comment #13) > So to clarify, for "Hello U+1F497 world": > Retrieving the entire string would return: "Hello U+1F497U+FEFF world" > Retrieving (6, 7) would return U+1F497 > Retrieving (7, 8) would return U+FEFF Yes. > That could make things a bit ugly for braille routing, though. How so? I mean, braille already has issues, since transcription is not 1to1 anyway. U+FEFF can be transcribed into an empty braille cell set (like other invisible zero-width characters), conversely to U+1F497 already being transcribed into several braille cells. I don't see where the problem is worse than what we already have. > We also have another bug here which is that the character boundary doesn't > handle characters which deal with multiple Unicode code points. For example, > the sequence U+091C U+094D U+091E is actually a single "character" > semantically, even though it consists of 3 Unicode code points (and this > doesn't even have any surrogates). However, we should file a separate bug > for that. Well, character-encoding-wise, they are indeed 3 characters, and string functions treat them as 3 separate characters, this is coherent, I'd say we only need to keep that coherency. Then there is the actual rendering which is a complex thing indeed, but that can be all handled between liblouis/orca/nvda, firefox does not have to be involved. My concern is that if we try to do anything else than sticking to unicode code points, we will get a lot of troubles (double-width characters, zero-width characters, etc.) which should really matter only for rendering, and not for transmission of the text.

James Teh [:Jamie]

Comment 15

•

8 years ago

(In reply to Samuel Thibault from comment #14) > > That could make things a bit ugly for braille routing, though. > How so? I mean, braille already has issues, since transcription is not 1to1 > anyway. U+FEFF can be transcribed into an empty braille cell set (like other > invisible zero-width characters), That requires the translator or tables know about U+FEFF, and I don't know that they currently do. However, I guess you could just use lou_compileString (I think that's what it's called) to insert a rule at runtime, so that should work. > > We also have another bug here which is that the character boundary doesn't > > handle characters which deal with multiple Unicode code points. > Well, character-encoding-wise, they are indeed 3 characters, and string > functions treat them as 3 separate characters, this is coherent, I'd say we > only need to keep that coherency. I mean the character "boundary", not the offsets. The example should consume 3 offsets, yes, since it consists of 3 code points. However, it is semantically a single character and should be considered as such when cursoring, for example. So, IAccessibleText::textAtOffset(0, IA2_TEXT_BOUNDARY_CHAR) should return offsets (0, 3). I'm not sure what the ATK equivalent is, but I imagine there would be one.

Jonathan Kew [:jfkthame]

Comment 16

•

8 years ago

(In reply to James Teh [:Jamie] from comment #13) > That's an interesting and creative idea. :) So to clarify, for "Hello > U+1F497 world": > Retrieving the entire string would return: "Hello U+1F497U+FEFF world" > Retrieving (6, 7) would return U+1F497 > Retrieving (7, 8) would return U+FEFF Given an element with the text "Hello U+1F497 world", wouldn't a user of the at-spi interface expect that retrieving (8, 13) should return "world"? But under this scheme it will instead return " worl" because of the "fake" character that's inserted, shifting the offsets of everything that follows.

James Teh [:Jamie]

Comment 17

•

8 years ago

(In reply to Jonathan Kew (:jfkthame) from comment #16) > Given an element with the text "Hello U+1F497 world", wouldn't a user of the > at-spi interface expect that retrieving (8, 13) should return "world"? The client can only know what to "expect" based on something they retrieved earlier. In this scheme, the client never actually gets "Hello U+1F497 world". Instead, we lie to the client and they get "Hello U+1F497U+FEFF world" when they retrieve the entire string. After that, any offsets to which the client refers will be based on this string (the one with the U+FEFF we lied about).

Samuel Thibault

Assignee

Comment 18

•

8 years ago

(In reply to James Teh [:Jamie] from comment #15) > (In reply to Samuel Thibault from comment #14) > > > That could make things a bit ugly for braille routing, though. > > How so? I mean, braille already has issues, since transcription is not 1to1 > > anyway. U+FEFF can be transcribed into an empty braille cell set (like other > > invisible zero-width characters), > > That requires the translator or tables know about U+FEFF, and I don't know > that they currently do. Well, not a problem that can't be solved :) They really have to care anyway, since Unicode text files do often include the BOM at the beginning. > However, I guess you could just use lou_compileString (I think that's what it's called) to insert a rule at runtime, so that should work. Until that gets fixed in liblouis, indeed. > > > We also have another bug here which is that the character boundary doesn't > > > handle characters which deal with multiple Unicode code points. > > Well, character-encoding-wise, they are indeed 3 characters, and string > > functions treat them as 3 separate characters, this is coherent, I'd say we > > only need to keep that coherency. > > I mean the character "boundary", not the offsets. The example should consume > 3 offsets, yes, since it consists of 3 code points. However, it is > semantically a single character and should be considered as such when > cursoring, for example. So, IAccessibleText::textAtOffset(0, > IA2_TEXT_BOUNDARY_CHAR) should return offsets (0, 3). I'm not sure what the > ATK equivalent is, but I imagine there would be one. There is an ATSPI_TEXT_BOUNDARY_CHAR, but it sticks to the unicode notion of code points: « @ATSPI_TEXT_BOUNDARY_CHAR: An #AtspiText instance is bounded by this character only. Start and end offsets differ by one, by definition, for this value. » So it will be coherent for at-spi (and the change I propose is only for at-spi, not ia2).

Jonathan Kew [:jfkthame]

Comment 19

•

8 years ago

(In reply to James Teh [:Jamie] from comment #17) > (In reply to Jonathan Kew (:jfkthame) from comment #16) > > Given an element with the text "Hello U+1F497 world", wouldn't a user of the > > at-spi interface expect that retrieving (8, 13) should return "world"? > > The client can only know what to "expect" based on something they retrieved > earlier. In this scheme, the client never actually gets "Hello U+1F497 > world". Instead, we lie to the client and they get "Hello U+1F497U+FEFF > world" when they retrieve the entire string. After that, any offsets to > which the client refers will be based on this string (the one with the > U+FEFF we lied about). I don't know much about these interfaces, or how they're used, but isn't there still potential for confusion/problems if the client itself has previously set the text of an element -- so it knows exactly what it "expects" to be there -- but then we map offsets differently than it expects? So, client sets text of an editable element to "Hello U+1F497 world", and then wants to set the selection to the word "world", so it sets it to (8, 13)... and misses its target.

Samuel Thibault

Assignee

Comment 20

•

8 years ago

In at-spi2 there is actually no way for clients to set the text of an element. Element modifications go through the usual input methods, with all the potential for conflicts with other input sources, and thus the screen reader already has to care for what actual text insertion events happen.

Jonathan Kew [:jfkthame]

Comment 21

•

8 years ago

OK, clearly I don't understand the pieces involved here. Looking at https://github.com/GNOME/at-spi2-core/tree/master/xml, for example, I see an EditableText interface that looks like it can set/modify text, with methods like SetTextContents, InsertText, CutText, etc. Is that not potentially an issue?

Samuel Thibault

Assignee

Comment 22

•

8 years ago

Oh, sorry, that's right. It seems it has been too long a time since I actually looked at the whole at-spi interface, and Orca does not actually make any use of it, so to my mind it just didn't exist at all :) The conflicting thing still applies: a screen reader should not expect the resulting text to be exactly what it set, since other kinds of input could intervene in the meanwhile.

James Teh [:Jamie]

Comment 23

•

8 years ago

(In reply to Samuel Thibault from comment #18) > There is an ATSPI_TEXT_BOUNDARY_CHAR, but it sticks to the unicode notion of > code points: Fair enough; if that's what the spec says, that's what we should do. Still, this seems flawed. That makes it impossible for a screen reader to report the semantic character correctly when cursoring through text, since pressing right/left arrow will consider all 3 code points as just one position in a well behaved editor. There are also other cases of this; e.g. a word processor where a date field occupies a single "character" stop when cursoring.

Alex ARNAUD

Comment 24

•

8 years ago

Hello all, I've found a real life example to test this issue. Steps to reproduce: 1) Open Firefox without e10s (maybe the bug exists with e10s but I don't use e10s for now) 2) Go to https://blog.nightly.mozilla.org/2018/01/18/these-weeks-in-firefox-issue-30/ 3) Go to the heading 2 "Project Updates" 4) Press up arrow Result: Orca reads "New contributors (

Alex ARNAUD

Comment 25

•

8 years ago

Maybe there is a mistake with bugzilla, it seems that Unicode characters break the end of my message. Result: Orca reads "New contributors ( = First Patch!)" Expected result: Orca should read "Steve Pfister fixed a bug where the SafeBrowsing warning notification bar wouldn’t go away after a page navigation, and made sure that the tab title was properly set in about:url-classifier" Remove the star character solves the issue. Best regards.

Joanmarie Diggs

Reporter

Comment 26

•

8 years ago

Unfortunately, Orca has to hack around various and sundry bugs in the AtkText implementation. If I don't get the hack quite right, things like skipping text or getting stuck can occur. I'll see if something like that is going on and try to work around this particular instance. That said, fixing all the AtkText bugs in Gecko would be super appreciated! :)

Alex ARNAUD

Comment 27

•

8 years ago

(In reply to Joanmarie Diggs from comment #26) > Unfortunately, Orca has to hack around various and sundry bugs in the > AtkText implementation. If I don't get the hack quite right, things like > skipping text or getting stuck can occur. I'll see if something like that is > going on and try to work around this particular instance. In the context of Hypra, Samuel are working on this bug. So I hope you'll be able to remove your workaround in the near future. > That said, fixing all the AtkText bugs in Gecko would be super appreciated! > :) Right, could you point us what bugs are the most problematic ? To be honest we've difficulties to understand if those bugs have a user impact until we are ourselves impacted by the bugs. You've provided lot of good workaround to make our lives better. As we're overloaded we focus on most important bugs. You maintain the screen reader, we (hypra) maintain the screen magnifier (ezoom on Compiz), it's also a really huge work. You could find the bug we plan to work on here : https://bugs.debian.org/cgi-bin/pkgreport.cgi?users=bugs@hypra.fr;tag=hypra Best regards.

Samuel Thibault

Assignee

Comment 28

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

Here is what it would look like: the UTF16toUTF8 converter is added an aSurrBOM parameter which defaults to false, that the atk layer can use to make it add the BOMs along non-unicode16 values. atk can then request one more character on each side to be sure to get the paired surrogates, and remove them before pushing to at-spi.

Attachment #8947407 - Flags: review?(surkov.alexander)

Jonathan Kew [:jfkthame]

Comment 29

•

8 years ago

Drive-by comment: I don't feel good about pushing this hack all the way into xpcom/string code; if it's necessary to do something like this, I think it should be kept local to the a11y code and not added to core string APIs.

Samuel Thibault

Assignee

Comment 30

•

8 years ago

Ok, we can probably insert the BOMs along the non-unicode16 after conversion, I'll have a look.

Alex ARNAUD

Comment 31

•

8 years ago

(In reply to Alex ARNAUD from comment #24) > Hello all, > > I've found a real life example to test this issue. > > Steps to reproduce: > 1) Open Firefox without e10s (maybe the bug exists with e10s but I don't use > e10s for now) > 2) Go to > https://blog.nightly.mozilla.org/2018/01/18/these-weeks-in-firefox-issue-30/ > 3) Go to the heading 2 "Project Updates" > 4) Press up arrow > > Result: Orca reads "New contributors ( The Samuel's patch resolves the issue with this test case. Best regards.

alexander :surkov (:asurkov)

Comment 32

•

8 years ago

Comment on attachment 8947407 [details] [diff] [review] multibyte Review of attachment 8947407 [details] [diff] [review]: ----------------------------------------------------------------- r? to david baron, the xpcom strings owner

Attachment #8947407 - Flags: review?(dbaron)

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 33

•

8 years ago

Comment on attachment 8947407 [details] [diff] [review] multibyte This isn't possible to review without code comments explaining what this new boolean parameter does. Please either: * add them and request review again * refactor along the lines of comment 29. I probably prefer the latter, since I think this adds overhead for everybody else using these APIs (APIs that were previously quite straightforward).

Attachment #8947407 - Flags: review?(dbaron) → review-

alexander :surkov (:asurkov)

Comment 34

•

8 years ago

Comment on attachment 8947407 [details] [diff] [review] multibyte Samuel, will you redesgin the patch to address David's comment, right?

Attachment #8947407 - Flags: review?(surkov.alexander)

Samuel Thibault

Assignee

Comment 35

•

8 years ago

I will, yes, I'm just currently busy right now by other things :)

alexander :surkov (:asurkov)

Comment 36

•

8 years ago

(In reply to Samuel Thibault from comment #35) > I will, yes, I'm just currently busy right now by other things :) sure, please take your time, and thank you for doing this. just review request was bumping me daily :)

Samuel Thibault

Assignee

Comment 37

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

Here is a reworked patch, which adds the BOMs after translating to UTF-8.

Attachment #8947407 - Attachment is obsolete: true

Attachment #8952348 - Flags: review?(surkov.alexander)

alexander :surkov (:asurkov)

Comment 38

•

8 years ago

Comment on attachment 8952348 [details] [diff] [review] multibyte Review of attachment 8952348 [details] [diff] [review]: ----------------------------------------------------------------- nits and styling issues ::: accessible/atk/nsMaiInterfaceText.cpp @@ +205,5 @@ > + > + return true; > +} > + > +// Add a BOM after each non-unicode16 character of str nit: it'd be nice have a comment explaining what BOM is nit: dot in the end of the sentence @@ +212,5 @@ > +{ > + uint32_t n; > + > + // First compute how much room we will need > + n = 0; nit: uint32_t n = 0; or I guess you can use auto n = 0U; @@ +261,5 @@ > + // Unicode16, just copy the byte. > + ret.Replace(reti++, 1, str[i]); > + } > + } > + NS_ASSERTION(reti == n, "Bogus conversion!?"); MOZ_ASSERT @@ +274,4 @@ > { > AccessibleWrap* accWrap = GetAccessibleWrap(ATK_OBJECT(aText)); > nsAutoString autoStr; > + bool aStartShifted = false, aEndShifted = false; nit: local variables should not have 'a' prefix; it is used for arguments @@ +282,5 @@ > return nullptr; > > + PrepareUTF16toUTF8(&aStartOffset, &aEndOffset, > + &aStartShifted, &aEndShifted, > + text->IsDefunct() ? 0 : it'd say to return early if an accessible is defunct, since we cannot guarantee any text for a defunct accessible. @@ +304,5 @@ > + } > + nsAutoCString cautoStrBOM = AddBOMs(cautoStr); > + if (!FinishUTF16toUTF8(cautoStrBOM, aStartShifted, aEndShifted)) { > + return nullptr; > + } I would really prefer to wrap these by class similar to NS_ConvertUTF16toUTF8 which takes string, offsets and character count

Samuel Thibault

Assignee

Comment 39

•

8 years ago

> I would really prefer to wrap these by class similar to NS_ConvertUTF16toUTF8 which takes string, offsets and character count Mmm, not sure about the details. - AIUI, by "wrap these by class", you mean like nsString.h's class NS_ConvertUTF16toUTF8 which is a trivial wrapper around the AppendUTF16toUTF8 function which does the actual work of copying from one string to another, right? I.e. something like class NS_AddBOMs : public nsAutoCString { public: explicit NS_AddBOMS(const nsAString& aString) { AddBOMs(aString, *this); } } - Should this still be put within nsMaiInterfaceText.cpp, not to be stuffed into nsString.h? - what do you mean by "offsets and character count"? I don't see them in the NS_ConvertUTF16toUTF-8, and for AddBOMs, it just needs the string anyway, no other information is needed. For FinishUTF16toUTF8, the additional needed information beyond the string is whether the heading and/or the trailing characters should be removed, I didn't really plan to implement generic substringing.

Flags: needinfo?(surkov.alexander)

Samuel Thibault

Assignee

Comment 40

•

8 years ago

Actually, while I see how to add a wrapper for AddBOMs, I'm not sure about FinishUTF16toUTF8: what should I do on failure? Just assertions? I'm a bit afraid of depending on TextSubstring() providing expected data.

alexander :surkov (:asurkov)

Comment 41

•

8 years ago

(In reply to Samuel Thibault from comment #39) > > I would really prefer to wrap these by class similar to NS_ConvertUTF16toUTF8 which takes string, offsets and character count > > Mmm, not sure about the details. > > - AIUI, by "wrap these by class", you mean like nsString.h's class > NS_ConvertUTF16toUTF8 which is a trivial wrapper around the > AppendUTF16toUTF8 function which does the actual work of copying from one > string to another, right? I.e. something like > > class NS_AddBOMs : public nsAutoCString > { > public: > explicit NS_AddBOMS(const nsAString& aString) > { > AddBOMs(aString, *this); > } > } > I meant it'd be nice to wrap PrepareUTF16toUTF8/AddBOMs/FinishUTF16toUTF8 by a single call, it could be: class UTIF16ToMultibyteUTF8 { bool UTIF16ToMultibyteUTF8(const gint aStartOffset, const gint aEndOffset, gint aCount, nsAutoString &aStr) { NS_ConvertUTF16toUTF8 str(aStr); auto startOffset = aStartOffset; auto endOffset = aEndOffset; auto startShifted = ... PrepareUTF16toUTF8(&startOffset, &endOffset, ..) nsAutoCString cautoStrBOM = AddBOMs(cautoStr); if (!FinishUTF16toUTF8(cautoStrBOM, aStartShifted, aEndShifted)) { } } // Returns a string if the conversion was successful. char* get() {} } Actually I'm flexible how it could be implemented, it may be a class or namespace, the point is it'd be good to scope PrepareUTF16toUTF8/AddBOMs/FinishUTF16toUTF8, so the caller has zero chance for error. > - Should this still be put within nsMaiInterfaceText.cpp, not to be stuffed > into nsString.h? David said in comment #33 that his preference would be not to add this into string API. So it could be either nsMaiInterfaceText.cpp or a new file in atk folder. > - what do you mean by "offsets and character count"? just arguments of a new function/ctor

Flags: needinfo?(surkov.alexander)

Samuel Thibault

Assignee

Comment 42

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

Oh, I see the idea now, notably it's the . Here is a worked patch. I still had to split the NS_DOMtoATK work in two pieces, since there's an interlacing between getting the HyperTextAccessible or the ProxyAccessible and its size, the preparation, the TextSubstring call, and the eventual Conversion. The addition of a class however makes it look nice indeed.

Attachment #8952348 - Attachment is obsolete: true

Attachment #8952348 - Flags: review?(surkov.alexander)

Attachment #8953378 - Flags: review?(surkov.alexander)

Samuel Thibault

Assignee

Comment 43

•

8 years ago

"it's the" -> it's the keeping of startshifted/endshifted which is useful to wrap.

alexander :surkov (:asurkov)

Comment 44

•

8 years ago

Comment on attachment 8953378 [details] [diff] [review] multibyte Review of attachment 8953378 [details] [diff] [review]: ----------------------------------------------------------------- ::: accessible/atk/nsMaiInterfaceText.cpp @@ +270,5 @@ > + } > + MOZ_ASSERT(desti == n); > +} > + > +class NS_DOMtoATK : public nsAutoCString I lean towards to move all of this code into a separate file: 300 new lines into 600 lines file, which are not closely related to each other. Sounds good with you? and please could you add a comment to the class describing what it's about? @@ +276,5 @@ > +public: > + void Prepare(gint *aStartOffset, gint *aEndOffset, gint count) > + { > + PrepareUTF16toUTF8(aStartOffset, aEndOffset, > + &StartShifted, &EndShifted, count); wouldn't be nicer to keep get rid of PrepareUTF16toUTF8 and move its content here? @@ +319,5 @@ > ConvertTexttoAsterisks(accWrap, autoStr); > } else if (ProxyAccessible* proxy = GetProxy(ATK_OBJECT(aText))) { > + > + AtkStr.Prepare(&aStartOffset, &aEndOffset, > + proxy->CharacterCount()); if you had a variable to keep character count like uint32_t charCount = 0; if (accWrap) { charCount = text->CharacterCount(); } else { charCount = proxy->CharacterCount(); } then you could move Prepare and NS_ConvertUTF16toUTF inside of AtkString by AtkString cautoStr(aStartOffset, aEndOffset, charCount); return (cautoStr.get()) ? g_strdup(cautoStr.get()) : nullptr;

Samuel Thibault

Assignee

Comment 45

•

8 years ago

> I lean towards to move all of this code into a separate file Why not indeed :) > wouldn't be nicer to keep get rid of PrepareUTF16toUTF8 and move its content here? I was wondering it indeed. Keeping it as separate function like the two others looked simpler to grasp when reading: three functions which work together, and a class which wraps the way to call them. > you could move Prepare and NS_ConvertUTF16toUTF inside of AtkString by No, because of the interleaving: one has to call Prepare to get Start/EndOffset modified, then use one or the other TextString, before converting to UTF8, adding BOMs and finishing. At best we could have HyperTextAccessible* text ProxyAccessible* proxy gint count; if (accWrap) { text = accWrap->AsHyperText(); count = static_cast<gint>(text->CharacterCount()); } else if (proxy = GetProxy(ATK_OBJECT(aText))) { count = proxy->CharacterCount(); } else { return nullptr; } AtkString cautoStr(&aStartOffset, &aEndOffset, charCount); if (accWrap) { text->TextSubstring(aStartOffset, aEndOffset, autoStr); } else { proxy->TextSubstring(aStartOffset, aEndOffset, autoStr); } if (!cautoStr.Convert(autoStr)) { return nullptr; } return g_strdup(cautoStr.get()); that doesn't look very appealing to me, compared to DOMtoATK cautoStr; if (accWrap) { HyperTextAccessible* text = accWrap->AsHyperText(); cautoStr->Prepare(&aStartOffset, &aEndOffset, static_cast<gint>(text->CharacterCount())); text->TextSubstring(aStartOffset, aEndOffset, autoStr); } else if (ProxyAccessible* proxy = GetProxy(ATK_OBJECT(aText))) { cautoStr->Prepare(&aStartOffset, &aEndOffset, proxy->CharacterCount()); proxy->TextSubstring(aStartOffset, aEndOffset, autoStr); } if (!cautoStr.Convert(autoStr)) { return nullptr; } return g_strdup(cautoStr.get());

alexander :surkov (:asurkov)

Comment 46

•

8 years ago

(In reply to Samuel Thibault from comment #45) > > I lean towards to move all of this code into a separate file > > Why not indeed :) good :) > > wouldn't be nicer to keep get rid of PrepareUTF16toUTF8 and move its content here? > > I was wondering it indeed. Keeping it as separate function like the two > others looked simpler to grasp when reading: three functions which work > together, and a class which wraps the way to call them. if you don't need to access to those functions outside a class, then it makes sense to put them into the class, just to hide implementation details. Technically it looks pretty much same visually, just functions are prefixed by a class name. Right? > > you could move Prepare and NS_ConvertUTF16toUTF inside of AtkString by > > No, because of the interleaving: one has to call Prepare to get > Start/EndOffset modified, then use one or the other TextString, before > converting to UTF8, adding BOMs and finishing. At best we could have I see > HyperTextAccessible* text > ProxyAccessible* proxy > gint count; > if (accWrap) { > text = accWrap->AsHyperText(); > count = static_cast<gint>(text->CharacterCount()); > } else if (proxy = GetProxy(ATK_OBJECT(aText))) { > count = proxy->CharacterCount(); > } else { > return nullptr; > } > AtkString cautoStr(&aStartOffset, &aEndOffset, charCount); > if (accWrap) { > text->TextSubstring(aStartOffset, aEndOffset, autoStr); > } else { > proxy->TextSubstring(aStartOffset, aEndOffset, autoStr); > } > if (!cautoStr.Convert(autoStr)) { > return nullptr; > } > return g_strdup(cautoStr.get()); > > that doesn't look very appealing to me, compared to > > DOMtoATK cautoStr; > if (accWrap) { > HyperTextAccessible* text = accWrap->AsHyperText(); > cautoStr->Prepare(&aStartOffset, &aEndOffset, > static_cast<gint>(text->CharacterCount())); > text->TextSubstring(aStartOffset, aEndOffset, autoStr); > } else if (ProxyAccessible* proxy = GetProxy(ATK_OBJECT(aText))) { > cautoStr->Prepare(&aStartOffset, &aEndOffset, proxy->CharacterCount()); > proxy->TextSubstring(aStartOffset, aEndOffset, autoStr); > } > if (!cautoStr.Convert(autoStr)) { > return nullptr; > } > return g_strdup(cautoStr.get()); String class changing offsets right before grabbing a text look a bit weird. So you could move those outside DOMtoATK class like auto startOffset = aStartOffset > 0 ? aStartOffset - 1 : 0; auto endOffset = aEndOffset; and then if (accWrap) { auto charCount = text->CharactedCount(); endOffset = endOffset == -1 || endOffset == charCount ? charCount : endOffset + 1; } else { .. } or rename 'cautoStr' to something more generic like 'converter'? and rename Prepare method to something more explicit like AdjustOffsets(). Also could you please give more info (sorry, but I missed this part in the bug discussion) on non-unicode16 values in UTF16 string, and why we need to adjust offsets to handle those characters. Are those characters are bigger than 16bits?

Samuel Thibault

Assignee

Comment 47

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

> it makes sense to put them into the class Sure, that makes sense. > String class changing offsets right before grabbing a text look a bit weird. So you could move those outside DOMtoATK class like Well, to me _that_ would look odd :) The whole thing works correctly because we change the offsets to grab one character before and one character after the requested selection, and the finish operation needs to know whether one more character were grabbed on either end to properly drop them, thus that information needs to be transmitted along. It's really better to stuff that knowledge inside the class, so the outside just needs to know that it has to prepare the conversion, get the text with possibly shifted offsets, and call the conversion. > or [...] rename Prepare method to something more explicit like AdjustOffsets(). Yes, that makes sense > non-unicode16 values in UTF16 string, and why we need to adjust offsets to handle those characters. Are those characters are bigger than 16bits? Yes, they are bigger than 16bit, and are thus represented with two UTF-16 code units (a pair of surrogates), while in unicode they are just one character. When the selection is in the middle of a non-unicode16 character, we need to get both surrogates, not only the one that falls within the selection, otherwise we can't convert that into the unicode character. I have attached a reworked patch.

Attachment #8953378 - Attachment is obsolete: true

Attachment #8953378 - Flags: review?(surkov.alexander)

Attachment #8953660 - Flags: review?(surkov.alexander)

alexander :surkov (:asurkov)

Comment 48

•

8 years ago

Comment on attachment 8953660 [details] [diff] [review] multibyte Review of attachment 8953660 [details] [diff] [review]: ----------------------------------------------------------------- another portions of comments, that's a complicated thing :) ::: accessible/atk/nsMaiInterfaceText.cpp @@ +164,5 @@ > proxy->TextSubstring(aStartOffset, aEndOffset, autoStr); > } > > NS_ConvertUTF16toUTF8 cautoStr(autoStr); > + if (!AtkStr.Convert(cautoStr)) { it looks like you have double NS_ConvertUTF16toUTF8 conversion now ::: accessible/atk/nsDOMtoATK.cpp @@ +8,5 @@ > +#include "nsUTF8Utils.h" > + > +// In order to properly get non-unicode16 values, change offsets to get one more > +// character on each end, so that ConvertUTF16toUTF8 can convert surrogates even > +// if the originally requested offsets fall between them. usually a method comments are in headers @@ +32,5 @@ > + if (StartShifted) { > + // PrepareUTF16toUTF8 added a heading character > + > + if (Length() == 0) { > + // Odd, there should have been a character... shouldn't you assert there? and everywhere where odd is used @@ +36,5 @@ > + // Odd, there should have been a character... > + return false; > + } > + > + if ((int) Length() < UTF8traits::bytes(CharAt(0))) { nit: static_cast<int> @@ +82,5 @@ > +NS_DOMtoATK::AddBOMs(const nsAutoCString& aSource) > +{ > + uint32_t n = 0; > + > + // First compute how much room we will need nit: dot in the end @@ +91,5 @@ > + } > + n++; > + } > + > + uint32_t desti; // Index within this nit: initialize it @@ +94,5 @@ > + > + uint32_t desti; // Index within this > + > + // Add BOMs after non-unicode16 characters > + Truncate(n); Truncate cannot make string longer? Did you mean SetLength()? @@ +95,5 @@ > + uint32_t desti; // Index within this > + > + // Add BOMs after non-unicode16 characters > + Truncate(n); > + desti = 0; yep ^ :) @@ +129,5 @@ > + } > + MOZ_ASSERT(desti == n); > +} > + > +bool NS_DOMtoATK::Convert(nsAutoString aAutoStr) nit: bool NS_DOMToATK::Convert also public methods usually go before priviate methods used for their implementation @@ +132,5 @@ > + > +bool NS_DOMtoATK::Convert(nsAutoString aAutoStr) > +{ > + NS_ConvertUTF16toUTF8 cautoStr(aAutoStr); > + if (!(cautoStr.get())) { nit: no () around str.get() are needed, right? @@ +134,5 @@ > +{ > + NS_ConvertUTF16toUTF8 cautoStr(aAutoStr); > + if (!(cautoStr.get())) { > + return false; > + } since AdjustOffset and Convert has to go paired, then it'd be good to assert if the user forgets to call AdjustOffsets ::: accessible/atk/nsDOMtoATK.h @@ +28,5 @@ > +// AdjustOffsets before peeking the DOM string, and only after call Convert to > +// get the proper string with non-unicode16 characters expressed in just one > +// unicode character followed by a BOM. > + > +class NS_DOMtoATK : public nsAutoCString it's sort of weird to have nsAutoCString as a base class and use no constructor (because AdjustOffset has to be called after creation), and have Convert instead Assign. It would be probably nicer to remove inheritance from nsAutoCString. Convert could return a pointer to a converted string, to avoid having extra get() method. @@ +31,5 @@ > + > +class NS_DOMtoATK : public nsAutoCString > +{ > +public: > + void AdjustOffsets(gint *aStartOffset, gint *aEndOffset, gint count); nit: type* name @@ +38,5 @@ > +private: > + void AddBOMs(const nsAutoCString& aSource); > + bool FinishUTF16toUTF8(void); > + > + bool StartShifted; nit members should be prefixed by 'm'

Samuel Thibault

Assignee

Comment 49

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

> it looks like you have double NS_ConvertUTF16toUTF8 conversion now Ah, sorry, that part of the patch was not refreshed. > shouldn't you assert there? and everywhere where odd is used Right. Since the string surely comes from UTF16toUTF8 it is really supposed to be alright. Here is a reworked patch.

Attachment #8953660 - Attachment is obsolete: true

Attachment #8953660 - Flags: review?(surkov.alexander)

Attachment #8954429 - Flags: review?(surkov.alexander)

alexander :surkov (:asurkov)

Comment 50

•

8 years ago

Comment on attachment 8954429 [details] [diff] [review] multibyte Review of attachment 8954429 [details] [diff] [review]: ----------------------------------------------------------------- mostly nits, could you please update the patch one more time so I can do extra run? ::: accessible/atk/nsMaiInterfaceText.cpp @@ +164,4 @@ > proxy->TextSubstring(aStartOffset, aEndOffset, autoStr); > } > > + //libspi will free it. nit: space after // ::: accessible/atk/nsDOMtoATK.cpp @@ +7,5 @@ > +#include "nsDOMtoATK.h" > +#include "nsUTF8Utils.h" > + > +void > +NS_DOMtoATK::AdjustOffsets(gint *aStartOffset, gint *aEndOffset, nit: type* name @@ +26,5 @@ > + > +gchar* > +NS_DOMtoATK::Convert(nsAutoString aAutoStr) > +{ > + MOZ_ASSERT(mAdjusted); nit: nice to have an assertion message that offsets has to be adjusted at this point @@ +47,5 @@ > + if (mStartShifted) { > + // PrepareUTF16toUTF8 added a heading character > + > + // There should be a character > + MOZ_ASSERT(aStr.Length() > 0); nit: it makes sense to turn these comments into assertion messages @@ +49,5 @@ > + > + // There should be a character > + MOZ_ASSERT(aStr.Length() > 0); > + // It should be complete > + MOZ_ASSERT (static_cast<int>(aStr.Length()) nit: extra space before ( @@ +50,5 @@ > + // There should be a character > + MOZ_ASSERT(aStr.Length() > 0); > + // It should be complete > + MOZ_ASSERT (static_cast<int>(aStr.Length()) > + >= UTF8traits::bytes(aStr.CharAt(0))); nit: no operator on a new line, I guess it should be fine to keep all this on same line @@ +57,5 @@ > + aStr.Cut(0, UTF8traits::bytes(aStr.CharAt(0))); > + } > + > + if (mEndShifted) { > + // PrepareUTF16toUTF8 added a trailing character nit: pls dot in the end (here and everywhere) @@ +59,5 @@ > + > + if (mEndShifted) { > + // PrepareUTF16toUTF8 added a trailing character > + > + int len = aStr.Length(); len is used in assertsions, you could use aStr.Lenght() directly @@ +63,5 @@ > + int len = aStr.Length(); > + // There should be a character > + MOZ_ASSERT(len > 0); > + > + int trail; nit: should be initialized @@ +81,5 @@ > + } > +} > + > +void > +NS_DOMtoATK::AddBOMs(nsAutoCString &aDest, const nsAutoCString& aSource) nit: nsAutoCString& aDest @@ +94,5 @@ > + } > + n++; > + } > + > + uint32_t desti; // Index within this uint32_t desti = 0; comment is confusing, it was probably left when NS_DOMtoATK was a string @@ +105,5 @@ > + > + if (bytes >= 4) { > + // Non-unicode16 character, add a BOM after it. > + > + if (bytes > (aSource.Length() - i)) { nit: you don't need braces around aSource.Lenght() - i @@ +107,5 @@ > + // Non-unicode16 character, add a BOM after it. > + > + if (bytes > (aSource.Length() - i)) { > + // We don't have the whole sequence, copy what we have > + for ( ; i < aSource.Length(); i++) { nit: no space before ; @@ +118,5 @@ > + // Copy whole sequence. > + for (uint32_t j = 0; j < bytes; j++) { > + aDest.Replace(desti++, 1, aSource[i + j]); > + } > + i += bytes-1; nit: space around - @@ +123,5 @@ > + > + // And add a BOM after it. > + aDest.Replace(desti++, 1, '\xEF'); > + aDest.Replace(desti++, 1, '\xBB'); > + aDest.Replace(desti++, 1, '\xBF'); won't it be equivalent to aDest.Repalce(dest, 3, "\xEF\xBB\xBF"); ::: accessible/atk/nsDOMtoATK.h @@ +6,5 @@ > + > +#include "nsString.h" > +#include "nsMai.h" > + > +// ATK offsets are counted in unicode codepoints, while DOM offsets are counted nit: usually class and method comments look this way: /** * Blabla */ @@ +28,5 @@ > +// AdjustOffsets before peeking the DOM string, and only after call Convert to > +// get the proper string with non-unicode16 characters expressed in just one > +// unicode character followed by a BOM. > + > +class NS_DOMtoATK nit: netscape still reminds about itself :) no NS is needed, just DOMToATK should be fine @@ +32,5 @@ > +class NS_DOMtoATK > +{ > +public: > + NS_DOMtoATK(void) { > + mAdjusted = false; should be wrapped by DEBUG

Samuel Thibault

Assignee

Comment 51

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

Here is a reworked patch.

Attachment #8954429 - Attachment is obsolete: true

Attachment #8954429 - Flags: review?(surkov.alexander)

Attachment #8954683 - Flags: review?(surkov.alexander)

alexander :surkov (:asurkov)

Comment 52

•

8 years ago

Comment on attachment 8954683 [details] [diff] [review] multibyte Review of attachment 8954683 [details] [diff] [review]: ----------------------------------------------------------------- more nits and more questions, and thank you for being patient about my review style :) ::: accessible/atk/DOMtoATK.cpp @@ +8,5 @@ > +#include "nsUTF8Utils.h" > + > +void > +DOMtoATK::AdjustOffsets(gint* aStartOffset, gint* aEndOffset, > + gint count) nit: wrong offset @@ +20,5 @@ > + (*aEndOffset)++; > + mEndShifted = true; > + } > + > + mAdjusted = true; wrap it by DEBUG @@ +41,5 @@ > + return g_strdup(cautoStrBOMs.get()); > +} > + > +void > +DOMtoATK::FinishUTF16toUTF8(nsAutoCString &aStr) nit: type& name @@ +97,5 @@ > + aDest.SetLength(n); > + for (uint32_t i = 0; i < aSource.Length(); i++) { > + uint32_t bytes = UTF8traits::bytes(aSource[i]); > + > + if (bytes >= 4) { wonder whether 4 should be a constant of UTF8traits or String, because it has to stay in sync with aSource.Length type btw, I miss why >= 4 not > 4, since 4 bytes should a typical character, and you shouldn't add BOM, no? @@ +101,5 @@ > + if (bytes >= 4) { > + // Non-unicode16 character, add a BOM after it. > + > + if (bytes > aSource.Length() - i) { > + // We don't have the whole sequence, copy what we have. when it happens, could you add a more detailed comment please? @@ +112,5 @@ > + > + // Copy whole sequence. > + for (uint32_t j = 0; j < bytes; j++) { > + aDest.Replace(desti++, 1, aSource[i + j]); > + } it can be replaced on a single call, something like: aDest.Replace(desti++, bytes, Substring(aSource, i, bytes) @@ +113,5 @@ > + // Copy whole sequence. > + for (uint32_t j = 0; j < bytes; j++) { > + aDest.Replace(desti++, 1, aSource[i + j]); > + } > + i += bytes - 1; could you please double check this logic, i++ means 4 bytes shift, so if 'bytes' are 6, then you increment i at 2, which equivalent to 8 bytes if I don't miss anything @@ +123,5 @@ > + // Unicode16, just copy the byte. > + aDest.Replace(desti++, 1, aSource[i]); > + } > + } > + MOZ_ASSERT(desti == n, "incoherency between computed length" nit: i -> I (uppercase)

Samuel Thibault

Assignee

Comment 53

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

> > + uint32_t bytes = UTF8traits::bytes(aSource[i]); > > + > > + if (bytes >= 4) { > > wonder whether 4 should be a constant of UTF8traits or String, because it has to stay in sync with aSource.Length type > > btw, I miss why >= 4 not > 4, since 4 bytes should a typical character, and you shouldn't add BOM, no? I have added a comment: 4 here is just the exact number of bytes that UTF-8 starts using for encoding non-unicode16 characters (yes, we are lucky that it's exactly that boundary: 3 bytes in utf8 can code up to U+FFFF, and U+10000 needs 4 bytes.) > > + if (bytes >= 4) { > > + // non-unicode16 character, add a bom after it. > > + > > + if (bytes > asource.length() - i) { > > + // we don't have the whole sequence, copy what we have. > > when it happens, could you add a more detailed comment please? Ah, this is actually again defensive code that I was putting. Since the utf-8 string comes from the UTF16toUTF8 converter, this is really not supposed to happen, so I have put a MOZ_ASSERT instead. > > + // Copy whole sequence. > > + for (uint32_t j = 0; j < bytes; j++) { > > + aDest.Replace(desti++, 1, aSource[i + j]); > > + } > > + i += bytes - 1; > > could you please double check this logic, i++ means 4 bytes shift, No, aSource is a one-byte-one-char string. We are here just eating the bytes that we have copied into aDest, minus the one byte that the for loop will count. Here is a reworked patch.

Attachment #8954683 - Attachment is obsolete: true

Attachment #8954683 - Flags: review?(surkov.alexander)

Attachment #8954802 - Flags: review?(surkov.alexander)

alexander :surkov (:asurkov)

Updated

•

8 years ago

Attachment #8954802 - Attachment is patch: true

alexander :surkov (:asurkov)

Comment 54

•

8 years ago

Comment on attachment 8954802 [details] [diff] [review] multibyte Review of attachment 8954802 [details] [diff] [review]: ----------------------------------------------------------------- I'm not sure I can testify that I 100% understand all logic behind adding a BOM. I don't have a11y-related concerns, so I'm tempting to set r+, but it'd be nice to have somebody more experienced than myself to look at this. Since the patch contains xpcom string related changes, then could you, David, take a look at conversion part as well? ::: accessible/atk/DOMtoATK.cpp @@ +100,5 @@ > + > + if (bytes >= 4) { > + // Non-unicode16 character, add a BOM after it. > + > + if (bytes > aSource.Length() - i) { assert as you suggested? @@ +112,5 @@ > + > + // Copy whole sequence. > + for (uint32_t j = 0; j < bytes; j++) { > + aDest.Replace(desti++, 1, aSource[i + j]); > + } still, could be replaces by a single call? ::: accessible/atk/DOMtoATK.h @@ +30,5 @@ > + * get the proper string with non-unicode16 characters expressed in just one > + * unicode character followed by a BOM. > + */ > + > +class DOMtoATK nit no new line between class and comment @@ +47,5 @@ > + * them. > + */ > + void AdjustOffsets(gint* aStartOffset, gint* aEndOffset, gint count); > + > + gchar* Convert(nsAutoString aAutoStr); nit: a small comment wouldn't harm @@ +62,5 @@ > + void FinishUTF16toUTF8(nsAutoCString& aStr); > + > + bool mAdjusted; > + bool mStartShifted; > + bool mEndShifted; please initialize those in constructor too

Attachment #8954802 - Flags: review?(surkov.alexander)

Attachment #8954802 - Flags: review?(dbaron)

Attachment #8954802 - Flags: review+

Samuel Thibault

Assignee

Comment 55

•

8 years ago

Errr, sorry, it seems the patch didn't get refreshed actually...

Samuel Thibault

Assignee

Comment 56

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

Here is the fixed really refreshed patch, sorry for the previous one.

Attachment #8954802 - Attachment is obsolete: true

Attachment #8954802 - Flags: review?(dbaron)

Attachment #8954825 - Flags: review?(surkov.alexander)

Attachment #8954825 - Flags: review?(dbaron)

Samuel Thibault

Assignee

Comment 57

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

Here is also the few additional comments you made addressed.

Attachment #8954825 - Attachment is obsolete: true

Attachment #8954825 - Flags: review?(surkov.alexander)

Attachment #8954825 - Flags: review?(dbaron)

Attachment #8954827 - Flags: review?(surkov.alexander)

Attachment #8954827 - Flags: review?(dbaron)

alexander :surkov (:asurkov)

Comment 58

•

8 years ago

Comment on attachment 8954827 [details] [diff] [review] multibyte Review of attachment 8954827 [details] [diff] [review]: ----------------------------------------------------------------- ::: accessible/atk/nsMaiInterfaceText.cpp @@ +244,5 @@ > + // Heading surrogate, get the trailing surrogate and combine them. > + gunichar characterLow = static_cast<gunichar>(text->CharAt(aOffset + 1)); > + > + if (!NS_IS_LOW_SURROGATE(characterLow)) { > + // It should have been a trailing surrogate... Flag the error. btw, should we assert here? or such strings exist? ::: accessible/atk/DOMtoATK.h @@ +37,5 @@ > +#ifdef DEBUG > + mAdjusted = false; > +#endif > + mStartShifted = false; > + mEndShifted = false; it's better to put these into initialization list @@ +51,5 @@ > + > + /** > + * After calling AdjustOffsets and obtained the increased text portion, this > + * can be called to convert to ATK properly (add BOMs and remove the > + * additional characters requested by AdjustOffsets). maybe: something simpler: Converts a string of accessible text into ATK gchar* string (add ...). Note, AdjustOffsets has to be called in advance. Another option to consider. If you enclosed most of atk's getText into this class, then you would avoid mAdjusted and stuff, i.e. that could look like getTextCB(AtkText *aText, gint aStartOff, ..) { return TextToATKString(ATK_OBJECT(aText), gint* aStartOffset, gint* aEndOffset); } class TextToATKString { TextToATKString(AtkObject* aObject, gint* aStartOffset, gint* aEndOffset) { } operator gchar*() { // converts to accessible text to atk string } };

Samuel Thibault

Assignee

Comment 59

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

> > + if (!NS_IS_LOW_SURROGATE(characterLow)) { > > + // It should have been a trailing surrogate... Flag the error. > > btw, should we assert here? or such strings exist? In principle it shouldn't happen, but I wouldn't be surprised that somehow some web pages could manage to sneak in such cases. That's different from the UTF-8 checks which should be alright already since it's UTF16toUTF8 output. > If you enclosed most of atk's getText into this class Actually I was surprised that only getText needs the conversion. It could very well happen that some newer ATK method would need it too, so stuffing getText into the class would then not make sense.

Attachment #8954827 - Attachment is obsolete: true

Attachment #8954827 - Flags: review?(surkov.alexander)

Attachment #8954827 - Flags: review?(dbaron)

Attachment #8954839 - Flags: review?(surkov.alexander)

Attachment #8954839 - Flags: review?(dbaron)

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Updated

•

8 years ago

Summary: atk_text_get_get() return value incorrect after multbyte character → atk_text_get_get() return value incorrect after multibyte character

alexander :surkov (:asurkov)

Comment 60

•

8 years ago

(In reply to Samuel Thibault from comment #59) > Created attachment 8954839 [details] [diff] [review] > multibyte > > > > + if (!NS_IS_LOW_SURROGATE(characterLow)) { > > > + // It should have been a trailing surrogate... Flag the error. > > > > btw, should we assert here? or such strings exist? > > In principle it shouldn't happen, but I wouldn't be surprised that somehow > some web pages could manage to sneak in such cases. That's different from > the UTF-8 checks which should be alright already since it's UTF16toUTF8 > output. > > > If you enclosed most of atk's getText into this class > > Actually I was surprised that only getText needs the conversion. It could > very well happen that some newer ATK method would need it too, so stuffing > getText into the class would then not make sense. fair enough

alexander :surkov (:asurkov)

Updated

•

8 years ago

Attachment #8954839 - Flags: review?(surkov.alexander) → review+

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 61

•

8 years ago

Comment on attachment 8954839 [details] [diff] [review] multibyte I'm assuming you're asking me to review only the part in xpcom/string/. >--- a/xpcom/string/nsUTF8Utils.h >+++ b/xpcom/string/nsUTF8Utils.h >@@ -54,6 +54,23 @@ public: > { > return (aChar & 0xFE) == 0xFC; > } >+ static int bytes(char aChar) Please add a comment above this saying something like: // return the number of bytes in a sequence beginning with aChar >+ { >+ if (isASCII(aChar)) >+ return 1; >+ if (is2byte(aChar)) >+ return 2; >+ if (is3byte(aChar)) >+ return 3; >+ if (is4byte(aChar)) >+ return 4; >+ if (is5byte(aChar)) >+ return 5; >+ if (is6byte(aChar)) >+ return 6; Please brace each of these if statements, with the style: if (isASCII(aChar)) { return 1; } (I also wonder if there's a faster way to do this.) >+ // Not supposed to be used for in-sequence, but return 1 Please replace this comment with MOZ_ASSERT_UNREACHABLE("should not be used for in-sequence characters"); r=dbaron on this part of the patch, with those changes

Attachment #8954839 - Flags: review?(dbaron) → review+

Samuel Thibault

Assignee

Comment 62

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

> (I also wonder if there's a faster way to do this.) You mean, doing it with a bit of computation? Well, strictly speaking it could be a matter of counting the number of heading 1 bits, with special casing 0 (ascii) and 1 (in-sequence). In the end the most common case (ascii) will not be faster anyway, and we'll get a very subtle-to-understand formula for non-common cases :) > Please replace this comment with MOZ_ASSERT_UNREACHABLE("should not be used for in-sequence characters"); Right :)

Attachment #8954839 - Attachment is obsolete: true

Attachment #8954962 - Flags: review?(surkov.alexander)

Attachment #8954962 - Flags: review?(dbaron)

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 63

•

8 years ago

Comment on attachment 8954962 [details] [diff] [review] multibyte >+ MOZ_ASSERT_UNREACHABLE("should not be used for in-sequence characters"); You should still have a return after this. (Probably stick to 1?) r=dbaron on this file, with that change

Attachment #8954962 - Flags: review?(dbaron) → review+

Samuel Thibault

Assignee

Comment 64

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

Ah, ok :)

Attachment #8954962 - Attachment is obsolete: true

Attachment #8954962 - Flags: review?(surkov.alexander)

Attachment #8954964 - Flags: review?(surkov.alexander)

Attachment #8954964 - Flags: review?(dbaron)

alexander :surkov (:asurkov)

Comment 65

•

8 years ago

(In reply to David Baron :dbaron: ⌚️UTC-8 from comment #61) > Comment on attachment 8954839 [details] [diff] [review] > multibyte > > I'm assuming you're asking me to review only the part in xpcom/string/. David, could you please take a look at logic of DOMToATK class as well? It'd be nice to have someone to check it, who knows strings internals.

Flags: needinfo?(dbaron)

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 66

•

8 years ago

Comment on attachment 8954964 [details] [diff] [review] multibyte OK, additional review per comment 65: Throughout, in comments, you should replace the term "non-unicode16" with "non-BMP". >+ * BOMs (Byte Order Marks, U+FEFF) normally only appear at the beginning of >+ * unicode files, but their occurrence within text (notably after cut&paste) is >+ * not uncommon, and are thus considered as non-text. It's worth noting here that the BOM is also known as a "ZERO WIDTH NO-BREAK SPACE". >+ * The DOMtoATK class handles the conversion. Since the selection requested >+ * through ATK may not contain both surrogates at the ends of the selection, we >+ * need to fetch one UTF-16 code point more on both side, and get rid of it >+ * before returning the string to ATK. The caller thus needs to call >+ * AdjustOffsets before peeking the DOM string, and only after call Convert to replace "before peeking the DOM string, and only after" with ", then use the possibly-adjusted offsets to get the data from the DOM string, and then" >+ * get the proper string with non-unicode16 characters expressed in just one "expressed in" -> "expressed as". >+ * unicode character followed by a BOM. That said, this two-step API is confusing and seems likely to be difficult to maintain. If it weren't for the other callers of ConvertTexttoAsterisks, I'd suggest that you instead: * replace ConvertTexttoAsterisks with a boolean indicating whether to do the conversion * replace the three-step API described above with a one-step API (templatized over something that can be AccessibleWrap or ProxyAccessible) that takes the whole thing in a single call, taking a boolean argument for whether to do the conversion to asterisks (This makes me wonder, though, why you're not changing the other callers of ConvertTexttoAsterisks? Do they not require the same change?) (Also, why is the "to" in ConvertTexttoAsterisks not capitalized like it should be?) DOMtoATK::AdjustOffsets should MOZ_ASSERT(!mAdjusted, ...) to enforce the proper way of calling the API. In DOMtoATK::FinishUTF16toUTF8, all 3 occurrences of "heading character" should be "leading character". A number of comments refer to PrepareUTF16toUTF8, which doesn't exist. I think they should refer to AdjustOffsets. >+ // drop first character >+ aStr.Cut(0, UTF8traits::bytes(aStr.CharAt(0))); This is somewhat inefficient since it shifts all of the bytes in the character. It would be better if: * the g_strdup were inside this function * this function maintained an offset to use for the start However, the order of operations of AddBOMs and the stripping of the leading and trailing characters seems backwards. It seems like this will behave incorrectly if you add a BOM for the first or last character, and mStartShifted/mEndShifted is true. >- //copy and return, libspi will free it. This comment should probably be put in DOMtoAtk::Convert rather than just removed. In the first loop in DOMtoATK::AddBOMs, don't call UTF8traits::bytes twice on the same character; put the result in a variable. The first loop in DOMtoATK::AddBOMs also seems to compute n incorrectly for any multi-byte character, since it only adds 1 to n rather than the number of bytes needed. This makes me wonder how you managed not to hit the MOZ_ASSERT at the end of the function. Are you testing this patch with a debug build? I think the code would be clearer if you renamed variables: i -> srci n -> destlength I think it would also be clearer if you took the i++ out of the for() expressions and just did that increment in the loop (dropping the "- 1" from the places you increment i inside the loop). Some of these concerns require some significant work, and I'm also not sure if I reviewed the whole thing given how much I was jumping around, so marking review-, since I'd like to review it again after these revisions.

Flags: needinfo?(dbaron)

Attachment #8954964 - Flags: review?(dbaron) → review-

Samuel Thibault

Assignee

Comment 67

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

> If it weren't for the other callers of ConvertTexttoAsterisks, I'd suggest that you instead: * replace ConvertTexttoAsterisks ? This patch has nothing to do with ConvertTextToAsterisks > However, the order of operations of AddBOMs and the stripping of the > leading and trailing characters seems backwards. It seems like this > will behave incorrectly if you add a BOM for the first or last > character, and mStartShifted/mEndShifted is true. On the contrary, it'll work fine, precisely thanks to the BOM. Say text is just one non-BMP character, thus two UTF-16 codepoints in the DOM. From the point of view of ATK we want to make it appear as that character plus the BOM (to get two unicode characters). If ATK asks for Start/End as 0/1, AdjustOffsets will make that 0/2, we will properly get the two UTF-16 codepoints, convert to utf-8, add the BOM, and drop the BOM. If ATK asks for Start/End as 1/2, AdjustOffsets will make that 0/2, we will properly get the two UTF-16 codepoints, convert to utf-8, add the BOM, and drop the non-BMP character. > The first loop in DOMtoATK::AddBOMs also seems to compute n incorrectly for any multi-byte character, since it only adds 1 to n rather than the number of bytes needed. Urgl, indeed. It was to be expected after all these rewrites of this patch, that I'd let a bug slip in and miss checking it with DEBUG. Here is a reworked patch.

Attachment #8954964 - Attachment is obsolete: true

Attachment #8954964 - Flags: review?(surkov.alexander)

Attachment #8954985 - Flags: review?(surkov.alexander)

Attachment #8954985 - Flags: review?(dbaron)

Samuel Thibault

Assignee

Comment 68

•

8 years ago

Ergl, I actually again failed to build with proper DEBUG being enabled of course, let me fix that......

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 69

•

8 years ago

(In reply to Samuel Thibault from comment #67) > > If it weren't for the other callers of ConvertTexttoAsterisks, I'd suggest that you instead: * replace ConvertTexttoAsterisks > > ? This patch has nothing to do with ConvertTextToAsterisks It absolutely does, because that's the only thing blocking a much cleaner API rather than this confusing 3-step thing.

Samuel Thibault

Assignee

Comment 70

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

There it is.

Attachment #8954985 - Attachment is obsolete: true

Attachment #8954985 - Flags: review?(surkov.alexander)

Attachment #8954985 - Flags: review?(dbaron)

Attachment #8954991 - Flags: review?(surkov.alexander)

Attachment #8954991 - Flags: review?(dbaron)

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 71

•

8 years ago

(And I'm still curious why you need to modify getTextCB, but not getTextAfterOffsetCB, getTextAtOffsetCB, and getTextBeforeOffsetCB.)

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 72

•

8 years ago

(In reply to Samuel Thibault from comment #67) > > However, the order of operations of AddBOMs and the stripping of the > > leading and trailing characters seems backwards. It seems like this > > will behave incorrectly if you add a BOM for the first or last > > character, and mStartShifted/mEndShifted is true. > > On the contrary, it'll work fine, precisely thanks to the BOM. Say text is > just one non-BMP character, thus two UTF-16 codepoints in the DOM. From the > point of view of ATK we want to make it appear as that character plus the > BOM (to get two unicode characters). If ATK asks for Start/End as 0/1, > AdjustOffsets will make that 0/2, we will properly get the two UTF-16 > codepoints, convert to utf-8, add the BOM, and drop the BOM. If ATK asks for > Start/End as 1/2, AdjustOffsets will make that 0/2, we will properly get the > two UTF-16 codepoints, convert to utf-8, add the BOM, and drop the non-BMP > character. OK, I guess I'd forgotten that the cases where you add the BOM and the cases where you deal with surrogates are exactly the same. However, the above behavior still seems surprising. You should at least have comments explaining that the intent is that: * if the start is between the high and low surrogates, the UTF-8 result includes a BOM from it but not the character * if the end is between the high and low surrogates, the UTF-8 result includes the character but *not* the BOM * all non-BMP characters that are fully in the string are in the UTF-8 result as character followed by BOM It's not obvious that that's a desirable behavior, so if it's what you want, you should explain it.

Samuel Thibault

Assignee

Comment 73

•

8 years ago

> > > If it weren't for the other callers of ConvertTexttoAsterisks, I'd suggest that you instead: * replace ConvertTexttoAsterisks > > > > ? This patch has nothing to do with ConvertTextToAsterisks > > It absolutely does, because that's the only thing blocking a much cleaner API rather than this confusing 3-step thing. > (templatized over something that can be AccessibleWrap or ProxyAccessible) Ah, now I see. Uh, I'd have to learn how to do such templatization in C++ then. > why you're not changing the other callers of ConvertTexttoAsterisks? Do they not require the same change?) In the other cases, the offsets are rounded by the Accessible objects, so they can't fall between surrogate pairs and we thus don't need AdjustOffsets/Finish. That being said, we do indeed need addBOMs for them for coherency of UTF-8 string lengths.

Samuel Thibault

Assignee

Comment 74

•

8 years ago

(about convertTexttoAsterisk) > that's the only thing blocking a much cleaner API rather than this confusing 3-step thing There is also the getText() call which is in between.

Samuel Thibault

Assignee

Comment 75

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

Here is a fixed patch with addition of BOMs for the other gettext* methods. Concerning having to use two steps, yes, I agree this is really non-usual. But as explained previously, there is no way around it because there is fundamental interleaving between having to shift the offsets before calling the Accessible method with the shifted offsets, and only after, finishing the resulting string. The class is made to keep this as just one class with two calls, instead of exposing the state of the conversion etc. Yes, we could embed the asterisk thing, it's not really a problem (actually that's why I didn't even understand why you were mentioning it, for me it was out of the issues at stake), but there remains the getText() call. We could also embed the call to getText() into the class but that changes the meaning of the class a lot (it does not just convert) and then the getTextCB function basically only makes one call to class and nothing else, and there is no other caller (or else the class would have to learn which other getText* it should call), i.e. we'd have just moved the content of getTextCB into the class, what becomes the point of defining a class at all then?

Attachment #8954991 - Attachment is obsolete: true

Attachment #8954991 - Flags: review?(surkov.alexander)

Attachment #8954991 - Flags: review?(dbaron)

Attachment #8955198 - Flags: review?(surkov.alexander)

Attachment #8955198 - Flags: review?(dbaron)

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 76

•

8 years ago

I'd note that getCharacterAtOffsetCB looks like it can read a character past the end of the buffer if the last character in the buffer is an unpaired high surrogate. That's a potential crash that should be fixed. (In reply to Samuel Thibault from comment #75) > Concerning having to use two steps, yes, I agree this is really non-usual. > But as explained previously, there is no way around it because there is > fundamental interleaving between having to shift the offsets before calling > the Accessible method with the shifted offsets, and only after, finishing > the resulting string. The class is made to keep this as just one class with > two calls, instead of exposing the state of the conversion etc. I think there is a way around it: a function that's templatized over the underlying class of the accessible. (The templatized method should just have the three things in it so that you don't end up generating the code multiple times for a large thing -- the underlying pieces that AdjustOffsets and ConvertAdjusted do can remain non-templatized.) In other words, you can do something like: enum class AtkStringConvertFlags : uint32_t { ConvertTextToAsterisks = 1 << 0, }; MOZ_MAKE_ENUM_CLASS_BITWISE_OPERATORS(AtkStringConvertFlags) template <class AccessibleOrProxy> gchar* StringToATKString(AccesibleOrProxy* aAccessible, gint aStartOffset, gint aEndOffset, AtkStringConvertFlags aFlags) { gint startOffset = aStartOffset, endOffset = aEndOffset; StringToATKStringPrivate atkStringConverter; /* was DOMtoATK */ atkStringConverter.AdjustOffsets(&startOffset, &endOffset, gint(aAccessible->CharacterCount())); nsAutoString str; aAccessible->TextSubstring(startOffset, endOffset, str); return atkStringConverter.ConvertAdjusted(str, aFlags); } > Yes, we could embed the asterisk thing, it's not really a problem (actually > that's why I didn't even understand why you were mentioning it, for me it > was out of the issues at stake), but there remains the getText() call. We > could also embed the call to getText() into the class but that changes the > meaning of the class a lot (it does not just convert) and then the getTextCB > function basically only makes one call to class and nothing else, and there > is no other caller (or else the class would have to learn which other > getText* it should call), i.e. we'd have just moved the content of getTextCB > into the class, what becomes the point of defining a class at all then? getTextCB still needs to handle accessibles and proxies somewhat differently. However, the point of defining a class is to define an API that's a lot harder for the caller to mess up. Instead of having an API that must be called using an exact sequence of three things in sequence, the API is a single function call, which is a much more reasonable thing to reason about when looking at and thinking about the calling code.

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 77

•

8 years ago

And a version with probably better naming suggestions: template <class AccessibleOrProxy> gchar* NewATKString(AccesibleOrProxy* aAccessible, gint aStartOffset, gint aEndOffset, AtkStringConvertFlags aFlags) { gint startOffset = aStartOffset, endOffset = aEndOffset; ATKStringConverterPrivate converter; /* was DOMtoATK */ converter.AdjustOffsets(&startOffset, &endOffset, gint(aAccessible->CharacterCount())); nsAutoString str; aAccessible->TextSubstring(startOffset, endOffset, str); return converter.ConvertAdjusted(str, aFlags); }

Samuel Thibault

Assignee

Comment 78

•

8 years ago

> I'd note that getCharacterAtOffsetCB looks like it can read a character past the end of the buffer if the last character in the buffer is an unpaired high surrogate. That's a potential crash that should be fixed. No, Accessible's CharAt already returns '\0' for out-of-bound accesses (otherwise a screen reader could crash firefox). > I think there is a way around it: a function that's templatized over the underlying class of the accessible. Ok, I see the idea. It's a bit more complex than that, because one doesn't necessarily want to call TextSubstring (that's only for getTextCB, getTextAtOffsetCB will need TextAtOffset, ditto for Before/After, but I think with a bit of more templating that should be workable). I'll have a look.

Samuel Thibault

Assignee

Comment 79

•

8 years ago

> However, the point of defining a class is to define an API that's a lot harder for the caller to mess up. Sure, I agree in general. But if we are *only* moving the *whole* content of getTextCB into a class, and that can't be used for anything else, there is no point: we'd have just moved code which can't be reused, and added complexity.

Samuel Thibault

Assignee

Comment 80

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

Here is a reworked patch. It is reworked as you proposed, and for getCharacterAtOffsetCB too. Hoping that we start converging to something :)

Attachment #8955198 - Attachment is obsolete: true

Attachment #8955198 - Flags: review?(surkov.alexander)

Attachment #8955198 - Flags: review?(dbaron)

Attachment #8956057 - Flags: review?(surkov.alexander)

Attachment #8956057 - Flags: review?(dbaron)

alexander :surkov (:asurkov)

Comment 81

•

8 years ago

David, is it still on your review queue, right? Just pinging to make sure we don't wait each other or something. Technically I r+'ed the patch already, but I would take another run after you're good with the patch (the changes were rather significant).

Flags: needinfo?(dbaron)

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 82

•

8 years ago

So I don't understand why getTextCB needs to deal with the possibility that the boundaries might fall between surrogates, but getTextAfterOffsetCB and getTextBeforeOffsetCB (and maybe some of the other callbacks too) don't need to deal with that possibility. At the callsites of NewATKString, you shouldn't need template syntax; the compiler should deduce the correct template parameter from the arguments provided. I think beyond that, I think it's probably better if you don't wait on me for finishing this -- Surkov should review the new patch, and then if he thinks it's good, it can land.

Flags: needinfo?(dbaron)

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Updated

•

8 years ago

Attachment #8956057 - Flags: review?(dbaron)

Samuel Thibault

Assignee

Comment 83

•

8 years ago

(In reply to David Baron :dbaron: ⌚️UTC-8 from comment #82) > So I don't understand why getTextCB needs to deal with the possibility that > the boundaries might fall between surrogates, but getTextAfterOffsetCB and > getTextBeforeOffsetCB (and maybe some of the other callbacks too) don't need > to deal with that possibility. That is because in the getTextCB case, it's the screen reader which decides the precise offsets. In the other getText* cases, it's the Accessible object which rounds offsets to start or end of word/sentence/line/etc.

Samuel Thibault

Assignee

Comment 84

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

In this version I have just removed the template syntax as advise by David.

Attachment #8956057 - Attachment is obsolete: true

Attachment #8956057 - Flags: review?(surkov.alexander)

Attachment #8958751 - Flags: review?(surkov.alexander)

alexander :surkov (:asurkov)

Comment 85

•

8 years ago

(In reply to Samuel Thibault from comment #83) > (In reply to David Baron :dbaron: ⌚️UTC-8 from comment #82) > > So I don't understand why getTextCB needs to deal with the possibility that > > the boundaries might fall between surrogates, but getTextAfterOffsetCB and > > getTextBeforeOffsetCB (and maybe some of the other callbacks too) don't need > > to deal with that possibility. > > That is because in the getTextCB case, it's the screen reader which decides > the precise offsets. In the other getText* cases, it's the Accessible object > which rounds offsets to start or end of word/sentence/line/etc. ATK_TEXT_BOUNDARY_CHAR might be special though? Anyway, we can catch these in followups if necessary, I don't think it can create any new problems.

Samuel Thibault

Assignee

Comment 86

•

8 years ago

I was about to say that ATK_TEXT_BOUNDARY_CHAR is supposed to return exactly one UTF-8 character, and not one UTF-16 codepoint, and thus again offsets should be rounded to UTF-8 characters boundaries, and not fall between UTF-16 surrogates, but looking again at the at-spi definition: "Start and end offsets differ by one, by definition, for this value." which of course makes sense in the at-spi world, but here we'd be tempted to return two UTF-8 characters, but that won't be fine. So again we have to return separate characters: the real character and the BOM. Actually, testing DOM's behavior, nsIAccessibleText::BOUNDARY_CHAR does separate surrogates, so it already has the behavior of returning just one UTF-16 surrogate. The only issue is that the conversion to UTF-8 will them turn that into U+FFFD. I would say that to avoid making things even more complex we can special-case that: the ATK_TEXT_BOUNDARY_CHAR case really is almost very much like a getCharacterAtOffset. Otherwise, a generic version would have to make the Accessible call once only to get the resulting offsets, then shift them, then make the Accessible call again to get the text, and then fix the result. While in the ATK_TEXT_BOUNDARY_CHAR the resulting offsets are trivial to compute (-1/+0/+1) and we can just then call getTextCB as it is already. I'd indeed rather do that in a separate patch :)

Samuel Thibault

Assignee

Comment 87

•

8 years ago

Attached patch getchar (obsolete) — Details — Splinter Review

Here is what it looks like.

Attachment #8958897 - Flags: review?(surkov.alexander)

alexander :surkov (:asurkov)

Comment 88

•

8 years ago

Comment on attachment 8958751 [details] [diff] [review] multibyte Review of attachment 8958751 [details] [diff] [review]: ----------------------------------------------------------------- I didn't look closely at strings conversion logic, it seems David was ok with it. I would use base classes in arguments, for example, nsAString instead nsAutoString, use references at strings to avoid copy, and use const where possible. Otherwise, it looks good. Please let keep an eye on regressions, the code is complicated. ::: accessible/atk/nsMaiInterfaceText.cpp @@ +131,3 @@ > { > // convert each char to "*" when it's "password text" > if (accWrap->NativeRole() == roles::PASSWORD_TEXT) { could you please file a follow up to get rid of NativeRole() == roles::PASSWORD_TEXT by adding Accessible::IsPassword() method? ::: accessible/atk/DOMtoATK.cpp @@ +12,5 @@ > + > +namespace mozilla::a11y::DOMtoATK { > + > +void > +AddBOMs(nsAutoCString& aDest, const nsAutoCString& aSource) const for aSource @@ +78,5 @@ > +#endif > +} > + > +gchar* > +ATKStringConverterPrivate::FinishUTF16toUTF8(nsAutoCString& aStr) I guess you can use base class here nsACString& @@ +119,5 @@ > + return g_strdup(aStr.get() + skip); > +} > + > +gchar* > +ATKStringConverterPrivate::ConvertAdjusted(nsAutoString aAutoStr) const nsAString& @@ +134,5 @@ > + return FinishUTF16toUTF8(cautoStrBOMs); > +} > + > +gchar* > +Convert(nsAutoString aAutoStr) const nsAString& ? you should avoid copying a string @@ +149,5 @@ > + > +void > +ConvertTexttoAsterisks(nsAString& aString) > +{ > + for (uint32_t i = 0; i < aString.Length(); i++) nit: technically moz coding style requires {} around single if and for ::: accessible/atk/DOMtoATK.h @@ +70,5 @@ > + }; > + > + MOZ_MAKE_ENUM_CLASS_BITWISE_OPERATORS(AtkStringConvertFlags) > + > + class ATKStringConverterPrivate { I would probably name it 'Helper' instead 'Private', because this class it not really private and actually visible to everyone

Attachment #8958751 - Flags: review?(surkov.alexander) → review+

alexander :surkov (:asurkov)

Comment 89

•

8 years ago

Comment on attachment 8958897 [details] [diff] [review] getchar Review of attachment 8958897 [details] [diff] [review]: ----------------------------------------------------------------- it makes sense to move this patch into separate bug I'd say, this one gets too long ::: accessible/atk/nsMaiInterfaceText.cpp @@ +160,4 @@ > return nullptr; > } > > +static gint getCharacterCountCB(AtkText *aText); nit: type* name @@ +165,5 @@ > +static gchar* > +getCharTextAtOffset(AtkText *aText, gint aOffset, > + gint *aStartOffset, gint *aEndOffset) > +{ > + gint end = aOffset + 1; I wonder if it breaks magic offsets, like -1 or -2 @@ +273,4 @@ > { > nsAutoString autoStr; > int32_t startOffset = 0, endOffset = 0; > + AccessibleWrap* accWrap; it is a weird change, could you keep this line unchanged please? :)

Samuel Thibault

Assignee

Comment 90

•

8 years ago

(In reply to alexander :surkov from comment #89) > @@ +165,5 @@ > > +static gchar* > > +getCharTextAtOffset(AtkText *aText, gint aOffset, > > + gint *aStartOffset, gint *aEndOffset) > > +{ > > + gint end = aOffset + 1; > > I wonder if it breaks magic offsets, like -1 or -2 In at-spi, only the -1 magic offset is defined and only for getText's end offsets. Others do not have magic offsets. > @@ +273,4 @@ > > { > > nsAutoString autoStr; > > int32_t startOffset = 0, endOffset = 0; > > + AccessibleWrap* accWrap; > > it is a weird change, could you keep this line unchanged please? :) Well, sure, it's just that we then call GetAccessibleWrap for no use, when aBoundaryType is ATK_TEXT_BOUNDARY_CHAR. Oh, but the firefox coding style doesn't forbid mixing variable declaration and code, does it?

alexander :surkov (:asurkov)

Comment 91

•

8 years ago

(In reply to Samuel Thibault from comment #90) > (In reply to alexander :surkov from comment #89) > > @@ +165,5 @@ > > > +static gchar* > > > +getCharTextAtOffset(AtkText *aText, gint aOffset, > > > + gint *aStartOffset, gint *aEndOffset) > > > +{ > > > + gint end = aOffset + 1; > > > > I wonder if it breaks magic offsets, like -1 or -2 > > In at-spi, only the -1 magic offset is defined and only for getText's end > offsets. Others do not have magic offsets. oh, that's complicated, you may want to add a comment > > @@ +273,4 @@ > > > { > > > nsAutoString autoStr; > > > int32_t startOffset = 0, endOffset = 0; > > > + AccessibleWrap* accWrap; > > > > it is a weird change, could you keep this line unchanged please? :) > > Well, sure, it's just that we then call GetAccessibleWrap for no use, when > aBoundaryType is ATK_TEXT_BOUNDARY_CHAR. Oh, but the firefox coding style > doesn't forbid mixing variable declaration and code, does it? it's not forbidden for sure, but I might be missing something, what's wrong with: if (aBoundaryType == ATK_TEXT_BOUNDARY_CHAR) { return getCharTextAtOffset(aText, aOffset - 1, aStartOffset, aEndOffset); } AccessibleWrap* accWrap = GetAccessibleWrap(ATK_OBJECT(aText));

Samuel Thibault

Assignee

Comment 92

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

Here is the reworked multibyte, only changing string types and coding style (and retested as working fine of course).

Attachment #8958751 - Attachment is obsolete: true

Attachment #8959129 - Flags: review?(surkov.alexander)

alexander :surkov (:asurkov)

Comment 93

•

8 years ago

Comment on attachment 8959129 [details] [diff] [review] multibyte Review of attachment 8959129 [details] [diff] [review]: ----------------------------------------------------------------- ::: accessible/atk/DOMtoATK.cpp @@ +78,5 @@ > +#endif > +} > + > +gchar* > +ATKStringConverterHelper::FinishUTF16toUTF8(nsCString& aStr) I think it should nsACString, A stands for abstract, it works equally both for nsCString and nsAutoCString

Attachment #8959129 - Flags: review?(surkov.alexander) → review+

Samuel Thibault

Assignee

Comment 94

•

8 years ago

(In reply to alexander :surkov from comment #93) > ::: accessible/atk/DOMtoATK.cpp > @@ +78,5 @@ > > +#endif > > +} > > + > > +gchar* > > +ATKStringConverterHelper::FinishUTF16toUTF8(nsCString& aStr) > > I think it should nsACString, A stands for abstract, it works equally both > for nsCString and nsAutoCString Yes, but since nsACString might be non-null-terminated, it does not provide the get() method, which we do need here.

alexander :surkov (:asurkov)

Comment 95

•

8 years ago

(In reply to Samuel Thibault from comment #94) > > > +gchar* > > > +ATKStringConverterHelper::FinishUTF16toUTF8(nsCString& aStr) > > > > I think it should nsACString, A stands for abstract, it works equally both > > for nsCString and nsAutoCString > > Yes, but since nsACString might be non-null-terminated, it does not provide > the get() method, which we do need here. fair enough

alexander :surkov (:asurkov)

Comment 96

•

8 years ago

do you have perms to try-server the patch and then land it?

alexander :surkov (:asurkov)

Comment 97

•

8 years ago

Comment on attachment 8958897 [details] [diff] [review] getchar obsolete because was moved to new bug 1445954

Attachment #8958897 - Attachment is obsolete: true

Attachment #8958897 - Flags: review?(surkov.alexander)

Samuel Thibault

Assignee

Comment 98

•

8 years ago

No, I don't have any particular permissions on mozilla.

alexander :surkov (:asurkov)

Comment 99

•

8 years ago

(In reply to Samuel Thibault from comment #98) > No, I don't have any particular permissions on mozilla. you probably should request then, you have my vouch if needed. since patched code is not covered by tests, try server shouldn't fail, thus checkin-needed may be enough to get it landed

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Assignee: nobody → samuel.thibault

Tiberius Oros[:tiberius_oros]

Comment 100

•

8 years ago

Could not land patch.

Flags: needinfo?(samuel.thibault)

Samuel Thibault

Assignee

Comment 101

•

8 years ago

Attached patch multibyte (obsolete) — Details — Splinter Review

Ah, yes, the patch used to work fine over hg tip, but not with the latest changes. Here is a version which works fine over tip: the only change is the aString.ReplaceLiteral(i, 1, u"*"); line moved unchanged by the patch from ConvertTexttoAsterisks to ConvertTexttoAsterisks which needed an update for the tip.

Attachment #8959129 - Attachment is obsolete: true

Flags: needinfo?(samuel.thibault)

Attachment #8959506 - Flags: review?(surkov.alexander)

Samuel Thibault

Assignee

Comment 102

•

8 years ago

I have requested try server access in bug 1446355

Flags: needinfo?(surkov.alexander)

alexander :surkov (:asurkov)

Comment 103

•

8 years ago

(In reply to Samuel Thibault from comment #102) > I have requested try server access in bug 1446355 yep, thanks

Flags: needinfo?(surkov.alexander)

alexander :surkov (:asurkov)

Comment 104

•

8 years ago

(In reply to Samuel Thibault from comment #101) > Created attachment 8959506 [details] [diff] [review] > multibyte > > Ah, yes, the patch used to work fine over hg tip, but not with the latest > changes. Here is a version which works fine over tip: the only change is the > aString.ReplaceLiteral(i, 1, u"*"); line moved unchanged by the patch from > ConvertTexttoAsterisks to ConvertTexttoAsterisks which needed an update for > the tip. you don't have to ask for subsequent review if the changes are minor

alexander :surkov (:asurkov)

Updated

•

8 years ago

Attachment #8959506 - Flags: review?(surkov.alexander) → review+

Samuel Thibault

Assignee

Comment 105

•

8 years ago

> you don't have to ask for subsequent review if the changes are minor Ok, but I guess a review+ flag is needed for checkin-needed to be taken? Or perhaps my own review+ could be enough for that to happen?

alexander :surkov (:asurkov)

Comment 106

•

8 years ago

(In reply to Samuel Thibault from comment #105) > > you don't have to ask for subsequent review if the changes are minor > > Ok, but I guess a review+ flag is needed for checkin-needed to be taken? Or > perhaps my own review+ could be enough for that to happen? It shouldn't be necessary I think. It is considered that r+ is implicitly inherited from previous patch if the changes are not significant. I believe that human (not a machine) checks in a patch :) You shouldn't do r+ for your own patch, it's confusing :)

Pulsebot

Comment 107

•

8 years ago

Pushed by dluca@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/a66043ecf949 atk: Introduce U+FEFF characters to match AT-SPI offsets with DOM offsets r=surkov

Keywords: checkin-needed

Dorel Luca [:dluca]

Comment 108

•

8 years ago

Backed out changeset a66043ecf949 (bug 1346535) for build bustages Log of the failure: https://treeherder.mozilla.org/logviewer.html#?job_id=168629125&repo=mozilla-inbound&lineNumber=19487 Backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/a66043ecf949b365d7a3c09423fb0f4ea137668f

Flags: needinfo?(samuel.thibault)

Pulsebot

Comment 109

•

8 years ago

Backout by dluca@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/5ea694074089 Backed out changeset a66043ecf949 for build bustages

Samuel Thibault

Assignee

Comment 110

•

8 years ago

Attached patch multibyte — Details — Splinter Review

Ah, I didn't realize that syntax was recent C++. Here is a fixed patch.

Attachment #8959506 - Attachment is obsolete: true

Flags: needinfo?(samuel.thibault)

Samuel Thibault

Assignee

Comment 111

•

8 years ago

I don't know if that is supposed to test for such build failures, but linux debug & opt try https://treeherder.mozilla.org/#/jobs?repo=try&revision=8a6ebf351eeff629110abdb59b09ac380bcb3e8b passed.

Samuel Thibault

Assignee

Updated

•

8 years ago

Attachment #8959715 - Flags: checkin+

Samuel Thibault

Assignee

Updated

•

8 years ago

Attachment #8959715 - Flags: checkin+

Samuel Thibault

Assignee

Updated

•

8 years ago

Keywords: checkin-needed

Pulsebot

Comment 112

•

8 years ago

Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/6f9396e9ea0a atk: Introduce U+FEFF characters to match AT-SPI offsets with DOM offsets. r=surkov, r=dbaron

Keywords: checkin-needed

Andreea Pavel [:apavel]

Comment 113

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/6f9396e9ea0a

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox61: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla61

Samuel Thibault

Assignee

Comment 114

•

8 years ago

Comment on attachment 8959715 [details] [diff] [review] multibyte Approval Request Comment [Feature/Bug causing the regression]: miscalculation of offsets between DOM and ATK when a string contains non-BMP characters (e.g. emojis) [User impact if declined]: People using screen readers get bogus text (offset by one and such) as soon as a webpage contains non-BMP characters (e.g. emojis, which are more and more common). [Is this code covered by automated tests?]: The whole ATK stack is not currently automatically tested. [Has the fix been verified in Nightly?]: Yes [Needs manual test from QE? If yes, steps to reproduce]: None [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: It is not [Why is the change risky/not risky?]: It touches only the ATK layer, thus only users needing the fix. [String changes made/needed]: None

Attachment #8959715 - Flags: approval-mozilla-beta?

Julien Cristau [:jcristau]

Updated

•

8 years ago

status-firefox55: affected → wontfix

status-firefox59: --- → wontfix

status-firefox60: --- → affected

alexander :surkov (:asurkov)

Comment 115

•

8 years ago

(In reply to Samuel Thibault from comment #114) > Comment on attachment 8959715 [details] [diff] [review] > multibyte > > Approval Request Comment are you sure you want to back port it? usually regressions, crashes, perf/stability issues and serious bugs are candidates for backporting. How many users and how often the bug beats?

Samuel Thibault

Assignee

Comment 116

•

8 years ago

Well, without it, ESR users would have to live with the issue until the next ESR. This is indeed not a regression, crash, perf or stability issue, but not being able to properly read the web page text seems serious to me. All users using orca are impacted (i.e. visually impaired users), and the bug poses problem whenever a webpage contains a non-BMP character, for instance emojis, which can be seen more and more used on the web.

Samuel Thibault

Assignee

Comment 117

•

8 years ago

BTW, as mentioned in the beta request, I could verify the fix in Nightly.

Status: RESOLVED → VERIFIED

Alex ARNAUD

Comment 118

•

8 years ago

(In reply to alexander :surkov from comment #115) > (In reply to Samuel Thibault from comment #114) > > Comment on attachment 8959715 [details] [diff] [review] > > multibyte > > > > Approval Request Comment > > are you sure you want to back port it? usually regressions, crashes, > perf/stability issues and serious bugs are candidates for backporting. How > many users and how often the bug beats? The user impact is the following: when any emoji is present on a page, orca will skips the next line or paragraph. As a blind or visual-impaired relies only on their screen reader it's for me one of the most important bug we've in Firefox for GNU/Linux. Imagine if Firefox does a black rectangle on each page and prevents you to read content: what would you tell? I've discovered this bug only because I use a screen magnifier with the screen reader and I can see on the screen if Orca skips text. Best regards, Alex.

Henri Sivonen (:hsivonen)

Comment 119

•

8 years ago

5-byte and 6-byte UTF-8 byte sequences are defined not to exist anymore. Please don't add code that's written as if they still existed. It seems bad to call U+FEFF a "BOM" when it's not used as a BOM but as zero-width no-break space. Furthermore, this seems to break ATK's view of astral characters. This seems problematic. There are languages that are entirely astral, though they might not be supported by ATK yet. Han-script text, e.g. Hong Kong Cantonese, can use astral ideographs among BMP ideographs. It seems bad to break these. Additionally, it seems plausible that ATK could develop the capability to read the meaning of emoji. Is there a reason why the index adjustment isn't done using UTF8CharEnumerator::NextChar() as in the Pango line breaker integration? (Note: I have an in-flight rewrite of UTF8CharEnumerator::NextChar() to comply with WHATWG requirements and Unicode recommendations in bug 1402247.)

Flags: needinfo?(samuel.thibault)

Henri Sivonen (:hsivonen)

Comment 120

•

8 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #119) > Is there a reason why the index adjustment isn't done using > UTF8CharEnumerator::NextChar() as in the Pango line breaker integration? Or, rather UTF8CharEnumerator::NextChar() if we get an index in number of scalar values from ATK and need to map it to UTF-16 data that we have.

Henri Sivonen (:hsivonen)

Comment 121

•

8 years ago

Oops. One more time: Or, rather UTF16CharEnumerator::NextChar() if we get an index in number of scalar values from ATK and need to map it to UTF-16 data that we have.

alexander :surkov (:asurkov)

Comment 122

•

8 years ago

Henri, it probably makes sense to open another bug for the issues. Frankly, this one is getting too large.

Samuel Thibault

Assignee

Comment 123

•

8 years ago

> Furthermore, this seems to break ATK's view of astral characters. What makes you think so? It does work in my tests, there are just U+FEFF characters along them, which just get ignored by speech synthesis etc. so only the astral characters get rendered. > it seems plausible that ATK could develop the capability to read the meaning of emoji. Sure, screen readers already do read them, and depending on the speech synthesis and braille table they can be rendered. This patch does not change that. > Is there a reason why the index adjustment isn't done using UTF16CharEnumerator::NextChar() as in the Pango line breaker integration? Well, perhaps. The complexity looked way higher, and the computation cost probably expensive. Where it gets really more tricky than this is that it's not just a question of converting from UTF-16 to UTF-8. See accessible/generic/HyperTextAccessible.cpp's HyperTextAccessible::TextSubstring: it does not contain just a UTF-16 string, it is actually composed of several children (to represent e.g. hyperlinks). HyperTextAccessible::GetChildIndexAtOffset uses a binary search within the mOffsets array to efficiently find which children contains a given UTF-16 offset. To keep this efficiency with a number of UTF-8 scalars, we'd have to introduce another array to do binary search in it. Basically it'd mean either replicating the whole machinery behind HyperTextAccessible::TextSubstring, using UTF-8 scalars instead of UTF-16 offsets, or to extend the existing machinery with a boolean parameter to switch between the two. It can probably be done, it's just quite invasive in HyperTextAccessible.

Flags: needinfo?(samuel.thibault)

Samuel Thibault

Assignee

Comment 124

•

8 years ago

(as for the 5-6byte UTF-8 byte sequences, it should probably be a separate bug entry indeed, xpcom/string/nsUTF8Utils.h has a lot more to remove than what has just been introduced).

Julien Cristau [:jcristau]

Comment 125

•

8 years ago

:surkov, any further thoughts on possible risks/benefits based on Samuel's followup comments?

Flags: needinfo?(surkov.alexander)

Julien Cristau [:jcristau]

Comment 126

•

8 years ago

(I meant in terms of uplifting to 60 or not)

alexander :surkov (:asurkov)

Comment 127

•

8 years ago

60 is a next esr and thus leaving the users without emoji for next half year is not a good thing. I don't have problems with the patch, but I'm not sure though whether Henri doesn't have other concerns. Samuel, no regressions so far on trunk, correct?

Flags: needinfo?(surkov.alexander) → needinfo?(hsivonen)

Henri Sivonen (:hsivonen)

Comment 128

•

8 years ago

Comment 123 looks convincing enough to me that I don't have concerns about uplifting this.

Flags: needinfo?(hsivonen)

Samuel Thibault

Assignee

Comment 129

•

8 years ago

We haven't seen any regression indeed.

Alex ARNAUD

Comment 130

•

8 years ago

(In reply to Samuel Thibault from comment #129) > We haven't seen any regression indeed. We're four users (whose three blinds) that using the multi-bytes version of Firefox patched by Samuel and we've encounter any regressions, only improvement. Firefox no longer skips text when I use it with Orca. Best regards.

Julien Cristau [:jcristau]

Comment 131

•

8 years ago

Comment on attachment 8959715 [details] [diff] [review] multibyte sounds like we can go ahead and get this in 60.0b10. Thanks for the feedback.

Attachment #8959715 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Stefan Hindli [:stefan_hindli]

Comment 132

•

8 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/35cabd15daab

status-firefox60: affected → fixed

Bogdan Maris, Desktop Test Engineering

Updated

•

8 years ago

Flags: qe-verify+

David Olah

Comment 133

•

8 years ago

I tried to reproduce the issue, I installed Accerciser and IPython Console but I would require some info about how to find the accessible object corresponding with the <div> element. Can you please help me with that so I can verify it?

Flags: needinfo?(surkov.alexander)

Samuel Thibault

Assignee

Comment 134

•

8 years ago

Attached image tree.png — Details

Here is where I can find it on my system

alexander :surkov (:asurkov)

Comment 135

•

8 years ago

(In reply to Samuel Thibault from comment #134) > Created attachment 8979502 [details] > tree.png > > Here is where I can find it on my system thanks, Samuel!

Flags: needinfo?(surkov.alexander)

David Olah

Comment 136

•

8 years ago

I still cannot find the Firefox in that list from Accerciser on Ubuntu. Am I missing somethng? (I open Firefox, open Accerciser and search for Firefox in the Name box but it doesn't appear)

Flags: needinfo?(samuel.thibault)

Samuel Thibault

Assignee

Comment 137

•

8 years ago

IIRC firefox disables accessibility if no screen reader is running, so you might have to run e.g. orca before even starting firefox.

Flags: needinfo?(samuel.thibault)

David Olah

Comment 138

•

8 years ago

I managed to follow the steps. Thanks Samuel! I verified the issue on Nighty 62.0a1 (2018-05-22), Beta 61.0b7 and Firefox 60. The result is the expected one so I am marking the issue as verified.

status-firefox60: fixed → verified

status-firefox61: fixed → verified

status-firefox62: --- → verified

Flags: qe-verify+

Henri Sivonen (:hsivonen)

Updated

•

7 years ago

Updated

•

5 years ago

Regressions: 1606721

James Teh [:Jamie]

Updated

•

2 years ago

broken-multibyte.txt - Full output of performing steps in opening report 9 years ago Joanmarie Diggs 259 bytes, text/plain		Details
multibyte 8 years ago Samuel Thibault 10.65 KB, patch	dbaron : review-	Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 6.94 KB, patch		Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 7.66 KB, patch		Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 9.49 KB, patch		Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 9.67 KB, patch		Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 9.88 KB, patch		Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 9.88 KB, patch	surkov : review+	Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 9.88 KB, patch		Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 10.16 KB, patch		Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 10.08 KB, patch	surkov : review+ dbaron : review+	Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 10.19 KB, patch	dbaron : review+	Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 10.21 KB, patch	dbaron : review-	Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 10.49 KB, patch		Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 10.51 KB, patch		Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 12.44 KB, patch		Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 14.94 KB, patch		Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 14.83 KB, patch	surkov : review+	Details \| Diff \| Splinter Review
getchar 8 years ago Samuel Thibault 2.63 KB, patch		Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 14.79 KB, patch	surkov : review+	Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 14.77 KB, patch	surkov : review+	Details \| Diff \| Splinter Review
multibyte 8 years ago Samuel Thibault 14.79 KB, patch	jcristau : approval-mozilla-beta+	Details \| Diff \| Splinter Review
tree.png 8 years ago Samuel Thibault 75.29 KB, image/png		Details