word boundary detection for Thai text

ASSIGNED
Assigned to
(NeedInfo from)

Status

()

--
major
ASSIGNED
11 years ago
a year ago

People

(Reporter: arthit, Assigned: thep, NeedInfo)

Tracking

(Depends on: 1 bug, Blocks: 1 bug, {intl})

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(7 attachments, 8 obsolete attachments)

7.90 KB, patch
roc
: review+
Details | Diff | Splinter Review
8.15 KB, patch
roc
: review+
Details | Diff | Splinter Review
9.22 KB, patch
roc
: review+
Details | Diff | Splinter Review
15.19 KB, patch
roc
: review+
Details | Diff | Splinter Review
15.08 KB, patch
roc
: review+
Details | Diff | Splinter Review
14.94 KB, patch
roc
: review+
Details | Diff | Splinter Review
59 bytes, text/x-review-board-request
Details
(Reporter)

Description

11 years ago
Word boundary detection is needed for various text-related functions in Firefox,
including:
- navigation/selection/editing by keyboard: caret moving by word
- selection by mouse: double-clicking on text should select word (not the whole text between whitespaces)


Navigation by keyboard case:

input text:  ทดสอบการทำงาน ภาษาไทย-ลาว

expected caret stops (estimated), from left to right: ทดสอบ|การ|ทำงาน| ภาษา|ไทย-|ลาว|

current caret stops, from left to right: ทดสอบการทำงาน| ภาษาไทย-|ลาว|

| = caret position



Selection by mouse double-clicking case:

text:  ทดสอบการทำงาน

double-click any point over "ทดสอบ" text should highlight the whole word "ทดสอบ" and not others*


*also depends on dictionary, if we use dictionary-based approach for word boundary detection.

found on Mac OS X, Linux, Windows
(Reporter)

Updated

11 years ago
Blocks: 65896
Severity: normal → enhancement

Comment 1

11 years ago
FYI.

Safari 3.1.2 on Mac OS X has no problem on this test case, both keyboard and mouse selection.

Updated

11 years ago
Component: Layout: CTL → Layout: Text
QA Contact: arthit → layout.fonts-and-text
Prabhat, are you still working on this?

Can someone test this case with Firefox 4, and confirm the problem still exists?
Assignee: prabhat.hegde → nobody

Comment 3

8 years ago
Windows XP / Firefox 4 beta 6

Still work not correct

Comment 4

8 years ago
Still exists on 4.0b6 Ubuntu 10.04

Comment 5

8 years ago
Firefox 4.0b8pre on Ubuntu 10.10 has problem on this case.

Comment 6

8 years ago
goffity

Still exists Firefox 3.6.11 Ubuntu 10.04

Comment 7

8 years ago
TonHor ,, i found this bug ,too

Firefox 3.6.11 on Windows 7 in this test case
I use Firefox 4 beta on Window 7.
this bug not fix.
The people here at Barcamp Bangkok are saying this bug is a major problem in daily-usage of Firefox. Enhancement -> Major.
Severity: enhancement → major

Comment 10

8 years ago
Still exists Firefox 3.6.11 Windows 7
David or Masayuki, do you know where in the codebase we'd need to change in order to fix this? A contributor has a Thai word-boundary detection algorithm he's implemented in Java, and is interested in converting it to something we can use.
In the code in intl/lwbrk/src/, there is integration with platform-specific line-breaking code for line breaking in certain languagse.  See the ifdefs in the Makefile.in in that directory that choose an appropriate platform implementation of NS_GetComplexLineBreaks.

However, it sounds like this bug isn't really about line breaking, but rather about selection movement and extension; I think the relevant codepath may be that code in nsSelection.cpp calls nsIFrame::PeekOffset which in turn calls nsTextFrame::PeekOffsetWord.
I'd also note that we probably don't want more than one way of doing Thai word boundary detection; we already have a setup for relying on the platform for it for line breaking, and we should probably use the same code here.  (If it needs improvement or replacement, that should probably be a separate bug.)
Arthit said earlier today that libthai has code for word boundary detection, but that we're not using it. Maybe the solution is to have the relevant selection code use libthai instead of whatever it's using now?
roc may have some more advice.
Cc'ing Theppitak, who wrote the linebreaking code for Thai. Perhaps he knows the best way forward here, or would be willing to write a patch for this?
(Assignee)

Comment 18

5 years ago
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) (away: 9/6 - 9/10, JST) from comment #15)
> Isn't using nsIWordBreaker?
> http://mxr.mozilla.org/mozilla-central/source/layout/generic/
> nsTextFrameThebes.cpp#5705
> 
> Its implementation is here:
> http://mxr.mozilla.org/mozilla-central/source/intl/lwbrk/src/
> nsSampleWordBreaker.cpp

Having tried some quick change for verification, I can confirm that it is
the right place to do (with the former one moved to
layout/generic/nsTextFrame.cpp, line 6827). 

However, the current interface is obviously inefficient for the costly Thai
word analysis:

layout/generic/nsTextFrame.cpp, line 6823:
---8<---
  nsIWordBreaker* wordBreaker = nsContentUtils::WordBreaker();
  for (int32_t i = 0; i <= textLen; ++i) {
    int32_t indexInText = i + textStart;
    mWordBreaks[i] |=
      wordBreaker->BreakInBetween(aContext.get(), indexInText,
                                  aContext.get() + indexInText,
                                  aContext.Length() - indexInText);
  }
---8<---

The other methods, FindWord() and NextWord(), look more practical, although
something like nsJISx4051LineBreaker::GetJISx4051Breaks() is the most
optimal.

> I guess that you need to use some native APIs on each platform like line
> breaker for such complex languages. See bug 336959 and
> https://wiki.mozilla.org/Gecko:Line_Breaking#Complex_scripts

Yes, we can even call the existing NS_GetComplexLineBreaks(), as there is
no significant difference between line break and word break for Thai.
(Assignee)

Comment 19

5 years ago
Created attachment 8485374 [details] [diff] [review]
First patch, using NextWord() method instead of BreakInBetween()

This first patch uses NextWord() instead of BreakInBetween(), and only patch the NextWord() method, not the others. Just for further discussion.
Thanks Theppitak!

Masayuki, are you the right person to take a look at Theppitak's approach in this patch?
Flags: needinfo?(masayuki)
(Assignee)

Comment 21

5 years ago
Created attachment 8485802 [details] [diff] [review]
Updated patch, with all methods implemented

This patch implements Thai word break in all methods. All are tested with ad hoc modifications on ClusterIterator c-tor. (All three methods are possible to use. Only NextWord() is chosen in the proposed patch.)
Attachment #8485374 - Attachment is obsolete: true
(Assignee)

Comment 22

5 years ago
Comment on attachment 8485802 [details] [diff] [review]
Updated patch, with all methods implemented

Requesting for patch review.
Attachment #8485802 - Flags: review?(roc)
(In reply to Dietrich Ayala (:dietrich) from comment #20)
> Thanks Theppitak!
> 
> Masayuki, are you the right person to take a look at Theppitak's approach in
> this patch?

I'm not so familiar around the complex script part. However, if nobody will review it, I'll try.

Theppitack:

Please create diff files with |-U 8 -p|. That makes patches easy to review. Anyway, thank you for your work.

# And sorry for the delay to reply due to my vacation and Japanese vacation season.
Flags: needinfo?(masayuki)
(Assignee)

Comment 24

5 years ago
Created attachment 8489813 [details] [diff] [review]
Reformatted patch

I've reformatted the patch as Masayuki suggested. Please review it. Thanks.
Attachment #8485802 - Attachment is obsolete: true
Attachment #8485802 - Flags: review?(roc)
Attachment #8489813 - Flags: review?(masayuki)
Comment on attachment 8489813 [details] [diff] [review]
Reformatted patch

Roc or smontague is better reviewer. If they are too busy, I'll try, though.
Attachment #8489813 - Flags: review?(masayuki) → review?(roc)
Comment on attachment 8489813 [details] [diff] [review]
Reformatted patch

Review of attachment 8489813 [details] [diff] [review]:
-----------------------------------------------------------------

::: intl/lwbrk/nsSampleWordBreaker.cpp
@@ +30,5 @@
> +  uint8_t c2 = this->GetClass(aText2[0]);
> +  if (kWbClassThaiLetter == c1 && kWbClassThaiLetter == c2) {
> +    nsAutoString text(aText1, aTextLen1);
> +    text.Append(aText2, aTextLen2);
> +    uint8_t *breakBefore = new uint8_t[aTextLen1 + aTextLen2];

Use
  nsTArray<uint8_t> breakBefore;
  breakBefore.SetLength(aTextLen1 + aTextLen2);
then pass breakBefore.Elements(). The implicit destructor will clean up for you.

You could make it an nsAutoTArray to avoid some heap allocations.

@@ +130,5 @@
>       }
>    }
>    if(kWbClassThaiLetter == c)
>    {
> +    uint8_t *breakBefore = new uint8_t[range.mEnd - range.mBegin];

Same here.

@@ +169,5 @@
>         break;
>    }
>    if(kWbClassThaiLetter == c1)
>    {
> +    uint8_t *breakBefore = new uint8_t[aLen - aPos];

and here.

@@ +177,5 @@
> +      i++;
> +    }
> +    delete[] breakBefore;
> +    if (i < cur - aPos)
> +      return aPos + i;

This still isn't a very good interface. It's quadratic in the number of words in the text.

Seems to me we need a new method in nsIWordBreaker that does exactly what the loop in nsTextFrame does: fills in a word-breaks array for the entire string. Which, conveniently, is what NS_GetComplexLineBreaks does.

Can you do that? If not, needinfo me and I'll try to find someone else to help.
Attachment #8489813 - Flags: review?(roc) → review-
(Assignee)

Comment 27

5 years ago
Created attachment 8490581 [details] [diff] [review]
Patch using nsTArray

This patch uses nsTArray as roc suggested. It looks cleaner now.

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #26)
> Seems to me we need a new method in nsIWordBreaker that does exactly what
> the loop in nsTextFrame does: fills in a word-breaks array for the entire
> string. Which, conveniently, is what NS_GetComplexLineBreaks does.
> 
> Can you do that? If not, needinfo me and I'll try to find someone else to
> help.

I'll try to do it first. Here's what I think needed to do:
- Add a new method GetWordBreaks() to nsIWordBreaker (with uuid regenerated).
- Add the method in nsSampleWordBreaker and implement it.
- Modify the ClusterIterator c-tor to call the new method instead.
Attachment #8489813 - Attachment is obsolete: true
(Assignee)

Comment 28

5 years ago
Created attachment 8490656 [details] [diff] [review]
Patch with new GetWordBreaks() method

This patch now adds the new nsIWordBreaker::GetWordBreaks() method, implements it in nsSampleWordBreaker, and modifies ClusterIterator to use it.
Attachment #8490581 - Attachment is obsolete: true
Attachment #8490656 - Flags: review?(roc)
Comment on attachment 8490656 [details] [diff] [review]
Patch with new GetWordBreaks() method

Review of attachment 8490656 [details] [diff] [review]:
-----------------------------------------------------------------

::: intl/lwbrk/nsIWordBreaker.h
@@ +33,5 @@
>                                 uint32_t aOffset) = 0;
>    virtual int32_t NextWord(const char16_t* aText, uint32_t aLen, 
>                             uint32_t aPos) = 0;
> +  virtual void GetWordBreaks(const char16_t* aText, uint32_t aLen,
> +                             bool* aBreakBefore) = 0;

Document that some elements of aBreakBefore are set to true and the rest are untouched.

::: intl/lwbrk/nsSampleWordBreaker.cpp
@@ +202,5 @@
> +  {
> +    for (curEnd = curBegin + 1; curEnd < aLen; curEnd++)
> +    {
> +      c2 = this->GetClass(aText[curEnd]);
> +      if (c2 != c1) 

remove trailing whitespace

@@ +214,5 @@
> +                              breakBefore.Elements());
> +      for (uint32_t i = curBegin + 1; i < curEnd; i++)
> +      { 
> +        aBreakBefore[i] = breakBefore[i - curBegin];
> +      } 

remove trailing whitespace
Attachment #8490656 - Flags: review?(roc) → review+
(Assignee)

Comment 30

5 years ago
Created attachment 8490729 [details] [diff] [review]
Edited patch

Edited patch according to roc's comments.
Attachment #8490729 - Flags: review?(roc)

Comment 31

5 years ago
build busted and test failures: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=cf586bd6a24e

Thep, could you please take a look?
Assignee: nobody → thep
Status: NEW → ASSIGNED
(Assignee)

Comment 32

5 years ago
(In reply to Ekanan Ketunuti from comment #31)
> build busted and test failures:
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=cf586bd6a24e
> 
> Thep, could you please take a look?

Oh, I've ignored textStart value in ClusterItarator c-tor in the latest patch. Fixing soon.
(Assignee)

Comment 33

5 years ago
Created attachment 8492025 [details] [diff] [review]
Correcting build & test failures

This patch fixes:
- The incorrect filling of word breaks without textStart concerned, which could cause wrong behavior when aDirection < 0.
- The missing inclusion of nsTArray.h

test_bug496275.html now passes in my box. I'm not sure about others.
(Assignee)

Comment 35

5 years ago
Created attachment 8492232 [details] [diff] [review]
Fix more warnings & mochitest failure

I was wrong when saying that test_bug496275.html was passed in my box. I just forgot how to really run mochitest when saying so.

This patch:
- fixes test_bug496275.html failure, the difference is that aBreakBefore[0] was missed compared to the original BreakInBetween() solution;
- fixes compiler warning about signedness of i in nsSampleWordBreaker::NextWord();
- instead of directly assigning aBreakBefore elements in nsSampleWordBreaker::GetWordBreak(), only turns on the true elements and leave the others untouched (by using the OR operation), to really conform to what documented;
- does not assign aBreakBefore[aLen] by guarding the curEnd index.
Attachment #8492025 - Attachment is obsolete: true

Comment 36

5 years ago
Comment on attachment 8492232 [details] [diff] [review]
Fix more warnings & mochitest failure

This doesn't look better than previous one. https://tbpl.mozilla.org/?tree=Try&rev=c960bbf840e6
(Assignee)

Comment 37

5 years ago
Created attachment 8492541 [details] [diff] [review]
Really fix mochitest failure

OK. It must have worked by chance due to my debugging process.

It appears ClusterIterator::mWordBreaks[] must be calculated up to the position *next to* the last character, not just at the last one. (Hence the "i <= textLen", not "i < textLen", loop condition in the old code)

So I've adjusted the amount of calculated data as such. (The guard against aBreakBefore[aLen] writing in the last patch is also reverted.) It should really pass the mochitest now.
Attachment #8492232 - Attachment is obsolete: true

Comment 38

5 years ago
Comment on attachment 8492541 [details] [diff] [review]
Really fix mochitest failure

ok, i've pushed to try. https://tbpl.mozilla.org/?tree=Try&rev=d701648021c0

if that all comes back clean, you'll want to upload a new patch with hg metadata (incl your name, email, commit msg) as seen per https://developer.mozilla.org/en-US/docs/Creating_a_patch_that_can_be_checked_in and re-request roc for review again (for sure).
(Assignee)

Comment 39

5 years ago
Created attachment 8492627 [details] [diff] [review]
Formatted patch for review

Patch formatted for review & commit. Please review again. Thanks.
Attachment #8492541 - Attachment is obsolete: true
Attachment #8492627 - Flags: review?(roc)
Comment on attachment 8492627 [details] [diff] [review]
Formatted patch for review

Review of attachment 8492627 [details] [diff] [review]:
-----------------------------------------------------------------

::: intl/lwbrk/nsIWordBreaker.h
@@ +40,5 @@
> +   * Note that aBreakBefore[] is assumed to be initially filled with false.
> +   * Only word break positions are set to true, and the rest are untouched.
> +   *
> +   * Note also that aBreakBefore is assumed to be of size aLen + 1,
> +   * to include the position after the last character. */

Since aBreakBefore[aLen] is always set to true, I think it would be simpler to say that aBreakBefore has aLen elements. Then in   ClusterIterator::ClusterIterator set mWordBreaks[textStart + textLen] to true.

::: intl/lwbrk/nsSampleWordBreaker.cpp
@@ +131,5 @@
>       }
>    }
>    if(kWbClassThaiLetter == c)
>    {
> +    nsTArray<uint8_t> breakBefore;

Make this nsAutoTArray<uint8_t,100>

@@ +174,5 @@
>         break;
>    }
>    if(kWbClassThaiLetter == c1)
>    {
> +    nsTArray<uint8_t> breakBefore;

Make this nsAutoTArray<uint8_t,100>

@@ +209,5 @@
> +        break;
> +    }
> +    if (kWbClassThaiLetter == c1)
> +    {
> +      nsTArray<uint8_t> breakBefore;

Make this nsAutoTArray<uint8_t,100>

::: layout/generic/nsTextFrame.cpp
@@ +6837,5 @@
>      mFrag->AppendTo(str, textOffset, textLen);
>      aContext.Insert(str, 0);
>    }
>    nsIWordBreaker* wordBreaker = nsContentUtils::WordBreaker();
> +  nsTArray<bool> breakBefore;

Make this nsAutoTArray<bool,100>

@@ +6839,5 @@
>    }
>    nsIWordBreaker* wordBreaker = nsContentUtils::WordBreaker();
> +  nsTArray<bool> breakBefore;
> +  breakBefore.SetLength(aContext.Length() + 1);
> +  memset(breakBefore.Elements(), false, breakBefore.Length()*sizeof(bool));

Call PodZero

@@ +6843,5 @@
> +  memset(breakBefore.Elements(), false, breakBefore.Length()*sizeof(bool));
> +  wordBreaker->GetWordBreaks(aContext.get(), aContext.Length(),
> +                             breakBefore.Elements());
> +  memcpy(mWordBreaks.Elements(), breakBefore.Elements() + textStart,
> +         textLen + 1);

Call PodCopy
Attachment #8492627 - Flags: review?(roc) → review-
(Assignee)

Comment 41

5 years ago
Created attachment 8492842 [details] [diff] [review]
Updated patch for review & commit

Updated patch after roc's review.

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #40)
> ::: intl/lwbrk/nsIWordBreaker.h
> @@ +40,5 @@
> > +   * Note that aBreakBefore[] is assumed to be initially filled with false.
> > +   * Only word break positions are set to true, and the rest are untouched.
> > +   *
> > +   * Note also that aBreakBefore is assumed to be of size aLen + 1,
> > +   * to include the position after the last character. */
> 
> Since aBreakBefore[aLen] is always set to true, I think it would be simpler
> to say that aBreakBefore has aLen elements. Then in  
> ClusterIterator::ClusterIterator set mWordBreaks[textStart + textLen] to
> true.

OK, done. But I think it should be mWordBreaks[textLen] instead.

> Make this nsAutoTArray<uint8_t,100>

Done in all the four commented places, plus another one in BreakInBetween().

> @@ +6839,5 @@
> >    }
> >    nsIWordBreaker* wordBreaker = nsContentUtils::WordBreaker();
> > +  nsTArray<bool> breakBefore;
> > +  breakBefore.SetLength(aContext.Length() + 1);
> > +  memset(breakBefore.Elements(), false, breakBefore.Length()*sizeof(bool));
> 
> Call PodZero
> 
> @@ +6843,5 @@
> > +  memset(breakBefore.Elements(), false, breakBefore.Length()*sizeof(bool));
> > +  wordBreaker->GetWordBreaks(aContext.get(), aContext.Length(),
> > +                             breakBefore.Elements());
> > +  memcpy(mWordBreaks.Elements(), breakBefore.Elements() + textStart,
> > +         textLen + 1);
> 
> Call PodCopy

Both are done. Should I do so to line 6822, too?:

  memset(mWordBreaks.Elements(), false, (textLen + 1)*sizeof(bool));

(In fact, there are other two places in PropertyProvider::GetHyphenationBreaks() at line 3233 and 3260, but I'm afraid that would add too much to the scope of the patch.)
Attachment #8492627 - Attachment is obsolete: true
Attachment #8492842 - Flags: review?(roc)
Comment on attachment 8492842 [details] [diff] [review]
Updated patch for review & commit

Review of attachment 8492842 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!
Attachment #8492842 - Flags: review?(roc) → review+

Comment 43

5 years ago
Comment on attachment 8492842 [details] [diff] [review]
Updated patch for review & commit

Thep, m-oth still orange. https://tbpl.mozilla.org/?tree=Try&rev=b509f97ec344
Still, getting pretty close. :)
(Assignee)

Comment 44

5 years ago
I have problem running both tests in my box (I built it outside source tree):

At top build dir:

$ $TOP_SRC_DIR/mach mochitest-plain test_eventemitter_basic.html
From _tests: Kept 20139 existing; Added/updated 0; Removed 0 files and 0 directories.
No tests could be found in the path specified. Please specify a path that is a test file or is a directory containing tests.

$ $TOP_SRC_DIR/mach mochitest-plain _tests/testing/mochitest/chrome/toolkit/devtools/tests/mochitest/test_eventemitter_basic.html
From _tests: Kept 20139 existing; Added/updated 0; Removed 0 files and 0 directories.
No tests could be found in the path specified. Please specify a path that is a test file or is a directory containing tests.

$ $TOP_SRC_DIR/mach mochitest-plain test_virtualcursor_text.html 
From _tests: Kept 20139 existing; Added/updated 0; Removed 0 files and 0 directories.
No tests could be found in the path specified. Please specify a path that is a test file or is a directory containing tests.

$ $TOP_SRC_DIR/mach mochitest-plain _tests/testing/mochitest/a11y/accessible/tests/mochitest/pivot/test_virtualcursor_text.html 
From _tests: Kept 20139 existing; Added/updated 0; Removed 0 files and 0 directories.
No tests could be found in the path specified. Please specify a path that is a test file or is a directory containing tests.

Any hint? Thanks.
I think the command you're looking for would probably be

$ $TOP_SRC_DIR/mach mochitest-plain $TOP_SRC_DIR/toolkit/devtools/tests/mochitest/test_eventemitter_basic.html

i.e. you should provide the path to the test file in the source tree.

(Or run

$ ./mach mochitest-plain toolkit/devtools/tests/mochitest/test_eventemitter_basic.html

from the top of the SOURCE tree, assuming your mozconfig file tells it where to find the objdir; I think that's the usual/expected approach.)
(Assignee)

Comment 46

5 years ago
(In reply to Jonathan Kew (:jfkthame) from comment #45)
> I think the command you're looking for would probably be
> 
> $ $TOP_SRC_DIR/mach mochitest-plain
> $TOP_SRC_DIR/toolkit/devtools/tests/mochitest/test_eventemitter_basic.html
> 
> i.e. you should provide the path to the test file in the source tree.
> 
> (Or run
> 
> $ ./mach mochitest-plain
> toolkit/devtools/tests/mochitest/test_eventemitter_basic.html
> 
> from the top of the SOURCE tree, assuming your mozconfig file tells it where
> to find the objdir; I think that's the usual/expected approach.)

I did that, too, with no success. Note that I succesfully ran test_bug496275.html in the previous round. But the same method does not apply to the last two tests.
Oh, these aren't "plain" mochitests... for test_eventemitter_basic.html, you'd need to run mochitest-chrome, and for test_virtualcursor_text.html, it'd be mochitest-a11y.

Or take the easy way out - just use "mach mochitest <path-to-test-file>" and let mach figure out which flavor of test it is:

$ ./mach mochitest toolkit/devtools/tests/mochitest/test_eventemitter_basic.html

$ ./mach mochitest accessible/tests/mochitest/pivot/test_virtualcursor_text.html
(Assignee)

Comment 48

5 years ago
OK. I can run the mochitests successfully now. Thanks for your advice.

Unlike what's seen on tbpl, test_eventemitter_basic.html passes.
But test_virtualcursor_text.html fails alike. I'll check the latter, then.
(Assignee)

Comment 49

5 years ago
Created attachment 8495019 [details] [diff] [review]
Fix word-back behavior

Not related to the a11y mochitest failures, this patch fixes another problem I have found myself: word-wise backword movements in previous patches occasionally move too far. This was because ClusterIterator::NextCluster() sometimes moved past word break position without returning.

I have also added a mochitest for this bug. Note that the test will be failed if the change to ClusterIterator::NextCluster() is not applied.

I'd like to get the additional change reviewed before working further on the a11y mochitest failures.
Attachment #8495019 - Flags: review?(roc)
(Assignee)

Comment 50

5 years ago
Created attachment 8496395 [details] [diff] [review]
Fix a11y mochitest

Having thought on it more thoroughly, now I think I have done it more correctly.

Here's the aContext contents and the text portion of interest
(C = context, T = text of interest):

- when aDirection > 0:

             |<--textLen-->|
  +----------+-------------+
  |CCCCCCCCCC|TTTTTTTTTTTTT|
  +----------+-------------+
              ^             ^
              textStart     textStart+textLen

- when aDirection < 0:

  |<--textLen-->|
  +-------------+----------+
  |TTTTTTTTTTTTT|CCCCCCCCCC|
  +-------------+----------+
   ^             ^
   textStart     textStart+textLen

- generic model:

             |<--textLen-->|
  +----------+-------------+----------+
  |CCCCCCCCCC|TTTTTTTTTTTTT|CCCCCCCCCC|
  +----------+-------------+----------+
              ^             ^
              textStart     textStart+textLen

GetWordBreaks() calculates word break positions on the total aContext string of length aContext.Length() (no additional slot at the end).

Then, we copy the resulting word break positions from textStart to textStart+textLen, i.e. of length textLen+1, into mWordBreaks.

Note that even though GetWordBreaks() only calculates at most aContext.Length() positions, we need to allocate aContext.Length()+1 slots for breakBefore array, so we can still safely copy textLen+1 slots when the text of interest is at the end of aContext. The default false value as initialized is in accordance to what the old code using BreakInBetween() did.

That's how it should really work. The previous guesses with true values on position 0 and textLen+1 were based on special cases, which don't always apply in general.

Please review again. Thanks.
Attachment #8496395 - Flags: review?(roc)
(Assignee)

Comment 53

5 years ago
Created attachment 8496946 [details] [diff] [review]
Fix more mochitest failures

OK. In ClusterIterator c-tor, mWordBreaks[0] or mWordBreaks[textLen] may be initialized with true, and using PodCopy will overwrite it. To behave exactly the same as previous code, we need to OR the word break positions instead of assigning them.

This passes the two failed mochitests. I can't run the os-x mochitest, though. I'm using Linux.

Please review it again. Thanks.
Attachment #8496946 - Flags: review?(roc)

Comment 54

5 years ago
Comment on attachment 8496946 [details] [diff] [review]
Fix more mochitest failures

still seeing mochitest failure: https://tbpl.mozilla.org/?tree=Try&rev=eb160289a0b5 

test_wordboundary.html:
  498 INFO TEST-UNEXPECTED-PASS | chrome://mochitests/content/a11y/accessible/tests/mochitest/text/test_wordboundary.html | getTextBeforeOffset (word end): wrong text (got 'hello', expected: 'hello'), offset: 6, id: 'e2' ; - expected FAIL
  500 INFO TEST-UNEXPECTED-PASS | chrome://mochitests/content/a11y/accessible/tests/mochitest/text/test_wordboundary.html | getTextBeforeOffset (word end): wrong end offset(got '5', expected: '5'), offset: 6, id: 'e2' ; - expected FAIL
  601 INFO TEST-UNEXPECTED-PASS | chrome://mochitests/content/a11y/accessible/tests/mochitest/text/test_wordboundary.html | getTextAtOffset (word end): wrong text (got ' ', expected: ' '), offset: 5, id: 'e2' ; - expected FAIL
  602 INFO TEST-UNEXPECTED-PASS | chrome://mochitests/content/a11y/accessible/tests/mochitest/text/test_wordboundary.html | getTextAtOffset (word end): wrong start offset(got '5', expected: '5'), offset: 5, id: 'e2' ; - expected FAIL
  613 INFO TEST-UNEXPECTED-PASS | chrome://mochitests/content/a11y/accessible/tests/mochitest/text/test_wordboundary.html | getTextAtOffset (word end): wrong text (got ' ', expected: ' '), offset: 6, id: 'e2' ; - expected FAIL
  614 INFO TEST-UNEXPECTED-PASS | chrome://mochitests/content/a11y/accessible/tests/mochitest/text/test_wordboundary.html | getTextAtOffset (word end): wrong start offset(got '5', expected: '5'), offset: 6, id: 'e2' ; - expected FAIL
  715 INFO TEST-UNEXPECTED-PASS | chrome://mochitests/content/a11y/accessible/tests/mochitest/text/test_wordboundary.html | getTextAfterOffset (word end): wrong text (got ' ', expected: ' '), offset: 5, id: 'e2' ; - expected FAIL
  716 INFO TEST-UNEXPECTED-PASS | chrome://mochitests/content/a11y/accessible/tests/mochitest/text/test_wordboundary.html | getTextAfterOffset (word end): wrong start offset(got '5', expected: '5'), offset: 5, id: 'e2' ; - expected FAIL
https://tbpl.mozilla.org/php/getParsedLog.php?id=49156080&tree=Try


test_movement_by_words.html on OS X:
  156 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_movement_by_words.html | Left movement broken with eatSpace=false, stopAtPunctuation=true in "<b>Log out</b> roc"; sel.anchorNode.parentNode=[object HTMLDivElement] - got 2, expected 1
  257 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_movement_by_words.html | Left movement broken with eatSpace=true, stopAtPunctuation=true in "<b>Log out</b> roc"; sel.anchorNode.parentNode=[object HTMLDivElement] - got 2, expected 1
  358 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_movement_by_words.html | Left movement broken with eatSpace=false, stopAtPunctuation=false in "<b>Log out</b> roc"; sel.anchorNode.parentNode=[object HTMLDivElement] - got 2, expected 1
  459 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_movement_by_words.html | Left movement broken with eatSpace=true, stopAtPunctuation=false in "<b>Log out</b> roc"; sel.anchorNode.parentNode=[object HTMLDivElement] - got 2, expected 1
https://tbpl.mozilla.org/php/getParsedLog.php?id=49174837&tree=Try

test_movement_by_words.html on B2G:
  328 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_movement_by_words.html | Left movement broken with eatSpace=false, stopAtPunctuation=true in "<b>Log out</b>  roc"; sel.anchorNode.parentNode=[object HTMLDivElement] - got 2, expected 1
  429 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_movement_by_words.html | Left movement broken with eatSpace=true, stopAtPunctuation=true in "<b>Log out</b>  roc"; sel.anchorNode.parentNode=[object HTMLDivElement] - got 2, expected 1
  530 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_movement_by_words.html | Left movement broken with eatSpace=false, stopAtPunctuation=false in "<b>Log out</b>  roc"; sel.anchorNode.parentNode=[object HTMLDivElement] - got 2, expected 1
  631 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_movement_by_words.html | Left movement broken with eatSpace=true, stopAtPunctuation=false in "<b>Log out</b>  roc"; sel.anchorNode.parentNode=[object HTMLDivElement] - got 2, expected 1
https://tbpl.mozilla.org/php/getParsedLog.php?id=49169174&tree=Try

test_bug425915.html on B2G:
  157 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 0d: Wrong anchor offset. - got 11, expected 8
  159 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 0d: Wrong focus offset. - got 11, expected 8
  161 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 0e: Wrong anchor offset. - got 16, expected 11
  163 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 0e: Wrong focus offset. - got 16, expected 11
  165 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 0f: Wrong anchor offset. - got 19, expected 16
  167 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 0f: Wrong focus offset. - got 19, expected 16
  169 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 0g: Wrong anchor offset. - got 28, expected 19
  171 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 0g: Wrong focus offset. - got 28, expected 19
  173 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 0h: Wrong anchor offset. - got 35, expected 28
  175 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 0h: Wrong focus offset. - got 35, expected 28
  177 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 0i: Wrong anchor offset. - got 40, expected 32
  179 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 0i: Wrong focus offset. - got 40, expected 32
  181 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 0j: Wrong anchor offset. - got 40, expected 35
  183 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 0j: Wrong focus offset. - got 40, expected 35
  193 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 1b: Wrong anchor offset. - got 29, expected 32
  195 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 1b: Wrong focus offset. - got 29, expected 32
  197 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 1c: Wrong anchor offset. - got 20, expected 29
  199 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 1c: Wrong focus offset. - got 20, expected 29
  201 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 1d: Wrong anchor offset. - got 17, expected 20
  203 INFO TEST-UNEXPECTED-FAIL | /tests/layout/generic/test/test_bug425915.html | test 1d: Wrong focus offset. - got 17, expected 20
https://tbpl.mozilla.org/php/getParsedLog.php?id=49169174&tree=Try
(Assignee)

Comment 55

4 years ago
I've been busy preparing for a conference right now. I'll be back to work on it soon after that.
Hi Theppitak!

I confirmed today with some Mozillians in Bangkok that this bug still exists. Do you have the time to update the patch to contemporary mozilla-central and trying again?

Thanks!

Dietrich
Flags: needinfo?(thep)
(Assignee)

Comment 57

2 years ago
My machine that was used to build mozilla has been out of order. And I'm currently engaged with some job right now. So, I'm afraid I can't work on it yet.

Meanwhile, I'm happy if somebody could finish it.

Comment 58

2 years ago
How about we give Theppitak level one access so he can build and run tests on the mozilla infrastructure?

Theppitak, if you happy to finish this please follow https://www.mozilla.org/en-US/about/governance/policies/commit/.
Would this patch fix the issue shown here, where Google Docs is breaking at the wrong place?

https://i.imgur.com/gNiX5fm.png

Or is that a different bug?

Comment 60

2 years ago
(In reply to Dietrich Ayala (:dietrich) from comment #59)
> Would this patch fix the issue shown here, where Google Docs is breaking at
> the wrong place?
> 
> https://i.imgur.com/gNiX5fm.png
> 
> Or is that a different bug?

it's definitely different bug.
Comment hidden (mozreview-request)
I got pinged on this bug by the thai community on our l10n workshop, and I've spent a few to see if I can get the patch to apply and build. The answer is yes, so I've uploaded the result to review-board and pushed to try.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c742f505f548cdfcf18d728872b52e861b3256cc

I know from local testing that the actual test in this patch fails now, but probably because we restructured stuff in test infra.

From testing at the event, the double-click selection works for thai, it doesn't work for lao, though.

I wonder if three years later, with icu in our code, we can actually implement this in a way that works for all lanuages with this problem?

Note, I won't be able to actually do anything useful beyond C++ nits, but this should give folks a head start.
... doesn't even builds on windows, in https://hg.mozilla.org/try/rev/7861406f8916b00b65a1bd737ca9566f74ac40dd#l2.144


15:07:23     INFO -  Warning: C4805 in z:\build\build\src\intl\lwbrk\nsSampleWordBreaker.cpp: '|=': unsafe mix of type 'bool' and type 'unsigned char' in operation
15:07:23     INFO -  z:/build/build/src/intl/lwbrk/nsSampleWordBreaker.cpp(218): warning C4805: '|=': unsafe mix of type 'bool' and type 'unsigned char' in operation

(In reply to Axel Hecht [:Pike] from comment #62)
> I wonder if three years later, with icu in our code, we can actually
> implement this in a way that works for all lanuages with this problem?

jfkthame, would you know?
Flags: needinfo?(jfkthame)
(In reply to Axel Hecht [:Pike] from comment #63)
> ... doesn't even builds on windows, in
> https://hg.mozilla.org/try/rev/7861406f8916b00b65a1bd737ca9566f74ac40dd#l2.
> 144
> 
> 
> 15:07:23     INFO -  Warning: C4805 in
> z:\build\build\src\intl\lwbrk\nsSampleWordBreaker.cpp: '|=': unsafe mix of
> type 'bool' and type 'unsigned char' in operation
> 15:07:23     INFO - 
> z:/build/build/src/intl/lwbrk/nsSampleWordBreaker.cpp(218): warning C4805:
> '|=': unsafe mix of type 'bool' and type 'unsigned char' in operation

That should be easily fixable, e.g. by casting the bool to a uint8_t.

> (In reply to Axel Hecht [:Pike] from comment #62)
> > I wonder if three years later, with icu in our code, we can actually
> > implement this in a way that works for all lanuages with this problem?
> 
> jfkthame, would you know?

Well.... ICU has line- and word-breaking support, though currently I believe we exclude that from our build. So enabling it would increase our code (and data) size, as it'll pull in the Thai dictionary that is used to identify word boundaries. I'm not sure what the total size impact would be, but given that we're still waiting to shoehorn ICU into the Android build, I'd be hesitant to do it right now.

If we do (eventually) move to an ICU-backed solution, that should work for whatever languages ICU supports -- I think that includes several of the SEAsian languages that have similar issues. The same would be true, though, for a solution based on platform-native breaking (like we currently use for line-break): although the patch here specifically checks for Thai letters, it could perfectly well do the same for Lao, Khmer, Burmese, etc., and pass those through to the platform breaker as well; then we'd get support for whatever languages the host platform implements.

Eventually, a re-write of both line- and word-breaking to use ICU services consistently across all platforms (and eliminate platform-specific code here) would probably be a good thing. But until we tackle that, something along the lines of the patch here is probably the easiest interim solution.
Flags: needinfo?(jfkthame)
Depends on: 1423593
You need to log in before you can comment on or make changes to this bug.