Clean up for lwbrk WordBreaker and its gtest
Categories
(Core :: Internationalization, task)
Tracking
()
Tracking | Status | |
---|---|---|
firefox94 | --- | fixed |
People
(Reporter: TYLin, Assigned: TYLin)
References
Details
Attachments
(4 files)
This implement part of my proposal in bug 1722484 comment 1.
Assignee | ||
Comment 1•3 years ago
|
||
Assignee | ||
Comment 2•3 years ago
|
||
Depends on D124301
Assignee | ||
Comment 3•3 years ago
|
||
Here are the changes in this patch. They shouldn't change the behavior.
- Rename the gtest to
TestBreak.cpp
because it also contains word break tests. - Align ruler comments to the test strings.
- Rename
lb
towb
inTestASCIIWB
. - Remove unused variable
j
inTestPrintWordWithBreak()
. - Use
ArrayLength
instead ofsizeof
trick to get the array length. - #include ArrayUtils.h, and sort the #includes statements.
Depends on D124302
Assignee | ||
Comment 4•3 years ago
|
||
A UAX29 compatible word breaker (like ICU4C) treat the end of text as a
word break opportunity (rule WB2 [1]), but currently lwbrk word breaker
doesn't.
The motivation of this patch is to make WordBreaker::Next()
closer to
a UAX29 compatible one (at least for English text), and see if the
callers need to change. This should make the future integration of ICU4X
segmenter easier.
The only caller of WordBreaker::Next() is ClusterIterator's constructor.
This patch shouldn't change its behavior because we've already manually
assigned a word break point at the end of the line when aContext
is
empty and aDirection
is -1. This patch generalizes it to all
conditions.
Also, update TestPrintWordWithBreak() so that the result string makes
more sense.
[1] https://www.unicode.org/reports/tr29/#WB2
Depends on D124303
Comment 6•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/b1e7fd7b879a
https://hg.mozilla.org/mozilla-central/rev/3a38d2e56bf8
https://hg.mozilla.org/mozilla-central/rev/ccfb78f756bf
https://hg.mozilla.org/mozilla-central/rev/55efff2d5628
Description
•