Closed Bug 1728708 Opened 3 years ago Closed 3 years ago

Clean up for lwbrk WordBreaker and its gtest

Categories

(Core :: Internationalization, task)

task

Tracking

()

RESOLVED FIXED
94 Branch
Tracking Status
firefox94 --- fixed

People

(Reporter: TYLin, Assigned: TYLin)

References

Details

Attachments

(4 files)

This implement part of my proposal in bug 1722484 comment 1.

Here are the changes in this patch. They shouldn't change the behavior.

  • Rename the gtest to TestBreak.cpp because it also contains word break tests.
  • Align ruler comments to the test strings.
  • Rename lb to wb in TestASCIIWB.
  • Remove unused variable j in TestPrintWordWithBreak().
  • Use ArrayLength instead of sizeof trick to get the array length.
  • #include ArrayUtils.h, and sort the #includes statements.

Depends on D124302

A UAX29 compatible word breaker (like ICU4C) treat the end of text as a
word break opportunity (rule WB2 [1]), but currently lwbrk word breaker
doesn't.

The motivation of this patch is to make WordBreaker::Next() closer to
a UAX29 compatible one (at least for English text), and see if the
callers need to change. This should make the future integration of ICU4X
segmenter easier.

The only caller of WordBreaker::Next() is ClusterIterator's constructor.
This patch shouldn't change its behavior because we've already manually
assigned a word break point at the end of the line when aContext is
empty and aDirection is -1. This patch generalizes it to all
conditions.

Also, update TestPrintWordWithBreak() so that the result string makes
more sense.

[1] https://www.unicode.org/reports/tr29/#WB2

Depends on D124303

Pushed by aethanyc@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/b1e7fd7b879a
Part 1 - Move WordBreakClass and GetClass into WordBreaker's private section. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/3a38d2e56bf8
Part 2 - Rename WordBreaker::NextWord() to WordBreaker::Next(). r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/ccfb78f756bf
Part 3 - Clean up the gtest for line and word breaker. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/55efff2d5628
Part 4 - Simplify WordBreaker::Next() and make it recognize the end of text a word break opportunity. r=jfkthame
Blocks: 1729682
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: