Closed Bug 1730084 Opened 3 years ago Closed 3 years ago

Remove WordBreaker::BreakInBetween()

Categories

(Core :: Internationalization, task)

task

Tracking

()

RESOLVED FIXED
94 Branch
Tracking Status
firefox94 --- fixed

People

(Reporter: TYLin, Assigned: TYLin)

References

Details

Attachments

(6 files)

This is part of bug 1722484 comment 1 to simplify WordBreaker's API.

nextChar gets its value from DecodeChar() and PeekNextChar(), and both
functions return char32_t. It also passes as an char32_t argument to
BreakInBetween(). Therefore it really should have 32 bits to avoid
potential truncation.

Depends on D125148

Assignee: nobody → aethanyc
Status: NEW → ASSIGNED

The motivation of this patch is to remove rarely used API in
WordBreaker. WordBreaker::BreakInBetween() is used only in
nsFind::BreakInBetween() in production, and it can be replaced by
Next().

If the user wants to know whether there is a word break between two
strings such as the use cases in gtest, joining the two strings and
passing the result to Next() is the preferred way.

Depends on D125150

The extra patch here adds some empty fragments; I confirmed locally that this passes with the current WordBreaker::BreakInBetween(), but fails when Part 4 above is used:

Expected equality of these values:
  "This^   ^is^ ^a^ ^internationalization^ ^work^."
  NS_ConvertUTF16toUTF8(result).get()
    Which is: "Th^is^   ^is^ ^a^ ^int^ernationalization^ ^work^."

This shows that the modified implementation is now finding (spurious) breaks at the empty fragments.

Nice catch. Thanks for constructing a testcase that expose the problem in D125151. It also exposes an existing bug in TestFindWordBreakFromPosition that it doesn't consider fragN can be an empty string. As a result, it crashes with your testcase because WordBreaker::FindWord() doesn't accept an empty string as the input.

I don't see why WordBreaker::FindWord() cannot take an empty string because it returns a range with mBegin and mEnd set to aTextLen + 1. I'll post a separate patch to address this problem by either making the tests consider empty string or improving the error story of WordBreaker::FindWord().

The function is added in Bug 1728708 Part 4, but it never get called, so
this patch transforms it into a test in WordBreak test suite to make it
run.

While I'm here, other individual functions are also transformed into
tests so that we can have more granular results if some of them failed.

  • Rename arguments so that their names are consistent with Next().
  • Make the function not assert on an empty string, i.e. aLen == 0, like
    Next().
  • Fix an undefined behavior when the user passes aTextLen == aOffset.
    The methods used to access aText[aOffset] that is clearly out of range
    because the string may not be null-terminated. After this patch, it
    returns a sentinel WordRange when aLen == aPos.
  • Add document and gtest TestFindWordWithEmptyString().

Depends on D125150

Attachment #9240454 - Attachment description: Bug 1730084 Part 4 - Remove WordBreaker::BreakInBetween(). r?jfkthame → Bug 1730084 Part 5 - Remove WordBreaker::BreakInBetween(). r?jfkthame
Attachment #9240510 - Attachment description: Bug 1730084 - Add some empty fragments to the word-breaker test data. r=tylin → Bug 1730084 Part 6 - Add some empty fragments to the word-breaker test data. r=tylin
Pushed by aethanyc@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/315c1c0f7946
Part 1 - Run TestNextWordBreakWithEmptyString as a test. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/564e130b51c5
Part 2 - Change nextChar from char16_t to char32_t in nsFind::Find(). r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/c5607a57f3d7
Part 3 - Remove unused declaration in nsFind.h. r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/2e3ddecd9328
Part 4 - Clean up and fix an edge case of FindWord(). r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/08ee0dfa5b83
Part 5 - Remove WordBreaker::BreakInBetween(). r=jfkthame
https://hg.mozilla.org/integration/autoland/rev/d74edfc1d3ed
Part 6 - Add some empty fragments to the word-breaker test data. r=TYLin
See Also: → 1779846

Could this bug have changed end-of-line selection behaviour in the editor which in turn causes Thunderbird Bug 1779846 where spellcheck fails on such selections?

Flags: needinfo?(aethanyc)

Yes, I think Part 4 regresses Bug 1779846.

Flags: needinfo?(aethanyc)
Regressions: 1779846
See Also: 1779846
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: