Closed Bug 1745113 Opened 3 years ago Closed 3 years ago

Unify ClusterIterator with Grapheme segmenter

Categories

(Core :: Internationalization, task, P3)

task

Tracking

()

RESOLVED FIXED
98 Branch
Tracking Status
firefox98 --- fixed

People

(Reporter: m_kato, Assigned: TYLin)

References

Details

Attachments

(5 files)

I guess that ClusterIterator is Grapheme break segmenter, so It may/can be unified like line/work breaker?

Sounds reasonable to me.

We also have ClusterReverseIterator(), but ClusterReverseIterator::Next() looks basic comparing to ClusterIterator::Next(). I wonder if we can reimplement it using std::reverse_iterator or ReverseIterator on ClusterIterator.

Agreed, ClusterIterator is basically a grapheme break segmenter, and could be unified.

If the segmenter is bidirectional (supports Prev as well as Next), then it should be able to replace ClusterReverseIterator as well. Currently the reverse iterator is less complete/correct than the forward one, as you noticed. The reverse version is only used in a couple of places and the limitations of the implementation haven't particularly mattered to us afaik, but I don't think there would be any problem with making them more symmetrical if it's convenient to share the underlying implementation.

(I don't think we can directly use std::reverse_iterator or ReverseIterator as things stand, because they'd require the underlying iterator to be bidirectional or at least to support a decrement operator (like a Prev method), which ClusterIterator doesn't currently have.)

I'm planning to take a look starting this week.

For the forward iterator, I think we can use a similar approach like bug 1722484. That is, we can use existing ClusterIterator to implement class GraphemeClustersBreakIteratorUtf16 in Segmenter.h and rewrite the callers.

For the reverse iterator, maybe we can replace the caller with forward ones like we did for line breaker in bug 1733009 because we currently don't have the ability to iterate the grapheme clusters backwards in ICU4X anyway. If the reverse iterator is critical, then like Jonathan suggested in comment 2, we should probably keep it, and investigate sharing the underlying implementation with ClusterIterator.

Assignee: nobody → aethanyc
Status: NEW → ASSIGNED
Depends on: 1749440

This patch doesn't change the behavior. Just move the code around.

Include "nsLayoutUtils.h" in nsFileControlFrame to get rid of warnings in my
editor because it uses utilities such as nsLayoutUtils::AppUnitWidthOfString.
We compile it without issues because of unified build.

Depends on D135639

Span is more versatile than const String&, and existing callers passing a
String into this function do not need any change.

This patch makes the next part simpler.

Depends on D135641

This is the main patch for the bug. It aims to change the grapheme cluster
break's Next() API by implementing SegmentIteratorUtf16 interface, and adapt
the callers. It shouldn't change the behavior.

While rewriting the caller, one caveat worth mentioning is the loop termination
condition. If the old code relies on !AtEnd() as the loop termination
condition, and it advances the iterator at the end of the loop, it meant
to skip its logic when the break position is at the end of the string. For
example, see the mozTXTToHTMLConv::NumberOfMatches.

This patch also hooks grapheme cluster break iterator into
Segmenter::TryCreate() interface.

Existing test coverage for the file changed:

  • netwerk/test/unit/test_mozTXTToHTMLConv.js
  • layout/reftests/forms/input/file/dynamic-max-width.html

Depends on D135642

Pushed by aethanyc@gmail.com: https://hg.mozilla.org/integration/autoland/rev/8031a294bc7c Part 1 - Move ClusterIterator into Segmenter.h, and rename it. r=necko-reviewers,kershaw https://hg.mozilla.org/integration/autoland/rev/a35cd10a12ec Part 2 - Move ClusterReverseIterator into Segmenter.h, and rename it. r=necko-reviewers,kershaw https://hg.mozilla.org/integration/autoland/rev/15139f534811 Part 3 - Change CountGraphemeClusters() to take a Span parameter. r=jfkthame https://hg.mozilla.org/integration/autoland/rev/195a3e5a485d Part 4 - Change AppUnitWidthOfString to take a Span parameter. r=jfkthame https://hg.mozilla.org/integration/autoland/rev/1df57205c907 Part 5 - Make grapheme cluster break iterators implement SegmentIteratorUtf16, and adapt the callers. r=necko-reviewers,jfkthame,kershaw
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: