Closed Bug 1908051 Opened 4 months ago Closed 4 months ago

Intl.Segmenter grapheme splitting is incorrect for Indic languages

Categories

(Core :: Internationalization, defect)

Firefox 128
defect

Tracking

()

RESOLVED DUPLICATE of bug 1899411

People

(Reporter: agnijith, Unassigned)

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0

Steps to reproduce:

Run the below script in the browser console:

let input = "തേങ്ങാക്കുല";
let segmenter = new Intl.Segmenter();
let segments = [...segmenter.segment(input)].map(x => x.segment);
console.log(segments);

Actual results:

segments is an Array of length = 6:

[
  "തേ",
  "ങ്",
  "ങാ",
  "ക്",
  "കു",
  "ല"
]

Expected results:

segments should be an Array of length = 4:

[
    "തേ",
    "ങ്ങാ",
    "ക്കു",
    "ല"
]

The Bugbug bot thinks this bug should belong to the 'Core::Internationalization' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Internationalization
Product: Firefox → Core

Fixed by the update in bug 1899411. That means the next release (Firefox 129) will handle this case properly.

Status: UNCONFIRMED → RESOLVED
Closed: 4 months ago
Duplicate of bug: 1899411
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: