Open Bug 1370185 Opened 7 years ago Updated 2 years ago

Sorting Tibetan script (Tibetan or Dzongkha language)

Categories

(Core :: JavaScript: Internationalization API, defect, P5)

45 Branch
defect

Tracking

()

UNCONFIRMED

People

(Reporter: elie.roux, Unassigned)

Details

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0
Build ID: 20170419042421

Steps to reproduce:

1. Install the dz_BT and bo_IN locales (I'm under Debian Sid)
2. Run the following code:

var tibCollator = new Intl.Collator('dz');
var tibSortedArray = ["ང", "རྔ", "ལྔ", "སྔ", "བརྔ", "བསྔ", "ཅ"];
var tibRandomArray = ["ལྔ", "ང", "ཅ", "རྔ", "སྔ", "བརྔ", "བསྔ"];
var tibResultArray = ["ལྔ", "ང", "ཅ", "རྔ", "སྔ", "བརྔ", "བསྔ"];
tibResultArray.sort(tibCollator.compare);
var resPrint = "";
if (JSON.stringify(tibSortedArray)==JSON.stringify(tibResultArray)) {
  resPrint = "ok!";
} else {
  resPrint = "error: "+JSON.stringify(tibRandomArray)+" has been sorted as "+JSON.stringify(tibResultArray)+", should have been "+JSON.stringify(tibSortedArray);
}
console.log(resPrint);


Actual results:

See first the expected:

dz, bo, dz-BT, bo-IN

then the unexpected

error: ["ལྔ","ང","ཅ","རྔ","སྔ","བརྔ","བསྔ"] has been sorted as ["ང","ཅ","བརྔ","བསྔ","རྔ","ལྔ","སྔ"], should have been ["ང","རྔ","ལྔ","སྔ","བརྔ","བསྔ","ཅ"]


Expected results:

the array of tibetan strings should have been correctly sorted.

Dzongka sorting data is in CLDR files for a long time, and is present in the GNU Glibc, as one can see in /usr/share/i18n/locales/dz_BT, so I have no idea why this doesn't work.

Note that it also does not work in Chrome, but I think the reason is quite different:

https://bugs.chromium.org/p/chromium/issues/detail?id=729508
Component: Untriaged → JavaScript Engine
Product: Firefox → Core
Component: JavaScript Engine → JavaScript: Internationalization API
I think I've identified the root cause of this bug:
We're using ICU to implement the Intl.Collator object, and it seems like ICU is returning inconsistent data about the supported collation types. ICU claims it supports "dz" (per ucol_getAvailable), but when we construct the UCollator object, the actual locale is the root locale. (I still need to verify this for ICU4C, but at least that's the case for ICU4J.)

This bug can also reproduced for the locales "bo" (which imports the collation rules from "dz") and "wae", and also for the collation "de-u-co-eor" (per ucol_getKeywordValuesForLocale, "de" supports "eor", but the actual collator uses "und-u-co-eor"). 

"dz", "bo", "wae", and "de-u-co-eor" all have in common that their status is either draft="unconfirmed" or draft="provisional" (http://cldr.unicode.org/index/process#resolution_procedure). 

So it seems like we should file a bug at ICU's bug tracker...
Thanks a lot for the rapid answer! I've opened a ticket: http://bugs.icu-project.org/trac/ticket/13224 

When/if you have a small example showing the bug can you upload it on the ICU ticket?
(In reply to Elie Roux from comment #2)
> Thanks a lot for the rapid answer! I've opened a ticket:
> http://bugs.icu-project.org/trac/ticket/13224 
> 
> When/if you have a small example showing the bug can you upload it on the
> ICU ticket?

Thank you for reporting this issue!

I've added a simple test case to the ICU ticket, hopefully this helps to determine what needs to be changed to get this issue resolved.
Thanks a lot for the detailed example, it helps a lot!
Priority: -- → P5
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.