Closed Bug 1533481 Opened 6 years ago Closed 6 years ago

Update our in-tree ICU to 64 and update to Unicode 12

Categories

(Core :: JavaScript: Internationalization API, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox67 --- wontfix
firefox68 --- fixed

People

(Reporter: anba, Assigned: anba)

References

Details

Attachments

(11 files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
10.00 MB, application/octet-stream
anba
: review+
Waldo
: feedback+
Details
5.60 MB, application/octet-stream
anba
: review+
Waldo
: feedback+
Details

From http://site.icu-project.org/download/64:

ICU 64 updates to Unicode 12 and to CLDR 35 locale data with many additions and corrections, and some new languages. ICU adds a data filtering/subsetting mechanism, improved formatting API, and a C++ LocaleBuilder.

That means we need to combine the ICU update with the Unicode update.

Try with RC2:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d93a45589082049ec140fd5201d1c918125c7342

Mostly green except one WPT test which needs to be updated.

Thank you Andre! This update will also give us new APIs for RelativeTimeFormat.formatToParts IIRC. Anything else for ECMA402?

Anything else for ECMA402?

From http://site.icu-project.org/download/64:

  • Unicode and CLDR updates

From http://htmlpreview.github.io/?https://github.com/unicode-org/icu/blob/maint/maint-64/icu4c/APIChangeReport.html

  • ulistfmt_formatStringsToResult, ulistfmt_resultAsValue, and ufmtval_nextPosition are maybe useful for Intl.ListFormat.formatToParts? (But I haven't yet checked what these functions actually do, currently I'm simply guessing what they might do based on their names...)
  • And as you said, ureldatefmt_formatToResult, ureldatefmt_resultAsValue, and ufmtval_nextPosition are possibly the missing functions needed for Intl.RelativeTimeFormat.formatToParts.

Apart from that, the update also includes fixes for multiple OOM issues (bug 1495244, bug 1431142). And the new data filter builder could be used to further trim icudtxxl.dat by about 500KB (unzipped) resp. ~50KB (zipped), when we remove support for "confusables", "stringprep", "unames", and "cnvalias" (the last one requires UCONFIG_NO_CONVERSION to be set). The data for these features isn't used for Intl and I guess it's also not necessary for other ICU uses in the tree, but I still need to check that with :m_kato (probably?).

New Try-build testing all platforms/tests: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5a679ea6cffafc33226939c0844047aad8a5d258

Seems to look pretty good!

We're setting LC_ALL to "en_US.UTF-8" in js/src/tests/lib/tests.py, but somehow
this doesn't always work, for example on the Try-servers the default locale is
set to "und", but this effect wasn't reproducible locally. Switch to ICU's
default locale function which contains multiple fallbacks to ensure LC_ALL is
honoured as expected.

Depends on D25262

  • ICU no longer supports removing resource files manually, instead the new data
    filter builder needs to be used.
  • This includes test data, which also must be preserved in ICU 64.
  • umutex.h contains an implicit call to new in ICU64 which needs to be allowed
    in check_vanilla_allocations.py.
  • Drive-by: Make ICU with multiple jobs for faster rebuilds.

Depends on D25263

Unicode 12 changes the character property of U+166D (CANADIAN SYLLABICS CHI SIGN)
from Po (Punctuation, other) to So (Symbol, other), therefore replace U+166D with
U+166E (CANADIAN SYLLABICS FULL STOP).

Depends on D25267

  • Add markers to generated files to discourage manual edits.
  • Add generated file to clang-format-ignore.
  • Handle currently not supported Georgian characters in the Perl script.

Depends on D25268

The actual patch file is too large (84.831.406 bytes unzipped, 16.352.610 bytes zipped), so I can neither upload it through moz-phab, nor arc (even when --less-context is used), nor uploading it to Bugzilla directly (Bugzilla has a 10MB limit). So I had to split it into two zip-files. Alternatively it's also uploaded to https://depot.tu-dortmund.de/yxrd4 (valid for 28 days).

And the feedback request is actually a review request, but without being able to set r? we need to get creative. :-)

Attachment #9054286 - Flags: feedback?(jwalden)

Part two of the split zip file.

Attachment #9054287 - Flags: feedback?(jwalden)

FWIW, here's the moz-phab error output:

Creating new revision:
467741:ed5f39250c5e Bug 1533481 - Part 5: Update in-tree ICU to release 64.1. r=jwalden!
Linting...
No lint engine configured for this project.
Running unit tests...
No unit test engine is configured for this project.


    Diff for 'intl/icu/source/data/brkitr/dictionaries/cjdict.txt' with
    context is 4,504,160 bytes in length. Generally, source changes should
    not be this large. If this file is a huge text file, try using the
    '--less-context' flag. If the file is not a text file, you can mark it
 Exception 
The program is attempting to read user input, but stdin is being piped from some other source (not a TTY).
(Run with `--trace` for a full exception trace.)
    'binary'. Mark this file as 'binary' and continue? [y/N]
CalledProcessError: Command '['/home/andre/bin/arc', 'diff', '--base', 'arc:this', '--allow-untracked', '--no-amend', '--no-ansi', '--message-file', '/tmp/tmpu_HTeO', '--create']' returned non-zero exit status 1

After that the local repository needed to be fixed manually with hg evolve to move the changesets back into the correct order.

And the arc diff output:

andre@VBdev:~/hg/mozilla-inbound$ arc diff --less-context --create
You have a saved revision message in '.hg/arc/create-message'.
You can use this message, or discard it.

    Do you want to use this message? [Y/n] y

Linting...
No lint engine configured for this project.
Running unit tests...
No unit test engine is configured for this project.


    Diff for 'intl/icu/source/data/brkitr/dictionaries/cjdict.txt' with
    context is 4,504,160 bytes in length. Generally, source changes should
    not be this large. If the file is not a text file, you can mark it
    'binary'. Mark this file as 'binary' and continue? [y/N] y



    Diff for 'intl/icu/source/test/testdata/BidiCharacterTest.txt' with
    context is 6,968,550 bytes in length. Generally, source changes should
    not be this large. If the file is not a text file, you can mark it
    'binary'. Mark this file as 'binary' and continue? [y/N] y



    Diff for 'intl/icu/source/test/testdata/BidiTest.txt' with context is
    8,457,590 bytes in length. Generally, source changes should not be this
    large. If the file is not a text file, you can mark it 'binary'. Mark
    this file as 'binary' and continue? [y/N] y


mmap() failed: [12] Cannot allocate memory

mmap() failed: [12] Cannot allocate memory

mmap() failed: [12] Cannot allocate memory

Comment on attachment 9054286 [details]
bug1533481-part5-icu-update.zip.001

rs=me

Attachment #9054286 - Flags: feedback?(jwalden) → feedback+

Comment on attachment 9054287 [details]
bug1533481-part5-icu-update.zip.002

rs=me

Attachment #9054287 - Flags: feedback?(jwalden) → feedback+

Comment on attachment 9054286 [details]
bug1533481-part5-icu-update.zip.001

Add r+ in addition to f+ per comment #16 for tracking purposes.

Attachment #9054286 - Flags: review+

Comment on attachment 9054287 [details]
bug1533481-part5-icu-update.zip.002

Add r+ in addition to f+ per comment #17 for tracking purposes.

Attachment #9054287 - Flags: review+
Attachment #9054263 - Attachment description: Bug 1533481 - Part 8: Replace U+166D with U+166E after update to Unicode 12. r=jfkthame! → Bug 1533481 - Part 8: Remove test for U+166D after update to Unicode 12. r=jfkthame!

Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=22228ca898566f7306f415f43f08492a9274ddb4

Parts 1-4 and 6-10 are Phabricator requests, but Part 5 isn't (due to size restrictions in Phabricator). Instead Part 5 had to be uploaded as a split zip-file (bug1533481-part5-icu-update.zip.001 and bug1533481-part5-icu-update.zip.002) directly to Bugzilla (the zipped patch had to split because of size restrictions in Bugzilla). It is still necessary to apply all parts in the correct order to avoid compiler errors resp. test failures. Sorry for the inconvenience!

Keywords: checkin-needed

Pushed by nerli@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/eaddff053db4
Part 1: Use ICU to retrieve the system default locale. r=jwalden!
https://hg.mozilla.org/integration/autoland/rev/6acec9d8b4a7
Part 2: Update SpiderMonkey to Unicode 12. r=jwalden!
https://hg.mozilla.org/integration/autoland/rev/de8c3ab271f7
Part 3: Remove no longer needed ICU patches or update to apply cleanly. r=jwalden!
https://hg.mozilla.org/integration/autoland/rev/912fb96c21b3
Part 4: Prepare update to ICU 64. r=jwalden!
https://hg.mozilla.org/integration/autoland/rev/cedfd8518726
Part 5: Update in-tree ICU to release 64.1. r=jwalden!
https://hg.mozilla.org/integration/autoland/rev/347e9b869ceb
Part 6: Update switch to handle new UNumberFormatFields entries. r=jwalden!
https://hg.mozilla.org/integration/autoland/rev/635920ce336f
Part 7: Update expected results after ICU update. r=jwalden!
https://hg.mozilla.org/integration/autoland/rev/cf0206013ba0
Part 8: Remove test for U+166D after update to Unicode 12. r=jfkthame!
https://hg.mozilla.org/integration/autoland/rev/8955c8b9e8b3
Part 9: Update Gecko to Unicode 12. r=jfkthame!
https://hg.mozilla.org/integration/autoland/rev/d10b57a89a73
Part 10: Updating ICU requires a clobber. r=jwalden!

Keywords: checkin-needed

Coverity found 147 new issues in that release.
Some issues are marked as high.
Andi or myself can provide access.

Actually, I was mislead by the automation email. Coverity could not correlate the new defects with the old.
We are good, we didn't regress (even if ICU would benefit from some cleanup)
(oh, I didn't realized that they moved to github and that they accept PR \o/)

There also seem to be plans to add Coverity builds directly to ICU, maybe that'll help to reduce the warnings: https://unicode-org.atlassian.net/browse/ICU-20495

Depends on: 1543644
Regressions: 1543644
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/16502 for changes under testing/web-platform/tests
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: