Closed Bug 1569567 Opened 5 years ago Closed 5 years ago

Reduce ICU data file size

Categories

(Core :: JavaScript: Internationalization API, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla70
Tracking Status
firefox70 --- fixed

People

(Reporter: anba, Assigned: anba)

Details

Attachments

(7 files)

The uncompressed ICU data file size is 12.704.736 bytes (zipped: 3.874.723 bytes).

Removing unused data can bring it down to 10.881.296 bytes (zipped: 3.093.263 bytes).

  • Base size → 12.704.736
  • Remove unused trees → 12.232.064
  • Remove "Version" → 12.218.704
  • Remove collation "Version" → 12.216.176
  • Remove currency "variant" and "formal" → 12.214.928
  • Remove collation "UCARules" → 12.015.776
  • Remove collation "sequences" → 11.657.840
  • Remove various "locales" data → 10.885.008
  • Remove currency-codes and gender list → 10.881.296

"confusables" feature:

  • Used in 'unicode/uspoof.h' for spoofing detection.

"stringprep" feature:

  • Used in 'unicode/usprep.h' for RFC 3454 string preparation.

"unames" feature:

  • Used in 'unicode/uchar.h' for u_charName to retrieve the name of a Unicode character.

Is there any way we could figure out how to do this somewhat more systematically than just finding these things semi-randomly? I guess the ICU setup is you compile in a big bundle of data and then ICU just queries it and says what it finds, so it's hard to know what needs what, but maybe we could do something at all that's better than you just randomly discovering stuff in here happens to be unused sometimes.

We could do it the other way round and instead of excluding things we don't need, only include stuff we actually need. But in any case we rely on ICU reporting U_MISSING_RESOURCE_ERROR when we've removed too much data.

Sigh, I've forgotten to reimport the tzdata 2019b changes after running icu_sources_data.py, that means the numbers are slightly different than reported in comment #0:

  • Remove unused trees → 12.233.232
  • Remove "Version" and collation "Version" → 12.217.344
  • Remove currency "variant" and "formal" → 12.216.096
  • Remove "UCARUles" → 12.016.944
  • Remove "sequences" → 11.659.008
  • Remove various "locales" data → 10.886.176
  • Remove currency-codes and gender list → 10.882.464
Attachment #9081220 - Attachment description: Bug 1569567 - Part 1: Exclude additional resource files currently not needed. ? → Bug 1569567 - Part 1: Exclude additional resource files currently not needed. r=jwalden!
Attachment #9081221 - Attachment description: Bug 1569567 - Part 2: Remove "Version" info from ICU data file. → Bug 1569567 - Part 2: Remove "Version" info from ICU data file. r=jwalden!
Attachment #9081225 - Attachment description: Bug 1569567 - Part 3: Remove "variant" and "formal" currency values from ICU data file. → Bug 1569567 - Part 3: Remove "variant" and "formal" currency values from ICU data file. r=jwalden!
Attachment #9081227 - Attachment description: Bug 1569567 - Part 4: Remove "UCARules" collation values from ICU data file. → Bug 1569567 - Part 4: Remove "UCARules" collation values from ICU data file. r=jwalden!
Attachment #9081228 - Attachment description: Bug 1569567 - Part 5: Remove "Sequence" collation values from ICU data file. → Bug 1569567 - Part 5: Remove "Sequence" collation values from ICU data file. r=jwalden!
Attachment #9081229 - Attachment description: Bug 1569567 - Part 6: Remove unused "locales" values from ICU data file. → Bug 1569567 - Part 6: Remove unused "locales" values from ICU data file. r=jwalden!
Attachment #9081230 - Attachment description: Bug 1569567 - Part 7: Remove currency codes and gender lists from ICU data file. → Bug 1569567 - Part 7: Remove currency codes and gender lists from ICU data file. r=jwalden!

Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/autoland/rev/6f1c13a0ec7c
Part 1: Exclude additional resource files currently not needed. r=jwalden
https://hg.mozilla.org/integration/autoland/rev/2a3e8df7b7c4
Part 2: Remove "Version" info from ICU data file. r=jwalden
https://hg.mozilla.org/integration/autoland/rev/c2f95d84647b
Part 3: Remove "variant" and "formal" currency values from ICU data file. r=jwalden
https://hg.mozilla.org/integration/autoland/rev/03f0b86873ac
Part 4: Remove "UCARules" collation values from ICU data file. r=jwalden
https://hg.mozilla.org/integration/autoland/rev/b57c502e3167
Part 5: Remove "Sequence" collation values from ICU data file. r=jwalden
https://hg.mozilla.org/integration/autoland/rev/8c103cf517ae
Part 6: Remove unused "locales" values from ICU data file. r=jwalden
https://hg.mozilla.org/integration/autoland/rev/1d141c74e8dc
Part 7: Remove currency codes and gender lists from ICU data file. r=jwalden

Keywords: checkin-needed
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: