Reduce ICU data file size
Categories
(Core :: JavaScript: Internationalization API, enhancement)
Tracking
()
Tracking | Status | |
---|---|---|
firefox70 | --- | fixed |
People
(Reporter: anba, Assigned: anba)
Details
Attachments
(7 files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review |
The uncompressed ICU data file size is 12.704.736 bytes (zipped: 3.874.723 bytes).
Removing unused data can bring it down to 10.881.296 bytes (zipped: 3.093.263 bytes).
- Base size → 12.704.736
- Remove unused trees → 12.232.064
- Remove "Version" → 12.218.704
- Remove collation "Version" → 12.216.176
- Remove currency "variant" and "formal" → 12.214.928
- Remove collation "UCARules" → 12.015.776
- Remove collation "sequences" → 11.657.840
- Remove various "locales" data → 10.885.008
- Remove currency-codes and gender list → 10.881.296
Assignee | ||
Comment 1•5 years ago
|
||
"confusables" feature:
- Used in 'unicode/uspoof.h' for spoofing detection.
"stringprep" feature:
- Used in 'unicode/usprep.h' for RFC 3454 string preparation.
"unames" feature:
- Used in 'unicode/uchar.h' for
u_charName
to retrieve the name of a Unicode character.
Assignee | ||
Comment 2•5 years ago
|
||
Depends on D39664
Assignee | ||
Comment 3•5 years ago
|
||
Depends on D39665
Assignee | ||
Comment 4•5 years ago
|
||
Depends on D39666
Assignee | ||
Comment 5•5 years ago
|
||
Depends on D39667
Assignee | ||
Comment 6•5 years ago
|
||
Depends on D39668
Assignee | ||
Comment 7•5 years ago
|
||
Depends on D39669
Comment 8•5 years ago
|
||
Is there any way we could figure out how to do this somewhat more systematically than just finding these things semi-randomly? I guess the ICU setup is you compile in a big bundle of data and then ICU just queries it and says what it finds, so it's hard to know what needs what, but maybe we could do something at all that's better than you just randomly discovering stuff in here happens to be unused sometimes.
Assignee | ||
Comment 9•5 years ago
|
||
We could do it the other way round and instead of excluding things we don't need, only include stuff we actually need. But in any case we rely on ICU reporting U_MISSING_RESOURCE_ERROR
when we've removed too much data.
Assignee | ||
Comment 10•5 years ago
|
||
Sigh, I've forgotten to reimport the tzdata 2019b changes after running icu_sources_data.py
, that means the numbers are slightly different than reported in comment #0:
- Remove unused trees → 12.233.232
- Remove "Version" and collation "Version" → 12.217.344
- Remove currency "variant" and "formal" → 12.216.096
- Remove "UCARUles" → 12.016.944
- Remove "sequences" → 11.659.008
- Remove various "locales" data → 10.886.176
- Remove currency-codes and gender list → 10.882.464
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 11•5 years ago
|
||
Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1ece604451cdc5c018f649bfea208c20c748e168
Comment 12•5 years ago
|
||
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/autoland/rev/6f1c13a0ec7c
Part 1: Exclude additional resource files currently not needed. r=jwalden
https://hg.mozilla.org/integration/autoland/rev/2a3e8df7b7c4
Part 2: Remove "Version" info from ICU data file. r=jwalden
https://hg.mozilla.org/integration/autoland/rev/c2f95d84647b
Part 3: Remove "variant" and "formal" currency values from ICU data file. r=jwalden
https://hg.mozilla.org/integration/autoland/rev/03f0b86873ac
Part 4: Remove "UCARules" collation values from ICU data file. r=jwalden
https://hg.mozilla.org/integration/autoland/rev/b57c502e3167
Part 5: Remove "Sequence" collation values from ICU data file. r=jwalden
https://hg.mozilla.org/integration/autoland/rev/8c103cf517ae
Part 6: Remove unused "locales" values from ICU data file. r=jwalden
https://hg.mozilla.org/integration/autoland/rev/1d141c74e8dc
Part 7: Remove currency codes and gender lists from ICU data file. r=jwalden
Comment 13•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/6f1c13a0ec7c
https://hg.mozilla.org/mozilla-central/rev/2a3e8df7b7c4
https://hg.mozilla.org/mozilla-central/rev/c2f95d84647b
https://hg.mozilla.org/mozilla-central/rev/03f0b86873ac
https://hg.mozilla.org/mozilla-central/rev/b57c502e3167
https://hg.mozilla.org/mozilla-central/rev/8c103cf517ae
https://hg.mozilla.org/mozilla-central/rev/1d141c74e8dc
Description
•