Closed Bug 1986247 Opened 3 months ago Closed 3 months ago

Complain about attempts to vendor crates that duplicate ICU4X functionality

Categories

(Developer Infrastructure :: Mach Vendor & Updatebot, enhancement)

enhancement

Tracking

(firefox144 fixed)

RESOLVED FIXED
144 Branch
Tracking Status
firefox144 --- fixed

People

(Reporter: hsivonen, Assigned: hsivonen)

References

Details

Attachments

(1 file)

Bug 1948330 added a transitive dependency on unic-ucd-ident (and its unic-* dependencies), whose last development activity was on 2020-10-21 and, therefore, is out of date relative to the latest Unicode version. Additionally, the duplication with icu_properties is bad for binary size. (This is now being addressed.)

mach vendor rust should probably complain about attempts to vendor crates that are duplicative with ICU4X which we're trying to migrate towards.

In addition to the Unicode-aware functionality in the standard library, the Rust ecosystem has four sets of crates for Unicode stuff:

  • ICU4X (crates prefixed with icu_; this is what we are trying to migrate towards)
  • unicode-rs (crates prefixed with unicode-; a loose constellation, not all managed by https://github.com/unicode-rs/ )
  • UNIC (crates prefixed with unic-)
  • rust_icu (crates prefixed with rust_icu_; not actually Rust internals but binding for ICU4C)

I suggest adding the following denylist to mach vendor rust:

  • Block vendoring of crates whose name starts with unic- with the exception of unic-langid and unic-langid-ffi. Rationale: UNIC in general is unmaintained and not being updated to new Unicode versions. unic-langid is maintained, though, and is used widely enough in Gecko to require a more careful migration plan to icu_locale.
  • Block vendoring of crates whose name starts with rust_icu. Rationale: These are bindings for ICU4C, and we're trying to migrate away from ICU4C towards ICU4X. (The risk of accidentally vendoring these is low, but it's also an easy denylist item.)
  • Block vendoring of unicode-normalization, unicode-segmentation, unicode-ccc, unicode-canonical-combining-class, unicode-general-category, unicode-joining-type, and unicode-case-mapping. Rationale: These would be duplicative relative to icu_normalizer, icu_segmenter, icu_properties, and icu_casemap. (It's not practical to deny-list everything that starts with unicode-. Notably, unicode-bidi as a whole is out of scope for ICU4X and we use unicode-bidi with Unicode Database things redirected to ICU4X. At present, ICU4X does not have the API surface of unicode-width even though it has the raw East Asian Width data.)

It would be great for us to also be able to migrate onwards from the unic-langid crates, as they're in practice only "maintained" in the sense that if we get something like bug 1917175 or bug 1872962 reported, I'll need to provide a fix for the upsteam packages. But as noted above, such a migration would need more care and attention than has been available so far, and would need to include work on the fluent.rs crates, which are no longer maintained by Mozilla, but by outside contributors.

(In reply to Henri Sivonen (:hsivonen) from comment #0)

It's not practical to deny-list everything that starts with unicode-.

On second thought, perhaps we should try denylisting crates whose name starts with unicode- and make exceptions for unicode-bidi, unicode-bidi-ffi, unicode-width, and unicode-ident and see how it goes. (unicode-ident seems to be used enough that it's not practical to denylist it at this point in time. It seems that unicode-normalization has been re-introduced while I wasn't looking. I'll file a separate bug about that.)

While at it, let's put feruca on a mach vendor rust denylist. It is duplicative with icu_collator (bug 1937541) but less complete.

unicase vs icu_casemap might be same issue.

Assignee: nobody → hsivonen
Status: NEW → ASSIGNED

(In reply to Makoto Kato [:m_kato] from comment #3)

unicase vs icu_casemap might be same issue.

Indeed. Let's talk about the Application Services aspect in bug 1986265. I also filed bug 1986401.

Pushed by hsivonen@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/1880e3c68c73 https://hg.mozilla.org/integration/autoland/rev/27c3dcad78cf Complain about attempts to vendor crates that duplicate ICU4X functionality. r=firefox-build-system-reviewers,supply-chain-reviewers,platform-i18n-reviewers,jfkthame,glandium DONTBUILD
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → 144 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: