Closed Bug 1715595 Opened 4 months ago Closed 3 months ago

Use char rather than uint8_t for utf-8 in unified components

Categories

(Core :: Internationalization, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
92 Branch
Tracking Status
firefox92 --- fixed

People

(Reporter: dminor, Assigned: dminor)

References

(Blocks 1 open bug)

Details

(Whiteboard: [i18n-unification])

Attachments

(1 file)

We should replace uint8_t with char in the unified components. I had originally used uint8_t for compatibility with Rust which is using u8, but the ICU4X C FFI has been developed around char instead.

This is pending the resolution of https://github.com/unicode-org/icu4x/issues/769, in case we end up choosing something different there.

Assignee: nobody → dminor
Status: NEW → ASSIGNED

My gut instinct would be to prefer an unsigned type here; UTF-8 uses (virtually) the full range of byte values from 0x00 - 0xFF, and having half of them be negative on the C side can be something of a potential footgun.

But I guess we need to see where the ICU4X issue goes....

My take on the upstream issue is that the decision is to use char and the issue remains open to update the unit tests where needed. I'll double check.

Confirmed that upstream is going to use char.

Pushed by dminor@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5e6a1afb2e39
Use char rather than uint8_t for utf-8 in unified components r=platform-i18n-reviewers,gregtatum
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → 92 Branch
Whiteboard: [i18n-unification]
You need to log in before you can comment on or make changes to this bug.