Open Bug 1551746 Opened 6 years ago Updated 2 years ago

Replace use of escape() and unescape() in hacky UTF-8 <--> UTF-16 conversion

Categories

(MailNews Core :: Feed Reader, task)

Tracking

(Not tracked)

People

(Reporter: jorgk-bmo, Unassigned)

References

Details

+++ This bug was initially created as a clone of Bug #1349722 +++

https://hg.mozilla.org/comm-central/rev/55b04a77e7610a1907960a9268f5e816869cddc8
looks quite hacky and both escape() and unescape() are deprecated.

The JS Mime way to do the UTF-8 to UTF-16 is this:
https://searchfox.org/comm-central/rev/d86758c2328ae10f2d3f8b0422772e33e858f089/mailnews/mime/jsmime/jsmime.js#577

Basically, it's just using a "normal" TextDecoder() for "UTF-8".

Henri, what do you think of unescape(encodeURIComponent(source)) and decodeURIComponent(escape(url)) as UTF-16 to UTF-8 and UTF-8 to UTF-16 conversions in JS. Hacky? Our mail headers may contain raw UTF-8 and we need to convert from that to JS strings in UTF-16. Is there a better way? Maybe there is some code in M-C that does the same.

As stated in comment #0, JS Mime does the raw UTF-8 to JS string conversion using a byte array and a text decoder.

Flags: needinfo?(hsivonen)

(In reply to Jorg K (GMT+2) from comment #1)

Henri, what do you think of unescape(encodeURIComponent(source)) and decodeURIComponent(escape(url)) as UTF-16 to UTF-8 and UTF-8 to UTF-16 conversions in JS. Hacky?

That solution seems hacky and inefficient.

Our mail headers may contain raw UTF-8 and we need to convert from that to JS strings in UTF-16. Is there a better way? Maybe there is some code in M-C that does the same.

I suggest:

function binaryStringToArrayBuffer(str) {
    let buf = new Uint8Buffer(str.length);
    for (let i = 0; i < str.length; i++) {
        buf[i] = str.charCodeAt(i);
    }
    return buf;
}

function decodeUtf8BytesInString(bytesAsUtf16LowerHalves) {
    return (new TextDecoder()).decode(binaryStringToArrayBuffer(bytesAsUtf16LowerHalves));
}

Flags: needinfo?(hsivonen)

I should have read earlier comments better. That's exactly what JSMime does (but with different function names).

Thanks, yes, JS Mimce does that. And for the way back, UTF-16 to UTF-8?

(In reply to Jorg K (GMT+2) from comment #4)

Thanks, yes, JS Mimce does that. And for the way back, UTF-16 to UTF-8?

After TextEncoder, I don't know if there's a more efficient way to convert a Uint8Array into a string whose each code unit represents a byte value than to call String.fromCharCode() on a per-byte basis and to concatenate the results.

Thanks, Henri. I'll get to it. Not our most pressing issue, I just saw this in passing.

Type: defect → task
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.