1347877 - Remove nsIScriptableUnicodeConverter

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Reporter

Description

•

8 years ago

We should remove nsIScriptableUnicodeConverter and have chrome script use the same facilities as Web scripts (TextDecoder/TextEncoder) instead of maintaining a parallel XPCOM universe of things.

:aceman

Updated

•

8 years ago

Blocks: post-57-api-changes

No longer depends on: post-57-api-changes

:aceman

Comment 1

•

8 years ago

This is still used a lot in calendar, chat and mailnews. Can you please describe what the replacement is?

Jorg K (CEST = GMT+2)

Comment 2

•

8 years ago

See for example: https://dxr.mozilla.org/comm-central/rev/5d39d63d0bf223541c5515ab2691736f5a638ded/mailnews/compose/test/unit/test_longLines.js#41

Masatoshi Kimura [:emk]

Comment 3

•

8 years ago

The problem is that TextEncoder does not support non-UTF-8 encodings. We will have to move nsIScriptableUnicodeConverter to c-c unless we fix bug 862292.

alta88

Updated

•

8 years ago

Depends on: 1349722

Makoto Kato [:m_kato]

Updated

•

8 years ago

Priority: -- → P3

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Reporter

Updated

•

8 years ago

Depends on: 1353285

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Reporter

Comment 4

•

8 years ago

(In reply to :aceman from comment #1) > This is still used a lot in calendar, chat and mailnews. > Can you please describe what the replacement is? Filed bug 1353285. (Covers only encode to UTF-8. Please don't do output in legacy encodings.) (In reply to Masatoshi Kimura [:emk] from comment #3) > The problem is that TextEncoder does not support non-UTF-8 encodings. We > will have to move nsIScriptableUnicodeConverter to c-c unless we fix bug > 862292. Fixing bug 862292 would be a fine solution. :-) Failing that, c-c needs to figure out on its own how it wants to call into the upcoming (bug 1261841) mozilla::Encoding/mozilla::Encoder for ISO-2022-JP encode if it moves the message encoding step from C++ to JS in the future (I believe the relevant call is in C++ at present). My advice would be not moving nsIScriptableUnicodeConverter over to c-c, but c-c devs may, of course, opt to do so.

alta88

Comment 5

•

8 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #4) > (In reply to :aceman from comment #1) > > This is still used a lot in calendar, chat and mailnews. > > Can you please describe what the replacement is? > > Filed bug 1353285. (Covers only encode to UTF-8. Please don't do output in > legacy encodings.) > Henri, as far as nsIScriptableUnicodeConverter, do you have an opinion on using the unescape(encodeURIComponenet(str))<->decodeURIComponent(escape(str)) method to do the js<->cpp conversion, as proposed in Bug 1349722 patch? (I don't think async is important for the use case.)

Naveed Ihsanullah [:naveed]

Updated

•

8 years ago

Whiteboard: [qf]

Naveed Ihsanullah [:naveed]

Updated

•

8 years ago

Whiteboard: [qf] → [qf-]

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Reporter

Comment 6

•

8 years ago

(In reply to alta88 from comment #5) > (In reply to Henri Sivonen (:hsivonen) from comment #4) > > (In reply to :aceman from comment #1) > > > This is still used a lot in calendar, chat and mailnews. > > > Can you please describe what the replacement is? > > > > Filed bug 1353285. (Covers only encode to UTF-8. Please don't do output in > > legacy encodings.) > > > > Henri, as far as nsIScriptableUnicodeConverter, do you have an opinion on > using the > unescape(encodeURIComponenet(str))<->decodeURIComponent(escape(str)) method > to do > the js<->cpp conversion, as proposed in Bug 1349722 patch? (I don't think > async is > important for the use case.) I think I'm missing some context for this question. Do you mean to perform a conversion between an UTF-16 JS string and a UTF-8 string represented as a JS string with each byte zero-extended into a 16-bit code unit without actually caring about URL encoding? Or does the escaping and unescaping actually serve some URL-related purpose here?

alta88

Comment 7

•

8 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #6) > (In reply to alta88 from comment #5) > > (In reply to Henri Sivonen (:hsivonen) from comment #4) > > > (In reply to :aceman from comment #1) > > > > This is still used a lot in calendar, chat and mailnews. > > > > Can you please describe what the replacement is? > > > > > > Filed bug 1353285. (Covers only encode to UTF-8. Please don't do output in > > > legacy encodings.) > > > > > > > Henri, as far as nsIScriptableUnicodeConverter, do you have an opinion on > > using the > > unescape(encodeURIComponenet(str))<->decodeURIComponent(escape(str)) method > > to do > > the js<->cpp conversion, as proposed in Bug 1349722 patch? (I don't think > > async is > > important for the use case.) > > I think I'm missing some context for this question. Do you mean to perform a > conversion between an UTF-16 JS string and a UTF-8 string represented as a > JS string with each byte zero-extended into a 16-bit code unit Yes. > without > actually caring about URL encoding? The context is that the xhr response is an xml doc and utf8 is the encoding, and nothing is sniffed or otherwise detected explicitly for any other encoding. Upon js (using UTF-16 internally) parsing of the doc, it is given to xpcom as a |string| which carries no explicit (likely 8859-1 implied) encoding info. The string is currently converted into the right char * bytes using nsIScriptableUnicodeConverter.ConvertFromUnicode and UTF-8 as the encoding. > Or does the escaping and unescaping > actually serve some URL-related purpose here? This seems to be part and parcel of using *codeURIComponenet, meaning it doesn't work without that conversion. So the question is whether you're familiar with this method and whether it's an acceptable alternative to writing some other conversion code as filed in Bug 1353285. It wfm for all manner of idn and non ascii urls as well as utf8 js string content.

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Reporter

Comment 8

•

8 years ago

(In reply to alta88 from comment #7) > (In reply to Henri Sivonen (:hsivonen) from comment #6) > > (In reply to alta88 from comment #5) > > > (In reply to Henri Sivonen (:hsivonen) from comment #4) > > > > (In reply to :aceman from comment #1) > > > > > This is still used a lot in calendar, chat and mailnews. > > > > > Can you please describe what the replacement is? > > > > > > > > Filed bug 1353285. (Covers only encode to UTF-8. Please don't do output in > > > > legacy encodings.) > > > > > > > > > > Henri, as far as nsIScriptableUnicodeConverter, do you have an opinion on > > > using the > > > unescape(encodeURIComponenet(str))<->decodeURIComponent(escape(str)) method > > > to do > > > the js<->cpp conversion, as proposed in Bug 1349722 patch? (I don't think > > > async is > > > important for the use case.) > > > > I think I'm missing some context for this question. Do you mean to perform a > > conversion between an UTF-16 JS string and a UTF-8 string represented as a > > JS string with each byte zero-extended into a 16-bit code unit > > Yes. It seems inefficient for that purpose. > > without > > actually caring about URL encoding? > > The context is that the xhr response is an xml doc and utf8 is the encoding, > and nothing is sniffed or otherwise detected explicitly for any other > encoding. Upon js (using UTF-16 internally) parsing of the doc, it is given > to xpcom as a |string| which carries no explicit (likely 8859-1 implied) > encoding info. The string is currently converted into the right char * bytes > using nsIScriptableUnicodeConverter.ConvertFromUnicode and UTF-8 as the > encoding. This doesn't make sense to me. XHR gives an AString (UTF-16) to XPCOM and ConvertFromUnicode takes UTF-16. There doesn't appear to be an 8-bit string between those steps. > > Or does the escaping and unescaping > > actually serve some URL-related purpose here? > > This seems to be part and parcel of using *codeURIComponenet, meaning it > doesn't work without that conversion. > > So the question is whether you're familiar with this method and whether it's > an acceptable alternative to writing some other conversion code as filed in > Bug 1353285. It seems excessively inefficient as an implementation for bug 1353285.

alta88

Comment 9

•

8 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #8) > (In reply to alta88 from comment #7) > > (In reply to Henri Sivonen (:hsivonen) from comment #6) > > > (In reply to alta88 from comment #5) > > > > (In reply to Henri Sivonen (:hsivonen) from comment #4) > > > > > (In reply to :aceman from comment #1) > > > > > > This is still used a lot in calendar, chat and mailnews. > > > > > > Can you please describe what the replacement is? > > > > > > > > > > Filed bug 1353285. (Covers only encode to UTF-8. Please don't do output in > > > > > legacy encodings.) > > > > > > > > > > > > > Henri, as far as nsIScriptableUnicodeConverter, do you have an opinion on > > > > using the > > > > unescape(encodeURIComponenet(str))<->decodeURIComponent(escape(str)) method > > > > to do > > > > the js<->cpp conversion, as proposed in Bug 1349722 patch? (I don't think > > > > async is > > > > important for the use case.) > > > > > > I think I'm missing some context for this question. Do you mean to perform a > > > conversion between an UTF-16 JS string and a UTF-8 string represented as a > > > JS string with each byte zero-extended into a 16-bit code unit > > > > Yes. > > It seems inefficient for that purpose. > > > > without > > > actually caring about URL encoding? > > > > The context is that the xhr response is an xml doc and utf8 is the encoding, > > and nothing is sniffed or otherwise detected explicitly for any other > > encoding. Upon js (using UTF-16 internally) parsing of the doc, it is given > > to xpcom as a |string| which carries no explicit (likely 8859-1 implied) > > encoding info. The string is currently converted into the right char * bytes > > using nsIScriptableUnicodeConverter.ConvertFromUnicode and UTF-8 as the > > encoding. > > This doesn't make sense to me. XHR gives an AString (UTF-16) to XPCOM and > ConvertFromUnicode takes UTF-16. There doesn't appear to be an 8-bit string > between those steps. > The xpcom function argument is cast as a const char * for the data string. > > > Or does the escaping and unescaping > > > actually serve some URL-related purpose here? > > > > This seems to be part and parcel of using *codeURIComponenet, meaning it > > doesn't work without that conversion. > > > > So the question is whether you're familiar with this method and whether it's > > an acceptable alternative to writing some other conversion code as filed in > > Bug 1353285. > > It seems excessively inefficient as an implementation for bug 1353285. First, thanks for taking the time to consider this. Measuring the difference between using encodeURIComponent vs. ConvertFromUnicode for the conversion, on a 65kb string (which is large for the use case), both show it taking 1ms. On smaller strings, encodeURIComponent may show 1ms while it takes less than that for ConvertFromUnicode such that it doesn't register out to ms. Given that 1) The upstream api provider is notorious for pulling apis without notice (not necessarily this api/case), 2) The new implementation isn't written, 3) The encodeURIComponent variants are standard spec since ECMA 5 and more.. reliable for a downstream consumer, whatever inefficiency may exist (fractions of ms) is meaningless for prudent technical risk management.

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Reporter

Updated

•

7 years ago

Depends on: 1444329

Kris Maglione [:kmag]

Updated

•

7 years ago

Updated

•

3 years ago

Performance Impact: --- → -

Whiteboard: [qf-]

Mark Banner (:standard8)

Updated

•

3 years ago

Depends on: 1761317

Magnus Melin [:mkmelin]

Updated

•

3 years ago

Depends on: 1762335

Mathew Hodson

Updated

•

2 years ago

Depends on: 1773535

Mathew Hodson

Updated

•

2 years ago

Depends on: 1773932

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Geoff Lankow (:darktrojan)

Updated

•

2 years ago

Depends on: 1803986

Mark Banner (:standard8)

Updated

•

2 years ago

Depends on: 1811339

Mark Banner (:standard8)

Comment 10

•

2 years ago

Just to note there are now < 30 instances of this accessed from JavaScript files in the tree. It may be worth a small drive to remove the rest if this is still desired.

Henri Sivonen (:hsivonen) (temporarily away from Bugzilla)

Reporter

Comment 11

•

2 years ago

I think it still makes sense to do this. A quick glance at some of the remaining uses indicates that the key mismatch arises from XPCOM streams not using ArrayBuffers, so if replacing XPCOM streams with Web Platform Streams is too large a prerequisite step, an intermediate step would be making XPCOM streams able to deal with bytes as ArrayBuffers.

Gregory Pappas [:gregp]

Updated

•

2 years ago

Depends on: 1811994

Mark Banner (:standard8)

Updated

•

1 year ago

Depends on: 1851797

Mark Banner (:standard8)

Updated

•

1 year ago

No longer depends on: 923017

Mark Banner (:standard8)

Updated

•

11 months ago

Depends on: 1861645