332646 - Investigate using native uconv on Win32

Reporter

Description

•

19 years ago

I wrote a native Unicode converter for windows ce which also works on Windows 32. This allows us to remove alot of code+data from firefox (on ARM i saved .67MB). This might be something we could enable on the trunk since we have dropped Windows 98 support. To create a build, just add this like to your mozconfig: ac_add_options --enable-native-uconv The code for this converter lives here: http://lxr.mozilla.org/mozilla/source/intl/uconv/native/nsWinCEUConvService.cpp

Jungshik Shin

Comment 1

•

19 years ago

This is what I sent to Doug in response to his email seeking my opinion (with emphasis added around 'consistently') I'm not sure what you had in mind when you wrote 'This might be something we could enable on the trunk since we have dropped Windows 98 support.'. When we interact with the OS, we rely on OS APIs (nsNativeCharsetUtils.cpp), but we need our converters (unless there's an acute need to save memory footprint as in minimo) when dealing with web pages/forms/mail messages that come from outside and that we send out to the wild because our converters do more than what the OS APIs can do and do things a little differently. They also enable us to handle all sorts of character encodings **consistently** across platforms. In short, I don't see any connection between dropping support for Win98 and using the native uconv. Please, 'enlighten' me if I'm missing anything and I'd be glad to stand corrected.

Jungshik Shin

Comment 2

•

19 years ago

(In reply to comment #1) > In short, I don't see any connection between dropping support for > Win98 and using the native uconv. Ok. I can see some connections in that WideCharToMultiByte and MultibytoWideChar on Win 2k or later support a lot more encodings than on Win 9x/ME. Still, I don't like some of their converters and I prefer ours to theirs. Moreover, I'm loath to give up the consistency across platforms. Neither do I like to let go our control over the way incoming data stream is interpreted and outgoing data is encoded.

Doug Turner (:dougt)

Reporter

Comment 3

•

19 years ago

thanks for your response. are there any encoding/decoding tests that we can run to see if there are significant differences between the native uconv and the mozilla converters?

Simon Montagu :smontagu

Comment 4

•

19 years ago

(In reply to comment #3) > are there any encoding/decoding tests that we can > run to see if there are significant differences between the native uconv and > the mozilla converters? http://smontagu.damowmow.com/encodingtest.html

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Comment 5

•

19 years ago

I don't like using native uconv too. If the native uconvs don't have compatibility on each versions of Windows (including future releases), we are not happy...

Doug Turner (:dougt)

Reporter

Comment 6

•

19 years ago

maybe this isn't so much "I am demanding that we use native uconv", but rather "is there a way to reduce code+data bloat in the uconv code. your helps and ideas in making improvements in this area are important. What can we do?

Simon Montagu :smontagu

Comment 7

•

19 years ago

By the way, does MultibyteToWideChar emit UTF-16 or UCS-2? Testcase: http://www.i18nguy.com/unicode-plane1-utf8.html

Jean-Marc Desperrier

Comment 8

•

19 years ago

MultibyteToWideChar emits UTF-16 from Windows 2000 upward. It might break on the 9x, but it will be very hard to display non BMP characters on those anyway. 670 Kb just for encodings is really a lot. It might be worth thinking of a way to do better on this point.

Darin Fisher

Comment 9

•

19 years ago

dougt: what sort of performance improvement do you see from this change? is it just a codesize savings? does that translate to performance in this case? i tend to agree with jshin+masayuki+smontagu. i18n consistency across ff builds is important, so this change sounds risky.

Doug Turner (:dougt)

Reporter

Comment 10

•

19 years ago

i have not measured perf. I will post a engineering build shortly. Also, I am not avocating breaking consistency for the sake of it. See comment #6.

Doug Turner (:dougt)

Reporter

Comment 11

•

19 years ago

not sure what it means. Build running native uconv against the tests yields these failures. iso-8859-3 28 codepoint(s) failed iso-8859-6 48 codepoint(s) failed iso-8859-7 5 codepoint(s) failed iso-8859-8 32 codepoint(s) failed iso-8859-10 46 codepoint(s) failed iso-8859-11 87 codepoint(s) failed iso-8859-13 56 codepoint(s) failed iso-8859-14 32 codepoint(s) failed iso-8859-16 40 codepoint(s) failed Shift-JIS 3 codepoint(s) failed windows-936 5 codepoint(s) failed I didn't run anything past the Windows-949 Korean testcase.

Rob Marshall [tH]

Comment 12

•

19 years ago

A build with --enable-native-uconv doesn't get a scriptable unicode converter, which is pretty essential for various code. Lack of it completely breaks ChatZilla and Venkman, for example (bug 327835, bug 327827).

Masatoshi Kimura [:emk]

Comment 13

•

19 years ago

(In reply to comment #11) > Shift-JIS 3 codepoint(s) failed > windows-936 5 codepoint(s) failed Native uconv randomly fails when converting multibyte charset because MultiByteToWideChar can't hold the state. At least, you will have to use IMultiLanguage. (I don't know whether WinCE supports IMultiLanguage)

Jungshik Shin

Comment 14

•

19 years ago

(In reply to comment #13) > At least, you will have to use IMultiLanguage. (I don't know whether WinCE > supports IMultiLanguage) That's not a good idea, either if it means we have to rely on the presence of MS IE. It seems like the trunk build already does (it uses Mlang), but IMHO, we should try to get rid of that dependency.

Benjamin Smedberg

Comment 15

•

19 years ago

On trunk the minimum system requirements are Win2k, so we can rely on any components which are shipped with that.

Doug Turner (:dougt)

Reporter

Comment 16

•

19 years ago

ideally, this code would only depend on stuff that windows ce 4.2 would have so that I don't get left out implementing something on my own.

Masatoshi Kimura [:emk]

Comment 17

•

19 years ago

IMultiLanguage requires Windows CE .NET 4.0 and later per MSDN. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wceielng/html/cerefimultilanguageiunknown.asp Nonetheless if we don't use IMultiLanguage, I propose WONTFIXing this bug. 1. Both of intl owners opposed using native uconv. 2. Saving code size will be little unless we use native uconv for multibyte charsets (namely GBK, UHC, and Shift_JIS).

Jungshik Shin

Comment 18

•

19 years ago

(In reply to comment #15) > On trunk the minimum system requirements are Win2k, so we can rely on any > components which are shipped with that. What I was implying was that if we rely on MS IE's presence (in this case 'Mlang'), the whole point of developing firefox is sort of moot in a sense. To me, for us to depend on Mlang looks like MS IE depending on our intl library.

Doug Turner (:dougt)

Reporter

Comment 19

•

19 years ago

if you mark WONTFIX, please reopen a new bug to address my comment #6.

Simon Montagu :smontagu

Comment 20

•

19 years ago

I opened bug 336553.

Doug Turner (:dougt)

Reporter

Comment 21

•

18 years ago

mass reassigning to nobody.

Assignee: dougt → nobody

Phil Ringnalda (:philor)

Updated

•

16 years ago

QA Contact: amyy → i18n

Simon Montagu :smontagu

Updated

•

14 years ago

Depends on: 644801

Simon Montagu :smontagu

Comment 22

•

14 years ago

Native uconv is gone.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → WONTFIX

Bugzilla

Investigate using native uconv on Win32

Categories

(Core :: Internationalization, defect)

Tracking

()

People

(Reporter: dougt, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Comment 18

Comment 19

Comment 20

Comment 21

Updated

Updated

Comment 22