I wrote a native Unicode converter for windows ce which also works on Windows 32. This allows us to remove alot of code+data from firefox (on ARM i saved .67MB). This might be something we could enable on the trunk since we have dropped Windows 98 support. To create a build, just add this like to your mozconfig: ac_add_options --enable-native-uconv The code for this converter lives here: http://lxr.mozilla.org/mozilla/source/intl/uconv/native/nsWinCEUConvService.cpp
This is what I sent to Doug in response to his email seeking my opinion (with emphasis added around 'consistently') I'm not sure what you had in mind when you wrote 'This might be something we could enable on the trunk since we have dropped Windows 98 support.'. When we interact with the OS, we rely on OS APIs (nsNativeCharsetUtils.cpp), but we need our converters (unless there's an acute need to save memory footprint as in minimo) when dealing with web pages/forms/mail messages that come from outside and that we send out to the wild because our converters do more than what the OS APIs can do and do things a little differently. They also enable us to handle all sorts of character encodings **consistently** across platforms. In short, I don't see any connection between dropping support for Win98 and using the native uconv. Please, 'enlighten' me if I'm missing anything and I'd be glad to stand corrected.
(In reply to comment #1) > In short, I don't see any connection between dropping support for > Win98 and using the native uconv. Ok. I can see some connections in that WideCharToMultiByte and MultibytoWideChar on Win 2k or later support a lot more encodings than on Win 9x/ME. Still, I don't like some of their converters and I prefer ours to theirs. Moreover, I'm loath to give up the consistency across platforms. Neither do I like to let go our control over the way incoming data stream is interpreted and outgoing data is encoded.
thanks for your response. are there any encoding/decoding tests that we can run to see if there are significant differences between the native uconv and the mozilla converters?
(In reply to comment #3) > are there any encoding/decoding tests that we can > run to see if there are significant differences between the native uconv and > the mozilla converters? http://smontagu.damowmow.com/encodingtest.html
I don't like using native uconv too. If the native uconvs don't have compatibility on each versions of Windows (including future releases), we are not happy...
maybe this isn't so much "I am demanding that we use native uconv", but rather "is there a way to reduce code+data bloat in the uconv code. your helps and ideas in making improvements in this area are important. What can we do?
By the way, does MultibyteToWideChar emit UTF-16 or UCS-2? Testcase: http://www.i18nguy.com/unicode-plane1-utf8.html
MultibyteToWideChar emits UTF-16 from Windows 2000 upward. It might break on the 9x, but it will be very hard to display non BMP characters on those anyway. 670 Kb just for encodings is really a lot. It might be worth thinking of a way to do better on this point.
dougt: what sort of performance improvement do you see from this change? is it just a codesize savings? does that translate to performance in this case? i tend to agree with jshin+masayuki+smontagu. i18n consistency across ff builds is important, so this change sounds risky.
i have not measured perf. I will post a engineering build shortly. Also, I am not avocating breaking consistency for the sake of it. See comment #6.
not sure what it means. Build running native uconv against the tests yields these failures. iso-8859-3 28 codepoint(s) failed iso-8859-6 48 codepoint(s) failed iso-8859-7 5 codepoint(s) failed iso-8859-8 32 codepoint(s) failed iso-8859-10 46 codepoint(s) failed iso-8859-11 87 codepoint(s) failed iso-8859-13 56 codepoint(s) failed iso-8859-14 32 codepoint(s) failed iso-8859-16 40 codepoint(s) failed Shift-JIS 3 codepoint(s) failed windows-936 5 codepoint(s) failed I didn't run anything past the Windows-949 Korean testcase.
A build with --enable-native-uconv doesn't get a scriptable unicode converter, which is pretty essential for various code. Lack of it completely breaks ChatZilla and Venkman, for example (bug 327835, bug 327827).
(In reply to comment #11) > Shift-JIS 3 codepoint(s) failed > windows-936 5 codepoint(s) failed Native uconv randomly fails when converting multibyte charset because MultiByteToWideChar can't hold the state. At least, you will have to use IMultiLanguage. (I don't know whether WinCE supports IMultiLanguage)
(In reply to comment #13) > At least, you will have to use IMultiLanguage. (I don't know whether WinCE > supports IMultiLanguage) That's not a good idea, either if it means we have to rely on the presence of MS IE. It seems like the trunk build already does (it uses Mlang), but IMHO, we should try to get rid of that dependency.
On trunk the minimum system requirements are Win2k, so we can rely on any components which are shipped with that.
ideally, this code would only depend on stuff that windows ce 4.2 would have so that I don't get left out implementing something on my own.
IMultiLanguage requires Windows CE .NET 4.0 and later per MSDN. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wceielng/html/cerefimultilanguageiunknown.asp Nonetheless if we don't use IMultiLanguage, I propose WONTFIXing this bug. 1. Both of intl owners opposed using native uconv. 2. Saving code size will be little unless we use native uconv for multibyte charsets (namely GBK, UHC, and Shift_JIS).
(In reply to comment #15) > On trunk the minimum system requirements are Win2k, so we can rely on any > components which are shipped with that. What I was implying was that if we rely on MS IE's presence (in this case 'Mlang'), the whole point of developing firefox is sort of moot in a sense. To me, for us to depend on Mlang looks like MS IE depending on our intl library.
if you mark WONTFIX, please reopen a new bug to address my comment #6.
I opened bug 336553.
mass reassigning to nobody.
Native uconv is gone.