206811 - xp_iconv in nsNativeCharsetutils.cpp should use UTF-16 if available

Assignee

Description

•

21 years ago

xp_iconv in nsNativeCharsetUtils.cpp uses UCS-2, but *PRUnichar is not a string
of UCS2's but UTF-16 string. [1] Therefore, we should try UTF-16 before trying
UCS-2. 
At least, glibc, Solaris, HP/UX 10/11 and Tru64 [2] use 'UTF-16' for the native
endian UTF-16. I haven't yet figured out what AIX and IRIX do about it. [3]

[1] 
http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsNativeCharsetUtils.cpp#177
[2]
http://docs.sun.com/db/doc/806-5584/6jej8rb0u?a=view
http://h30097.www3.hp.com/docs/porting/tru64-to-hpux/CHP18NXX.HTM

[3]http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixbman/admnconc/convert.htm

Jungshik Shin

Assignee

Comment 1

•

21 years ago

Attached patch a patch (obsolete) — Details — Splinter Review

Darin Fisher

Comment 2

•

21 years ago

Comment on attachment 124018 [details] [diff] [review]
a patch

>+static const char *UTF_16_NAMES[] = {
>+    "UTF-16",     
>     "UCS-2",
>     "UCS2",
>     "UCS_2",

this patch makes good sense to me, but do we need to worry about
other variants of UTF-16 .. such as UTF16, UTF_16, utf-16, utf16, 
and utf_16?  i know it almost seems rediculous, but i can't really
say for sure.

also, how about replacing all instances of "ucs2" with "utf16"
e.g., ucs2_to_isolatin1 -> utf16_to_isolatin1

Jungshik Shin

Assignee

Comment 3

•

21 years ago

As for replacing all instances of "ucs2" with "utf16", I'll do.

As for UTF-16 variants, I'm not sure. It seems like every platform that supports
UTF-16 uses "UTF-16". However, I guess it doesn't hurt to try other variants
because it's done only once at the init, right?

Jungshik Shin

Assignee

Comment 4

•

21 years ago

darin, 
Can you also comment on my posting 
'news://news.mozilla.org:119/bajml3$fvr1@ripley.netscape.com'? I thought I was
ringing a false alram, but I just realized that Linux does not use iconv(3) but
uses wcrtomb and mbrtowc assuming that wchar_t is UTF-16 string, of which I'm
not sure.

Status: NEW → ASSIGNED

Darin Fisher

Comment 5

•

21 years ago

actually, i'm not sure why we'd want to ever prefer wcrtomb/mbrtowc over iconv
on linux.  i know that under OSX, nl_langinfo does not seem to exist, but the
nsNativeCharsetUtils methods aren't invoked until OSX IIRC.

Ulrich Drepper

Comment 6

•

21 years ago

On Linux it is *never* true that wchar_t == UCS2 or UTF-16.  With the
-fshort-wchar option one can make gcc define wchar_t as a two-byte value but
this doesn't magically the runtime.  In fact, when using this option and the
normal runtime for wide character handling you guarantee certain death for your
program.

Everybody who uses this option must be aware of this.  And s/he must be aware of
the fact that the so created fake wchar_t type violates ISO C.  UTF-16 is no
valid encoding for wchar_t and UCS2 is obviously insufficient.

If somebody insists on using -fshort-wchar s/he must make sure to never use any
of the wide character interfaces in the runtime.  It is possible to use iconv()
but even then the pseudo encoding name "WCHAR_T" must not be used.  It is then
necessary to explicitly use "UTF-16" or so.

Jungshik Shin

Assignee

Comment 7

•

21 years ago

Attached patch a new patch (including switching to iconv under Linux) — Details — Splinter Review

Darin and I decided to switch over to iconv for Linux after discussing the
issue off-line (newsgroups/mail) as well as here with Ulrich and Bruno.

Attachment #124018 - Attachment is obsolete: true

Jungshik Shin

Assignee

Comment 8

•

21 years ago

Before leaving for the vacation, Darin wrote to me the following:
<quote>
jshin: i am going to be away on vacation for the next 2 weeks, in which
case i most likely will not get a chance to make this change in time for
1.4 final.  if you think we should try to make this change for 1.4 final
(sounds to me like we should), then you ought to get in touch with
drivers@mozilla.org to discuss it with them.  be sure to run some
performance tests (start up time is probably going to be most sensitive)
so they can be aware of all the implications of this change.  thanks!
</quote>

I agree with him that we should fix this both in trunk and 1.4f branch because
1.4f branch is likely to be a long-lived one. I measured the start-up time with
and without the patch (attachment 124092 [details] [diff] [review]) and I found that the one with the
patch takes slightly shorter than otherwise. (0.095 vs 0.097 on 10-time average.
the std. deviation is virtually zero unless I wait long enough between runs).

Jungshik Shin

Assignee

Updated

•

21 years ago

Attachment #124092 - Flags: superreview?(alecf)

Attachment #124092 - Flags: review?(ftang)

Alec Flett

Comment 9

•

21 years ago

Comment on attachment 124092 [details] [diff] [review]
a new patch (including switching to iconv under Linux)

looks good. sr=alecf

Attachment #124092 - Flags: superreview?(alecf) → superreview+

Jungshik Shin

Assignee

Comment 10

•

21 years ago

Setting the target to 1.4final. Frank, can you review?

Target Milestone: --- → mozilla1.4final

Jungshik Shin

Assignee

Comment 11

•

21 years ago

Comment on attachment 124092 [details] [diff] [review]
a new patch (including switching to iconv under Linux)

asking smontagu for r.

Attachment #124092 - Flags: review?(ftang) → review?(smontagu)

Simon Montagu :smontagu

Comment 12

•

21 years ago

Comment on attachment 124092 [details] [diff] [review]
a new patch (including switching to iconv under Linux)

I can't say I know enough about iconv to review this properly, but since (from
comment 7) it looks like people who do know have signed off on the approach, I
am happy to give r=smontagu

Attachment #124092 - Flags: review?(smontagu) → review+

Benjamin Smedberg

Comment 13

•

21 years ago

This appears to have increased codesize by 3.4k on various linux platforms. Was
this expected; are we statically linking the iconv libraries?

Jungshik Shin

Assignee

Comment 14

•

21 years ago

> This appears to have increased codesize by 3.4k on various linux platforms.

Thank you for alerting me about the codesize I missed. 

I realize that USE_ICONV has more code than USE_STDCONV. 

 We can cut down the increase by half by NOT defining
ENABLE_UTF8_FALLBACK_SUPPORT. For sure, iconv(3) in glibc doesn't need that
which is for iconv(3) on Solaris. On Solaris, iconv(3) has to go through UTF-8
for most combinations of to/from codesets. 

Removing 'inline' from xp_iconv_reset helped me very little (100bytes) while
removing 'inline' from xp_iconv_open and xp_iconv didn't change the size at all
(they are too complex to be inlined with gcc 3.x with -O1). 

I can squeeze out additional ~300 bytes removing unnecessary array elements. The
final numbers are:

 - old       : 898056 bytes
 - current   : 902504 bytes   : +4448
 - w/ patch  : 899528 bytes   : +1472

Jungshik Shin

Assignee

Comment 15

•

21 years ago

Attached patch a patch to reduce the size (obsolete) — Details — Splinter Review

This patch doesn't look pretty because it's littered with '#if
!defined(__linux)'. Anyway, with this patch, I cut down the code size increase
(libxpcom.so on ix86) to 1.2kB.

Jungshik Shin

Assignee

Comment 16

•

21 years ago

I forgot to mention that I removed 'inline' in xp_iconv_reset, which saved me
only about 190 bytes. Because I removed the function call overhead on Linux in
other places (plain iconv_open() is used in place of xp_iconv_open), I don't
think there's any slow down in the startup. If it's a worry, perhaps we can just
put back inline for 190 byte size increase (on ix86, gcc 3.x, -O1...) 

BTW, I assumed that Linux binary is always compiled AND run with glibc 2.x or
GNU libiconv, which is very  unlikely to break and which enabled me to use
'UTF-16' and 'ISO-8859-1' directly in iconv_open(). Benjamin, I guess you're
concerned with embedded devices. Is my assumption valid there?

Benjamin Smedberg

Comment 17

•

21 years ago

jshin, I am not actually an embedder. I was just made very conscious of codesize
issues while working with alecf, and looking a tbox it seemed a small change had
large codesize implications. If that was unexpected, I just wanted to bring it
to your attention. I don't have any particular need/desire for a fix.

Alec Flett

Comment 18

•

21 years ago

But I'd like to see this fix go in if in fact the extra code isn't needed on
linux... however, is there any way to detect at runtime what systems do and
don't need it, as opposed to choosing linux vs. not linux? i.e. is there some
kind of test we can add to configure.in to detect this case?

Jungshik Shin

Assignee

Comment 19

•

21 years ago

> is there any way to detect at runtime what systems do and
> don't need it, as opposed to choosing linux vs. not linux? i.e. is 
> there some kind of test we can add to configure.in to detect this 
> case?

There may be. Adding seawood to CC.

We might be able to detect the codeset names for UTF-16, UTF-8, and ISO-8859-1
with some tweaking in configure.in. Then, we can reduce the codesize on all
Unix(-like) platforms. When I wrote about this possibility for another program,
some people had a reservation about it because what's detected at the
compile-time is not always the case at the run-time because iconv(3) is
available in a separate standalone libiconv and people often install it on
non-Linux systems if they're not satisfied with the stock version. 

On Linux (barring possible exceptions on embedded devices unknown to me. adding
dougt for this), the possibility is almost zero because there's no reason to
replace the stock iconv(3) with something else. It just occurrs to me that some
embedded devices (Linux) use a lightweight C library other than glibc. I'm not
sure whether it lacks iconv(3) altogether, in which case we don't have to worry
because 'HAVE_ICONV' is set to false. A tricky case is vanilla iconv(3) that
doesn't support UTF-16.  

BTW, Ulrich, is there any known glibc (configuration option) that doesn't
support UTF-16 but supports UCS-2? This question is also relevant to native
uconv (http://lxr.mozilla.org/mozilla/source/intl/uconv/native/).

If all these are regarded as too much headache for about 800bytes, I think we
can just go with #ifdef'ing out 'UTF-8 fallback' code for Linux. That'll save us
about 2.4kB (out of 4.5kB introduced by my check-in) instead of 3.2kB.

Boris Zbarsky [:bzbarsky]

Comment 20

•

21 years ago

I think this checkin caused blocker bug 208809

Ulrich Drepper

Comment 21

•

21 years ago

> BTW, Ulrich, is there any known glibc (configuration option) that doesn't
> support UTF-16 but supports UCS-2?

UCS-2 handling is built in libc, UTF-16 is a module which can be removed.

Darin Fisher

Comment 22

•

21 years ago

>If all these are regarded as too much headache for about 800bytes, I think we
>can just go with #ifdef'ing out 'UTF-8 fallback' code for Linux. That'll save us
>about 2.4kB (out of 4.5kB introduced by my check-in) instead of 3.2kB. 

yeah, something like that is probably sufficient.

as for the other tweaks, since you have done them... maybe worth making the
changes, but in general there are bigger fish to fry ;-)

i think ulrich's comment suggests that we should fallback on UCS-2 if no UTF-16
converter exists.

Jungshik Shin

Assignee

Updated

•

21 years ago

Blocks: 208809

Jungshik Shin

Assignee

Comment 23

•

21 years ago

Attached patch a new patch to fix make it work on RH 6.2 (obsolete) — Details — Splinter Review

This is the first of two patches. Here I'm explictly checking whether we need
byte swapping or not and byte-swap if necessary. I had to cast away constness,
which can be dangerous. I also disabled UTF8FALLBACK on linux but had to put
back the dummy conversion for BOM check because glibc 2.2.x also puts BOM for
UTF-16 and it's also necessary to check the endianness of UTF-16. With this,
the size reduction compared with the trunk is 1.3kB.

Jungshik Shin

Assignee

Comment 24

•

21 years ago

Attached patch patch #2 to fix bug 208809 and to cut down the size (obsolete) — Details — Splinter Review

Here I'm following Ulrich's suggestion in bug 208809. Instead of using UTF-16
(which is 'native' endian on recent glibc but is BE in old glibc regardless of
the native endianness), UTF-16LE/UTF-16BE are used depending on the endianness.
One problem with this is that UTF-16LE/BE are not supported by glibc 2.1.x (at
least they're not in glibc 2.1.3 on RH 6.2 ). As a fallback, I'm using
'UNICODELITTLE' and 'UNICODEBIG' that are present both in glibc 2.1.x and
recent glibc.

The size reduction is about 2.8kB. Although we can't support nob-BMP on Linux
with old glibc with this patch, I think this patch is better than attachment
125292 [details] [diff] [review]. There're a couple of reasons. Firstly, I don't feel comfortable casting
away const (I guess it can crash Mozilla). And, it's smaler and faster (no need
to byte-swap). With attachment 125292 [details] [diff] [review], we might  be byteswapping twice, in
glibc and in our code. 

BTW, I blocked the dummy conversion to get rid of 'BOM"  on linux (because we
now explicitly specify the endianness), but that saves us only about 100bytes.
To be safe, I'd rather keep that code in.

Ulrich Drepper

Comment 25

•

21 years ago

> Instead of using UTF-16
> (which is 'native' endian on recent glibc but is BE in old glibc regardless of
> the native endianness),

That's wrong.  If you use UTF-16 and you provide no BOM big endian is used.  But
it is not correct to use UTF-16 without BOM.  So using the explicit names is
*always* correct, regardless of platform.

Jungshik Shin

Assignee

Comment 26

•

21 years ago

> That's wrong.  If you use UTF-16 and you provide no BOM big endian is used. 
But it is not correct to use UTF-16 without BOM.  

  Let's put aside difference in interpreting UTF-16 and focus on how different
versions of glibc behave.  I didn't think about converting from UTF-16. 

1. In recent glibc, when converting to 'UTF-16',  the output is prepended with
BOM to indicate the native endian in the first invocation and the *native*
endian is used.

2.  When converting from UTF-16, the absence of BOM is regarded as indicating BE
regardless of the native endianness. 

3.  in old glibc, BOM was not prepended to the output and BE (not the native
endian) is used everywhere when converting to UTF-16.

#1 is what I observed and #3 was inferred from bz's test result in bug 208809. 

As for #2, is that what you meant? My test with glibc 2.2.93 on ix86 tells me a
different story. With or without BOM, "\x20\x21" is interpreted as U+2120 by
iconv when converting _from_ "UTF-16". That is, both '\xff\xfe\x20\x21' and
'\x20\x21' gave me the same result (interpreted as LE). Has there been a change
recently (2.3.x)?  

> So using the explicit names is *always* correct, regardless of 
> platform.

 No doubt about that. As I wrote, one problem with that is that UTF-16LE and
UTF-16BE are not available in old glibc. Anyway, using UNICODELITTLE and
UNICODEBIG as fallback at the cost of losing non-BMP support is reasonable.

tenthumbs

Comment 27

•

21 years ago

Glibc has /usr/include/byteswap.h which has optimized swapping 
functions. You should use them where available.

Jungshik Shin

Assignee

Comment 28

•

21 years ago

Comment on attachment 125292 [details] [diff] [review]
a new patch to fix make it work on RH 6.2

This is not well tested nor safe. (crashes on bz's RH 6.2). Let's go with the
other one.

Attachment #125292 - Attachment is obsolete: true

Jungshik Shin

Assignee

Comment 29

•

21 years ago

> using UNICODELITTLE and UNICODEBIG as fallback at the cost of 
> losing non-BMP support in old glibc is reasonable.

Darin (Simon, and Alec) , what do you think? I think attachment 125293 [details] [diff] [review] (except
for not so pretty '__linux') is a reasonable compromise. If you agree, pls give
r/sr to the patch so that I can remove the blocker (bug 208809).

Jungshik Shin

Assignee

Comment 30

•

21 years ago

I've done some tests (direct and indirect) and found that UTF-16LE and UTF-16BE
work under glibc 2.2.x, Solaris 8 or newer, and Tru64 (Compaq - HP) V5. Under
Tru64 (both V4 and V5), the meaning of UTF-16 is dependent on the environment
variable ICONV_BYTEORDER, which is yet another reason to use UTF-16LE (because
alpha is LE). By default, 'UTF-16' in Tru64 is interpreted as LE (platform
endian). Unfortunately, Tru64 V4 doesn't support 'UTF-16LE' so that it'd break
Mozilla if ICONV_BYTEORDER is set to 'big-endian', but there's not much we can
do other than providing a shell script wrapper. Solaris 7 and IRIX 6 (both BE.
MIPS can be run as LE as well, but SGI machines are BE) interpret UTF-16 as BE
and emit UTF-16BE for 'UTF-16'. 
As mentioned earlier, for glibc 2.1.x, we need to use UNICODELITTLE
and UNICODEBIG because 'UTF-16' is always interpreted as BE by glibc 2.1.x and
UTF-16LE/UTF-16BE are not supported. 

Based on these and the fact that we have used UCS-2 and variants so far (AIX,
HP/UX and other Unix), I think we can use the following for  
UTF_16_NAMES[]

+#if defined(IS_LITTLE_ENDIAN)
+    "UTF-16LE",
+#if defined(__linux)
+    "UNICODELITTLE",
+#endif
+    "UCS-2LE",
+#else
+    "UTF-16BE",
+#if defined(__linux)
+    "UNICODEBIG",
+#endif
+    "UCS-2BE",
+#endif
     "UTF-16",
     "UCS-2",
     "UCS2",

Darin Fisher

Comment 31

•

21 years ago

jshin: that sounds reasonable to me.

Jungshik Shin

Assignee

Comment 32

•

21 years ago

Attached patch a new patch doing what I wrote (obsolete) — Details — Splinter Review

Darin, thanks. Here's the patch implementing that. Can I get review? Alec and
Simon, please feel free to review :-)

This is basically the same as attachment 125293 [details] [diff] [review] except that UTF16_NAMES[] is
changed as I wrote and I kept the block removing BOM in the first invocation of
iconv(3) for linux (that is, I got rid of '#if !defined(__linux)' around that
code.).

Attachment #124889 - Attachment is obsolete: true

Attachment #125293 - Attachment is obsolete: true

Jungshik Shin

Assignee

Comment 33

•

21 years ago

Just in case we need to refer to them later..Let me add a note that I used 
attachment 125288 [details] (to bug  208809) and attachment   125485 [details] (to bug 209048) ( 
for Solaris that doesn't like 4 NULL's in iconv, need to do what Mozilla  does 
in nsNativeCharsetUtils.cpp along with 'const char **' change) to try iconv(3) 
on a few platforms.

Jungshik Shin

Assignee

Comment 34

•

21 years ago

Comment on attachment 125560 [details] [diff] [review]
a new patch doing what I wrote


>+     * We also use it to determine whether or not 'UTF-16'/'UCS-2' is 
>+     * interpreted as native endian. If it's interpreted as BE on LE,
>+     * we need byte-swapping.

The above comment should have not been included.  I got rid of it in my tree.

I've been waiting for bug 209048 to be resolved. Now that it can be fixed by a
Solaris patch, I'm asking for r/sr.

Attachment #125560 - Flags: superreview?(darin)

Attachment #125560 - Flags: review?(smontagu)

Darin Fisher

Comment 35

•

21 years ago

Comment on attachment 125560 [details] [diff] [review]
a new patch doing what I wrote

>Index: xpcom/io/nsNativeCharsetUtils.cpp

>+// just in case... but we  know for sure that iconv(3) in glibc for Linux

nit:		    "but we know for"

>+// doesn't need this.
>+#if !defined(__linux)

is there a GLIBC specified #define that we should test instead?


>-// PRUnichar[] is NOT a UCS-2 array BUT for UTF-16 string. Therefore, we
>-// have to use UTF-16 with iconv(3) on platforms where it's supported.
>-// We could list 'UTF-16' name variants, but all platforms known (to me) to 
>-// support UTF-16 in iconv(3) uses 'UTF-16'. Let me know (jshin) if there's an
>-// exception. (bug 206811)
>+/* 


>+ * PRUnichar[] is NOT a UCS-2 array BUT for a UTF-16 string. Therefore, we

nit: PRUnichar[] is NOT a UCS-2 array BUT a UTF-16 array.

>+ * have to use UTF-16 with iconv(3) on platforms where it's supported.
>+ * However, the way UTF-16 and UCS-2 are  interpreted varies across platforms 

nit: However, the way UTF-16 and UCS-2 are interpreted varies across platforms 


>+ * and  implementations of iconv(3). On Tru64, it also depends on the environment
>+ * variable.  To avoid the trouble arising  from byte-swapping 

nit: variable.	To avoid the trouble arising from byte-swapping 


> static const char *UTF_16_NAMES[] = {
>+#if defined(IS_LITTLE_ENDIAN)
>+    "UTF-16LE",
>+#if defined(__linux)
>+    "UNICODELITTLE",
>+#endif
>+    "UCS-2LE",
...
>     "UTF-16",
>     "UCS-2",

so it's never the case that we want to try "UTF-16" before "UCS-2LE"?


>+     * the first use of the iconv converter. The same is the case 
>+     * case of glibc 2.2.9x and Tru64 V5 (see bug 208809)

nit: "case case"


>+     * when 'UTF-16' is used. However, we use 'UTF-16LE/BE' in both
>+     * cases, instead  so that we should be safe. But just in case...

nit:   * cases, instead so that we should be safe. But just in case...

>      * This dummy conversion gets rid of the BOMs and fixes bugid 153562.

nit: s/bugid/bug/

Attachment #125560 - Flags: superreview?(darin) → superreview-

Jungshik Shin

Assignee

Comment 36

•

21 years ago

> so it's never the case that we want to try "UTF-16" before "UCS-2LE"?

I guess what's in 125560 is not 100% correct, but safer than the following.    

--------------
static const char *UTF_16_NAMES[] = {
#if defined(IS_LITTLE_ENDIAN)
    "UTF-16LE",
#if defined(__linux)
    "UNICODELITTLE",
#endif
#else
    "UTF-16BE",
#if defined(__linux)  // or if defined(__GLIBC__)
    "UNICODEBIG",
#endif
#endif
    "UTF-16",
#if defined(IS_LITTLE_ENDIAN)
    "UCS-2LE",
#else
    "UCS-2BE",
#endif
    "UCS-2",
---------------

In reality, I think there's no difference unless there's an iconv implementation
that works the way glibc 2.1.x does. That is, we're in trouble if the
implementation interprets UTF-16 as UTF-16BE on a LE machine  but does not
recognize 'UTF-16LE'. Do you know of any implementation that does? 

Anyway, which one do you like?  

>>+#if !defined(__linux)

>is there a GLIBC specified #define that we should test instead?

  Perhaps '__GLIBC__' or '__GNU_LIBRARY__' (deprecated)?

Darin Fisher

Comment 37

•

21 years ago

>In reality, I think there's no difference unless there's an iconv 
>implementation that works the way glibc 2.1.x does. That is, we're in trouble 
>if the implementation interprets UTF-16 as UTF-16BE on a LE machine  but does 
>not recognize 'UTF-16LE'. Do you know of any implementation that does? 

no, i don't know of any such implementations...


>Anyway, which one do you like?  

perhaps your first way was better since it'd be better... i defer to your best
judgement.  we can of course always fix it if we discover a problem on some
platform ;-)


>>>+#if !defined(__linux)
>
>>is there a GLIBC specified #define that we should test instead?
>
>  Perhaps '__GLIBC__' or '__GNU_LIBRARY__' (deprecated)? 

yeah, let's use __GLIBC__ instead since it's possible someone on another OS
might be using glibc as well.

Jungshik Shin

Assignee

Comment 38

•

21 years ago

Attached patch a new patch with nits taken care of and __linux replaced by __GLIBC__ — Details — Splinter Review

I decided to do the same as in attachment 125560 [details] [diff] [review]. So in terms of the actual
code, this is the same as attachment 125560 [details] [diff] [review] except that __GLIBC__ is used
instead of __linux.

Attachment #125560 - Attachment is obsolete: true

Jungshik Shin

Assignee

Updated

•

21 years ago

Attachment #125560 - Flags: review?(smontagu)

Jungshik Shin

Assignee

Updated

•

21 years ago

Attachment #127768 - Flags: superreview?(darin)

Attachment #127768 - Flags: review?(smontagu)

Darin Fisher

Comment 39

•

21 years ago

Comment on attachment 127768 [details] [diff] [review]
a new patch with nits taken care of and  __linux replaced by __GLIBC__

sr=darin

Attachment #127768 - Flags: superreview?(darin) → superreview+

Jungshik Shin

Assignee

Comment 40

•

21 years ago

Ping :-) Simon, can I get r? Or shall I ask someone else?

Simon Montagu :smontagu

Comment 41

•

21 years ago

Probably better ask someone else since, as I said before in comment 12, I really
know almost nothing about iconv.

Jungshik Shin

Assignee

Comment 42

•

21 years ago

Comment on attachment 127768 [details] [diff] [review]
a new patch with nits taken care of and  __linux replaced by __GLIBC__

Ulrich, can you review? You're the authority on iconv :-)

Attachment #127768 - Flags: review?(smontagu) → review?(drepper)

Ulrich Drepper

Comment 43

•

21 years ago

Comment on attachment 127768 [details] [diff] [review]
a new patch with nits taken care of and  __linux replaced by __GLIBC__

The patch looks fine.  glibc will never use the BOM if the endianness is
explicitly requested but it probably doesn't hurt.

If the style is OK I see no problems.
r=drepper

Attachment #127768 - Flags: review?(drepper) → review+

Jungshik Shin

Assignee

Comment 44

•

21 years ago

Comment on attachment 127768 [details] [diff] [review]
a new patch with nits taken care of and  __linux replaced by __GLIBC__

Fix checked into the trunk. Thank you all.

Now I'm asking for a1.4 as  Darin and I discussed (see comment #8). 

This patch (along with attachment 124092 [details] [diff] [review]) is to make nsNativeCharsetUtils on
Unix(including Linux) be UTF-16-safe. Some problems in attachment 124092 [details] [diff] [review] have
been identified and addressed in this patch so that I believe the risk is
pretty low. Linux with both glibc 2.2.x or later and old glibc 2.1.x were
tested. Solaris 8/9(Solaris 7 needs either a iconv patch from Sun or GNU iconv,
which was already relase-noted) is fine. Other Unix should be all right because
nsNativeCharsetUtils.cpp (with or without this patch) has a few  'defense
measures' against various possibilities. I also tested 'iconv(3)' APIs on Tru64
and HP/UX 11 and it works as is assummed here.

Considering that 1.4 would be a long lived one(like 1.0), I guess we can't just
assume that nobody would use non-BMP characters in filenames and other places.

On Linux, there was a little worry about the start-up time, but there's
virtually no difference (or slightly faster with this patch than without).

Given all these, I think it's safe to land this combined with attachment 124092 [details] [diff] [review]
in 1.4branch

Attachment #127768 - Flags: approval1.4.x?

Asa Dotzler [:asa]

Comment 45

•

21 years ago

Comment on attachment 127768 [details] [diff] [review]
a new patch with nits taken care of and  __linux replaced by __GLIBC__

We don't have many testing resources and we're short on time. This doesn't
appear to be a high-profile problem for users so it's not 1.4 material.

Attachment #127768 - Flags: approval1.4.x? → approval1.4.x-

Jungshik Shin

Assignee

Comment 46

•

21 years ago

sorry it's late. marking as fixed.

Status: ASSIGNED → RESOLVED

Closed: 21 years ago

Resolution: --- → FIXED

a patch 21 years ago Jungshik Shin 2.00 KB, patch		Details \| Diff \| Splinter Review
a new patch (including switching to iconv under Linux) 21 years ago Jungshik Shin 4.71 KB, patch	smontagu : review+ alecf : superreview+	Details \| Diff \| Splinter Review
a patch to reduce the size 21 years ago Jungshik Shin 3.49 KB, patch		Details \| Diff \| Splinter Review
a new patch to fix make it work on RH 6.2 21 years ago Jungshik Shin 7.44 KB, patch		Details \| Diff \| Splinter Review
patch #2 to fix bug 208809 and to cut down the size 21 years ago Jungshik Shin 5.38 KB, patch		Details \| Diff \| Splinter Review
a new patch doing what I wrote 21 years ago Jungshik Shin 5.96 KB, patch	darin.moz : superreview-	Details \| Diff \| Splinter Review
a new patch with nits taken care of and __linux replaced by __GLIBC__ 21 years ago Jungshik Shin 5.59 KB, patch	drepper.fsp : review+ darin.moz : superreview+ asa : approval1.4.1-	Details \| Diff \| Splinter Review