Closed
Bug 208809
Opened 21 years ago
Closed 21 years ago
Filenames in XP filepicker are garbage
Categories
(SeaMonkey :: UI Design, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bzbarsky, Assigned: jshin1987)
References
Details
(Keywords: regression)
Attachments
(2 files)
PRESENT IN: Build pulled at "Sun Jun 8 19:20:30 CDT 2003" NOT PRESENT IN: Build pulled at "Tue May 27 10:57:48 CDT 2003" STEPS TO REPRODUCE: 1) Get a Linux build 2) Go to "File" menu 3) Select "Open File" 4) Try to read filenames in the filepicker EXPECTED RESULTS: filenames readable (all filenames involved are US-ascii; locale is "POSIX") ACTUAL RESULTS: filenames a mixture of chinese characters and question marks ADDITIONAL INFORMATION: When the filepicker is accessed via the "Browse" button of a file input, the same display problem occurs. If I type in a filename in the filepicker and hit enter, a garbage filename is displayed in the file input's textbox. If I then upload, the correct file is uploaded, but a garbage filename is sent along with it. Looking at checkins in this date range, the one that looks most likely to be responsible is the one for bug 206811 (since it also switched to iconv, as far as I can tell). I attempted to back out the nsNativeCharsetUtils patch that was checked in for that bug, but after that Mozilla could not even find the profile directory, hence would not start. Let me know if there is any useful information I could provide.
Comment 2•21 years ago
|
||
WFM in Linux nightly 1.4 2003060807 (Mdk 9.0/KDE 3)
Comment 3•21 years ago
|
||
bz: you still on redhat 6.2? ;-) what version of glibc are you running?
Assignee | ||
Comment 4•21 years ago
|
||
I don't see the problem, either. If I had, I'd not have landed my patch ;-) Can you try 'ldd --version' or 'strings /usr/lib/libc.a |grep GNU'?
Reporter | ||
Comment 5•21 years ago
|
||
Yes, I am still on RedHat 6.2. More info on all that stuff: ~% ldd --version ldd (GNU libc) 2.1.3 ~% rpm -q glibc glibc-2.1.3-21 Configured build with: --disable-tests '--enable-optimize=-march=pentiumpro -O2 -pipe' --enable-extensions=all --enable-strip-libs --enable-svg --enable-mathml --enable-crypto --enable-jsd --disable-debug --disable-dtd-debug Built with egcs 2.91.66 I could be wrong about iconv being the culprit, but if other people are not seeing it, chances are that this is indeed a problem with my version of iconv. Anything else I should check on?
Comment 6•21 years ago
|
||
how about the output of the command "iconv --list" ;-)
Reporter | ||
Comment 7•21 years ago
|
||
Assignee | ||
Comment 8•21 years ago
|
||
It's odd. glibc 2.1.3 should be good enough.(google turned up a thread of postings by Drepper and H.J. Lu on 'glibc redhat 6.2 iconv'. Darin, is there any known problem with iconv(3) in glibc 2.1.3 as shipped with RH 6.2? At one point, I'm sure I had glibc 2.1.3 on my system, but never had RH 6.2. bz, can you try the following? $ LC_ALL=C iconv -t UTF-16 | hexdump abcdef [ctrl-d] $ LC_ALL=C iconv -f ISO-8859-1 -t UTF-16 | hexdump abcdef [ctrl-d] Is there any difference? Besides, using 'UTF-16LE' in place of 'UTF-16' should not make any difference on ix86. Does it give you a different result? How about this program? After compiling it to, say, codeset, try 'LC_ALL=C ./codeset' and './codeset'. The output may be something like 'ANSI...196x' or 'ISO646....'. What's the result if you use that value for '-f' option in iconv in the above test ('iconv -f ANSI... -t UTF-16 | hexdump')? ----------------- #include <stdio.h> #include <locale.h> #include <langinfo.h> main () { char * codeset; setlocale(LC_ALL,""); codeset=nl_langinfo(CODESET); printf ("LC_CTYPE = %s", setlocale(LC_CTYPE, NULL)); printf ("LC_CTYPE Codeset = %s",codeset); } --------------- BTW, as for removing the patch, IIRC, just rebuilding xpcom was not enough. You have to rebuild necko (or something else?) as well.
Reporter | ||
Comment 9•21 years ago
|
||
Output of those commands: ~% env LC_ALL=C iconv -t UTF-16 | hexdump iconv: original encoding not specified using `-f' ~% env LC_ALL=C iconv -f ISO-8859-1 -t UTF-16 | hexdump abcdef 0000000 6100 6200 6300 6400 6500 6600 0a00 000000e ~% env LC_ALL=C iconv -f ISO-8859-1 -t UTF-16LE | hexdump iconv: conversion from `ISO-8859-1' to `UTF-16LE' not supported ~% ./codeset LC_CTYPE = C LC_CTYPE Codeset = ANSI_X3.4-1968 ~% env LC_ALL=C iconv -f ANSI_X3.4-1968 -t UTF-16 | hexdump abcdef 0000000 6100 6200 6300 6400 6500 6600 0a00 000000e (running 'codeset' with an explicit LC_ALL=C gave the same exact output there). As for the nsNativeCharsetUtils.cpp patch, I ran 'cvs up -r 1.13 xpcom/io/nsNativeCharsetUtils.cpp' then did a dep rebuild from toplevel. Mozilla refused to start, claiming inability to find the profile directory. I suppose I could do a clobber build if that's needed...
Assignee | ||
Comment 10•21 years ago
|
||
Oh, my gosh.. 'UTF-16' in glibc 2.1.3 seems to be big endian even on an little endian machine. I thought 'UTF-16' / 'UCS-2' / 'UCS-4' in glibc had always meant native endian. That explains Chinese characters you've seen because ASCII characters in UTF-16 byte-swapped are Chinese characters (U+0061 -> U+6100). Just to make sure, can you run hexdump with '-b' option? How about 'hexdump -f ISO-8859-1 -t UCS-2 | hexdump -b'? Hmm..... this is a headache. We have to check the endianness of 'UTF-16' at *run-time* and have to byte-swap if necessary. Then, it'll increase the codesize... Katakai-san, how about iconv(3) in Solaris on ix86? Does 'UTF-16/UCS-2' in Solaris mean 'native endian' ? Has it changed over time?
Comment 11•21 years ago
|
||
> 'UTF-16' in glibc 2.1.3 seems to be big endian even on an little
> endian machine.
This must be the case since UTF-16 has a fixed meaning.
But it's easy to handle. Include <endian.h> and then use
#if BYTE_ORDER == LITTLE_ENDIAN
"UTF-16LE"
#else
"UTF-16BE"
#endif
Assignee | ||
Comment 12•21 years ago
|
||
>> 'UTF-16' in glibc 2.1.3 seems to be big endian even on an little >> endian machine. > This must be the case since UTF-16 has a fixed meaning. I'm afraid this interpretation is not in agreement with what people on the UTC think. Moreover, how come using 'UTF-16' works on ix86 with glibc 2.2.x if it's always BE in glibc? > #if BYTE_ORDER == LITTLE_ENDIAN > "UTF-16LE" > #else > "UTF-16BE" Does iconv in glibc 2.1.x have UTF-16BE/LE? According to bz's test result (comment #9) and attachment 125266 [details], it doesn't appear to. What are 'UNICODELITTLE' and 'UNICODEBIG'? Maybe they're either UCS4 or UCS2.
Comment 13•21 years ago
|
||
Hi Jungshik,
> Katakai-san, how about iconv(3) in Solaris on ix86? Does 'UTF-16/UCS-2' in
> Solaris mean 'native endian' ? Has it changed over time?
yes, it's Little Endian.
Reporter | ||
Comment 14•21 years ago
|
||
~% iconv -f ISO-8859-1 -t UTF-16 | hexdump -b abcdef 0000000 000 141 000 142 000 143 000 144 000 145 000 146 000 012 000000e
Assignee | ||
Comment 15•21 years ago
|
||
Thank you for the answer and testing, Boris, Masaki and Ulrich. On RH 8.0 (with glibc 2.2.93, redhat version 2.2.93-5), I got the following result from this test program: [jungshik@bach jungshik]$ rpm -q glibc glibc-2.2.93-5 [jungshik@bach jungshik]$ ./iconv_test UCS-2 out[0]= 20, outlen=2 out[0]= 20, outlen=2 [jungshik@bach jungshik]$ ./iconv_test UCS-2LE out[0]= 20, outlen=2 out[0]= 20, outlen=2 [jungshik@bach jungshik]$ ./iconv_test UCS-2BE out[0]=2000, outlen=2 out[0]=2000, outlen=2 [jungshik@bach jungshik]$ ./iconv_test UTF-16 out[0]=feff, outlen=0 out[0]= 20, outlen=2 [jungshik@bach jungshik]$ ./iconv_test UTF-16LE out[0]= 20, outlen=2 out[0]= 20, outlen=2 [jungshik@bach jungshik]$ ./iconv_test UTF-16BE out[0]=2000, outlen=2 out[0]=2000, outlen=2 UCS-2 gives me native endian(LE on ix86) without BOM. UTF-16 prepend the output with BOM in the first invocation and is also native endian (LE on ix86), which is not the case of glibc 2.1.3. The same is true of UTF-32 and UCS-4. Unless we can stop supporting Linux with glibc 2.1.x (which I think we cannot), I'm afraid we can't help checking the endianness at run-time and byte-swapping if necessary. Patching is coming up.
Assignee | ||
Comment 16•21 years ago
|
||
I'll upload a couple of patches to bug 206811 because depending on which one we choose, the binary size changes by about 1.5 - 2kB.
Status: NEW → ASSIGNED
Depends on: 206811
Assignee | ||
Comment 17•21 years ago
|
||
bz, can you try two patches I uploaded to bug 206811?
Comment 18•21 years ago
|
||
RFC2781 says: 4.3 Interpreting text labelled as UTF-16 Text labelled with the "UTF-16" charset might be serialized in either big-endian or little-endian order. If the first two octets of the text is 0xFE followed by 0xFF, then the text can be interpreted as being big-endian. If the first two octets of the text is 0xFF followed by 0xFE, then the text can be interpreted as being little- endian. If the first two octets of the text is not 0xFE followed by 0xFF, and is not 0xFF followed by 0xFE, then the text SHOULD be interpreted as being big-endian. so the rh6 behavior seems to be correct.
Reporter | ||
Comment 19•21 years ago
|
||
Tried the two patches in bug 206811. The first one crashes Mozilla on startup: (gdb) frame #0 byteswap (in=0xffbf58ef <Address 0xffbf58ef out of bounds>, len=8704) at /home/bzbarsky/mozilla/profile/mozilla/xpcom/io/nsNativeCharsetUtils.cpp:191 191 tmp = in[len]; (gdb) p len $3 = 4294897831 The second one I apply, rebuild xpcom, and get an alert about "the directory containing the profile cannot be found (just like I did when I tried backing the first bug 206811 patch out)". I can create a new profile, however, and the filepicker works fine with it.... Then if I deapply the patch, start with the new profile, then reapply the patch, I can no longer use that profile. Looks like running with a broken build corrupts the profile registry or something like that. :(
Comment 20•21 years ago
|
||
> byteswap (in=0xffbf58ef <Address 0xffbf58ef out of bounds>, len=8704)
^^^^^^^^
0xffbf58ef = byteswap(0xbfffef58), I think.
Assignee | ||
Comment 21•21 years ago
|
||
> Looks like running with a broken build corrupts the profile
> registry or something like that. :(
I guess that's what's happening. I'm sorry if you lost all your bookmarks, and
such. You might be able to recover it by binary-editing 'appreg' file (I did
that a couple of times in the past and you may have done that, too ....)
Reporter | ||
Comment 22•21 years ago
|
||
Jungshik, I plan to install a different OS version sometime in the next two days, so if there is any more testing you want on RedHat 6.2, please let me know ASAP...
Assignee | ||
Comment 23•21 years ago
|
||
Boris, thanks for the offer and all your help so far. I guess I have all I need to know about glibc 2.1.x. Just in case, can you try the latest patch to bug 206811 (attachment 125560 [details] [diff] [review]) under RH 6.2? It's identical (as far as glibc 2.1.x is concerned) to the one right before it, but making sure that is not bad, I think.
Reporter | ||
Comment 24•21 years ago
|
||
Yep, latest patch there seems to work fine (modulo the const warning when compiling as I mentioned in email). Could we work on getting this in ASAP? I'm not too happy with all the corrupt data I'm putting into bugzilla as a result of this bug...
Assignee | ||
Comment 25•21 years ago
|
||
the patch for bug 206811 got landed, which should fix this bug with glibc 2.1.x
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Updated•20 years ago
|
Product: Core → Mozilla Application Suite
You need to log in
before you can comment on or make changes to this bug.
Description
•