Closed Bug 208809 Opened 22 years ago Closed 22 years ago

Filenames in XP filepicker are garbage

Categories

(SeaMonkey :: UI Design, defect)

x86
Linux
defect
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bzbarsky, Assigned: jshin1987)

References

Details

(Keywords: regression)

Attachments

(2 files)

PRESENT IN: Build pulled at "Sun Jun 8 19:20:30 CDT 2003" NOT PRESENT IN: Build pulled at "Tue May 27 10:57:48 CDT 2003" STEPS TO REPRODUCE: 1) Get a Linux build 2) Go to "File" menu 3) Select "Open File" 4) Try to read filenames in the filepicker EXPECTED RESULTS: filenames readable (all filenames involved are US-ascii; locale is "POSIX") ACTUAL RESULTS: filenames a mixture of chinese characters and question marks ADDITIONAL INFORMATION: When the filepicker is accessed via the "Browse" button of a file input, the same display problem occurs. If I type in a filename in the filepicker and hit enter, a garbage filename is displayed in the file input's textbox. If I then upload, the correct file is uploaded, but a garbage filename is sent along with it. Looking at checkins in this date range, the one that looks most likely to be responsible is the one for bug 206811 (since it also switched to iconv, as far as I can tell). I attempted to back out the nsNativeCharsetUtils patch that was checked in for that bug, but after that Mozilla could not even find the profile directory, hence would not start. Let me know if there is any useful information I could provide.
I don't see this in my 6-07 build nor in my new 6-09 one.
WFM in Linux nightly 1.4 2003060807 (Mdk 9.0/KDE 3)
bz: you still on redhat 6.2? ;-) what version of glibc are you running?
I don't see the problem, either. If I had, I'd not have landed my patch ;-) Can you try 'ldd --version' or 'strings /usr/lib/libc.a |grep GNU'?
Yes, I am still on RedHat 6.2. More info on all that stuff: ~% ldd --version ldd (GNU libc) 2.1.3 ~% rpm -q glibc glibc-2.1.3-21 Configured build with: --disable-tests '--enable-optimize=-march=pentiumpro -O2 -pipe' --enable-extensions=all --enable-strip-libs --enable-svg --enable-mathml --enable-crypto --enable-jsd --disable-debug --disable-dtd-debug Built with egcs 2.91.66 I could be wrong about iconv being the culprit, but if other people are not seeing it, chances are that this is indeed a problem with my version of iconv. Anything else I should check on?
how about the output of the command "iconv --list" ;-)
Attached file Output of iconv --list
It's odd. glibc 2.1.3 should be good enough.(google turned up a thread of postings by Drepper and H.J. Lu on 'glibc redhat 6.2 iconv'. Darin, is there any known problem with iconv(3) in glibc 2.1.3 as shipped with RH 6.2? At one point, I'm sure I had glibc 2.1.3 on my system, but never had RH 6.2. bz, can you try the following? $ LC_ALL=C iconv -t UTF-16 | hexdump abcdef [ctrl-d] $ LC_ALL=C iconv -f ISO-8859-1 -t UTF-16 | hexdump abcdef [ctrl-d] Is there any difference? Besides, using 'UTF-16LE' in place of 'UTF-16' should not make any difference on ix86. Does it give you a different result? How about this program? After compiling it to, say, codeset, try 'LC_ALL=C ./codeset' and './codeset'. The output may be something like 'ANSI...196x' or 'ISO646....'. What's the result if you use that value for '-f' option in iconv in the above test ('iconv -f ANSI... -t UTF-16 | hexdump')? ----------------- #include <stdio.h> #include <locale.h> #include <langinfo.h> main () { char * codeset; setlocale(LC_ALL,""); codeset=nl_langinfo(CODESET); printf ("LC_CTYPE = %s", setlocale(LC_CTYPE, NULL)); printf ("LC_CTYPE Codeset = %s",codeset); } --------------- BTW, as for removing the patch, IIRC, just rebuilding xpcom was not enough. You have to rebuild necko (or something else?) as well.
Output of those commands: ~% env LC_ALL=C iconv -t UTF-16 | hexdump iconv: original encoding not specified using `-f' ~% env LC_ALL=C iconv -f ISO-8859-1 -t UTF-16 | hexdump abcdef 0000000 6100 6200 6300 6400 6500 6600 0a00 000000e ~% env LC_ALL=C iconv -f ISO-8859-1 -t UTF-16LE | hexdump iconv: conversion from `ISO-8859-1' to `UTF-16LE' not supported ~% ./codeset LC_CTYPE = C LC_CTYPE Codeset = ANSI_X3.4-1968 ~% env LC_ALL=C iconv -f ANSI_X3.4-1968 -t UTF-16 | hexdump abcdef 0000000 6100 6200 6300 6400 6500 6600 0a00 000000e (running 'codeset' with an explicit LC_ALL=C gave the same exact output there). As for the nsNativeCharsetUtils.cpp patch, I ran 'cvs up -r 1.13 xpcom/io/nsNativeCharsetUtils.cpp' then did a dep rebuild from toplevel. Mozilla refused to start, claiming inability to find the profile directory. I suppose I could do a clobber build if that's needed...
Oh, my gosh.. 'UTF-16' in glibc 2.1.3 seems to be big endian even on an little endian machine. I thought 'UTF-16' / 'UCS-2' / 'UCS-4' in glibc had always meant native endian. That explains Chinese characters you've seen because ASCII characters in UTF-16 byte-swapped are Chinese characters (U+0061 -> U+6100). Just to make sure, can you run hexdump with '-b' option? How about 'hexdump -f ISO-8859-1 -t UCS-2 | hexdump -b'? Hmm..... this is a headache. We have to check the endianness of 'UTF-16' at *run-time* and have to byte-swap if necessary. Then, it'll increase the codesize... Katakai-san, how about iconv(3) in Solaris on ix86? Does 'UTF-16/UCS-2' in Solaris mean 'native endian' ? Has it changed over time?
> 'UTF-16' in glibc 2.1.3 seems to be big endian even on an little > endian machine. This must be the case since UTF-16 has a fixed meaning. But it's easy to handle. Include <endian.h> and then use #if BYTE_ORDER == LITTLE_ENDIAN "UTF-16LE" #else "UTF-16BE" #endif
>> 'UTF-16' in glibc 2.1.3 seems to be big endian even on an little >> endian machine. > This must be the case since UTF-16 has a fixed meaning. I'm afraid this interpretation is not in agreement with what people on the UTC think. Moreover, how come using 'UTF-16' works on ix86 with glibc 2.2.x if it's always BE in glibc? > #if BYTE_ORDER == LITTLE_ENDIAN > "UTF-16LE" > #else > "UTF-16BE" Does iconv in glibc 2.1.x have UTF-16BE/LE? According to bz's test result (comment #9) and attachment 125266 [details], it doesn't appear to. What are 'UNICODELITTLE' and 'UNICODEBIG'? Maybe they're either UCS4 or UCS2.
Hi Jungshik, > Katakai-san, how about iconv(3) in Solaris on ix86? Does 'UTF-16/UCS-2' in > Solaris mean 'native endian' ? Has it changed over time? yes, it's Little Endian.
~% iconv -f ISO-8859-1 -t UTF-16 | hexdump -b abcdef 0000000 000 141 000 142 000 143 000 144 000 145 000 146 000 012 000000e
Attached file iconv test program
Thank you for the answer and testing, Boris, Masaki and Ulrich. On RH 8.0 (with glibc 2.2.93, redhat version 2.2.93-5), I got the following result from this test program: [jungshik@bach jungshik]$ rpm -q glibc glibc-2.2.93-5 [jungshik@bach jungshik]$ ./iconv_test UCS-2 out[0]= 20, outlen=2 out[0]= 20, outlen=2 [jungshik@bach jungshik]$ ./iconv_test UCS-2LE out[0]= 20, outlen=2 out[0]= 20, outlen=2 [jungshik@bach jungshik]$ ./iconv_test UCS-2BE out[0]=2000, outlen=2 out[0]=2000, outlen=2 [jungshik@bach jungshik]$ ./iconv_test UTF-16 out[0]=feff, outlen=0 out[0]= 20, outlen=2 [jungshik@bach jungshik]$ ./iconv_test UTF-16LE out[0]= 20, outlen=2 out[0]= 20, outlen=2 [jungshik@bach jungshik]$ ./iconv_test UTF-16BE out[0]=2000, outlen=2 out[0]=2000, outlen=2 UCS-2 gives me native endian(LE on ix86) without BOM. UTF-16 prepend the output with BOM in the first invocation and is also native endian (LE on ix86), which is not the case of glibc 2.1.3. The same is true of UTF-32 and UCS-4. Unless we can stop supporting Linux with glibc 2.1.x (which I think we cannot), I'm afraid we can't help checking the endianness at run-time and byte-swapping if necessary. Patching is coming up.
I'll upload a couple of patches to bug 206811 because depending on which one we choose, the binary size changes by about 1.5 - 2kB.
Status: NEW → ASSIGNED
Depends on: 206811
bz, can you try two patches I uploaded to bug 206811?
RFC2781 says: 4.3 Interpreting text labelled as UTF-16 Text labelled with the "UTF-16" charset might be serialized in either big-endian or little-endian order. If the first two octets of the text is 0xFE followed by 0xFF, then the text can be interpreted as being big-endian. If the first two octets of the text is 0xFF followed by 0xFE, then the text can be interpreted as being little- endian. If the first two octets of the text is not 0xFE followed by 0xFF, and is not 0xFF followed by 0xFE, then the text SHOULD be interpreted as being big-endian. so the rh6 behavior seems to be correct.
Tried the two patches in bug 206811. The first one crashes Mozilla on startup: (gdb) frame #0 byteswap (in=0xffbf58ef <Address 0xffbf58ef out of bounds>, len=8704) at /home/bzbarsky/mozilla/profile/mozilla/xpcom/io/nsNativeCharsetUtils.cpp:191 191 tmp = in[len]; (gdb) p len $3 = 4294897831 The second one I apply, rebuild xpcom, and get an alert about "the directory containing the profile cannot be found (just like I did when I tried backing the first bug 206811 patch out)". I can create a new profile, however, and the filepicker works fine with it.... Then if I deapply the patch, start with the new profile, then reapply the patch, I can no longer use that profile. Looks like running with a broken build corrupts the profile registry or something like that. :(
> byteswap (in=0xffbf58ef <Address 0xffbf58ef out of bounds>, len=8704) ^^^^^^^^ 0xffbf58ef = byteswap(0xbfffef58), I think.
> Looks like running with a broken build corrupts the profile > registry or something like that. :( I guess that's what's happening. I'm sorry if you lost all your bookmarks, and such. You might be able to recover it by binary-editing 'appreg' file (I did that a couple of times in the past and you may have done that, too ....)
Jungshik, I plan to install a different OS version sometime in the next two days, so if there is any more testing you want on RedHat 6.2, please let me know ASAP...
Boris, thanks for the offer and all your help so far. I guess I have all I need to know about glibc 2.1.x. Just in case, can you try the latest patch to bug 206811 (attachment 125560 [details] [diff] [review]) under RH 6.2? It's identical (as far as glibc 2.1.x is concerned) to the one right before it, but making sure that is not bad, I think.
Yep, latest patch there seems to work fine (modulo the const warning when compiling as I mentioned in email). Could we work on getting this in ASAP? I'm not too happy with all the corrupt data I'm putting into bugzilla as a result of this bug...
the patch for bug 206811 got landed, which should fix this bug with glibc 2.1.x
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Product: Core → Mozilla Application Suite
Component: XP Apps: GUI Features → UI Design
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: