Closed Bug 208809 Opened 21 years ago Closed 21 years ago

Filenames in XP filepicker are garbage

Categories

(SeaMonkey :: UI Design, defect)

x86
Linux
defect
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bzbarsky, Assigned: jshin1987)

References

Details

(Keywords: regression)

Attachments

(2 files)

PRESENT IN: Build pulled at "Sun Jun  8 19:20:30 CDT 2003"
NOT PRESENT IN: Build pulled at "Tue May 27 10:57:48 CDT 2003"

STEPS TO REPRODUCE: 
1)  Get a Linux build
2)  Go to "File" menu
3)  Select "Open File"
4)  Try to read filenames in the filepicker

EXPECTED RESULTS: filenames readable (all filenames involved are US-ascii;
locale is "POSIX")

ACTUAL RESULTS: filenames a mixture of chinese characters and question marks

ADDITIONAL INFORMATION: When the filepicker is accessed via the "Browse" button
of a file input, the same display problem occurs.  If I type in a filename in
the filepicker and hit enter, a garbage filename is displayed in the file
input's textbox.  If I then upload, the correct file is uploaded, but a garbage
filename is sent along with it.

Looking at checkins in this date range, the one that looks most likely to be
responsible is the one for bug 206811 (since it also switched to iconv, as far
as I can tell).  I attempted to back out the nsNativeCharsetUtils patch that was
checked in for that bug, but after that Mozilla could not even find the profile
directory, hence would not start.

Let me know if there is any useful information I could provide.
I don't see this in my 6-07 build nor in my new 6-09 one.
WFM in Linux nightly 1.4 2003060807 (Mdk 9.0/KDE 3)
bz: you still on redhat 6.2? ;-)

what version of glibc are you running?
I don't see the problem, either. If I had, I'd not have landed my patch ;-)
Can you try 'ldd --version' or 'strings /usr/lib/libc.a |grep GNU'? 
Yes, I am still on RedHat 6.2.  More info on all that stuff:

~% ldd --version
ldd (GNU libc) 2.1.3

~% rpm -q glibc
glibc-2.1.3-21

Configured build with:

  --disable-tests '--enable-optimize=-march=pentiumpro -O2 -pipe'
  --enable-extensions=all --enable-strip-libs --enable-svg --enable-mathml
  --enable-crypto --enable-jsd --disable-debug --disable-dtd-debug

Built with egcs 2.91.66

I could be wrong about iconv being the culprit, but if other people are not
seeing it, chances are that this is indeed a problem with my version of iconv.

Anything else I should check on?
how about the output of the command "iconv --list" ;-)
Attached file Output of iconv --list
It's odd. glibc 2.1.3 should be good enough.(google turned up a thread of
postings by Drepper and H.J. Lu on 'glibc redhat 6.2 iconv'. Darin, is there any
known problem with iconv(3) in glibc 2.1.3 as shipped with RH 6.2? At one point,
I'm sure I had glibc 2.1.3 on my system, but never had RH 6.2.

bz, can you try the following?

  $ LC_ALL=C iconv -t UTF-16 | hexdump 
  abcdef
  [ctrl-d]

  $ LC_ALL=C iconv -f ISO-8859-1 -t UTF-16 | hexdump 
  abcdef
  [ctrl-d]

Is there any difference? Besides, using 'UTF-16LE' in place of 'UTF-16' should
not make any difference on ix86. Does it give you a different result? 

How about this program? After compiling it to, say, codeset, 
try 'LC_ALL=C ./codeset' and  './codeset'. The output may be
something like 'ANSI...196x' or 'ISO646....'. What's the result
if you use that value for '-f' option in iconv in the above test
('iconv -f ANSI... -t UTF-16 | hexdump')?
-----------------
#include <stdio.h>
#include <locale.h>
#include <langinfo.h>

main ()
{
char * codeset;
setlocale(LC_ALL,"");

codeset=nl_langinfo(CODESET);

printf ("LC_CTYPE = %s", setlocale(LC_CTYPE, NULL));
printf ("LC_CTYPE Codeset = %s",codeset);

}
---------------
  
BTW, as for removing the patch, IIRC, just rebuilding xpcom was not enough. You
have to rebuild necko (or something else?) as well.  
Output of those commands:

~% env LC_ALL=C iconv -t UTF-16 | hexdump
iconv: original encoding not specified using `-f'
~% env LC_ALL=C iconv -f ISO-8859-1 -t UTF-16 | hexdump
abcdef
0000000 6100 6200 6300 6400 6500 6600 0a00     
000000e
~% env LC_ALL=C iconv -f ISO-8859-1 -t UTF-16LE | hexdump
iconv: conversion from `ISO-8859-1' to `UTF-16LE' not supported
~% ./codeset
LC_CTYPE = C
LC_CTYPE Codeset = ANSI_X3.4-1968
~% env LC_ALL=C iconv -f ANSI_X3.4-1968 -t UTF-16 | hexdump
abcdef
0000000 6100 6200 6300 6400 6500 6600 0a00     
000000e

(running 'codeset' with an explicit LC_ALL=C gave the same exact output there).

As for the nsNativeCharsetUtils.cpp patch, I ran 'cvs up -r 1.13
xpcom/io/nsNativeCharsetUtils.cpp' then did a dep rebuild from toplevel. 
Mozilla refused to start, claiming inability to find the profile directory.

I suppose I could do a clobber build if that's needed...
Oh, my gosh.. 'UTF-16' in glibc 2.1.3 seems to be big endian even on an little
endian machine.  I thought 'UTF-16' / 'UCS-2' / 'UCS-4' in glibc had always
meant native endian.

That explains Chinese characters you've seen because ASCII characters in UTF-16
byte-swapped are Chinese characters (U+0061 -> U+6100).
Just to make sure, can you run hexdump with '-b' option? How about 
'hexdump -f ISO-8859-1 -t UCS-2 | hexdump -b'? 

Hmm..... this is a headache. We have to check the endianness of 'UTF-16' at
*run-time* and have to byte-swap if necessary. Then, it'll increase the codesize...

Katakai-san, how about iconv(3) in Solaris on ix86? Does 'UTF-16/UCS-2' in
Solaris mean 'native endian' ? Has it changed over time? 
> 'UTF-16' in glibc 2.1.3 seems to be big endian even on an little
> endian machine. 

This must be the case since UTF-16 has a fixed meaning.

But it's easy to handle.  Include <endian.h> and then use

#if BYTE_ORDER == LITTLE_ENDIAN
  "UTF-16LE"
#else
  "UTF-16BE"
#endif
>> 'UTF-16' in glibc 2.1.3 seems to be big endian even on an little
>> endian machine. 

> This must be the case since UTF-16 has a fixed meaning.

  I'm afraid this interpretation is not in agreement with what people on the UTC
think. Moreover, how come using 'UTF-16' works on ix86 with glibc 2.2.x if it's
always BE in glibc? 

> #if BYTE_ORDER == LITTLE_ENDIAN
>  "UTF-16LE"
> #else
>  "UTF-16BE"

Does iconv in glibc 2.1.x have UTF-16BE/LE? According to bz's test result
(comment #9) and attachment 125266 [details], it doesn't appear to.   What are
'UNICODELITTLE' and 'UNICODEBIG'? Maybe they're either UCS4 or UCS2.
Hi Jungshik,

> Katakai-san, how about iconv(3) in Solaris on ix86? Does 'UTF-16/UCS-2' in
> Solaris mean 'native endian' ? Has it changed over time? 

yes, it's Little Endian.
~% iconv -f ISO-8859-1 -t UTF-16 | hexdump -b
abcdef
0000000 000 141 000 142 000 143 000 144 000 145 000 146 000 012        
000000e
Attached file iconv test program
Thank you for the answer and testing, Boris, Masaki and Ulrich.

On RH 8.0 (with glibc 2.2.93, redhat version 2.2.93-5), I got the following
result from this test program:

[jungshik@bach jungshik]$ rpm -q glibc
glibc-2.2.93-5
[jungshik@bach jungshik]$ ./iconv_test UCS-2
out[0]=  20, outlen=2
out[0]=  20, outlen=2
[jungshik@bach jungshik]$ ./iconv_test UCS-2LE
out[0]=  20, outlen=2
out[0]=  20, outlen=2
[jungshik@bach jungshik]$ ./iconv_test UCS-2BE
out[0]=2000, outlen=2
out[0]=2000, outlen=2
[jungshik@bach jungshik]$ ./iconv_test UTF-16
out[0]=feff, outlen=0
out[0]=  20, outlen=2
[jungshik@bach jungshik]$ ./iconv_test UTF-16LE
out[0]=  20, outlen=2
out[0]=  20, outlen=2
[jungshik@bach jungshik]$ ./iconv_test UTF-16BE
out[0]=2000, outlen=2
out[0]=2000, outlen=2

UCS-2 gives me native endian(LE on ix86) without BOM. UTF-16 prepend the output
with  BOM in the first invocation and is also native endian (LE on ix86), which
is not the case of glibc 2.1.3.

The same is true of  UTF-32 and UCS-4. Unless we can stop supporting Linux with
glibc 2.1.x (which I think we cannot), I'm afraid we can't help checking the
endianness at run-time and byte-swapping if necessary. 

Patching is coming up.
I'll upload a couple of patches to bug 206811 because depending on which one we
choose, the binary size changes by about 1.5 - 2kB.
Status: NEW → ASSIGNED
Depends on: 206811
bz, can you try two patches I uploaded to bug 206811? 
RFC2781 says:

4.3 Interpreting text labelled as UTF-16

   Text labelled with the "UTF-16" charset might be serialized in either
   big-endian or little-endian order. If the first two octets of the
   text is 0xFE followed by 0xFF, then the text can be interpreted as
   being big-endian. If the first two octets of the text is 0xFF
   followed by 0xFE, then the text can be interpreted as being little-
   endian. If the first two octets of the text is not 0xFE followed by
   0xFF, and is not 0xFF followed by 0xFE, then the text SHOULD be
   interpreted as being big-endian.

so the rh6 behavior seems to be correct. 
Tried the two patches in bug 206811.  The first one crashes Mozilla on startup:

(gdb) frame
#0  byteswap (in=0xffbf58ef <Address 0xffbf58ef out of bounds>, len=8704)
    at /home/bzbarsky/mozilla/profile/mozilla/xpcom/io/nsNativeCharsetUtils.cpp:191
191             tmp = in[len];
(gdb) p len
$3 = 4294897831

The second one I apply, rebuild xpcom, and get an alert about "the directory
containing the profile cannot be found (just like I did when I tried backing the
first bug 206811 patch out)".   I can create a new profile, however, and the
filepicker works fine with it....  Then if I deapply the patch, start with the
new profile, then reapply the patch, I can no longer use that profile.

Looks like running with a broken build corrupts the profile registry or
something like that.  :(
> byteswap (in=0xffbf58ef <Address 0xffbf58ef out of bounds>, len=8704)
                 ^^^^^^^^

0xffbf58ef = byteswap(0xbfffef58), I think.
> Looks like running with a broken build corrupts the profile 
> registry or something like that.  :(

  I guess that's what's happening. I'm sorry if you lost all your bookmarks, and
such. You might be able to recover it by binary-editing 'appreg' file (I did
that a couple of times in the past and you may have done that, too ....)

Jungshik, I plan to install a different OS version sometime in the next two
days, so if there is any more testing you want on RedHat 6.2, please let me know
ASAP...
Boris, thanks for the offer and all your help so far. I guess I have all I 
need to know about glibc 2.1.x. Just in case, can you try the latest patch to 
bug 206811 (attachment 125560 [details] [diff] [review]) under RH 6.2? It's identical (as far as glibc 
2.1.x is concerned) to the one right before it, but making sure that is not 
bad, I think. 
Yep, latest patch there seems to work fine (modulo the const warning when
compiling as I mentioned in email).  Could we work on getting this in ASAP?  I'm
not too happy with all the corrupt data I'm putting into bugzilla as a result of
this bug...
the patch for bug 206811 got landed, which should fix this bug with glibc 2.1.x
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Product: Core → Mozilla Application Suite
Component: XP Apps: GUI Features → UI Design
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: