147333 - Cannot load local files whose names contain Japanese/Chinese characters

Reporter

Description

•

22 years ago

This is regression. Try to browse japanese EUC file on file
picker on Solaris.

1. Start CDE in Japanese EUC locale
2. Prepare a file with filename = Japanese EUC
3. Start Mozilla
4. File -> Open File...
   filename becomes ????.txt

This works on RC1, RC2 but does not work on RC3.

This problem happens on only Solaris platform. I could not
find this on Linux (RH6.2JA).

Masaki Katakai

Reporter

Comment 1

•

22 years ago

This problem also happens at directory browsing when
I type "file://...." on URL field.

Boris Zbarsky [:bzbarsky]

Comment 2

•

22 years ago

Sounds like fallout from the nsIFile api changes...

Boris Zbarsky [:bzbarsky]

Comment 3

•

22 years ago

upgrading to blocker also; this should block 1.0 if nothing else, imo, and this 
is the only way I know to get people's attention on it.

Severity: critical → blocker

Keywords: regression

Masaki Katakai

Reporter

Comment 4

•

22 years ago

This works on UTF-8 locale but doesn't work on
Japanese EUC locale.

It seems that now mbtowc() is used for code
conversion for native<->ucs2 but if I remember
correctly, this works on only Linux platform.
How about another UNIX? AIX and HP-UX?

For Solaris, we need to use iconv() of Solaris
for the conversion or need to continue to use
Mozilla own converters here.

Rui Xu

Updated

•

22 years ago

Keywords: intl

QA Contact: ruixu → ylong

Roy Yokoyama

Comment 5

•

22 years ago

cc dougt

Doug Turner (:dougt)

Comment 6

•

22 years ago

darin, any ideas?

Darin Fisher

Assignee

Comment 7

•

22 years ago

what version of solaris does this apply to?  i'm very surprised to hear that
mbtowc fails... is that really a common problem on solaris?

Roy Yokoyama

Comment 8

•

22 years ago

katakai: can you please give us some help?

Jay Yan

Comment 9

•

22 years ago

Ervin: 
can you help to test and work on this feature using Chineses locale? this
regression bug is expected to be fixed in Mozilla1.0. thanks
Jay

Roy Yokoyama

Comment 10

•

22 years ago

anybody tested in HP-UX/AIX/SGI?

Kyle Yuan

Comment 11

•

22 years ago

It is also failed on Solaris 9 CDE chinese env.
Anyone can narrow down the suspicious patches?

kill this account

Comment 12

•

22 years ago

My guess is that newer Unix systems have wide-char defined as unicode but older
systems do not.

What is the OS version on the systems that are seeing the problem?

Brian Yuan

Comment 13

•

22 years ago

widechar in Solaris is locale specific, usually not UCS2, so mbtowc() shouldn't
be used as the converter from native to UCS2, only iconv() or netscape specific
converter can be used in Solaris.

Dawn Endico

Comment 14

•

22 years ago

I suspect that the patch for this will not show up in the mozilla 1.0 tree
since we are about to tag the tree for 1.0 tonight and it doesn't seem that a 
well tested fix for this is likely to show up for this tonight.

However, it will be a few days before we actually release so if you can 
come up with a patch before the release date then we can release note
the patch and make sure that whoever does the solaris builds applies that patch
so that the mozilla 1.0 solaris binaries on ftp.mozilla.org will work.

Jay Yan

Comment 15

•

22 years ago

perhps Dawn is right :-(

could any encoding guru help to see the patch for bug 129279?

Antonio.Xu

Comment 16

•

22 years ago

i can reproduce this bug with RC3,but rc2 is fine

Ervin Yan

Comment 17

•

22 years ago

I can reproduce this bug in ko/zh/zh_TW locales on Solaris with RC3.
but RC2 is OK. also this bug happen on Netscape 7.

Darin Fisher

Assignee

Comment 18

•

22 years ago

hmm.. what version of netscape 7 are you testing?  netscape 7 PR1 does not
include the patch for bug 129279.  only builds after RC2 include that patch.

anyways, i can easily put together a patch for this, but i'll need someone to
test it out.  anyone here willing to be my patch buddy?

looks like redhat 7.3 (glibc-2.2.5) supports the iconv interface, so i'll go
ahead and develop a patch for this that works under linux.  it might even be
better to prefer iconv over mb[r]towc/wc[r]tomb in general... i'm not sure yet.

taking...

Assignee: yokoyama → darin

Kyle Yuan

Comment 19

•

22 years ago

darin, I'll be glad to test your patch.

bobj

Comment 20

•

22 years ago

I believe the problem is assuming that mbtowc() and related functions return
Unicode.  Both the length and encoding of wchar_t are not specified in
ANSI/ISO C --  they are implementation defined.

On Solaris (and probably most of the older Unix implementations) wchar_t
is probabably NOT Unicode for all locales.

I think Katakai-san is correct that the cross-Unix solution should be to use
iconv().  I'm not sure if iconv() is supported on other platforms.

If my memory servers me correctly, the original concept was that wchar_t was
intended to be only an internal opaque representation of character data that
represents each character in a fixed length number of bytes.  That is why
ANSI/ISO C left the size and encoding of wchar_t implementation defined.

Using wchar_t can cause problems of portability and interoperability because
the size and encoding can differ across platforms and even compilers on the
same platform.  If wchar_t is used externally you can hit big-endian v.
little-endian incompatibility as well.

bobj

Comment 21

•

22 years ago

Ignore last comment about iconv() on non-Unix platforms.
Looking at the patch it appears that mbtowc() is only used in nsLocalFileUnix.cpp,
so we don't have to worry about using iconv() on non-Unices.

kill this account

Comment 22

•

22 years ago

I believe mozilla has all the converters necessary. My recollection is that the
last time I talked to ftang about iconv he felt there were valid reason why 
we should not use them.

bobj

Comment 23

•

22 years ago

Can someone point to some good examples of code calling the internal
mozilla converters that would be used to replace the mbtowc() calls?

Antonio.Xu

Comment 24

•

22 years ago

hi darin,if you need help,please call me

Darin Fisher

Assignee

Comment 25

•

22 years ago

ok, i did some investigation... and it looks like iconv is supported as far back
as redhat 6.0, so we are fine using that for linux.  Mac OSX also provides
iconv, so that pretty much implies that *BSD systems should be covered.  i
suspect most unix platforms support iconv, so my patch will use iconv if
available and fall back on the existing implementation otherwise.

bstell: regarding using mozilla's uconv libraries: that is not an option. 
libxpcom cannot depend on any of mozilla's uconv libraries (i forget the bugs
off hand, but installer has intl problems otherwise).

i should have a patch by end of day tomorrow.

Status: NEW → ASSIGNED

Keywords: mozilla1.0, nsbeta1

Priority: -- → P1

Whiteboard: [adt1 RTM]

Target Milestone: --- → mozilla1.0.1

kill this account

Comment 26

•

22 years ago

While I know you have immediate and pressing problems due to a last minute
addition of a major amount of code I would hope you have got 'buy-in' from
the i18n developers and maintainers. 

The choice to use iconv is a major architectural decision and I would hope we 
make these kinds of decisions carefully and not as a side effect of a fire 
fighting drill.

Doug Turner (:dougt)

Comment 27

•

22 years ago

Brian, 
xpcom does not want a dependency on i18n.  We need to be able to go from unicode
to fs charset and back.  The system that xpcom runs on provides these
conversions.  Why should we use the i18n libraries?  Where is the benefit?

kill this account

Comment 28

•

22 years ago

Dougt: wouldn't you consider it odd if a someone decided to link in a new
3rd party html parser because they had issues with the mozilla html parser?

Shouldn't we make architectural decisions carefully?

Are you or darin planning to own all problems related to iconv such as
character conversions that differ from the i18n converters already in the
mozilla i18n code?

It seems a very high risk choice to include the nsiFile changes to very late
in the cycle. I do not see why this choice should make us give up on careful
architecture.

kill this account

Comment 29

•

22 years ago

Has anyone done any *actual* testing to find out if iconv will work on all
platforms?

This very problem seems to be here because no one did the minimal reasonable 
testing before checkin to see if all systems had a problem.

Lets be careful and not repeat the same mistake.

Masaki Katakai

Reporter

Comment 30

•

22 years ago

I have a experience that I was trying to use iconv() for code
conversion in my program (XIM input method server). The codes
needed to be shared in Linux and Solaris. I was seeing some
issues and finally gave up to use iconv() on linux platform().
I'm sorry I could not remember the problems correctly... but

- iconv() on linux depends on its version
- iconv() on linux always uses big-endian (??)
- but Solaris iconv() uses big-endian on SPARC
  little-endian on x86
- need to check BE and LE iconv capability

It would be OK if we could focus on the specific platform, and
if we could check the target platform and iconv() capability
properly. But it's too difficult, I think. I think we will need
to use many ifdef ..PLATFORM.. or ifdef ..ICONV_VERSION... when
we use iconv(). (but I'm sorry I don't know the issues about iconv()
exactly)

Also, it would be good if we could find other multi-platform
products/opensourced application that use iconv(). Does
anyone know such products? Does the product support all
platform of Mozilla? If exists, that is good our example.

Doug Turner (:dougt)

Comment 31

•

22 years ago

slow down, brian.  

>Dougt: wouldn't you consider it odd if a someone decided to link in a new
3rd party html parser because they had issues with the mozilla html parser?

I do not follow.  Please explain why switching the html parser out has anything
to do with the unicode->fs charset converter.

>Shouldn't we make architectural decisions carefully?

Are you seriously asking this question?  

>Are you or darin planning to own all problems related to iconv such as
character conversions that differ from the i18n converters already in the
mozilla i18n code?

Of course not.  

>It seems a very high risk choice to include the nsiFile changes to very late
in the cycle. 

What do *you* propose, brian?  Hand waving without an alternative just sucks. 
Give me an alternative for solaris, bsd, which doesnt pull in all of the
dependencies that i19n has!

>I do not see why this choice should make us give up on careful architecture.

You really can not be serious!  Show me one paper explaining this
"architecture"!  The "way it is now" doesn't mean anything was carefully designed?  

>Has anyone done any *actual* testing to find out if iconv will work on all
platforms?

Considering there isn't even a patch, i would think NO.

>This very problem seems to be here because no one did the minimal reasonable 
testing before checkin to see if all systems had a problem.

What is done is done.  We thought we did the minimal reasonable testing.  We
could have tested forever, but we were under the a large amount of pressure to
freeze API's and this was part of it.  Now with that said: Where the heck were
you?  Announcements were made, public builds were posted, you could have taked
the patch and helped us?  Ah.  It is far easier pointing out the obvious.

>Lets be careful and not repeat the same mistake.

Again, offer a suggestion which doesn't make you sound like a complete ass.  Do
you think that we want to make a similar mistake?

Doug Turner (:dougt)

Comment 32

•

22 years ago

what Masaki said.  lets special case the iconv on platforms which use the short
bus version of mbtowc.

Darin Fisher

Assignee

Comment 33

•

22 years ago

my take on this is as follows:

1- we had to remove the libxpcom dependency on mozilla's uconv libraries.

2- we chose to use mb[r]towc/wc[r]tomb thinking (incorrectly) that these
converted between the native multibyte character set and unicode (UCS-4 in some
cases).  despite testing on a variety of different unix platforms, we were
unfortunately wrong about this.  (yes, i wish we had solicited help from those
who would have spotted this problem immediately, but i guess time pressures got
the best of us.  now that it has been determined that 1.0 will ship with this
bug, we can fortunately take our time and fix this bug carefully.)

3- iconv seems to be the only option on platforms that don't use unicode for
wide chars (internally).

of course, we're going to have to do a lot of testing.  i'm hoping that since we
are only interested in conversion to/from UCS-2 that there won't be too much
variation from platform to platform.  endian-ness definitely worries me.  i'd
like to think that iconv would interpret "UCS-2" to mean "UCS-2" using the
native endian-ness.. but perhaps that is just wishful thinking :/  ..remains to
be tested i suppose.

kill this account

Comment 34

•

22 years ago

> Please explain why switching the html parser out has anything
> to do with the unicode->fs charset converter.

It is a comparison. Using iconv because the code for nsIFile was not
tested is a major architectural change just as using a 3rd party html
parser would be. I am very serious about this.

As to what we should do I strongly think we should not use iconv unless
someone who regularly works with the converters can state that this is
a safe and reliable thing to do.

I think perhaps due to the very late date we should do *nothing* right now
until we have figure out something we can live with. I recommend we just
release note this pending a plan.

Being in a (self imposed) fire drill does not mean we should give up on
careful architecture.

Alec Flett

Comment 35

•

22 years ago

well, for architectural, and embedding reasons, we cannot use the i18n libraries
in nsIFile, period. it's just not an option. Put simply there were
initialization issues in XPCOM (i.e. initialization of XPCOM itself) that
required nsIFile access before the i18n libraries had been loaded. Darin
explored the option pretty extensively and I hope we don't have to argue that
point again. 

Thus we have to rely on an os-specific mechanism for doing the conversion. It
sounds like the real issue is that the trunk-baked mbstowcs/etc provides a 90%
solution for 1.0, and iconv(), while it might provide a 100% solution, has not
been tested.

Let's just wait for an iconv patch and bake it on the trunk while we wait for
1.0 to ship.. if it looks good then we can land it on the 1.0.1 branch when it
is created.

Daniel Veditz [:dveditz]

Comment 36

•

22 years ago

Please keep the install folks in the loop on any changes in the file implementation.

kill this account

Comment 37

•

22 years ago

The choice to use iconv is an architectural decision.

Okay, we have problems using code that uses our own xpcom interfaces. That does
not imply the iconv is the answer.

Could anyone show me any discussions/emails/white-papers/studies on the subject
of using iconv in mozilla?

Who will own the iconv code and be responsible for its maintenance?

Right now based on what I see the decision to use iconv feels to me like very 
sloppy work. I will gladly retract that if anyone can show me the prepatory work 
that seems reasonable when making architectural decisions.

Doug Turner (:dougt)

Comment 38

•

22 years ago

> Okay, we have problems using code that uses our own xpcom interfaces. That does
not imply the iconv is the answer.

This is not what anyone is directy saying, brian.  The chain of arguement goes
something like this:

xpcom needs to reduces its dependencies for embedding.  xpcom requires
unicode->fs converters.  the converters that i18n built require necko which
brings in the world.  break this conversion by using system calls.

The above decission has been made.  it was discussed for week and week in person
and made several newsgroup postings:

news://news.mozilla.org:119/3CCF08E2.4020802@netscape.com
news://news.mozilla.org:119/3C190770.701@netscape.com


> Who will own the iconv code and be responsible for its maintenance?

I will if no one else can.

> Right now based on what I see the decision to use iconv feels to me like very 
sloppy work. I will gladly retract that if anyone can show me the prepatory work 
that seems reasonable when making architectural decisions.

sorry that it "feels" like "sloppy work".  Maybe you can offered a better
solution.  Right now, you are just pissing people off with your arrogant
assessment of what we already know.  Stop complaining and do something!

Roy Yokoyama

Comment 39

•

22 years ago

The fact is this bug exists and we have to do something about it.
Masaki's comment (#30) pretty much describes what we need to do 
for Solaris unless there is a better way.

I agree with Alec. Let's wait for an iconv patch and bake it on the trunk.

Frank Tang

Comment 40

•

22 years ago

I do not understand how can the patch for 129279 be checked in without review
from my team at all. I said many many time that you cannot use mbtowc or iconv
to replace our internal converter. The reason you cannot use mbtowc is the
wchar_t is not always unicode. The reason you cannot use iconv is the conversion
may not always be there. 

We should back out the change in 129279 to fix this problem. Using mbtowc or
iconv will not solve your problem and will only increase the complexity to let
you solve it later. 

Remove dependency on the uconv stuff is a "mission impossible". I said it many
time in the past. I truly hope people will listen. 

Please don't try to tell me how good is wbtomc or iconv. I know those stuff very
well for more than 10 years. I reviewed the source code of wbtomc in SVR3.2 in
1989 (yes, 13 years ago).  Bob and I were one of the reviewer for X/Open when
they introduce iconv into XPG/4.  (XPG/3 only have iconv in command line, XPG/4
have it in api) . Both of them WON'T do the job for you. 

If you use iconv, you will introduce new depedency into the mozilla source code. 
also, the iconv may or may not have the conversion pair for the locale. 

If you use wbtomc, the run time code is wrong on MANY locales and MANY
platforms. Because many locales and many platform do NOT use Unicode as the
internal representation for wchar_t. In fact, there are platform define wchar_t
as one byte, 2 bytes or 4 bytes.

Frank Tang

Comment 41

•

22 years ago

more problem about iconv
there are no guarantee that the value return by nl_langinfo will be the same
name used in iconv
also, there are no clear definitation about what is the name should be used for
iconv to indicate UCS2

Frank Tang

Comment 42

•

22 years ago

.Dear darin:
Please make decision based on what "you know" not what "you suspect".

>------- Additional Comment #25 From Darin Fisher 2002-05-29 00:18 -------

>ok, i did some investigation... and it looks like iconv is supported as far back
>as redhat 6.0, so we are fine using that for linux.  Mac OSX also provides
>iconv, so that pretty much implies that *BSD systems should be covered.  i
>suspect most unix platforms support iconv, so my patch will use iconv if
>available and fall back on the existing implementation otherwise.

darin, please understand there are difference between
1. the iconv apis is available, and
2. the iconv implementation include all the converter we need

From my understanding, on a lot of platform, while iconv api itself is
available, the conversion implementation (from one specifc encoding to another
one) that we need may not be there. 
Also, even the conversion implementation is there, we may not know how to
involve it. The name used to label the from charset and to charset may not be
the same across platform.

Since you mention MacOS X. Have you check how many converters are available on
the iconv supplied by MacOS X?


>so that pretty much implies that *BSD systems should be covered.

This is a FALSE assumption. Yur assumption of MacOS X have it so all BSD have it
is a FALSE assumption. Here is what I KNOW. (not what I suspect)

form my understanding, BSD does not implement iconv. iconv is introduced during
the X/Open movement around 1989-1992. Both Bob Jung and I were in the Unix
International / International Working Group and Open Software Fundation /
International Working Group that time. The iconv api is added to XPG/4 in 1992.
And in that time the main BSD development already stop. 

The first iconv implementation I saw was happen in SVR4 from USL in 1991. 

>i suspect most unix platforms support iconv
Again, this is what you "suspect" , not what you "know". And I KNOW there are a
lot of unix platform do not support iconv. or even they support iconv, it does
not support the conversion implementation we need. or they suppor the
implementation but we don't know how to invoke it.

Frank Tang

Comment 43

•

22 years ago

dougt:

>The above decission has been made.  it was discussed for week and week in person
>and made several newsgroup postings:
I don't remember any one talk to me in person. Who have you talk to in person
have the knowledge about the availability of wbtomc or iconv on different
platforms ?

hacker formerly known as seawood@netscape.com

Comment 44

•

22 years ago

> We should back out the change in 129279 to fix this problem. Using mbtowc or
> iconv will not solve your problem and will only increase the complexity to let
> you solve it later. 

FWIW, backing out just the changes to 129279 *will* break the build.  The build
structure has been changed since that checkin to reflect and enforce the fact
that xpcom no longer depends upon uconv.  Backing out the whole ball of wax is
possible but definitely not desirable as I'm sure developers like not having to
pull the entire tree (or 1/2 the tree using mcafee's script) just to build xpcom. 

> Remove dependency on the uconv stuff is a "mission impossible". I said it many
> time in the past. I truly hope people will listen. 

Pardon my disbelief, but why is that?  At the very least, why isn't possible to
 build uconv without bringing in necko and the rest of the world?  I may be
getting off target but if all of the changes are backed out, then we are going
to need some "uconv-lite" that we can direct people to build with the
"standalone" xpcom module that doesn't pull in the rest of the tree.  I'm still
mystified as to why we think it's a good idea to have our base library depend
upon a higher level component to work properly.  Perhaps this uconv-lite
shouldn't be part of the uconv module at all and should be a standalone lib on
the same level as xpcom (no, I'm not going to suggest stuffing more stuff into
xpcom).

kill this account

Comment 45

•

22 years ago

so dougt, what is you experience using iconv?

kill this account

Comment 46

•

22 years ago

dougt: would you kindly drop the personal attacks

Doug Turner (:dougt)

Comment 47

•

22 years ago

Frank, Roy, Brian, Alec, Darin, and I just had a meeting.  We agreed that the
direction that we would like to take is to use iconv where available falling
back on zero-byte padding.  Darin is going to checkin a patch on the trunk (once
reviewed and stuff) which converts the 'nix platforms over to use iconv.  We
will need the help of platform owners and users to test these changes on
different locales.  Frank Tang will be sending mail to the unix newsgroup
shortly do ask for this help.  

Chris, this does add a new library requirement to mozilla.  Is there anything
special we should do in configure?

hacker formerly known as seawood@netscape.com

Comment 48

•

22 years ago

I'm going to assume that we're _not_ checking-in another 3rd party library into
the tree and that this will be an external dependency.  We'll want to add a
configure stopping test for the library in question.  Configure's libical test
for MOZ_CALENDAR should be a sufficient template.  And, of course, the
tinderboxes & build release machines will need to be updated before the patch
can land.

Darin Fisher

Assignee

Comment 49

•

22 years ago

seawood: the systems we care about "should" already have support for iconv.  in
fact, under linux iconv is simply part of glibc (or so it seems now that i'm
happily invoking iconv under linux).  that said, my understanding is that on
some systems iconv is provided via libiconv.so, but i'm not sure where that applies.

Kyle Yuan

Comment 50

•

22 years ago

darin, if you made the patch, we (Sun china browser team) can help you to test 
before it can land.

hacker formerly known as seawood@netscape.com

Comment 51

•

22 years ago

So, to back up what Frank said, a quick cursory check shows that the sysv
systems have the iconv routines and the bsd-based systems do not.  I have no
idea how complete those implementations are.  Wrt Macs, OSX does not come with
iconv.h.  On my OSX 10.1.4 box, iconv.h & libiconv.so are installed under /sw
which means that they came from fink.  The additional runtime dependency needs
to be run by drivers as well.  (Not that anyone cares but BeOS appears to also
be missing an iconv implementation though a port is available.  Not sure about
our other fringe platforms: OpenVMS, QNX, etc.)

Darin Fisher

Assignee

Comment 52

•

22 years ago

seawood: i discussed the potential for a iconv dependency with drivers today, so
they're aware of the situation.

Darin Fisher

Assignee

Comment 53

•

22 years ago

Attached patch preliminary iconv patch -- just for testing purposes (obsolete) — Details — Splinter Review

this patch isn't final by any means... it just switches nsLocalFileUnix over to
using iconv.  please let me know how this patch works out... thx!!

Kyle Yuan

Comment 54

•

22 years ago

The prototype of iconv in Solaris is:
iconv(iconv_t, const char **, size_t *, char **, size_t *);

So we should change
+        res = iconv(gNativeToUnicode, (char **) input, &inLeft,
+                                      (char **) output, &outLeft);
and
+        res = iconv(gUnicodeToNative, (char **) input, &inLeft,
+                                      (char **) output, &outLeft);
to (const char **) input

Tried xpcom's clobber build, it still can not convert chinese char correctly.
Will try full clobber build later.

Antonio.Xu

Comment 55

•

22 years ago

yes,i still could not build successfully,so we must change some code like that
#if defined(SOLARIS) 
res = iconv(gNativeToUnicode, (const char **) input, &inLeft,
                              (char **) output, &outLeft);
#endif

Kyle Yuan

Comment 56

•

22 years ago

full clobber build completed, but it still failed to convert chinese char.

Jim Dunn

Comment 57

•

22 years ago

I am adding myself.  This sounds similar to the failure
we are seeing on AIX...  So am going to keep an eye on this.
FYI  AIX rc3 is dead in the water.

Philip K. Warren

Comment 58

•

22 years ago

We are seeing similar regressions on AIX from RC2->RC3. Specific symptoms include:

1) Starting Mozilla with a clean profile works on first run, however on
subsequent runs the Profile Manager comes up, saying the migrated profile can
not be run.
2) No files are visible in the File->Open File dialog. (Displays the message
"You do not have the permissions necessary to view this directory").
3) Creating a new profile in the Profile Manager is impossible - the folder path
is made up of garbage characters.

I will test this iconv patch on AIX and let you know if it addresses these problems.

Jim Dunn

Comment 59

•

22 years ago

Attached patch additional patch, need to link against iconv to get iconv_open... (obsolete) — Details — Splinter Review

I don't think this is the best fix for this, but for the branch
I think it is fine.  For the trunk we probably want to test for
iconv_open and if we find it have a USE_ICONV define and if
USE_ICONV is set then we should link agsinst EXTRA_ICONV_LIBS
in xpcom/build/Makefile.in.  EXTRA_ICONV_LIBS should also be
set in configure

NOTE: AIX also needs the previously mentioned (char **) -> (const char **)
patch

timeless

Updated

•

22 years ago

Keywords: mozilla1.0 → mozilla1.0.1

Summary: regression: can not browse local file on file picker → Cannot load local files whose names contain Japanese/Chinese characters

timeless

Comment 60

•

22 years ago

fwiw I asked a QNX user and it doesn't appear to have iconv either :)

Darin Fisher

Assignee

Comment 61

•

22 years ago

so, after making the (const char**) fix and adding -liconv (if necessary) what
are the results?  comment #56 says that there are problems with chinese
characters... what about others?  thx!

Philip K. Warren

Comment 62

•

22 years ago

These two patches seem to fix the problems on AIX. I am able to start Mozilla
multiple times, create profiles, and use the filepicker dialog.

Darin Fisher

Assignee

Comment 63

•

22 years ago

pkw: that's great news... but which locales did you test?  thx!

Philip K. Warren

Comment 64

•

22 years ago

Just en_US. We were seeing general problems that were not specific to
Japanese/Chinese locales.

Colin Blake

Comment 65

•

22 years ago

For the record, OpenVMS does appear to have iconv_open and iconv_close (though I
have never tried to use them). They are part of the standard C library and so I
don't need a -liconv or anything like that at link time.

hacker formerly known as seawood@netscape.com

Comment 66

•

22 years ago

Attached patch add configure test for iconv (obsolete) — Details — Splinter Review

This patch adds a configure test for iconv. Since I didn't want to duplicate
the iconv testing code, I look for iconv in libiconv first, then run the link
test for iconv, iconv_open & iconv_close.  If the test passes, _ICONV_LIBS
(either -liconv or empty) is appended to XPCOM_LIBS. Because some platforms
require symbols to be resolved at link time,-liconv must be used every place
-lxpcom is used. If we decide to use iconv directly outside of xpcom, we may
want to use a separate variable.

I also fixed a tiny bug with the USE_STDCONV case where mbrtowc & wcrtomb are
not available.

Because the test to use iconv or stdconv is in the source file, we may wind up
with an unneeded dependency by linking in libiconv but using the stdconv
routines if nl_langinfo isn't set.  I saw this on my OSX build.

Attachment #85557 - Attachment is obsolete: true

Attachment #85604 - Attachment is obsolete: true

John Sun

Comment 67

•

22 years ago

 Darin,
As for your patch can't work correctly on Solaris, I think it's for there is no
support for directly converting between locale codeset and UCS-2 codeset, so the 
iconv can't work well when trying to directly convert codeset between locale and
UCS-2. Maybe should make two step convert. First from locale to UTF8 and then
UTF8 to UCS-2.

from Ervin:
On Solaris iconv provide conversion between UTF-8 and the value return by
nl_langinfo CHARSET.   and provide the conversion between UTF-8 and UCS-2.   
seems no direct conversion between the value return by nl_langinfo CHARSET and
the UCS-2(little-endian/big-endian).

John Sun

Comment 68

•

22 years ago

Antonio  has verified my guess. He has write a segment of code as sample.
It work well in Solaris. Antonio will write mail to Darin for it.

Thank Antonio !

Antonio.Xu

Comment 69

•

22 years ago

Thanks for john's advice,i had changed some code of
nsNativeCharsetUtils.cpp,then make mozilla's filepicker show chinese word
correctly,i had changed some code for solaris like that
 char ** tmp;
        char *tmpoutput=new char[inLeft];
        *tmpoutput ='\0';
        char *tmpinput;
        tmpinput = tmpoutput; 
        size_t tmpoutLeft = inLeft;
        
       
        res = iconv(gUnicodeToUtf8, (const char **) input, &inLeft,
                                      (char **) &tmpoutput, &tmpoutLeft);
        tmpoutLeft = (PRUint32)(*inputLeft*2 - (PRUint32)tmpoutLeft); 
        res = iconv(gUtf8ToNative, (const char **) &tmpinput, &tmpoutLeft,
                                      (char **) output, &outLeft);     
it is means convert local chrset to UTF-8 then convert UTF-8 to UCS-2,because if
soalris can not convert from GBK to UCs-2.
i hope it can help you,if you have problem ,please let me know.
belong browser-china@sun.com

Bradley Baetz (:bbaetz)

Comment 70

•

22 years ago

Hmm. Maybe we should try directly, then try via UTF8. Are there any other
'common' intermediate formats which are guaranteed not to lose bits we want?

Bradley Baetz (:bbaetz)

Comment 71

•

22 years ago

iconv --list on a glibc system lists the valid locales (Although again, it
doesn't promise that you can use them all)

It does, however, contain both 'UCS2' and 'UCS-2'. Antonio, does using UCS2
instead of UCS-2 in darin's patch work?

Frank Liu

Comment 72

•

22 years ago

As timeless mentioned in Comment #60, QNX doesn't have iconv.

I just tried with opening a local file with filename in Chinese
using the File->Open File ...
1) RC2, doesn't work. It says "File xxxx doesn't exist".
2) RC3, works! It opens and display the file fine.

So, whatever changed in RC3, though broke Solaris, fixed it for QNX :)

Frank

Doug Turner (:dougt)

Comment 73

•

22 years ago

only if we can convince all the solaris users to switch to qnx....

For QNX, maybe it is best to continue using mbtowc.  is the wchar unicode on QNX?

Darin Fisher

Assignee

Comment 74

•

22 years ago

poking around on worms, i discovered something interesting about solaris.  the
available conversion libraries can be found in /usr/lib/iconv/*.so ...
apparently it dynamically loads the appropriate library named like this:
"FROM%TO.so" ... (e.g.: eucJP%UTF-8.so)... there is no eucJP%UCS*.so, at least
not on worms.

Darin Fisher

Assignee

Comment 75

•

22 years ago

Attached file all available conversions from UTF-8 under Solaris (worms) — Details

Darin Fisher

Assignee

Comment 76

•

22 years ago

solaris supports a much smaller set of conversions from UCS-2 directly.  i'm
thinking that we should always try to convert directly from/to UCS-2, but
fallback on UTF-8 if required.

Dawn Endico

Comment 77

•

22 years ago

on mothra:

#uname -a
SunOS mothra.mozilla.org 5.7 Generic_106541-12 sun4u sparc

#ls UTF*JP*
UTF-8%ISO-2022-JP.RFC1468.so  UTF-8%eucJP.so
UTF-8%ISO-2022-JP.so          UTF-8-Java%eucJP.so
#ls *JP*UTF*
ISO-2022-JP%UTF-8.so  eucJP%UTF-8-Java.so   eucJP%UTF-8.so

Dawn Endico

Comment 78

•

22 years ago

Attached file all available conversions from UTF-8 under Solaris 5.7 (mothra) — Details

this is the contents of /usr/lib/iconv on mothra, which runs solaris 5.7

Frank Liu

Comment 79

•

22 years ago

Ref my comment #72
Sorry my test was messed up. I was testing a file on a foreign
filesystem (vfat).
I just test it again with the QNX's native filesystem, and we
have the same problem as others: rc2 works fine (with chinese
filename), but rc3 gives "file doesn't exist".
My guess is QNX's vfat filesystem support probably has a bug which masked
the mozilla problem.
Frank

Darin Fisher

Assignee

Comment 80

•

22 years ago

Frank: does QNX provide the iconv API?

Sergei Dolgov

Comment 81

•

22 years ago

RC3 for BeOS PC loads local files which contain russian utf-8 charcaters properly,
though it shows such name in URL bar in "escaped" form:
BeBookindex%D0%A0%D1%83%D1%81%D1%81%D0%BA%D0%B8%D0%B9.html
I'm not familiar with this bug 147333 yet, so don't know, is it proper behaviour
or not.
Only thing i have to say here, is that in BeOS UTF-8 is native. So, theoretically,  
all correct file names are(should be) UTF-8.

Frank Liu

Comment 82

•

22 years ago

darin, 
as timeless mentioned in comment #60, qnx doesn't provide iconv API.
However, we can port GNU libiconv from http://www.gnu.org/software/libiconv/, if
that is needed.
Frank

Darin Fisher

Assignee

Comment 83

•

22 years ago

Frank: all we need is a way to convert UCS-2 to |char*| (and vice versa)
suitable for calling fopen and other stdc functions.  does QNX provide any
function(s) to help with this?  if not, then yes.. you probably want to look
into porting libiconv.  thx!

Darin Fisher

Assignee

Comment 84

•

22 years ago

Attached patch v1 patch (obsolete) — Details — Splinter Review

ok, this patch includes UTF-8 fallback support as well as seawood's autoconf
tests (plus an additional autoconf test for iconv with const input).

please test this out with as many different locales and on as many different
platforms as you can.  thx!!

Attachment #85725 - Attachment is obsolete: true

hacker formerly known as seawood@netscape.com

Comment 85

•

22 years ago

Attached patch alt. iconv() check using iconv.m4 — Details — Splinter Review

Here's the alternative iconv check, AM_ICONV, that bbaetz pointed out. 
Unfortunately, it pulls in a couple of other .m4 files as well as a rpath
configure test.

Darin Fisher

Assignee

Comment 86

•

22 years ago

cls: let me know which configure.in code you think we should use.

Mary Hoetzel

Comment 87

•

22 years ago

Patch 85922 does not work for AIX. The ISO charset names in AIX are ISO8859-n (not  
ISO-8859-n). This is true for iconv and nl_langinfo. Therefore, the test in
configure.in doesn't work and HAVE_ICONV and HAVE_ICONV_WITH_CONST_INPUT are not
defined. I changed nsNativeCharsetUtils.cpp accordingly but it still doesn't
work. It might be because I only did a partial rebuild. So I will do a new build
from scratch.

Kyle Yuan

Comment 88

•

22 years ago

patch v1 doesn't work for Solaris 5.9 chinese env. (chinese char was still 
displayed as ????)

Darin Fisher

Assignee

Comment 89

•

22 years ago

kyle: what does your configure output say about iconv?

Kyle Yuan

Comment 90

•

22 years ago

I didn't find any contents about iconv in configure output. How to get them?

Darin Fisher

Assignee

Comment 91

•

22 years ago

so, my patch changes configure.in ... which requires that you run autoconf to
rebuild the configure shell script.  then when you run "gmake -f client.mk
build" you should see the usual "configure output" ... there should be a line
that says something like "Checking for iconv ..."

please verify that this is the case for you... thx!!

Mary Hoetzel

Comment 92

•

22 years ago

Confirming my previous comment:
patch 85922 (or v1 patch) works for AIX if I change ISO-8859-1 to ISO8859-1 in
configure.in (3x). 
In addition, I need patch85604 which defines -liconv in
mozilla/xpcom/build/Makefile.in, or XPCOM_LIBS, as defined in the patched
configure.in (to include -liconv), must be used in mozilla/xpcom/build/Makefile.in.

I have tested this fix running mozilla in the following locales: ja_JP (eucJP),
Ja_JP (shift-jis), JA_JP (UTF-8), zh_CN (eucCN). For each of these locales,
filenames with locale specific characters were displayed correctly in the Open
File list.

Colin Blake

Comment 93

•

22 years ago

OpenVMS also needs ISO8859-1 instead of ISO-8859-1.

And the test program in configure.in is brain dead. I know it doesn't matter
since it never actually runs, but I was trying to use it as a test to see if
iconv actually worked on OpenVMS. There are three problems with the test:
  - inLeft should be the size of the input buffer, not the address
  - outLeft should be the size of the output buffer, not the address
  - the 4th arg to iconv should be a **char (just casting a *char doesn't cut it)

I'm still waiting to hear back from my contact in Japan as to whether or not we
need this fix in OpenVMS.

Darin Fisher

Assignee

Comment 94

•

22 years ago

yeah, forget the configure.in tests in the v1 patch.  the v1 patch has other
problems too.  anyhow, i'm working on a much improved version of the patch now,
and should have something uploaded shortly.  i'd like to stay away from the huge
configure.in changes corresponding to the alt.iconv() check submitted by cls,
but i have yet to get in touch with cls to discuss this further.

Frank Tang

Updated

•

22 years ago

Blocks: 141008

Darin Fisher

Assignee

Comment 95

•

22 years ago

Attached patch v2 patch (obsolete) — Details — Splinter Review

this patch includes a hopefully much improved autoconf test and some extra
fallback logic in nsNativeCharsetUtils.cpp to handle the different ways in
which encodings are written (e.g., UCS-2 vs UCS2).

Attachment #85922 - Attachment is obsolete: true

Darin Fisher

Assignee

Comment 96

•

22 years ago

i think this latest patch is close to be the final patch.  what i plan to do is
land this on the trunk and use feedback from tinderbox to further customize
configure.in to ensure that iconv is used everywhere it can be used.  then it
should be pretty straightforward for folks to pull and build the trunk bits to
verify that the patch has done the trick.  after that we can think about landing
this on the branch.

Doug Turner (:dougt)

Comment 97

•

22 years ago

Comment on attachment 86158 [details] [diff] [review]
v2 patch

seawood needs to review the configure changes.	Especially:

-MOZ_COMPONENT_XPCOM_LIBS='-L$(DIST)/bin -lxpcom'
+MOZ_COMPONENT_XPCOM_LIBS='$(XPCOM_LIBS)'

Darin Fisher

Assignee

Comment 98

•

22 years ago

dougt: seawood actually authored the first version of these configure.in
changes... the lines you site in particular come from him.  still.. i have
changed his original version quite a bit, so i'll be sure to ask him to review
the version in this patch.

Bradley Baetz (:bbaetz)

Comment 99

•

22 years ago

*** Bug 148953 has been marked as a duplicate of this bug. ***

Adam Hauner

Comment 100

•

22 years ago

*** Bug 149018 has been marked as a duplicate of this bug. ***

Richard Hess

Comment 101

•

22 years ago

we've finally reproduced this problem on IRIX.  
It's broken in 1.0rc3 on IRIX.

:)

Philip K. Warren

Comment 102

•

22 years ago

I have tested the latest v2 patch on AIX in en_US and ja_JP and it works
properly. The only remaining problem we are seeing is that we are not linking
with -liconv in mozilla/xpcom/base so we are still having to use attachment
85604 [details] [diff] [review]. Perhaps we should be adding -liconv to another define besides XPCOM_LIBS?

Darin Fisher

Assignee

Comment 103

•

22 years ago

philip: thx again for the feedback... i'll look into getting the -liconv problem
solved.

Asa Dotzler [:asa]

Updated

•

22 years ago

Blocks: 1.1a

kill this account

Comment 104

•

22 years ago

Frank and I looked at attachment 86158 [details] [diff] [review].

Frank asked if the code that called iconv was tread safe or used in a thread
safe manner.

The design here is that if iconv is not available then we fallback
to mbtowc/mbrtowc. To help tell if this at least marginally functional
Frank asked if the configure.in test could check that at least the
mbtowc/wctomb be able to convert ascii values; eg: 'a' (0x41) is converted
to u0041 and not say uFF41.

+#if defined(HAVE_ICONV_WITH_CONST_INPUT)
+    res = iconv(converter, input, inputLeft, output, outputLeft);
+#else
+    res = iconv(converter, (char **) input, inputLeft, output, outputLeft);
+#endif

nit: would it be attractive for the cast to become a macro?

   res = iconv(converter, ICONV_INPUT_CAST input, inputLeft, output, outputLeft);

+            res = xp_iconv(gUTF8ToUnicode, (const char **) &p, &n, (char **)
output, &outLeft);

Frank ask if you could use NS_ConvertUTF8toUCS2() / NS_ConvertUCS2toUTF8() 
instead of iconv here?

+ * XXX XXX XXX not for public consumption XXX XXX XXX

Could this be something like:

+                              **** NOTICE ****
+  
+             *** THESE ARE NOT GENERAL PURPOSE CONVERTERS ***
+ 
+  NS_CopyNativeToUnicode /  NS_CopyUnicodeToNative should only be used
+  by XP_COM for converting *FILENAMES* between native and Unicode. They
+  are not designed or tested for general encoding converter use.

Could you also put a notice on the lines where these calls are used to
direct readers to this message? Perhaps

// NS_CopyNativeToUnicode is for XP_COM filename use only; see
nsNativeCharsetUtils.h

kill this account

Comment 105

•

22 years ago

+nsresult
+nsNativeCharsetConverter::UnicodeToNative(const PRUnichar **input,
...
+    if (gUnicodeToNative != INVALID_ICONV_T) {
...
+#if defined(ENABLE_UTF8_FALLBACK_SUPPORT)
+    else if ((gUnicodeToUTF8 != INVALID_ICONV_T) &&
+             (gUTF8ToNative != INVALID_ICONV_T)) {
...
+#endif
+
+    // fallback: truncate and hope for the best

Is there someway to add an assert if we are doing the fallback?

Darin Fisher

Assignee

Comment 106

•

22 years ago

> Frank asked if the code that called iconv was tread safe or used in a thread
> safe manner.

yes, access to the iconv converters is protected by a global lock.  see the
constructor/destructor for nsNativeCharsetConverter.


> Frank asked if the configure.in test could check that at least the
> mbtowc/wctomb be able to convert ascii values; eg: 'a' (0x41) is converted
> to u0041 and not say uFF41.

sure, that sounds like a good plan.


> nit: would it be attractive for the cast to become a macro?

yeah, i like that.


> Frank ask if you could use NS_ConvertUTF8toUCS2() / NS_ConvertUCS2toUTF8() 
> instead of iconv here?

i certainly could, but it would be less performant... or at least, there would
be one extra buffer copy.  on the other hand, there are some definite benefits
to using our UTF8 - UCS2 converters... the resulting code would be simpler, and
we would be using more of our code.


> Could this be something like:
> ...
> Could you also put a notice on the lines where these calls are used to
> direct readers to this message? Perhaps

sure

> Is there someway to add an assert if we are doing the fallback?

i'll output a debug warning... perhaps in GlobalInit.

Colin Blake

Comment 107

•

22 years ago

The patch appears to work on OpenVMS, but I can't really test it. I asked an HP
engineer in Japan to try andreproduce the problem, but he tells me that Japanese
file names are not really well supported, even in Japanese VMS, and that because
of this, any file names containing Japanese characters don't appear in file
lists. This is true of all/most applications running on Japanese VMS, not just
Mozilla. He says that this problem most certainly does NOT need fixing in 1.0
final, and so for this reason, I am NOT including the patch in the OpenVMS 1.0
final kit.

Darin Fisher

Assignee

Comment 108

•

22 years ago

Attached patch v3 patch (obsolete) — Details — Splinter Review

this patch incorporates suggestions from ftang and bstell, plus i also took a
stab at fixing the -liconv problem.  hoping this will be the final revision.

Attachment #86158 - Attachment is obsolete: true

Doug Turner (:dougt)

Comment 109

•

22 years ago

Comment on attachment 86528 [details] [diff] [review]
v3 patch

A minor tweak needed before landing:

 I am one of those early users of the gmake builds on windows.	The
xpcom/io/Makefile.in adds the nsNativeCharsetUtils.cpp	file to the general set
of cpp files.  Instead, nsNativeCharsetUtils.cpp  should only be added here:


ifeq ($(MOZ_WIDGET_TOOLKIT),windows)
CPPSRCS 	+= nsLocalFileWin.cpp
else
CPPSRCS 	+= nsLocalFileUnix.cpp nsNativeCharsetUtils.cpp 
endif # windows
#endif
endif # OS2

I have only tested this on the window gmake build.  Make sense?

Darin Fisher

Assignee

Comment 110

•

22 years ago

dougt: good point... actually, i'll probably just add empty implementations of
nsNativeCharsetUtils.cpp instead because eventually i would like these functions
to be available XP.

Jim Crumley

Comment 111

•

22 years ago

In trying to test the v3 patch on Sparc Solaris 7, I think I may have found a
problem with the configure tests. I get the following error during configure:
configure:8011: c++ -o conftest  -pthreads   -I/usr/openwin/include  conftest.C
-liconv  -lsocket -ldl -lm  1>&5
ld: fatal: library -liconv: not found
ld: fatal: File processing errors. No output written to conftest
collect2: ld returned 1 exit status

The message makes no sense, since I have libiconv in /usr/lib, which is in  
LD_LIBRARY_PATH. I have gotten the test program to build outside of configure.
Does configure change the library path?

Anyway, I forced the iconv tests to "yes" in configure and I am trying to build now.

kill this account

Comment 112

•

22 years ago

I noticed a couple of items on reading this:

+        res = xp_iconv(gNativeToUnicode, input, &inLeft, (char **) output, ...
+        if (res != (size_t) -1) {
         ...
+            return NS_OK;
+        }
         ...
+        // reset converter
+        xp_iconv(gNativeToUnicode, NULL, NULL, NULL, NULL);

Can we always reset the converter after using it (even if t was successful). 
That way it could not carry state from conversion to conversion.

+    res = iconv(converter, ICONV_INPUT(input), inputLeft, output, outputLeft);
+    if (res == (size_t) -1) {
+        // on some platforms (e.g., linux) iconv will fail with
+        // E2BIG if it cannot convert _all_ of its input.  we're
+        // happy if it just converts some of its input.
+        if ((errno == E2BIG) && (*outputLeft < outputAvail))
+            res = 0;
+    }

Could you expound on the overall logic of returning a success when the conversion
failed? I do not understand the context where this would be attractive.

kill this account

Updated

•

22 years ago

Attachment #86528 - Flags: review+

kill this account

Comment 113

•

22 years ago

Comment on attachment 86528 [details] [diff] [review]
v3 patch

Darin: would you add to this bug the explaination you gave me over the phone
and pls always reset the converters.

subject to those: r=bstell@ix.netcom.com

Brendan Eich [:brendan]

Comment 114

•

22 years ago

Comment on attachment 86528 [details] [diff] [review]
v3 patch

I think I talked darin into taking a bit of time to

(1) invert loop/inner-if order to avoid mispredicted branches in conversion
loops;

(2) do A-B tests on Linux to see which solution performs best.

I'm concerned about the global converter shared with PRLock here -- there's no
space reason to avoid a converter per thread, and we could use NSPR's TLS to
avoid the locking overhead (although TLS isn't free to access, once allocated).
 Also, we've been using wctombs et al. on Linux for several milestones; however
good iconv may be, changing now (for 1.0.1, not for 1.1alpha) scares me.  I
favor a conservative approach, even though the existing wctombs-calling code is
being disturbed.

dbaron mentioned using a sink for best string-fu, rather than the nested loop
pattern.

I think drivers should let darin take a day here and do some measurements, and
produce a patch that we can all be happy with for trunk and branch.

/be

Kyle Yuan

Comment 115

•

22 years ago

got errors when compiled on Solaris 9/Forte 6U2

ld: warning: file 1QpGKnD8s9L5Yympgh7i.o: attempted multiple inclusion of file
ld: warning: file t6U4GxapKqSiiBmX2i3U.o: attempted multiple inclusion of file
ld: warning: file E2n7-uidwqr1S0TOALG4.o: attempted multiple inclusion of file
ld: warning: file 1QpGKnD8s9L5Yympgh7i.o: attempted multiple inclusion of file
ld: warning: file t6U4GxapKqSiiBmX2i3U.o: attempted multiple inclusion of file
ld: warning: file E2n7-uidwqr1S0TOALG4.o: attempted multiple inclusion of file
ld: warning: file 9DdvNvaiuZQbBAq2FhO6.o: attempted multiple inclusion of file
ld: warning: file 7QUf-O3hji--ubaU86Sa.o: attempted multiple inclusion of file
ld: warning: file 4Ulp-2UqQkvl-mPfhR9-.o: attempted multiple inclusion of file
ld: warning: file 7QUf-O3hji--ubaU86Sa.o: attempted multiple inclusion of file
ld: warning: file 4Ulp-2UqQkvl-mPfhR9-.o: attempted multiple inclusion of file
ld: warning: file 9DdvNvaiuZQbBAq2FhO6.o: attempted multiple inclusion of file
ld: warning: file 1QpGKnD8s9L5Yympgh7i.o: attempted multiple inclusion of file
ld: warning: file t6U4GxapKqSiiBmX2i3U.o: attempted multiple inclusion of file
ld: warning: file E2n7-uidwqr1S0TOALG4.o: attempted multiple inclusion of file
ld: fatal: file @LIBICONV@: open failed: No such file or directory
ld: fatal: File processing errors. No output written to libxpcom.so
gmake[3]: *** [libxpcom.so] Error 1
gmake[3]: Leaving directory `/export/home/work/mozilla_4/mozilla/xpcom/build'
gmake[2]: *** [libs] Error 2
gmake[2]: Leaving directory `/export/home/work/mozilla_4/mozilla/xpcom'
gmake[1]: *** [tier_2] Error 2
gmake[1]: Leaving directory `/export/home/work/mozilla_4/mozilla'
gmake: *** [default] Error 2

Kyle Yuan

Comment 116

•

22 years ago

I used current trunk source and applied v3 patch.

Daniel (Leaf) Nunes

Comment 117

•

22 years ago

Kyle, once you apply the patch, you must regenerate configure using autoconf;
the process is relatively simple if you have autoconf locally:
from mozilla/
autoconf -l build/autoconf

(this will make configure from configure.in).

Autoconf 2.12 (you can't use 2.5x, sorry) can be found at
http://www.gnu.org/software/autoconf/ 
but it also requires GNU M4, 
http://www.seindal.dk/rene/gnu/

Normally this is automatically updated (on the trunk at least) from configure.in.

Daniel (Leaf) Nunes

Comment 118

•

22 years ago

*or* darin could attach his generated configure script :)

Jay Yan

Comment 119

•

22 years ago

yes, it is beeter that the new configure script is attatched.

Daniel (Leaf) Nunes

Comment 120

•

22 years ago

Attached patch configure file to match the configure.in patch in v3 — Details — Splinter Review

Kyle Yuan

Comment 121

•

22 years ago

thanks leaf. now I can compile xpcom successfully. but the whole building 
progress is not finished yet. I'll post the testing result ASAP.

Kyle Yuan

Comment 122

•

22 years ago

tested v3 patch, it works for me (Solaris 9, CDE, Simplified Chinese)

Frank Liu

Comment 123

•

22 years ago

ported gnu libiconv and tested v3 patch on the qnx6.2, it doesn't work.
same error as without the patch. "ldd mozilla-bin" does show  /usr/lib/libiconv.so.2

and here is part of the ./configure output:
...
checking for gnu_get_libc_version()... no
checking for iconv in -liconv... no
checking for libiconv in -liconv... yes
checking for iconv()... yes
checking for iconv() with const input... yes
checking whether va_list assignments need array notation... no
...

Frank

Darin Fisher

Assignee

Comment 124

•

22 years ago

so, i ran the startup performance tests, comparing the old wcrtomb method to
iconv, and it looks like iconv is slightly slower... about 0.3% to be exact. 
this is probably due to the overhead of the PR_Lock as brendan suggested.  so,
that pretty much tells us that we should prefer wcrtomb/mbrtowc on linux since
we know that wchar_t is unicode under linux.

Frank: as for the problems with QNX, did you observe any debug warning messages
on the console having to do with conversion failures?  what exactly didn't work?
 thx!

Frank Liu

Comment 125

•

22 years ago

"didn't work" means when I "File | Open File" to open a local file with filename
in Chinese, it pops up a window with an error "File xxxx doesn't exist".
There is no error on the console. Actually, I don';t see any messages on the
console since the RC releases. Is there an option that I need to use to turn off
those debug messages? -Frank

Darin Fisher

Assignee

Comment 126

•

22 years ago

Frank: sounds like you are building an optimized build.  if you build debug,
you'll see some status/warning messages if something is going wrong.  also, were
the chinese file names showing up correctly in the file picker??  is the only
problem that you cannot open a file with a chinese name?

Darin Fisher

Assignee

Comment 127

•

22 years ago

Attached patch v4 patch — Details — Splinter Review

revised per last comments from bstell and brendan.

Attachment #86528 - Attachment is obsolete: true

Darin Fisher

Assignee

Comment 128

•

22 years ago

Comment on attachment 86906 [details] [diff] [review]
v4 patch

forwarding r=bstell, sr=brendan based on previous comments.

Attachment #86906 - Flags: superreview+

Attachment #86906 - Flags: review+

Frank Liu

Comment 129

•

22 years ago

Yes, I am building optimized. I will make a debug build with 
your new v4 patch.
Yes, the filenames show up correctly in the file picker, but
I cannot open it. It pops up an error windows saying "File xxxx
doesn't exist".
BTW, even without your patch, filename shows up correctly but
I can't open it. It seems your patch doesn't make any difference.
Again, in RC2, filename shows up correctly and I can open it fine.

Darin Fisher

Assignee

Comment 130

•

22 years ago

Frank: it sounds like the conversion from unicode to the native charset must be
failing on QNX where as the conversion from native charset to unicode seems to
be OK.  hopefully your debug build will indicate the problem.

Frank Liu

Comment 131

•

22 years ago

Attached file console log (start mozilla, try to open a local file with chinese filename, exit mozilla) — Details

those messages may mean something?
WARNING: mbtowc failed: possible charset mismatch, file
nsNativeCharsetUtils.cpp, line 579
full console.log attached.

Darin Fisher

Assignee

Comment 132

•

22 years ago

yup, that warning indicates that iconv is not being used... can you upload your
config-defs.h to this bug report... i suspect HAVE_NL_TYPES_H or
HAVE_NL_LANGINFO are not being defined (in which case we are unable to infer the
native charset of your system).

Frank Liu

Comment 133

•

22 years ago

Attached file config-defs.h — Details

I don't see HAVE_NL_TYPES_H or HAVE_NL_LANGINFO in the config-defs.h
See attachment.

Darin Fisher

Assignee

Comment 134

•

22 years ago

Frank: hmm.. we need nl_langinfo to determine the native charset of the system.
 without that we can't call iconv_open.  does QNX provide some other way to
determine the native filesystem charset?

Asa Dotzler [:asa]

Comment 135

•

22 years ago

Comment on attachment 86906 [details] [diff] [review]
v4 patch

a=asa (on behalf of drivers) for checkin to the 1.1 trunk.

Attachment #86906 - Flags: approval+

Frank Liu

Comment 136

•

22 years ago

I asked around but there isn't such function. People usually just hard code
UTF8 or ISO-88??-1.
I understand this is probably not a good solution. but between something that
"may work" and "not work at all", what will you choose?

Darin Fisher

Assignee

Comment 137

•

22 years ago

if mbtowc or wctomb fail, then we assume the multibyte encoding is ISO-8859-1. 
that should be the case corresponding to the console log uploaded by Frank.  so,
assuming the native charset is ISO-8859-1 and then using iconv wouldn't get us
anywhere.  perhaps UTF-8 is correct for QNX but i don't know.  someone would
have to hack the patch and test it out.

Darin Fisher

Assignee

Comment 138

•

22 years ago

fixed-on-trunk

marking FIXED to get QA's attention.

Status: ASSIGNED → RESOLVED

Closed: 22 years ago

Resolution: --- → FIXED

Frank Liu

Comment 139

•

22 years ago

Is there a quick way (say, through some environment variable) to change
mozilla's default assumption from "ISO-8859-1" to "UTF-8" to test it out on QNX?
I tried "View | Character Coding | .. " to change it, but it doesn't seem
to affect the File Picker ...

Darin Fisher

Assignee

Comment 140

•

22 years ago

no there isn't any thing like that... you'd have to hack the source yourself.  i
can give you pointers if you want to go that route.

Yuying Long

Comment 141

•

22 years ago

Katakai-san: Could you please verified it on Sun solaris machine? cause I only
have RH7.2 here which doesn't has this problem.

kill this account

Comment 142

•

22 years ago

http://bugzilla.mozilla.org/show_bug.cgi?id=147333#c113
> Darin: ... and pls always reset the converters.

http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsNativeCharsetUtils.cpp#345
345 nsNativeCharsetConverter::NativeToUnicode(const char **input,
346                                           PRUint32    *inputLeft,
347                                           PRUnichar  **output,
348                                           PRUint32    *outputLeft)
349 {
350     size_t res = 0;
351     size_t inLeft = (size_t) *inputLeft;
352     size_t outLeft = (size_t) *outputLeft * 2;
353 
354     if (gNativeToUnicode != INVALID_ICONV_T) {
355 
356         res = xp_iconv(gNativeToUnicode, input, &inLeft, (char **) output, 
&outLeft);
357 
358         if (res != (size_t) -1) {
359             *inputLeft = inLeft;
360             *outputLeft = outLeft / 2;
361             return NS_OK;
362         }
363 
364         NS_WARNING("conversion from native to ucs-2 failed");
365 
366         // reset converter
367         xp_iconv(gNativeToUnicode, NULL, NULL, NULL, NULL);

Would it be possible to always reset the converters?

Shanjian Li

Comment 143

•

22 years ago

answer to #139
Edit->Preference->Navigator->Language

In bottom right of the dialog box, you can change the default charset.

Darin Fisher

Assignee

Comment 144

•

22 years ago

bstell: if you look at the destructor for nsNativeCharsetConverter you will see
that we always reset the converter.  in fact, i probably needn't bother reseting
it in the place you quoted anymore.

Masaki Katakai

Reporter

Comment 145

•

22 years ago

I have done verification on Solaris 8 ja locales on SPARC/Inetl.
It's working fine. I'll try more testing other OS version and
locales tomorrow.

However, it seems that download manager always dumps core
here when I try to save contents to japanese filename. When
I disable download manager, the problem does not happen.
(I found similar bug but filed as new one - bug 150832)

kill this account

Comment 146

•

22 years ago

> if you look at the destructor for nsNativeCharsetConverter you will see
> that we always reset the converter.

This would imply that NativeToUnicode and UnicodeToNative can only be called 
once between construction and destruction.

If so, can we put in an assert to detect multiple uses and warn developers?

Darin Fisher

Assignee

Comment 147

•

22 years ago

brian: 

they are designed to be called multiple times between construction and
destruction.  the input is potentially multi-fragment, which means that it might
be important to save the state of the converter when moving to the next
fragment.  with the current patch, this is permitted.  if i were to reset the
converter after working on each fragment then i'd potentially lose such state. 
so, that can't be the right thing to do.  moreover, even if the input is
single-fragment, we might still call these routines (NativeToUnicode,
UnicodeToNative) more than once if the input size is greater than the maximum
output buffer size.

hacker formerly known as seawood@netscape.com

Comment 148

•

22 years ago

The solaris tinderboxes on the Ports pages have been orange for the past day. 
It looks like someone reset the tinderbox yesterday so I'm not absolutely sure
that it's this checkin but it definitely needs to be looked into as the first
orange cycled occurred during thattimeframe.

Darin Fisher

Assignee

Comment 149

•

22 years ago

seawood: yeah, i noticed that too... unfortunately, the error message isn't all
that helpful.  i should probably login and try running the build manually to see
what's going on.

kill this account

Comment 150

•

22 years ago

okay, this sound correct. Thanks Darin.

Masaki Katakai

Reporter

Comment 151

•

22 years ago

Jim Crumley,

Have you succeeded build on Solaris 7?

I understand there should not be any libiconv under /usr/lib/
on Solaris system. What is your libiconv.so?
Can you check which pkg does contain our libiconv.so?

% grep libiconv.so /var/sadm/install/contents 
/usr/lib/libiconv.so f none 0755 bin bin 23348 20350 941709164 SUNWhuccd
% pkginfo -x SUNWhuccd
SUNWhuccd       Traditional Chinese User Based Chinese Console Display Package
                (sparc) 8.0,REV=1999.11.04.10.14

However, one exception exists on Solaris 8. We have
shipped libiconv.so as internal use in Solaris 8 for
chinese locales. This is not valid libiconv.so. When
we use Solaris 8 as build system, we have to remove
SUNWhuccd package.

# pkgrm SUNWhuccd

There is no regression if we have the libiconv.so but
the binary built with the libiconv will not run on
Solaris 9 because there is no libiconv.so in Solaris 9.

Jim Crumley

Comment 152

•

22 years ago

Yes, I have gotten the patch to build and it seems to be working.

My libiconv on Solaris 7 is from the package that you mention - SUNWhuccd      
Traditional Chinese User Based Chinese Console Display Package (sparc)
7.0,REV=1.0.18

Masaki Katakai

Reporter

Comment 153

•

22 years ago

Verified the fix on Solaris 8 and 9.

However, I'm seeing Solaris iconv bug. To reset the
iconv state iconv(conv, NULL, NULL, NULL, NULL) causes
core dump only when PCK(SJIS) -> UTF-8 converter.
I have reported this problem to our OS engineers.

Status: RESOLVED → VERIFIED

Roland Mainz

Comment 154

•

22 years ago

Reopening... all Solaris tinderbox machines are in ORANGE status; regxpcom
crashes... and my own box (Solaris 2.7/SPARC) cannot run Mozilla either... ;-(

Steps to reproducte:
1. % rm component.reg
2. % ./mozilla -g -d dbx
3. (dbx) run
dbx/mozilla output looks like this:
-- snip --
(dbx) run
Running: mozilla-bin 
(process id 28880)
Reading libdemangle.so.1
File descriptors set to 512
Reading en_US.so.2
Type Manifest File:
/shared/bigtmp2/mozilla/2002-06-12-08-trunk/objdir_ws7_gtk/dist/bin/components/xpti.dat
nsNativeComponentLoader: autoregistering begins.
Reading ISO8859-1%UTF-8.so
Reading UTF-8%UCS-2.so
Reading UCS-2%UTF-8.so
Reading UTF-8%ISO8859-1.so
t@1 (l@1) signal SEGV (no mapping at the fault address) in __icv_iconv at
0xfdb80ad0
0xfdb80ad0: __icv_iconv+0x0074: ld      [%i3], %o0
dbx: internal warning: Typeid already exists with different name:__1nJnsAString_
or symclass:struct, ignore current
stab:./libxpcom.so:../../../../../../../home/mozilla/src/2002-06-12-08-trunk/mozilla/xpcom/io/nsNativeCharsetUtils.cpp
stab #137 __1nJnsAString_:U(0,115)
Current function is xp_iconv
  115       res = iconv(converter, ICONV_INPUT(input), inputLeft, output,
outputLeft);
(dbx) where
current thread: t@1
  [1] __icv_iconv(0x7b6a8, 0x0, 0x0, 0x0, 0x1, 0xfdb80bd0), at 0xfdb80ad0
  [2] _iconv(0x93ea0, 0x0, 0x0, 0x0, 0x0, 0x353), at 0xfe4c48ec
=>[3] xp_iconv(converter = 0x93ea0, input = (nil), inputLeft = (nil), output =
(nil), outputLeft = (nil)), line 115 in "nsNativeCharsetUtils.cpp"
  [4] nsNativeCharsetConverter::~nsNativeCharsetConverter(this = 0xffbeec8b),
line 333 in "nsNativeCharsetUtils.cpp"
  [5] NS_CopyNativeToUnicode(input = CLASS, output = CLASS), line 686 in
"nsNativeCharsetUtils.cpp"
dbx: warning: can't find file
"/shared/bigtmp2/mozilla/2002-06-12-08-trunk/objdir_ws7_gtk/xpcom/build/nsLocalFileUnix.o"
dbx: warning: see `help finding-files'
  [6] nsLocalFile::GetLeafName(0xc6d78, 0xffbeee78, 0xf523c, 0xff0d6ce8,
0x80000000, 0xc6d78), at 0xff0d6d30
dbx: warning: can't find file
"/shared/bigtmp2/mozilla/2002-06-12-08-trunk/objdir_ws7_gtk/xpcom/build/nsNativeComponentLoader.o"
  [7] nsNativeComponentLoader::AutoRegisterComponent(0x9f330, 0xffbeef10, 0x1,
0xffbeef20, 0xff0ee280, 0xff1e3484), at 0xff0ecc90
  [8] nsNativeComponentLoader::RegisterComponentsInDir(0x9f330, 0x0, 0xc4f18,
0x0, 0xffbef028, 0xffbef030), at 0xff0eb9f8
  [9] nsNativeComponentLoader::AutoRegisterComponents(0x9f330, 0x0, 0x121468,
0x2800, 0xff1ad830, 0xfe53a05c), at 0xff0eb834
dbx: warning: can't find file
"/shared/bigtmp2/mozilla/2002-06-12-08-trunk/objdir_ws7_gtk/xpcom/build/nsComponentManager.o"
  [10] nsComponentManagerImpl::AutoRegisterImpl(0x9a1e0, 0x0, 0xffbef158,
0xff1f0f80, 0xff0eb7dc, 0x74cb4), at 0xff0e4438
  [11] nsComponentManagerImpl::AutoRegister(0x9a1e0, 0x0, 0x0, 0xff1e627c,
0xff1e3484, 0x2800), at 0xff0e4218
dbx: warning: can't find file
"/shared/bigtmp2/mozilla/2002-06-12-08-trunk/objdir_ws7_gtk/xpcom/build/nsComponentManagerObsolete.o"
  [12] nsComponentManager::AutoRegister(0x0, 0x0, 0x80000000, 0x80000000,
0xffbef22c, 0x0), at 0xff0eaa18
  [13] main1(argc = ???, argv = ???, nativeApp = ???) (optimized), at 0x195bc
(line ~1304) in "nsAppRunner.cpp"
  [14] main(argc = ???, argv = ???) (optimized), at 0x1a780 (line ~1805) in
"nsAppRunner.cpp"
(dbx) print converter, input, inputLeft, output, outputLeft
converter = 0x93ea0
input = (nil)
inputLeft = (nil)
output = (nil)
outputLeft = (nil)
-- snip --

I guess this is the problem described in comment #153 ... ;-(

Status: VERIFIED → REOPENED

Resolution: FIXED → ---

Roland Mainz

Comment 155

•

22 years ago

Taking myself...

Assignee: darin → Roland.Mainz

Status: REOPENED → NEW

Roland Mainz

Updated

•

22 years ago

Status: NEW → ASSIGNED

Roland Mainz

Comment 156

•

22 years ago

Attached patch Patch for 2002-06-12-08-trunk to avoid |iconv(conv, NULL, NULL, NULL, NULL)| (obsolete) — Details — Splinter Review

Roland Mainz

Comment 157

•

22 years ago

Attached patch Better patch for 2002-06-12-08-trunk per cls's comments (obsolete) — Details — Splinter Review

Attachment #87486 - Attachment is obsolete: true

Roland Mainz

Comment 158

•

22 years ago

Reuqesting r=/sr= ...

Masaki Katakai

Reporter

Comment 159

•

22 years ago

I understand iconv(conv, NULL,*,*,*) needs to be
called to reset the conv state. I'm thinking the
Solaris bug happens when iconv(conv, NULL,NULL,NULL,NULL)
is called, which means when 3rd and 4th arguments are not NULL,
this bug does not happen. I'll check.

Masaki Katakai

Reporter

Comment 160

•

22 years ago

Sorry, 4th and 5th arguments should not be NULL.

The following codes can work in my environment even
when input and inputLeft = NULL.

    char        output[BUFSIZ];
    size_t      outputLeft=BUFSIZ;
    iconv(converter, input, inputLeft, &output, &outputLeft);

Roland Mainz

Comment 161

•

22 years ago

Masaki Katakai wrote:
> Sorry, 4th and 5th arguments should not be NULL.

Mhhh, can you file a bug against the iconv(3) manual page, please ? Such a
comment in the NOTES would be nice...

Masaki Katakai

Reporter

Comment 162

•

22 years ago

> Mhhh, can you file a bug against the iconv(3) manual page, please ? Such a
> comment in the NOTES would be nice...

It's not a spec. This is the bug of Solaris iconv().

Roland, could you wait? I'm now asking our OS engineer
to verify that can reset the state correctly.

Roland Mainz

Comment 163

•

22 years ago

Comment on attachment 87488 [details] [diff] [review]
Better patch for 2002-06-12-08-trunk per cls's comments

Ouch! Now I see what's going wrong... you want to reset the converter (e.g.
legal usage of |xp_iconv()|) ...

Attachment #87488 - Flags: needs-work+

Roland Mainz

Comment 164

•

22 years ago

Attached patch Patch for 2002-06-12-08-trunk — Details — Splinter Review

I moved all reset-converter xp_iconv()-calls to a new function
(|xp_iconv_reset()|) to avoid that we have to touch tons of lines in the
future...

Attachment #87488 - Attachment is obsolete: true

Darin Fisher

Assignee

Comment 165

•

22 years ago

Roland: latest patch looks good to me.  if you don't mind i'd like to keep this
bug assigned to myself since i still have to land the original patch (plus any
fixups) on the branch.

reassigning back to me.

Assignee: Roland.Mainz → darin

Status: ASSIGNED → NEW

Darin Fisher

Assignee

Comment 166

•

22 years ago

Comment on attachment 87495 [details] [diff] [review]
Patch for 2002-06-12-08-trunk

looks good to me.. sr=darin

Attachment #87495 - Flags: superreview+

Darin Fisher

Assignee

Comment 167

•

22 years ago

*** Bug 151523 has been marked as a duplicate of this bug. ***

Roland Mainz

Comment 168

•

22 years ago

Darin Fisher wrote:
> Roland: latest patch looks good to me.  if you don't mind i'd like to keep 
> this bug assigned to myself 

NP... :)

> since i still have to land the original patch (plus any fixups) on the branch.

I thought the original patch is already in the 1.0.1-branch...

... let's wait if katakai likes the patch or not... :)

Alec Flett

Comment 169

•

22 years ago

Comment on attachment 87495 [details] [diff] [review]
Patch for 2002-06-12-08-trunk

r=alecf

Attachment #87495 - Flags: review+

Roland Mainz

Comment 170

•

22 years ago

1. Can we please wait for katakai's response before checking the patch "in" ?
2. IMHO it may be wise to create a test tool (e.g. "TestNativeCharsetUtils") -
can anyone file a RFE for that, please ? :)

Darin Fisher

Assignee

Comment 171

•

22 years ago

well... hmm... why don't we just check it in since it should fix the tinderbox
orange.  if katakai comes back and has some suggestions on how to further fixup
the iconv code, we can make another revision.  given that we'll need this patch
(or something very similar) on the branch, i think it's worth while to expedite
things a bit.

Alec Flett

Comment 172

•

22 years ago

in addition, this patch just makes the existing code easier to fix, no matter
how we end up fixing it.

Paul Pietromonaco

Comment 173

•

22 years ago

I'm having some problems here.  When I try to run the patched Mozilla 1.0 on
older Solarii (either Intel or sparc - doesn't matter) I get Segmentation faults.  

Here's one of the scenarios:
I make a Mozilla 1.0 release build with v4 patch applied on Solaris 2.6 sparc. 
It compiles correctly.  When I run this build on my Solaris 8 multi-language
system, it works correctly, and when the locale is set to ja, I can load the
double-byte files correctly.  HOWEVER - when I run a fresh installation of this
build on my Solaris 2.6 system (that was actually used to build it), it
Segmentation Faults before I can even get the browser loaded.  A build done on
the same machine - without the patch - works just fine.  

I have a theory - my Solaris 2.6 box doesn't have Japanese locales - only a few
European locales.  The same situation is happening with my Solaris Intel boxes.
    It builds on 7, but only runs on 8.  The 7 is Euro only - the 8 is
multi-language.

Roland Mainz

Comment 174

•

22 years ago

Paul Pietromonaco wrote:
> I'm having some problems here.  When I try to run the patched Mozilla 1.0 on
> older Solarii (either Intel or sparc - doesn't matter) I get Segmentation 
> faults.

Please try attachment 87495 [details] [diff] [review] - that should fix your problem.

Darin Fisher

Assignee

Comment 175

•

22 years ago

if that patch doesn't do the trick, then can you perhaps supply a stack trace
corresponding to the crash?

Masaki Katakai

Reporter

Comment 176

•

22 years ago

The patch is good. Thanks!!

One info:
I've talked with OS engineer who is iconv expert.
iconv(conv, NULL, ...) means two things,

 1) reset the state of converter
 2) put sequence into outputs if necessary if output
   encoding is stateful.

2) For example, output is iso-2022-jp, iconv() does not
put escape sequence at the end of outputs at normal conversion.
At resetting (called by iconv(conv, NULL, ...)), escape
sequence is put into output if necessary e.g. end of kanji
character. So, strictly speaking, we need to look into
outputs even at iconv(conv, NULL, ...). However, I understand
there is no stateful encoding such as iso-2022-jp as OS
platform locale. So the patch is OK.

Darin Fisher

Assignee

Comment 177

•

22 years ago

marking FIXED

Status: NEW → RESOLVED

Closed: 22 years ago → 22 years ago

Resolution: --- → FIXED

Paul Pietromonaco

Comment 178

•

22 years ago

Roland Mainz wrote:
>
>Paul Pietromonaco wrote:
>> I'm having some problems here.  When I try to run the patched Mozilla 1.0 on
>> older Solarii (either Intel or sparc - doesn't matter) I get Segmentation 
>> faults.
>
>Please try attachment 87495 [details] [diff] [review] - that should fix your problem.

That indeed fixed my problem.  Thanks for the prompt reply! (^_^)

Noah Friedman

Comment 179

•

22 years ago

attachment 87495 [details] [diff] [review] fixes the same crash in mozilla 1.1a on solaris 7 for me too.
Thanks!

Paul Pietromonaco

Comment 180

•

22 years ago

Hi everyone,

Sorry to pester you - I'm very new to using Bugzilla - but I have a question. 
I've found a pretty severe bug that corrupts your User profile that I believe is
directly related to the patch.  However - it's pretty obscure, and the steps to
reproduce are not common. 

I'm thinking it's probably better to spawn a new bug and let this one stay
Resolved Fixed. However, I'm not sure if that's the right thing to do.  Can
someone let me know, either via this bug, or privately, what action you'd like
me to take?

Thanks in advance for your time,
Paul

Darin Fisher

Assignee

Comment 181

•

22 years ago

paul: it'd probably be best to file a separate bug in this case.  please add a
comment to this bug indicating the bug number so others interested can follow
the bug report.  thx!

Darin Fisher

Assignee

Comment 182

•

22 years ago

can someone please verify that this patch has done the trick?  (i.e., mark it
verified.)  that way we'll be able to land this on the 1.0 branch for mozilla
1.0.1.  thx!!

Judson Valeski

Comment 183

•

22 years ago

please checkin to the 1.0.1 branch. once there, remove the "mozilla1.0.1+"
keyword and add the "fixed1.0.1" keyword.

Keywords: mozilla1.0.1 → mozilla1.0.1+

Darin Fisher

Assignee

Updated

•

22 years ago

Keywords: adt1.0.1

Jaime Rodriguez, Jr.

Comment 184

•

22 years ago

ylong - can you get this one verified asap? thanks!

Blocks: 143047

Whiteboard: [adt1 RTM] → [adt1 RTM] [ETA 06/21]

Yuying Long

Comment 185

•

22 years ago

I never able to reproduce this bug on my linux RH7.2, so I'm afraid I'm not the
right person to verify it.  Since the bug was filed for Sun Sloaris. 
Katakai-san, or some one else who had this problem, could you please help us to
verify this? thanks a lot!

QA Contact: ylong → katakai

Masaki Katakai

Reporter

Comment 186

•

22 years ago

Sure. I'll try.

Frank Tang

Comment 187

•

22 years ago

add kevin.zhou@sun.com in the cc list
ylong- the problem cannot be reproduce on linux. It can be reproduce on Solaris
or other unix. However, since the patch is big and touch the code linux will be
use and we want to make suer the patch fix the solair problem without making
Linux worst, we need your help to make srue the current trunk with the patch do
not have additional problems on Linux. While you cannot say "I test it and it
fix the problem", it is possible for you to say "I test it and it do not
introduce problem on Linux but I cannot say anything about Solaris or other unix"

We can ask kevin.zhou@sun.com to allocate QA on Solaris testing. Kataki - can
you drive that? kevin is locate in china. kevin- can you verify the trunk build
that cannot reproduce the problem after the fix. if you can verify that, then we
can ask for landign of branch.

Yuying Long

Comment 188

•

22 years ago

> ylong- ... However, since the patch is big and touch the code linux 
> will be use and we want to make suer the patch fix the solair problem 
> without making Linux worst, we need your help to make srue the current > trunk
with the patch do not have additional problems on Linux. ...

Yes, when I said "I nerver able to reproduce this problem" meant before and
after the checked-in.  But for verify this particular bug, we need some one had
this problem before.

I'll definitely add comments(or maybe file a seperate bug also) here in case I
found any linux regression problem.

Darin Fisher

Assignee

Comment 189

•

22 years ago

marking VERIFIED per previous few comments.

Status: RESOLVED → VERIFIED

Paul Wyskoczka

Comment 190

•

22 years ago

Adding adt1.0.1+ on behalf of the adt for checkin to the 1.0 branch.  Please get
drivers approval before checking in. When you check this into the branch, please
change the mozilla1.0.1+ keyword to fixed1.0.1

Keywords: adt1.0.1 → adt1.0.1+

Whiteboard: [adt1 RTM] [ETA 06/21] → [adt3 RTM] [ETA 06/21]

Frank Tang

Comment 191

•

22 years ago

ylong- you try seveal differnt locale , right?

Kyle Yuan

Comment 192

•

22 years ago

verified on Solaris 9, Chinese locale. trunk build.

Jay Yan

Comment 193

•

22 years ago

Still does not work in my machine.
I am using trunk of:

       Mozilla 1.1a+
       Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.1a+) Gecko/20020620

My env is:

       Sun Microsystems Inc.   SunOS 5.8       Generic February 2000
       $ locale
       LANG=zh
       LC_CTYPE="zh"
       LC_NUMERIC="zh"
       LC_TIME="zh"
       LC_COLLATE="zh"
       LC_MONETARY="zh"
       LC_MESSAGES="zh"
       LC_ALL=
       $ 

Anto: yours?

Masaki Katakai

Reporter

Comment 194

•

22 years ago

Jay, which build id is displayed on browser title?
I'm using 2002061922 of mozilla-sparc-sun-solaris2.6.tar.gz.
It works fine fo me in ja/ja_JP.PCK(SJIS). Also I've tried in zh
locale and I don't see any problem.

Could you try truss and check correct iconv.so is being picked up?
In my Solaris 8, it seems that correct modules are picked and
zh filename can be displayed on filepicker.

9860:   open("/usr/lib/iconv/gb2312%UTF-8.so", O_RDONLY) = 9
9860:   open("/usr/lib/iconv/UTF-8%gb2312.so", O_RDONLY) = 9

Jay Yan

Comment 195

•

22 years ago

Katakai:
I am using debuging build. that means, I checkout trunk code, and build using
Forte u2, so the Build ID: is 00000000, the date of the code is 6/20.

What is the difference if I download releasing build from mozilla?

Ervin Yan

Comment 196

•

22 years ago

I verify in Simplified Chinese/Traditional Chinese/Korean and UTF-8 locales on
Solaris 9:

The mozilla trunk 2002061922 works fine.

but in ko.UTF-8 locale, file name seems not OK.

Antonio.Xu

Comment 197

•

22 years ago

I had verified this bug on solaris8 with mozilla-sparc-sun-solaris2.6.tar.gz.  
It is seems ok.  I have tested  GBK&ZH locale on Solaris8, all chinese fonts 
been showed correctly. I found another problem is the chinese file name can not 
been showed correctly in url bar, but it is not due to this bug.

Masaki Katakai

Reporter

Comment 198

•

22 years ago

We, Sun China and Japan team are now testing 2002061922 trunk
in all possible locales (including CJK,Thai,Russian and Euro...)
on Solaris 8 and Solaris 9.

We've completed the verification of the locales *except* ko.UTF-8
locale of Solaris 9. The results are OK for now.

We'll let you know the result of ko.UTF-8 later.

Yuying Long

Comment 199

•

22 years ago

In comment #191:
> ylong- you try seveal differnt locale , right?
Yes, I tried on my linux RH7.2: 
en_US, C, ja_JP.eucJP, zh_CN, zh_TW.Big5, ko_KR.eucKR.
Some locales(like ja_JP.SJIS...etc.) I can not try cause they falled back to C.

Frank Tang

Comment 200

•

22 years ago

It looks pretty good from the test result Sun Japan and Sun China did . Cool we
really have a global team :) darin, since we got all approval already and none
of the testing result show red flag, let's land it asap :) :) :) thank you very
very very very much :) Great job.

Darin Fisher

Assignee

Comment 201

•

22 years ago

actually, there could be problems with UTF-8.  i think we'd try to convert from
UTF-8 to UTF-8 to UCS-2.  instead of directly from UTF-8 to UCS-2, and vice
versa.  that might be causing problems with UTF-8, so i'm very anxious to hear
back from folks about the UTF-8 verfication results.  thx!!

Darin Fisher

Assignee

Comment 202

•

22 years ago

fixed1.0.1

Keywords: mozilla1.0.1+ → fixed1.0.1

Frank Tang

Updated

•

22 years ago

Blocks: 146292
No longer blocks: 141008

Jürgen Keil

Updated

•

22 years ago

Blocks: 153562

Jürgen Keil

Comment 203

•

22 years ago

It seems this change has caused a regression on solaris 8 (x86 & sparc).

The user profile directory in the "~/.mozilla/appreg" file gets truncated by
one character, in case the browser crashes.  This problem didn't exist in the
mozilla 1.0 release.


See also [Bug 153562] crash trashes user directory, requires new profile

Comment #180 for this bug also seems to hint at this profile problem (although
I'm not 100% sure, since paul didn't mention details).




Problem is that the solaris iconv conversion module adds a 'byte order mark'
BOM unicode charater (uFEFF) in front of the first character of the first
converted UCS-2 unicode string.  This breaks the assumption made in 
NS_CopyNativeToUnicode(), that the UCS-2 output string never contains more
unicode characters then the native input string.

Question is: should bug 147333 be reopened,  or shall we track the profile
problem for solaris on bug 153562?

bobj

Comment 204

•

22 years ago

> Problem is that the solaris iconv conversion module adds a 'byte order mark'
> BOM unicode charater (uFEFF) in front of the first character of the first
> converted UCS-2 unicode string.

Is this the expected iconv() behavior?

BOMs are meant to be used as the first character of a Unicode file or where
specified in a protocol.  It seems odd that iconv() would prepend BOMs to
strings since it has no idea how or where that string will be used.

What do other iconv() implementations do?

Jürgen Keil

Comment 205

•

22 years ago

Attached file Small 'iconv' test program, to check ISO8859-1 -> UCS-2 conversion on solaris — Details

> BOMs are meant to be used as the first character of a Unicode file or where
> specified in a protocol.  It seems odd that iconv() would prepend BOMs to
> strings since it has no idea how or where that string will be used.

Note that the solaris iconv() does not prepend BOMs for every converted string;

the BOM seems to be prepended *exactly* *once* during the first iconv() call
after an iconv_open of any conversion that outputs UCS-2.

Note also, that solaris 8 can convert into UCS-2, UCS-2LE and UCS-2BE.	The
BOM is only inserted when we convert into UCS-2.  On Solaris 8 SPARC, iconv ->
UCS-2 produces big endian unicode characters while on Solaris 8 Intel it
produces little ending unicode characters.

With UCS-2LE or UCS-2BE the BOMs are not inserted.


See also the attached small test program.  It converts the five character
ISO8859-1 string 'Hello' into UCS-2 using iconv().  On Solaris 8 SPARC, I get
different UCS-2 output for the first and second conversion:

% uname -a
SunOS foobar 5.8 Generic_108528-13 sun4u sparc SUNW,Ultra-60
% ic2
converting 'Hello' from ISO8859-1 -> UCS-2
converted 5 bytes input to 12 bytes output
fe ff 00 48 00 65 00 6c 00 6c 00 6f 
converted 5 bytes input to 10 bytes output
00 48 00 65 00 6c 00 6c 00 6f 


Btw. this happens for every conversion to UCS-2, for example the UTF-8 -> UCS-2

conversion module shows the same behaviour:

% ic2 UCS-2 UTF-8
converting 'Hello' from UTF-8 -> UCS-2
converted 5 bytes input to 12 bytes output
fe ff 00 48 00 65 00 6c 00 6c 00 6f 
converted 5 bytes input to 10 bytes output
00 48 00 65 00 6c 00 6c 00 6f 


And note, that solaris 2.6 behaves differently.  It does not offer conversion
to the endian-specific variants UCS-{2,4}{BE,LE},  and converting to UCS-{2,4}
does not add a BOM:


% uname -a
SunOS xxxx 5.6 Generic_105181-26 sun4m sparc sun4m
% ic2 UCS-2 UTF-8
converting 'Hello' from UTF-8 -> UCS-2
converted 5 bytes input to 10 bytes output
00 48 00 65 00 6c 00 6c 00 6f 
converted 5 bytes input to 10 bytes output
00 48 00 65 00 6c 00 6c 00 6f

bobj

Comment 206

•

22 years ago

Interesting...
A couple of possible hacks for Solaris8+:
(1) Do a spurious iconv() and then throw away the results with the BOM.
(2) Call iconv() with the appropriate UCS-2LE or UCS-2BE instead of UCS-2

Is this a problem on platforms other than Solaris8?

Jürgen Keil

Comment 207

•

22 years ago

> A couple of possible hacks for Solaris8+:
> (1) Do a spurious iconv() and then throw away the results with the BOM.
> (2) Call iconv() with the appropriate UCS-2LE or UCS-2BE instead of UCS-2

I've prepared patches for both ideas. See bug 153562.

preliminary iconv patch -- just for testing purposes 22 years ago Darin Fisher 20.38 KB, patch		Details \| Diff \| Splinter Review
additional patch, need to link against iconv to get iconv_open... 22 years ago Jim Dunn 538 bytes, patch		Details \| Diff \| Splinter Review
add configure test for iconv 22 years ago hacker formerly known as seawood@netscape.com 22.38 KB, patch		Details \| Diff \| Splinter Review
all available conversions from UTF-8 under Solaris (worms) 22 years ago Darin Fisher 1.78 KB, text/plain		Details
all available conversions from UTF-8 under Solaris 5.7 (mothra) 22 years ago Dawn Endico 3.83 KB, text/plain		Details
v1 patch 22 years ago Darin Fisher 29.00 KB, patch		Details \| Diff \| Splinter Review
alt. iconv() check using iconv.m4 22 years ago hacker formerly known as seawood@netscape.com 49.04 KB, patch		Details \| Diff \| Splinter Review
v2 patch 22 years ago Darin Fisher 30.04 KB, patch		Details \| Diff \| Splinter Review
v3 patch 22 years ago Darin Fisher 33.02 KB, patch	bstell : review+	Details \| Diff \| Splinter Review
configure file to match the configure.in patch in v3 22 years ago Daniel (Leaf) Nunes 416.01 KB, patch		Details \| Diff \| Splinter Review
v4 patch 22 years ago Darin Fisher 38.32 KB, patch	darin.moz : review+ darin.moz : superreview+ asa : approval+	Details \| Diff \| Splinter Review
console log (start mozilla, try to open a local file with chinese filename, exit mozilla) 22 years ago Frank Liu 27.81 KB, text/plain		Details
config-defs.h 22 years ago Frank Liu 1.86 KB, text/plain		Details
Patch for 2002-06-12-08-trunk to avoid \|iconv(conv, NULL, NULL, NULL, NULL)\| 22 years ago Roland Mainz 1.09 KB, patch		Details \| Diff \| Splinter Review
Better patch for 2002-06-12-08-trunk per cls's comments 22 years ago Roland Mainz 1.07 KB, patch		Details \| Diff \| Splinter Review
Patch for 2002-06-12-08-trunk 22 years ago Roland Mainz 2.81 KB, patch	alecf : review+ darin.moz : superreview+	Details \| Diff \| Splinter Review
Small 'iconv' test program, to check ISO8859-1 -> UCS-2 conversion on solaris 22 years ago Jürgen Keil 1.53 KB, text/plain		Details