Beginning on October 25th, 2016, Persona will no longer be an option for authentication on BMO. For more details see Persona Deprecated.
Last Comment Bug 129279 - nsIFile unicode/utf8/ascii task
: nsIFile unicode/utf8/ascii task
fixed on the 1.0 branch
: topembed+
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: Trunk
: x86 Windows NT
: P3 normal (vote)
: mozilla1.0
Assigned To: Darin Fisher
: Scott Collins
: Nathan Froyd [:froydnj]
Depends on:
Blocks: 99160 100676 102812
  Show dependency treegraph
Reported: 2002-03-06 07:21 PST by Doug Turner (:dougt)
Modified: 2010-02-28 13:08 PST (History)
25 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

preliminary XP_UNIX patch (gzip compressed) (85.08 KB, application/x-gzip)
2002-04-02 21:25 PST, Darin Fisher
no flags Details
preliminary XP_UNIX + XP_WIN patch (gzip compressed) (108.13 KB, application/x-gzip)
2002-04-07 21:49 PDT, Darin Fisher
no flags Details
preliminary XP_UNIX + XP_WIN + XP_MAC patch (gzip compressed) (100.20 KB, application/x-gzip)
2002-04-10 14:55 PDT, Darin Fisher
no flags Details
v1 patch (still needs XP_OS2 impl) (107.64 KB, application/x-gzip)
2002-04-15 18:03 PDT, Darin Fisher
no flags Details
v1 patch (-u10w) (172.70 KB, application/x-gzip)
2002-04-15 18:04 PDT, Darin Fisher
no flags Details
v1.01 patch (updated for trunk, still needs XP_OS2 impl) (107.61 KB, application/x-gzip)
2002-04-19 17:28 PDT, Darin Fisher
no flags Details
v1.02 patch (fixed some bugs, needs XP_OS2 impl) (108.37 KB, application/x-gzip)
2002-04-19 20:17 PDT, Darin Fisher
doug.turner: review+
v1.03 patch (backed out changes to encode file:// URLs using UTF-8) (108.58 KB, application/x-gzip)
2002-04-24 23:07 PDT, Darin Fisher
doug.turner: review+
alecf: superreview+
v1.04 patch (v1.03 + XP_OS2 changes) (113.69 KB, application/x-gzip)
2002-04-26 10:22 PDT, Darin Fisher
darin.moz: review+
darin.moz: superreview+
documentation changes that i checked in on the trunk (17.20 KB, patch)
2002-05-02 23:13 PDT, Darin Fisher
no flags Details | Diff | Splinter Review
branch patch (including all subsequent/related bugs) (147.18 KB, application/x-gzip)
2002-05-10 13:51 PDT, Darin Fisher
no flags Details

Description Doug Turner (:dougt) 2002-03-06 07:21:03 PST
This bug is to track the discussion and task of Combining all the 
char*/PRUnichar* pairs of methods to one method which takes
an utf8 string.
Comment 1 Doug Turner (:dougt) 2002-03-13 19:11:47 PST
Wont Fix for mozilla 1.0!
Comment 2 Judson Valeski 2002-03-14 08:30:29 PST
adding plus back. only triage teams or designates can remove a plus. I assume
this means that nsIfile will not freeze for 1.0?
Comment 3 Pete Collins (MDG) 2002-03-14 08:44:58 PST
I beleive it is going to freeze "as-is".

Just still need to get everything working properly.

Comment 4 Doug Turner (:dougt) 2002-03-14 09:34:03 PST
You can remove the plus, but it will not be fixed for mozilla 1.0.  Because of
that, nsIFile will become frozen as-is modula some commenting.
Comment 5 Conrad Carlen (not reading bugmail) 2002-03-14 11:46:55 PST
  Why does this *need* to be frozen when the 1.0 branch is cut? Having dual
char*/Unichar* methods for strings will enforce the existance of serious bugs
WRT to charset mapping when the locale changes. There's no amount of back-end
fixing that can correct that problem if it's frozen into the API.
  The argument about focusing on making the implementations better is true but,
in most cases, can be done independently of the API. This issue of charset
mapping is not one of them - the API needs to be fixed first or the
implementations are forever doomed.
Comment 6 Alec Flett 2002-03-14 11:54:28 PST
here's an evil thought: What if we split nsIFile into nsIFile and
nsIFileInternal, and moved all of the non-unicode methods into nsIFileInternal,
and then swap all the names around. Then we freeze nsIFile and be done with it.
We'll have to update callers, but that will mostly be changing


file->SetFSLeafName() (or something)

the compiler will find all the callers for us.

that will get us through 1.0.. then we wean people off of nsIFileInternal post 1.0
Comment 7 Doug Turner (:dougt) 2002-03-14 12:35:18 PST
go back to your hole alec, and don't come out! :-)

Seriously, it was thought about.  nsIFileInfo.  Then again, we could have just
passed around nsISupports everywhere.
Comment 8 Alec Flett 2002-03-14 14:01:41 PST
No! I'm not proposing some new design! I'm saying:

1) there are a lot of evil methods in nsIFile that we want to get rid of -
namely the ones that require the strings to be in a unicode-friendly format
2) fixing the consumers of those methods will take a lot of time

So instead, we just move the internal mozilla code onto a seperate interface and
fix the consumers the right way post-1.0
Comment 9 Pete Collins (MDG) 2002-03-14 14:19:37 PST
Fixing consumers didn't take me that long to do when i did it the first time.

First is UTF8String complete?

Is it just a matter of changing the API params and fixing up the callers?

Will i be able to just pass myUTFString.get() to libc calls?

  stat(myUTFString.get(), sb);

If thats the case, then maybe i'll bite the bullet and do it. 

Comment 10 Darin Fisher 2002-03-22 10:28:39 PST
ok, i didn't realize this bug existed... i was working on something very similar
in bug 99160.

pete: i don't believe you can pass UTF-8 methods to libc functions... the
strings need to be formated for libc's locale... er, codepage... whatever! ;-) 
UTF-8 can only be passed if that corresponds to the underlying codepage of the
system, which is rarely the case.  on my rh72 box, w/ locale set to en-US, the
codepage for the filesystem (as provided by nl_langinfo) is ISO-8859-1.

this presents an interesting issue... why do we think UTF-8 should be used in
the API?  (a thought experiment... bear with me)  well, for one it would seem to
simplify the API... one set of functions for all callers... JS or otherwise. 
this would be a huge plus.  less confusion for new nsIFile users == less bugs?!?

however, let's consider how people use nsIFile.  well, nsIFile was created so
people wouldn't have to muck around w/ file paths.  the string APIs allow people
to GetPath, GetTarget, GetLeafName, SetLeafName, Append, CopyTo, MoveTo, etc.
why do people usually call these methods?  it's very common that people will
call these methods if interacting w/ the system.  in such cases, they want file
paths and leafnames that are formatted for the charset of the underlying system.
 UTF-8 is not guaranteed here.  then there's another class... say someone enters
a file path or leaf name via the UI.... that string comes in as UCS2.  this case
is currently handled with the unicode set of methods.

conrad raises an interesting point: what happens if people are storing native
file paths, and the locale changes out from under them?  can this really happen?
 i mean, can the codepage of the filesystem really change?  of course, someone
could toggle it.. but isn't that just totally evil anyways?  do we have bugs on
this kind of thing?

at the very least, i'd like to see nsIFile use our abstract string classes.
my hacked-together-proposal patch in bug 99160 does pretty much what alecf
suggested in comment #6 and comment #8... creating nsILocalFileInternal.  as
much as i'd like nsILocalFileInternal to be truly internal... i think it may end
up being a required interface... to solve the problem of interacting w/ the
system.  so, maybe it really needs to be called something different... like

the other alternative to nsILocalFileFoo is just to do the conversion to
abstract strings, and mark all the native getters and setters as noscript.  add
comments which explain that the charset is not guaranteed to be ASCII, and leave
it at that.  maybe we could provide a charset getter, so someone could, if they
really wanted to, convert the narrow file paths to some other charset.  the
unicode getters would be more appropriate for such folks though.

in general, i just worry that the UTF-8 only API will be inefficient and costly
since we'll have to either scan incoming strings to verify that they are ASCII
(if the system isn't using UTF-8 as the filesystem codepage) or bite the bullet
and simply inflate to UCS2 and then convert back to the filesystem codepage. 
take a look at my proposal patch... you'll see that implementing GetTarget w/ a
UTF-8 string forces us to always go the route of GetUnicodeTarget :-(

i'm just not sure the one unified API is worth the added cost... w/ URLs it's
totally different.. UTF-8 is the emerging standard.  for filesystems, UTF-8 is
far from common.
Comment 11 Alec Flett 2002-03-22 10:49:09 PST
darin - you bring up some good points about why "native" access would be
useful.. frankly it seems like the unicode APIs are really the only ones that we
don't want at this point. 

My take, from working in mailnews which uses a lot of conversions between the
native charset, filenames that come in off the network, and so forth, is that
the UTF8 APIs really simplify code signifigantly.. but mostly because it is easy
to convert to/from UTF8 and converting to/from the native charset requires going
through i18n libraries.

I'm not actually too worried about inefficiencies in those cases though, because
I don't feel like these are APIs that people call often enough to make us really
worry about performance. mail and the filepicker are probably the only module
that calls nsIFile* apis in a loop of any kind, and even in those cases, we're
talking about a loop bounded by a count of mail folders (i.e. 10-100) or files
in a directory (i.e. 1-100+)

So anyway, I guess what I'm saying is that people are going to prefer to use the
UTF8 methods, for simplicity of code, and people are going to use the *Native
methods in the [what I claim is] rare case where they actually have filenames
coming in in the native charset.

so here's what I think we should do, which is more or less what you've proposed
as well:
1) make nsIFile have a bunch of UTF8 APIs
2) make nsIFileNative have all native-based api, and make it, or each individual
method, non-scriptable
3) make nsIFileInternal just contain the UCS2 interfaces that we'll say are

Why the seperate nsIFile vs. nsIFileNative? Mostly because I think it's a lot
easier to present an interface to a developer with all UTF8 methods, and to
point at an entirely seperate interface for Native. In addition, I can see cases
where it would be overkill to make a new nsIFile implementation implement a
bunch of native methods, when that isn't really appropriate (i.e. does FTP want
to implement nsIFile::GetNativeLeaf? What would "Native" mean in that case?)

Comment 12 Conrad Carlen (not reading bugmail) 2002-03-22 11:39:06 PST
> conrad raises an interesting point: what happens if people are storing native
> file paths, and the locale changes out from under them?  can this really happen?

Yes, easily. On Windows you may have to reboot, on the Mac you can do it on-the-fly.

As far as utf-8 and Unicode support on the OS file system level, it's more
widespread than you think. The Mac's HFSPlus file system is Unicode internally
and its APIs are Unicode-only. On OSX, the libc does take utf-8. I'm pretty sure
that the internals of the file system are Unicode on WinNT, 2K, XP. My whole
reason for wanting to have a utf-8 or Unicode-only API is that, on systems where
the file system is utf-8 or Unicode and we're using Unicode file system APIs, it
is the only way you can manage the code page changing out from under you.
Comment 13 Pete Collins (MDG) 2002-03-22 12:51:22 PST
conrad, yes exactly that was my point.

When i first started this and was using AStrings as params, i was converting
them to UTF8 strings and passing them right to libc calls and it worked fine. I
assumed this is becase UTF8 is ASCII compatible.

Comment 14 Darin Fisher 2002-03-22 13:16:48 PST
pete: UTF-8 is a superset of ASCII... so if you've only got ASCII characters in
a UTF-8 string, then libc wouldn't complain.  but, there's a problem when the
8th bit is set on one of the bytes... how should libc interpret that 8th bit? 
is it the start of an UTF-8 character (multibyte sequence), or is an umlaut from
ISO-8859-1?  it depends on the locale.

conrad: how does alecf's plan sound to you?  we could push UTF-8 as the standard
string-type on nsIFile and nsILocalFile, and relegate the other string types to
second-class interfaces (e.g., nsIFileNative and nsILocalFileNative).  this
unfortunately causes nsLocalFile to have two inheritance chains, but i think we
can live with that.

alecf: we should be able to do away with the UCS2 methods.  we can just fixup
the UCS-2 callsites to use UTF-8 instead.

as a side note: do we really want to have a FTP nsIFile impl one day?  that
would correspond to a bunch of blocking socket i/o calls :(  seems like
something we should avoid.
Comment 15 Alec Flett 2002-03-22 13:48:02 PST
talk to dougt, he's the one who's always going on about nsIFile and FTP :)
Comment 16 Doug Turner (:dougt) 2002-03-22 14:20:39 PST
alec: one day!  one day!  

Darin: blocking io - that is a red herring.  

lets not go out of our way to break anything.  if nsIFile is UTF8, ftp will do
fine for the most part.
Comment 17 Darin Fisher 2002-03-22 14:24:40 PST
ok, with that.. i'm going to move forward w/ nsILocalFileNative : nsILocalFile :
nsIFile.  there is no point to nsIFileNative (i think) since that would imply a
local file, right?
Comment 18 Conrad Carlen (not reading bugmail) 2002-03-22 14:45:34 PST
  OK - Ironically, once I do bug 95481, native *will* be Unicode on Mac.
Assuming nsILocalFileNative has char* args which are expected to be in the
current system char set, calls through the native iface will have to undergo
  I'll have to put NS_WARNING("You really shouldn't be calling this!") on each
nsILocalFileNative entry point on the Mac ;-)
Comment 19 Darin Fisher 2002-03-22 16:47:36 PST
conrad: i'm actually trying to make the native calls uses ACString (to help w/
performance, and because it's not much more work)... but that doesn't really
address your concern... so what is the string format best suited for the Mac? 
UCS2 or UTF8?
Comment 20 Darin Fisher 2002-03-22 16:48:23 PST
conrad: also what about all the code that currently calls PR_GetEnv... what
charset does that return on the Mac?
Comment 21 Conrad Carlen (not reading bugmail) 2002-03-25 09:08:13 PST
Darin: As far as the best charset for the Mac, it would be (after bug 95481 is
done) UCS2. Despite that, I still prefer utf-8 because the conversion to/from
that and UCS2, is fast, non-lossy, and doesn't require uconv.

Although, I think the nsIFileNative methods would need to be in the current
system charset even if the underlying implementation is Unicode/UTF-8. Part of
the reason we need native narrow charset calls anyway is to get us through 'til
the day nsIFileSpec is gone. It only deals in the native narrow charset and,
until it's gone, we need a way to get from an nsILocalFile to an nsFileSpec.
Look at how commonly this is done in mailnews :-/ Once nsIFileSpec is gone, I'm
not sure how useful the "native" calls will be - besides InitWithXXX.

In terms of code that currently calls PR_GetEnv, I doubt there's much of that in
XP code since env variables don't exist on the Mac (CFM). On Mach-0 they do and,
like everything there, it's utf-8.
Comment 22 Darin Fisher 2002-03-28 09:31:03 PST
-> me, since i'm working on a patch
Comment 23 Darin Fisher 2002-04-02 21:25:00 PST
Created attachment 77390 [details]
preliminary XP_UNIX patch (gzip compressed)

this patch converts:

  readonly attribute string path;
  readonly attribute wstring unicodePath;


  readonly attribute AUTF8String path;
  [noscript] readonly attribute ACString nativePath;

it also includes hand rolled UCS2 to native multibyte charset encoding/decoding
to get around the problem of nsLocalFile methods needing to be executed before
XPCOM has been initialized.
Comment 24 Alec Flett 2002-04-03 11:15:03 PST
actually, is there any chance you can attach the charset-conversion bit to bug
100676, and we can all play with the patch there?
Comment 25 Darin Fisher 2002-04-07 21:49:13 PDT
Created attachment 78149 [details]
preliminary XP_UNIX + XP_WIN patch (gzip compressed)

~550k uncompressed :(
Comment 26 Darin Fisher 2002-04-10 14:55:18 PDT
Created attachment 78619 [details]
preliminary XP_UNIX + XP_WIN + XP_MAC patch (gzip compressed)
Comment 27 Darin Fisher 2002-04-10 15:15:18 PDT
what's left?

1- merge pete collin's XP_UNIX wchar_t conversion stuff
2- make changes to nsLocalFileOS2
3- experimental builds :-)
Comment 28 Pete Collins (MDG) 2002-04-10 15:18:06 PDT
Darin is this patch ready for testing?

If so i'll apply it on a couple of unix boxes and let it back some.

Comment 29 Darin Fisher 2002-04-10 15:26:46 PDT
pete: cool... but i need to hack on the XP_UNIX port some more before it'll
compile everywhere.
Comment 30 Darin Fisher 2002-04-15 18:03:51 PDT
Created attachment 79364 [details]
v1 patch (still needs XP_OS2 impl)
Comment 31 Darin Fisher 2002-04-15 18:04:59 PDT
Created attachment 79365 [details]
v1 patch (-u10w)
Comment 32 Darin Fisher 2002-04-16 12:59:53 PDT
pete: this patch is ready for testing... can you try it out on some other unix
boxes?  thx!
Comment 33 Darin Fisher 2002-04-16 13:06:27 PDT
ran some startup tests under linux with an optimized build, and this patch has
no observable effect on startup performance.
Comment 34 Darin Fisher 2002-04-17 17:40:06 PDT
cls: can you please review my changes?  thx!
Comment 35 hacker formerly known as 2002-04-18 17:24:06 PDT
Since the tests in never check the return value of wcrtomb or
mbrtowc, they should really be link tests instead of runtime tests.  And there's
no need to set the HAVE_<foo> makefile variables if they're never going to be
substituted into nor used anywhere.
Comment 36 Darin Fisher 2002-04-18 19:59:54 PDT
ok, i'll substitute AC_TRY_RUN w/ AC_TRY_LINK and do away with setting the env
vars.  thx!
Comment 37 Alec Flett 2002-04-19 11:32:16 PDT
whew, this patch is a bitch :)
Anyway, it generally looks good. nice work cleaning up the icon stuff, and prefs. 
only wierd thing I've noticed so far is the odd OS/2 change from nsIOServiceOS2
to nsIOServiceWin - was that intentional?

..back to the patch..
Comment 38 Darin Fisher 2002-04-19 12:58:08 PDT
alecf: nsIOServiceOS2.cpp is obsolete.  it is/would-be identical to
nsIOServiceWin.cpp, so i got rid of it.
Comment 39 Darin Fisher 2002-04-19 17:28:10 PDT
Created attachment 80128 [details]
v1.01 patch (updated for trunk, still needs XP_OS2 impl)

fixed some bugs discovered by i18n QA
Comment 40 Darin Fisher 2002-04-19 20:17:41 PDT
Created attachment 80157 [details]
v1.02 patch (fixed some bugs, needs XP_OS2 impl)
Comment 41 Doug Turner (:dougt) 2002-04-22 18:47:17 PDT
Comment on attachment 80157 [details]
v1.02 patch (fixed some bugs, needs XP_OS2 impl)

r=looks good.  lets get it in soon!
Comment 42 Alec Flett 2002-04-23 14:06:33 PDT
Comment on attachment 80157 [details]
v1.02 patch (fixed some bugs, needs XP_OS2 impl)

ok, I haven't finished the v1.01 patch yet, but here are my comments thus far
(at about 90% of the way through) - then I'll do a diff of the old patch and
the new patch to see if there's anything interesting in the new one:

nsStorageRequest: odd non-string changes there
profile migration: persistent descriptor strings aren't utf8, all we seem to be
doing there is copying from one object to another, so it seems safe to me. or
unless this is copying from a nsFileSpec to a nsIFile.. ack. Not sure.

Index: rdf/datasource/src/nsFileSystemDataSource.cpp
     PRInt32	     nameLen = name.Length();
-    nsAutoString	 theURI; theURI.AssignWithConversion(uri);
-    if ((theURI.Find(ieFavoritesDir) == 0) && (nameLen > 4))
+    if ((strstr(uri, ieFavoritesDir) == uri) && (nameLen > 4))

was slightly broken to begin with. Calling str.Find(foo) == 0 is inefficient,
and should have been str.Find(foo, 0, 1) mostly because we really just want to
see if the string STARTS with foo, not if its contained anywhere inside it.

strstr() just continues this inefficiency and makes it harder to fix later..
how about
nsDependentCString theURI(uri);
nsDependentCString ieFavorites(ieFavoritesDir);
if ((nameLen > 4) &&
    (theURI.Length() >= ieFavorites.Length()) &&
    (Substring(theURI, 0, ieFavorites.Length()).Equals(ieFavorites)))

ugly I know, but calculating the length of uri and ieFavoritesDir will be
faster than doing a strstr..(also I moved the nameLen calculation earlier,
because its a cheap comparison)

Index: uriloader/exthandler/nsExternalHelperAppService.h
   // the following field is set if we were processing an http channel that had
a content disposition header
   // which specified the SUGGESTED file name we should present to the user in
the save to disk dialog. 
-  nsString mSuggestedFileName;
+  nsCString mSuggestedFileName;

it isn't always obvious that member vars are UTF8 - mind a little comment, just
saying this is UTF8? (like your other comments indicating native charset

should nsIRegistry::Open use nsACString to avoid losing native charset data?
(or maybe just be [noscript]

-    err = NR_RegOpen((char*)regFile, &mReg );
+    err = NR_RegOpen((char*)regFile.get(), &mReg );

NS_CONST_CAST (or no cast at all?) here?
Comment 43 Alec Flett 2002-04-23 14:50:45 PDT
ok, the rest of the 1.01 patch looks good - the xpinstall stuff is VERY scary,
(necessary, and look correct, but scary nonetheless) and should be run by
dveditz - I caused some xpinstall i18n bugs that might just be solved by this.
see bug 125106 for details.
Now checking the 1.01 -> 1.02 changes
Comment 44 Alec Flett 2002-04-23 14:54:29 PDT
ugh, the diff of the diff is almost as big as the original patch (19000 lines,
close to the original 27000 lines!) - any quick summary as to what changed? Is
it worth my reviewing a second time? :)
Comment 45 Darin Fisher 2002-04-23 15:06:25 PDT
i believe the biggest difference is the addition of an IsValidUTF8 method in
necko (which should probably be moved to the string library at some point).  i
need this function in order to properly convert a file URL to a nsIFile.

look in nsIOService*.cpp and nsURLHelpers.cpp for the related changes.  the
point of this change is that we need to be able to handle file:// URLs that
consist of either escaped native charset chars or escaped UTF-8 chars.  this is
needed for backwards compatibility with bookmarks, etc. from older builds.  we
need to be able to detect both formats because i really really want us to
generate file:// URLs using escaped UTF-8 instead of escaped native chars. 
however, perhaps this change is bad since it might make our URLs incompatible
with other software?!
Comment 46 Gashu 2002-04-24 02:56:52 PDT
Please read this at once if Conversion Problem Between Shift-JIS and Unicode is
not considered.;en-us;Q170559
Comment 47 Darin Fisher 2002-04-24 11:52:40 PDT
Gashu: thx for the link, but the windows unicode conversion code is not going to
be effected by this patch.  mozilla may have the bug you're describing though,
so it'd probably be worthwhile to ping the mozilla i18n folks to find out if
mozilla already works around the problems described in that microsoft problem
Comment 48 Darin Fisher 2002-04-24 23:07:50 PDT
Created attachment 80909 [details]
v1.03 patch (backed out changes to encode file:// URLs using UTF-8)

after discussing this a bit with alecf, we decided it'd be better not to switch
to UTF-8 file:// URLs at this time.  the change would potentially break URL
compatibility with other applications.	better not to complicate this patch
with such changes.
Comment 49 Alec Flett 2002-04-25 13:34:25 PDT
so is there much difference between v1.01 and v1.03?
Comment 50 Darin Fisher 2002-04-25 14:08:27 PDT
no not much difference.  in backing out my nsIOServiceXXX.cpp changes i
discovered a bug that i'd introduced into nsFileSystemDataSource.. a
UTF-8/native-charset mixup.

i also discovered that nsIRegistry::open is not really
xpconnect-safe/scriptable, and because we do call it from JS code, i decided to
replace the |string| path parameter with a nsIFile parameter.  this was a fairly
trivial/isolated change.

so that's all that has changed since v1.01.
Comment 51 Alec Flett 2002-04-25 15:39:25 PDT
Comment on attachment 80909 [details]
v1.03 patch (backed out changes to encode file:// URLs using UTF-8)

excellent - then my sr=alecf stands, with my minor comments, even though I
didn't mark it before (just looked over what I think are the new parts)
Comment 52 Doug Turner (:dougt) 2002-04-25 18:19:14 PDT
Comment on attachment 80909 [details]
v1.03 patch (backed out changes to encode file:// URLs using UTF-8)

Comment 53 Darin Fisher 2002-04-26 10:22:43 PDT
Created attachment 81166 [details]
v1.04 patch (v1.03 + XP_OS2 changes)

changes to nsLocalFileOS2 are nearly identical to nsLocalFileWin.
Comment 54 Darin Fisher 2002-04-26 10:23:29 PDT
Comment on attachment 81166 [details]
v1.04 patch (v1.03 + XP_OS2 changes)

carrying forward r=dougt and sr=alecf
Comment 55 Daniel Veditz [:dveditz] 2002-04-29 04:03:55 PDT
Dammit! ever hear of "Module Owner Review"? Failing that what about Alec's
comment 43 where he suggests you run this by me because I was dealing with a
specific file/character-set bug?

I may be the most pissed person you blindsided because I was in the middle of my
own big patch that I now have to largely re-do (and any concept of "baking on
the trunk" is out the window because it'll be a completely different patch on
the branch), but I'm sure I'm not the only one. How can you change something so
fundamental with absolutely no announcement that it's coming? (a landings page
item at level 0 doesn't count) Thank you for changing all the existing code, but
without any attempt at education what's to keep people from writing new code
treating the same-named routines in the old and now incorrect way? What about
folks like me (or anyone writing JS) who have unicode strings which now must get
converted to utf8 so that nsLocalFile can internally convert them back to
unicode at which point they'll be converted to the filesystem charset? Yeah,
that'll help our performance and heap abuse problems. Caltrans road closures are
more fun than this.
Comment 56 Doug Turner (:dougt) 2002-04-29 06:32:07 PDT
>Dammit! ever hear of "Module Owner Review"? 

The current rule of law has been large systemactic changes such as this do not
have to go through module review - if they did most of the API changes would
have been much more expensive in terms of engineering time.  Furthermore, I am
the co-module owner of xpinstall, so ha! :-)

> How can you change something so fundamental with absolutely no announcement
that it's coming?

Dan, on 3/12 I posted to the xpcom newsgroup a proposal to freeze the nsIFile
as-is.  The posting cited bug 99160.  If you had cc'ed yourself on this posting,
you would have been clued in to our changes.

Sorry that our changes intersect.  Is there anything that either Darin or I can?
 I can not believe that the changes required by any existing code will take more
than a few minutes to fix.

>What about folks like me (or anyone writing JS) who have unicode strings which
now must get converted to utf8 so that nsLocalFile can internally convert them
back to unicode at which point they'll be converted to the filesystem charset?

xpconnect does the right conversions.  If you have a string in js, it will be
correctly converted into either unicode or utf8 depending on the type of
parameter declared by the interface.

>Yeah,that'll help our performance and heap abuse problems. Caltrans road
closures are more fun than this.

Please do not just go off the deep end and start saying this will effect
performance without measuring!  There is no speed difference using mozilla, and
if fact, these changes may improve performance due to less copying of strings.

If anything, Dan, fixing this bug in the way we did will prevent future charset
Comment 57 Daniel Veditz [:dveditz] 2002-04-29 12:42:24 PDT
Posting to xpcom does not constitute an announcement.

Other large landings manage to ask module owners for quick sanity check reviews,
or at least contact them as a warning.

In this case specifically your super-reviewer asked that you run the "scary"
xpinstall changes by me first since he knew we were already dealing with bustage
in that area.

When you say this has no measurable impact on performance, do you say that after
having tested it on a file-intensive part of the product like XPInstall? Did you
test on a localized OS with non-ASCII paths, the folks who will be feeling the
pain from this?
Comment 58 Doug Turner (:dougt) 2002-04-29 13:05:54 PDT
> Posting to xpcom does not constitute an announcement.

What?  Come on, Dan!  The newsgroup is the single point of all mass
communication for modules.  (Although, that really isn't the point that we are

> In this case specifically your super-reviewer asked that you run the "scary"
xpinstall changes by me first since he knew we were already dealing with bustage
in that area.

Yup.  Sorry that this wasn't run by you.  Let me know if Darin or I can help you.

> Did you test on a localized OS with non-ASCII paths, the folks who will be
feeling the pain from this?

IQA ran builds with these changes.

Comment 59 Darin Fisher 2002-04-29 13:54:10 PDT

my appologies for not running these changes by you.  this landing was admittedly
rushed in order to get "bake time" on the trunk.  there was a lengthy discussion
about this bug on friday with drivers (jud, asa, and chofmann)... and the
conclusion was that this could not wait any longer, or it would completely miss
the boat.  as a result i focused entirely on making this a clean landing and
completely forgot about contacting you for a final review  :-(

as for the conversion issue, on unix and mac we can actually convert directly
from UTF-8 to native charset and vice versa, using OS calls.  i didn't add this
code to the patch because i didn't have time... this patch was about making the
right API changes now in order to avoid being stuck with a bad API later. 
moreover, this patch currently avoids a lot of conversions when the input
strings are plain ASCII.  moreover, on systems that use UTF-8 as a native
charset, there is no conversion necessary!  i haven't added the code to check
the native charset on each of the platforms.  i've only done this so far for

also, the switch from UCS2 to UTF8 was well discussed (and not just between
alecf, dougt and myself).  this was discussed in depth between many of the sr's
including brendan, shaver, and jband as well as other sr's during a past sr
monthly meeting (at that time, it was in the context of URLs and the
appropriateness of UTF-8 in mozilla in general).  brendan and shaver both
explicitly supported these nsIFile API change, as did dougt and alecf
(obviously), so i felt like we had pretty good concensus that this was the right
thing to do.

also, this patch fixed a large number of nsIFile related leaks in the installer
code.  iirc, in one of the toString implementations, i counted a dozen or so
leaks of GetPath.

it's not too late to make further changes to nsIFile, so if you have suggestions
on changes that we should be making or things that you feel we are definitely
doing wrong, then please voice them here.
Comment 60 (not reading, please use instead) 2002-04-30 00:04:41 PDT
I think this change caused some existing code to go into an infinite loop.

see, and read down.

the problem has been fixed, but I've sent private mail to darin and alecf, so 
we can figure out why, in case other infinite loops might be lurking.
Comment 61 Dan Mosedale (:dmose) 2002-04-30 17:40:46 PDT
I'd like to request that the doxygen comments in nsILocalFile.idl for
InitWithPath and InitWithNativePath be significantly beefed up.  It's not at all
clear from reading them what the exact problem on Mac is and how it should be
dealt with, nor is it clear when one of these functions would be preferred over
the other (what's a NativePath?).
Comment 62 Darin Fisher 2002-04-30 17:56:37 PDT
yup, definitely.
Comment 63 Asa Dotzler [:asa] 2002-04-30 23:48:17 PDT
this is post RC2 if it makes it. removing from RC2 list.
Comment 64 Asa Dotzler [:asa] 2002-05-02 10:57:13 PDT
readding this to RC2 list
Comment 65 Darin Fisher 2002-05-02 13:26:10 PDT
note to self: branch patch also needs patch for bug 141673.
Comment 66 Darin Fisher 2002-05-02 23:13:55 PDT
Created attachment 82171 [details] [diff] [review]
documentation changes that i checked in on the trunk
Comment 67 Darin Fisher 2002-05-10 13:51:14 PDT
Created attachment 83089 [details]
branch patch (including all subsequent/related bugs)
Comment 68 Darin Fisher 2002-05-13 16:32:17 PDT
Comment 69 Jay Yan 2002-05-28 20:52:11 PDT
can anybody help to have a look at a regression bug 147333? thanks
Comment 70 Sergei Dolgov 2002-07-02 07:41:46 PDT
Problems on BeOS.
go, for example, to Russian page

Try to "Save Page As".
In file save panel you'll see something weird.
If you try to save page with pure ASCII in <TITLE> but with spaces - you'll get
"escaped" name there, which is very stupid for BeOS, becuase it handles UTF-8 as
native and only true encoding for file names.

I think that problem consists of 2 facts:
1)No real locale support in BeOS. No $LANG environment variable, no locale
valur. Only proper internal encoding for all languages in BeOS is UTF-8.
2)in XPCOM/io BeOS is considered as Unix. But no specaial case for BeOS in or anywhere else. So it tends to fallback everywhere.
Comment 71 Conrad Carlen (not reading bugmail) 2002-07-02 07:58:31 PDT
From the sounds of that, BeOS needs its own nsLocalFile impl. I had to do that
for the Mac Mach-0 build because its native encoding is UCS2 when using the Mac
file APIs or UTF-8 when using the Unix calls.

Note You need to log in before you can comment on or make changes to this bug.