Closed Bug 94323 Opened 23 years ago Closed 22 years ago

Reorganize nsIFile for API Freeze

Categories

(Core :: XPCOM, defect, P1)

defect

Tracking

()

RESOLVED WONTFIX
Future

People

(Reporter: pete, Assigned: pete)

References

(Blocks 1 open bug)

Details

(Keywords: embed, topembed-)

Attachments

(23 files)

4.08 KB, text/plain
Details
1.34 KB, text/plain
Details
143.20 KB, patch
Details | Diff | Splinter Review
238.68 KB, patch
Details | Diff | Splinter Review
251.16 KB, patch
Details | Diff | Splinter Review
364.83 KB, patch
Details | Diff | Splinter Review
365.51 KB, patch
Details | Diff | Splinter Review
365.58 KB, patch
Details | Diff | Splinter Review
2.04 KB, patch
Details | Diff | Splinter Review
379.78 KB, patch
Details | Diff | Splinter Review
2.36 KB, text/plain
Details
382.41 KB, patch
Details | Diff | Splinter Review
382.94 KB, patch
Details | Diff | Splinter Review
2.20 KB, text/plain
Details
382.71 KB, patch
Details | Diff | Splinter Review
382.99 KB, patch
Details | Diff | Splinter Review
1.08 KB, patch
Details | Diff | Splinter Review
1.15 KB, patch
Details | Diff | Splinter Review
4.42 KB, patch
Details | Diff | Splinter Review
5.75 KB, text/plain
Details
5.27 KB, patch
Details | Diff | Splinter Review
6.17 KB, patch
Details | Diff | Splinter Review
3.53 KB, text/plain
Details
Summary says it all.

--pete
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla0.9.5
Blocks: 92569
Blocks: 57995
Blocks: 92329
Blocks: 53596
isSymlink() method will not be implemented using the followLinks bool check.
Doesn't make sense for it to.

--pete




I also implemented getLeafName to follow links and it works as you would expect.

My question is should we?

This is going to introduce a lot of bugs and a layer of confusion to callers,
since by default followLinks is set to true. That means that if we want the
leafName of our current initialized file, we would get the targets leafName not
the files leafName. 

I can hear code breaking now.

--pete
> I also implemented getLeafName to follow links and it works as you would expect.

As long as it obeys the followLinks attribute.

> My question is should we?

We should. If somebody really wants the file to refer to the link and not the
target (rare I would think), they can call SetFollowLinks(PR_FALSE). Of course,
when this is done, the caller may have to get the current state of that
attribute and restore it when done - depending on the scope and ownership of
that file. 
I'd like to discuss moveTo for a minute. In the jslib version i re-init the
class so it reflects the newly moved file. 

I'd like to do the same here and set mPath to the new location.

Anyone disagreew/ this idea?

--pete
Update:

I will be getting back to this this week. I haven't had much time at all to work
on this. Arg!

followLinks is implemented for the most part across all methods and is tested
fully on unix.
--pete
I am ready to rip out the unicode specific methods aka initWithUnicodePath,
unicodeLeafName, unicodePath etc . . .

The problem is nothing has been resolved as to how to deal w/ strings in the
local file path world. 

What type of args should we be using for char *'s?

Should i just have these methods take a PRUnichar * for now which will add a
dependency to xpcom standalone?

Any suggestions?

CC'ing some more people for feedback. 

--pete
  I think they should take an AString or a wstring. Point is, it should be a
wide string.
  On Win32 (NT or 2000) or Mac (System 8.6 and an HFS+ format disk), the OS file
system can use unicode strings. This is extremely preferable to doing the
conversion ourselves. Whenever we convert from unicode to what we think is the
FS charset, there is possible data loss in that conversion. In odd cases, like
booting the system in one language, making a file, then switching languages and
trying to specify that file in the now current FS charset, we can go wrong. For
file systems in which the internal format is unicode, we should always use
unicode and never convert. No conversion, no data loss, and we're not dependent
on our i18n code - it's not needed.
  On systems which cannnot use unicode to specify files and we're forced to do
conversion, we can do the conversion using OS facilities. glibc has this (not
sure since which version) Also in this case, we're not dependent on our i18n code.
  CC'ing ftang - we were talking about this today. 
I need to be able to still pass c string constants.  Do not break this.
Why do you need to pass c-string constants? Why not just pass
NS_LITERAL_STRING("foo").get()?
Or, if you take nsAString& instead of PRUnichar*, just drop the .get().
an nsAString parameter would be okay.

If you need any help with that or have questions, don't hesitate to ask.
sorry for the spam, changing summary from "Reorganize nsIFiles for API Freeze"
because it's bugging me.
Summary: Reorganize nsIFiles for API Freeze → Reorganize nsIFile for API Freeze
Ok, jag, if i use nsAString& as my param, then i'm using DOMString as my idl
param and including domstubs.idl. 

Is this correct?

Or is there another idl param i should be using in this case?

Thanks

--pete

pete: use AString in .idl files.

/be
http://lxr.mozilla.org/seamonkey/source/xpcom/base/nsrootidl.idl#75

Basically that means you can use AString for the type and don't specifically
need to include any idl.
Ok, i just want to clarify this using a simple example.
Below I stat a file using nsAString as my param.

NS_IMETHODIMP
nsLocalFile::Foo(const nsAString& aPath)
{
    struct stat file;

    if (stat(NS_ConvertUCS2toUTF8(aPath).get(), &file) < 0)
        printf("stat failed\n");

    else
        printf("file.st_size = %d\n", (int)file.st_size);

    return NS_OK;

}

This example works exactly as it should. 
Is the useage correct in this sample?
If so, then this is how i will proceed.

Thanks

--pete

Yup. That looks good, assuming stat supports UTF8.
This is right assuming UTF-8 is the file system char set. For Unix, that's fine.
For other platforms, it's a bit more work. For Mac & Windows, file system
routines which take strings take Unicode *but* not all versions of the OS. You
need to check at runtime and see if it's supported. For Mac, I have bug 95481 so
I'll deal with that platform when you have a patch for nsIFile/nsILocalFile.idl.
Yea, Conrad, unfortunately i can't do the mac and win implementations for I only
have a unix box at my disposal right now. I do plan on setting up a mac and win
box as soon as i have income again.  ;-)

--pete

Jag, another question. 

I'm converting the method Append over to nsAString.

There are many accessors using this method.

->Append(my->foo);

nsCAutoString myString;
->Append((char *)myString);

so i've been changing the calls over to

->Append(NS_LITERAL_STRING(my->foo));

->Append(NS_LITERAL_STRING(myString));

Is this the correct way to do it?

Thanks

--pete
When you have a literal string, e.g. "foopy", you want to use
NS_LITERAL_STRING("foopy");

If you have a nsCString, you'll need to use NS_ConvertUTF8toUCS2. I'd look some
more into the code though to see if perhaps you'd end up converting back and
forth between UCS2, and see if you can short-circuit that.
But, char *'s 

or 

static const char foo[] = "foopy.dat"

should all use NS_LITERAL_STRING correct?

Thanks

--pete
> i can't do the mac and win implementations for I only have a unix box at my
> disposal right now.

Don't worry about the Mac - I'm on it.

Before getting this far with the implementation, can  you post the patches to
the API so that people doing the implementation work on other platforms can code
to them?

Anything that used to end up calling Append(const char* aPath) can, if you don't
want to spend too much time, be wrapped in an NS_ConvertUTF8toUCS2 and you're
done. However, for literal strings it's cheaper on some platforms if you use
NS_LITERAL_STRING.

So:

  file->Append("blah")

becomes

  file->Append(NS_LITERAL_STRING("blah"))

and everything else, e.g.:

  // nsCString somePath;
  // const char* somePath;
  // const char somePath[] = "foopy";
  file->Append(somePath)

becomes

  file->Append(NS_ConvertUTF8toUCS2(somePath))

Though you might want to replace somePath[] = "foopy" with an NS_LITERAL_STRING,
or a #define somePath NS_LITERAL_STRING("foopy") if it's used in many places.

Until I fix bug 73292 (soon, I hope), you'll need to add a .get() for the
nsCString cases.

Want to continue this conversation in private mail to keep bug spam down? We can
c&p "highlights" afterwards :-)
Attached file nsIFile.idl
Attached file nsILocalFile.idl
Conrad, these would be the theoretical idl files.

I am working method by method and am changing the interface one method at a
time, fixing what breaks and then moving on to the next one.

--pete

 attribute AString persistentDescriptor;

This return type on this needs to remain "string" This is because this data is
not nesc a path (it is on some platforms, not on others). It's stored in prefs
and registry entries so what that accessor takes and returns needs to binary
compatible with what it's done in the past.
Attached patch current snapSplinter Review
patch #47051 is for my own reference.

--pete

-              nsAutoString  fileURLUnicode;
fileURLUnicode.AssignWithConversion(docURLSpec);      
-              res = webShell->SetURL(fileURLUnicode.get());
+              res = webShell->SetURL(docURLSpec.ToNewUnicode());

FYI, this (kind of) change will leak the newly created unicode buffer. That's
the reason they're doing this whole thing with nsAutoString and
AssignWithConversion. What you could do is something like:

+              res = webShell->SetURL(NS_ConvertASCIItoUCS2(docURLSpec).get());

This will use the exact same conversion, though I think they should be using
NS_ConvertUTF8toUCS2 here, but that's a different bug.

Also, the member functions ToNewUnicode, ToNewCString and ToNewUTF8String are
deprecated.
Instead you should be using the global functions in nsReadableUtils.h. Thus:

    *aResult = myString.ToNewUnicode();

becomes (don't forget to #include nsReadableUtils.h):

    *aResult = ToNewUnicode(myString);
I have a patch which switches all existing uses over from the one to the other
form. I hope to check that in soonish (need to get r= and sr= first of course).
Jag, thanks for catching those. I'll fix them now.
Yes, i only used ToNewUnicode() in one place. 

Is the ultimate plan to get rid of char*'s altogether and move everything over
to nsAString? 

Thanks

--pete
> Is the ultimate plan to get rid of char*'s altogether and move everything over
> to nsAString?

Yes. With the exception of persistentDescriptor. See my comment from 08/22. A
persistentDescriptor shouldn't be mistaken for or used like a path. It may look
like one on Unix & Windows but on the Mac, it's a base64 encoded file alias.
It's supposed to be treated as opaque data which is pointed to by a char*. Also,
there are many of these written into existing registries and prefs so this can't
be changed. Not to nag about this again but since I pointed it out last, it's
still a wide string in the latest patch. 
Conrad, i was actually referring to the entire mozilla codebase, not just
nsIFile in particular. It seems a lot of the surrounding implementations i see
would be better suited using nsAutoString.

Reguards to persistentDescriptor are you saying it needs to be a wide string and
not a regular string like it is now? I haven't changed the interface for
persistentDescriptor.

http://lxr.mozilla.org/seamonkey/source/xpcom/io/nsILocalFile.idl#107

--pete
Conrad: I think the question was asked in a broader context.

Pete: In which case the answer would be no. But we will be moving a lot of char*
usage over to nsACString (note the C) or one of the specialized classes that
inherits from it.

Just FYI (though I think you've realized this already), ToNewCString() and
ToNewUTF8String() also create new buffers, which also need to be freed manually,
or avoided where possible.

Regarding persistentDescriptor, I think what Conrad meant is that it's really a
byte*, but we're playing a few tricks here to make it "work" with javascript.

Pete: Sorry, I completely mis-read. This is right:
+    attribute string persistentDescriptor;
I must be seeing imaginary things, looking at the wrong patch, etc.

Blocks: 77833
Attached patch currnet snapSplinter Review
The good new is i have almost all of the param changes finished.
It compiles cleanly.

The bad news is a can only get xpcshell to run.

I am having a problem. I beleive somewhere there is a call to free or alloc that
is passing the wrong pointer ala . . .

in free(): warning: junk pointer, too high to make sense

So i have pretty much been in regression mode.

If anyone can spot an error in the huge patch i attached that would be great.

I have been up and down it and anywhere i removed a call to free, the char* was
replaced w/ a nsAutoString.

So, anyway progress is halted while i track this bug down.

Thanks

--pete


Index: editor/base/nsEditorShell.cpp

+  nsAutoString  fileURL; 
+  fileURL.Assign(inFileURL); 

Code like that you can short circuit to:

nsAutoString fileURL(inFileURL);

But, since your only use of the nsAutoString there is to be a string container
around the const PRUnichar* passed in, you really want nsDependentString.

               nsXPIDLString  leafName;
-              docFile->GetUnicodeLeafName(getter_Copies(leafName));
+              docFile->GetLeafName(leafName);

So, this might compile (shame on us) but currently won't work (it will at some
point in the future).

/me goes to stand in a corner.
Heres on that will issue the error.

myXPIDLCString.Adopt((char*)NS_ConvertUCS2toUTF8(myAuto).get());

needs to be:

myXPIDLCString.Adopt(ToNewCString(myAuto));

--pete



Yep, that would do the trick :-) Nice catch.

Whenever you need to cast with strings, ask yourself what's wrong. Could be the
interface you're using forces you to use a |char*| even though they really meant
a |const char*|, could be you're using strings differently than you should.
Ok, i got her running again. Yea!
I am still getting a butt load of:

mozilla-bin in free(): warning: junk pointer, too high to make sense.

Error messages 

but she runs . . .

I need to diligently go through all these calls again and catch more mistakes.

--pete

This comment from nsRegistry.cpp is especially true now (to the point where it
needs to be fixed)

+                registryLocation->GetPath(regFile);  // dougt fix...
                 // dveditz needs to fix his registry so that I can pass an
                 // nsIFile interface and not hack 

Now, we get back a Unicode path. Down a few lines, it gets converted like this:
+    err = NR_RegOpen((char*)NS_ConvertUCS2toUTF8(regFile).get(), &mReg );

Say the path has non-ACSII chars in it - is NR_RegOpen going to do the right
thing with that utf-8 string it's getting passed? It looks like it just gets
passed through to fopen which I doubt is going to do the right thing on all
platforms. The registry should take an nsILocalFile on which it could use
OpenANSIFileDesc. Then we wouldn't have to do this conversion. One little
problem - the registry is written in C.

Conrad you raise a good point here. Another issue, there is still a *lot* of
code using nsIFileSpec.

This all needs to be accomplished in tiers.

So i think what needs to be done is get the nsIFile interface solid, let it bake
 in the oven for a little while and then check it in.

Fix all accessor code that is implemented poorly. Your example is a good one.
Let it bake and check it in.

Then we need a serious performance and efficiency audit. There are over 3,000
calls to InitWithPath on startup. Tighten the interface up and get is lean and
mean. Let it bake in the oven and check it in.

So i think it would be best if we break out the tasks at hand into logical
divisions by order of importance.

After the new interface changes are solid and bug free and checked in, i'm real
interested on an audit level focus on performance and efficiency. 

"A Lean, mean, I/O machine"

It all needs to get done. 

--pete




Here we go.

Making an assignment like this is bad:

leafName = (char*)NS_ConvertUCS2toUTF8(autoleaf).get();

I change to this:

leafName = ToNewCString(autoLeafName);

We are good. No more of free junk pointers errors from malloc.

I have a bunch of these i need to fix.

--pete


Doh!

I should've pointed you at our current string guide/FAQ, it lists most if not
all of these.

http://lxr.mozilla.org/seamonkey/source/string/doc/string-guide.html
Jag, i knew about the guide. I just need to read it. ;-)

--pete
> Fix all accessor code that is implemented poorly. Your example is a good one.
> Let it bake and check it in.

Most of these cases need to be fixed before this gets checked in. If we fail to
open the registry file on a system where the path contains a non-ASCII char, or
lose people's mail, that's a showstopper. Many occurances of
NS_ConvertUCS2toUTF8 in the patch are likely to cause this.

This is a huge amount of work and you've already done a lot. I don't mean to
heap more on you in order to get this done. I'd say file bugs against others and
make them block this one.

One stopper is a conversion like this:

+  nsAutoString pathBuf;
+  aFile->GetPath(pathBuf);
   res = NS_NewFileSpec(getter_AddRefs(tempSpec));
   if (NS_FAILED(res)) return res;
-  res = tempSpec->SetNativePath(pathBuf);
+  res = tempSpec->SetNativePath(NS_ConvertUCS2toUTF8(pathBuf).get());

Ideally, nsFileSpec would die and we wouldn't have this problem. There is a bug
on that and it could block this one. In the short term, there is a workaround:
nsFileSpec has an assignment operator which takes an nsString&. If pathBuf was
just assigned to an nsFileSpec, it goes through Unicode conversion to the FS
charset.

Here is another bad one i just fixed a bunch of:

localFile->Append(NS_LITERAL_STRING(PREFS_FILE_50_NAME));

--pete

(BTW: nice attachment table w/ the new version of bugzilla)
Is that the fix, or broken?

If it's about having a macro inside another macro, I believe that in the
NS_LITERAL_STRING case it all just work because of the NS_L macro used inside
that. (yes, the string-guide needs updating ;).
It's getting there. I have some more bad assignments I'm zooming in on.

jag is the conversion to nsAutoString here going to increase the amount of
memory mozilla uses?

How efficient are these string classes? 

Do they clean themselves up after being used?

--pete
===== On nsAutoString:

nsAutoString is an object which in typical usage gets allocated on the stack,
usually as a short-lived helper object (read: only extra bloat, if any, for the
duration of the function). Internally it has a 64 unit (char or PRUnichar)
buffer (also allocated on the stack) which it will use unless/until it needs
more, in which case it will allocate. If we do allocate, we now have a 64 unit
buffer (so either 64 bytes or 128 bytes) of "wasted" stack memory for the
duration of the function. When the nsAutoString is destroyed, i.e. when you
leave the function/code block it was constructed in, it will deallocate the
buffer it allocated.

The advantage of nsAutoString really comes when your strings are typically up to
64 units, or when you build up strings (through Append or += ) and know normally
not to exceed that limit. Since it already has a buffer (and on the stack!) it
won't need to do expensive (re)allocations.

If it did allocate a buffer, it will keep using it until you tell it to
deallocate it (SetCapacity), or it needs a larger buffer. The direct benefit of
this is that you can reuse the nsAutoString and assign a smaller string into it
without a penalty (see below for more on this).

If you skip the bit about having a 64 unit buffer on the stack, you have your
nsString :-)

If you know your string usage to normally exceed the 64 unit limit and you're
going to be reusing the string object, or appending into it, you can use
something like this:

nsString foopy; foopy.SetCapacity(256); // allocate once (255 + \0)

This will give you a string with an internal buffer of 256 units (though we
don't promise it, it's more a hint to the string implementation of what you're
about to do with it, see nsAString.h). You can now for example pass it to a
function which you know to give you a string less than 256 units (if it's
larger, that's fine, it will just need to reallocate), and then append to that
string without having to reallocate as long as you stay within those 256 units.

Another thing you can do is iterate over a list of strings with variable length
and only have to allocate once (if the longest string fits within the capacity
you set).

That's a lot of theory up there. Please ask questions if something's not clear).

To summarize, I don't think memory usage will increase much (after you've gotten
rid of a bunch of unnecessary conversions back and forth) and may even decrease.
The string classes allow efficient use, but it's up to the programmer to know
what to use when. And yes, they do clean up after themselves and don't make it
easy for you to run into trouble with that (though it's certainly possible).

===== On unnecessary string code/copies:

+                nsAutoString fileurl;
+                file->GetURL(fileurl);
+                newValue = fileurl;

newValue here is an nsAutoString, so you can just as easily pass in newValue. I
guess this falls under your rule of "try to do 1:1 conversion first, then go
clean up redundant string copies / string class usage"


===== On charPtr != 0 vs. !IsEmpty():

char* foo; // do something with foo

if (foo) {
  nsCAutoString foopy(foo);
  CallSomethingWith(foopy);
}

If you change this to:

nsXPIDLCString foo; // do something with foo;

if (!foo.IsEmpty()) {
  nsCAutoString foopy(foo);
  CallSomethingWith(foopy);
}

You've actually created different code.

|!foo.IsEmpty()| for flat strings is equal to |foo != 0 && *foo != 0| (so you
have a pointer to a buffer, and the buffer is not an empty string). The original
code above allowed for the buffer to be an empty string, which the converted
code doesn't.

Sometimes this is exactly what you want (and was poorly expressed, aka a bug, in
the original code), sometimes it doesn't matter (because CallSomethingWith
subsequently does a test for empty string, or when it's a no-op like
str.Append("")), and sometimes you just plain want to allow for empty strings.

With nsXPIDLString, which covers about 90%+ of the cases, you're fine, since a
.get() call on it will tell you whether a buffer exists or not. Maybe for
specific code there's a cleaner way, but .get() will get you the 1:1
translation. If you're dealing with a nsString or nsAutoString however, that
will always have a buffer, so you'll need to give such conversion a little more
thought.

===== On FooWithConversion:

+        nsAutoString essenFiles; essenFiles.AssignWithConversion(ESSENTIAL_FILES);
+        pathToCleanupUtility->Append(essenFiles);

Try to avoid FooWithConversion, they're going away (since they're hiding for the
UCS2 -> ASCII case that you're throwing away bits)

In this case, you can either use:

pathToCleanupUtility->Append(NS_LITERAL_STRING(ESSENTIAL_FILES));
Or

#define ESSENTIAL_FILES NS_LITERAL_STRING("Essential Files")
pathToCleanupUtility->Append(ESSENTIAL_FILES);


More later, when I hope to have more time.
jag, thanks for the great info. It would serve well to perhaps post it into the
the implementors guide so hackers like myself can reference it.

Thats good to know about "AssignWithConversion" i'll go fix those now.

Thanks

--pete
Conrad, can you explain this code in nsProfile.cpp is doesn't make any sense.

http://lxr.mozilla.org/seamonkey/source/profile/src/nsProfile.cpp#811

First there is the creation and assignment of a nsAutoString to null.

nsAutoString currProfileDirString;
currProfileDirString.AssignWithConversion(strtok(NULL, " "));

w/ this assignment it will always be null.

Then right after there is a test to see if the AutoString is not empty.

!currProfileDirString.IsEmpty()

all so a null PRUnichar nsILocalUnicodeFile can be created?

Can you clarify this code for me?

Thanks

--pete


Attached patch current patchSplinter Review
jag, in this latest patch, i have about 19 errors from malloc.

mozilla-bin in free(): warning: junk pointer, too high to make sense.

Can you give it a once over and see if you can catch any.

I'm still lookin.

I also got rid of any calls i made to AssignWithConversion.

Thanks

--pete
these mods are far reaching enough that they warrant a branch IMO. once you're
ready, people can pull that branch on various platforms and beat on it. can you
head toward a branch?
Yea, branching is a good idea, however i'm not ready quite yet.

Once i branch, i'm not going to have the benefits of merges when i update.

Lets shoot for a branch sometime this week.

Thanks

--pete
man strtok:

NAME
       strtok, strtok_r - extract tokens from strings

SYNOPSIS
       #include <string.h>

       char *strtok(char *s, const char *delim);

       char *strtok_r(char *s, const char *delim, char **ptrptr);

DESCRIPTION
       A `token' is a nonempty string of characters not occurring in the string
delim, followed by \0 or by a character occurring in delim.

       The strtok() function can be used to parse the string s into tokens. The
first call to strtok() should have s as its first argument. Subsequent
       calls should have the first argument set to NULL. Each call returns a
pointer to the next token, or NULL when no more tokens are found.
-----

So, the strtok(NULL, " ") will continue where the last strtok (one line higher)
left us.

The equivalent of this code would be something like:
nsACString::const_iterator start, end, iter;
cmdResult.BeginReading(start); iter = start;
cmdResult.EndReading(end);

FindCharInReadable(' ', iter, end);
NS_ConvertASCIItoUCS2 currProfileName(Substring(start, iter));
while (iter != end && *iter != ' ')
  ++iter;
start = iter;
FindCharInReadable(' ', iter, end);
NS_ConvertASCIItoUCS2 currProfileDir(Substring(start, iter));
Except I need to check in the nsACString& constructor for NS_ConvertASCIItoUCS2
first.
just an update.

I have converted most of the nsLocalFile internal implementations to use
autostrings. In doing so, a lot of the implementations now work and are a lot
smarter and just "do the right thing".

For example SetLeafName on unix is flawed. 

Currently you can do this:

js> f.leafName='/foo/bar/baz/';
/foo/bar/baz/
js> f.path;
/tmp//foo/bar/baz/

this is bad because you are now using this setter like appendRelativePath.

Now we do the right thing:

js> f.leafName='/foo/bar/baz/';
/foo/bar/baz/
js> f.path;
tmp/foo
js> 

These implmentations should translate nicely over to windows and mac because of
the autostring mechanics used.

--pete



It's all starting to come together now. 

One weird side affect using nsAString as a param.

In the methods copyTo and moveTo if you want to use the original leafname of the
current file, you now have to pass an empty '' instead of null.

f.copyTo(d, ''); *not* f.copyTo(d, null);

This will no doubt lead to confusion for implementors. 

I guess clear documentation will solve this problem.

--pete
Status: 

I have been testing the shit out of the individual nsIFile implementations and
fixing a lot of tough problems mostly related to symlinks and are long standing
lingering issues.

It is shaping up. I wish to continue testing and then early next week perhaps we
can cut a branch for this work.

--pete
Attached patch curent snapSplinter Review
Attached patch currentSplinter Review
Jud and company. 

I am going to be tied up for the next three weeks solid. I landed some contract
work porting windows code to linux. 

I doubt i will be able to touch this until after then.

Thanks

--pete
Attached file nsLocalFileUnixDefs
Blocks: 100676
pushing back
Target Milestone: mozilla0.9.5 → mozilla0.9.7
Conrad, have we decided what args we are going to use here?

Keep nsAString? or use PRUnichar?

I have a little time now and want to get this patch going again.

--pete
> Keep nsAString? or use PRUnichar?

I like nsAString.
Ok, this last patch posted looks good on unix. I have a few calls to free i need
to track down but it runs and is solid.

Next, i plan on taking out the entire io dependency on uconv. At this point
using this patch it is not needed.

I would tackle the win32 stuff, but i don't have a win machine. ;-)

After things are tight, i paln on finishing a brutal test suite.
Then i will start to clearly document the API's.

I obviously can't do this in succession due to time restraints, but it will get
done and it will be rock solid when finished.

--pete
NS_NewLocalFile(const char* path, PRBool followLinks, nsILocalFile* *result);
openANSIFileDesc(in string mode);
NS_GetSpecialDirectory(const char* specialDirName, nsIFile* *result)
attribute string persistentDescriptor;

These are the methods i have left to convert.

persistentDescriptor remains as string.

What about the others?

ok to convert over to AString?

Thanks

--pete

> openANSIFileDesc(in string mode);

I'd keep that one as a c-string. It's just a string like "w", "r", etc and
having it as a Unicode string would just mean the impl would have to convert it
before passing it to fopen().

> NS_GetSpecialDirectory(const char* specialDirName, nsIFile* *result)

This has to remain as a c-string. The specialDirName is an nsIProperties
property name (which are c-strings), not a file name. You could rename it to
"dirPropName" though.
pete - I filed a bug for the uconv dependency in nsI*File a few weeks ago - bug
100676.. nobody has done any work there though. do you want to just roll that
bug into this one or work on it over there? Glad to hear you're tackling this,
hopefully I can help on windows..
Yea, Alec roll that into this one. 

Thanks

--pete
Conrad, i think what we are shooting for in this bug is landing it in stages.

1- API change
2- removing FileSpec
3- removing uconv from io

Sound good?

--pete
I was thinking it would go in the reverse order from that.

If we change all methods on nsIFile to be Unicode-only, we are forcing all
consumers to use methods which go through Unicode -> FS charset conversion which
has problems. Getting rid of that conversion is, to me, the reason for making
this API change, but that requires change to the back-end impls of nsLocalFile.
Conrad, since unix supports UTF-8 i should be right on track here. The first
thing i did was make the changes to the implementations.

You said you have the mac stuff working, so we just need windows implemented to
land to API change first.

Are you saying that there are problems using NS_ConvertUCS2toUTF8 on unix
platforms as well?

--pete

I have to agree with conrad - I don't see the rush in switching the API if the
code behind the correct APIs are not doing the right thing.
As conrad explained to me:
1) fix the implementation behind the unicode side to call the appropriate
OS-specific unicode-based system calls. This requires platform experts.
2) fix the consumers in a way that makes sense within the context of each
consumer
3) when consumers are completely migrated, then change the APIs.

Why? (this is mostly from conrad, but I went to see for myself, and I am with
him on this) your patch has a lot of NS_ConverUCS2toUTF8, which means that we're
assuming that
  a) the filesystem can handle UTF8 and isn't just going to barf on the high-bit
characters
  b) that the filesystem is in fact utf8. Imagine if the filesystem's charset is
not utf8.. then we're writing out this odd character string using the |const
char*|-based system APIs. Other programs in the system may try to read that
filename as if it were the in the system charset, and get a garbage filename.
There are pleanty of other strange scenarios that I can think of as well.

> 1) fix the implementation behind the unicode side to call the appropriate
> OS-specific unicode-based system calls. This requires platform experts.

Most unix systems pre glib 2.2 do not have unicode-based system calls or UTF-8
environment locales. At best there may be some partially implemented wchar.h
libs. All that most current systems have is the `wcstombs' family of methods to
do wide char conversions. 

So the only logical solution is hand the methods UTF-8 encoded characters since
they are directly based on the ASCII encoding. In most cases this will be the
basic latin character set anyway so there is no problem.

If the machine is using say a Japanese character set for the system locale and
is writing these characters, again no problem, we are just handing off the UTF-8
encoded characters in wide bytes, the kernel doesn't know the difference.

The problem arises when say a system that can only handle basic latin is handed
the UTF-8 high bit encodings of Japanese characters. They are now UTF-8 encoded
chars that can't be mapped to the systems locale.

In this case, it is a system level problem. There is just no way to interpret an
extended char set if the system doesn't have the locale support for it.

2) fix the consumers in a way that makes sense within the context of each
consumer

What do you mean by consumers, accessors of the nsIFile interface?
 
> b) that the filesystem is in fact utf8. Imagine if the filesystem's charset is
> not utf8.. then we're writing out this odd character string using the |const
> char*|-based system APIs. Other programs in the system may try to read that
> filename as if it were the in the system charset, and get a garbage filename.
> There are pleanty of other strange scenarios that I can think of as well.

No we are writing out bytes that are going to be either relivant or not on the
particular system. For example: 0x03b6

It's just a number to the filesystem. My term can't display it but it is a valid
character that is unsupported by my local system.

Whether the system can interpret that number to a character is another story.
If it can't then the chars didn't come from that system.

The way i see it is it is broken now, and will remain broken no matter what we
do if the system can't support and extended character base right?

If my system can't support a hebrew charset and i try to write a hebrew file
name, it just isn't going to happen.

We need a unicode *and* system expert on this matter.  ;-)

--pete
Ok, with the current implementation i have on my local box, i just saved a file
w/ this name:

(See if bugzilla takes to this, escape for example)
/tmp/&#1040;&#1041;&#1042;&#1043;&#1044;&#1072;&#1073;&#1074;&#1075;&#1076;.txt

It writes to my file system but my term can't properly display the filename
because there are UTF-8 chars it can't map to.

It comes up as: '?????01234.txt' when ls the dir.

Currently mozilla won't let you even save this filename.
So i think this is a step in the right direction here.

--pete
Trying text from clipboard:

/tmp/??????????.txt

See if this works.

--pete
Heh, looks like mysql can't deal w/ it either.

Oh well. Anyway the point i'm trying to make is we can write this out to the
file system now, it's up to the systems locale to interpret it.

For unix using UTF-8 should the reasonable solution in this case.

--pete
ack, that just seems lame. Let's say I do a File->Save Page As... and save it as
my name, but my name is in Japanese. You're proposing we ignore the system
locale (right?) and save it in UTF8. Now I go to open it with emacs, and look
for my file. Instead of seeing my name, I see some garbled string. that's not cool.
Here is the problem as i see it.
 
If a local system doesn't support unicode conversions natively. This is the case
on many unix systems pre glib 2.2.

We have a mozilla string that has been converted to UCS2. Our strings original
locale charset mapping is now gone in place of unicode.

We are now ready to convert this string back and pass it off to a libc call.
We have no native converters. If i use wcstombs it is expecting the first
conversion was made based on the systems locale where mozilla converted using
ISO 10646 (i beleive). The resulting conversion back to mutlibyte string will be
incongruous.

The systems that support a native unicode conversion should be fine.
The problem is that a large number of systems out there don't.

At least w/ UTF-8 there is a shot that if the system is ISO_8859-1, the chars
will map correctly. 

If your locale is Japanese, you have the glyphs and are running in UTF-8 you'll
see your file.

I just see a great deal of platform specific work to accomodate the few unix
systems that have good unicode support implemented.

Comments from any unix hackers out there?

--pete
adding shaver for unix filesystem advice
Gack.

Gimme a day to digest this?  There's a lot of content here, and I'm ear-deep in
kernel MM internals and the linker.
> At least w/ UTF-8 there is a shot that if the system is ISO_8859-1, the chars
> will map correctly.

Is there a way to determine this at runtime and create the right implementation
of nsLocalFile (internally, completely hidden to consumers) which takes
advantage of this and another one which reverts to using uconv to convert from
UCS2 to the FS charset only if we're running on a lame machine?
That's what I'm planning on doing on the Mac. It's possible to tell at runtime
if UCS2 file system APIs are present (they are on all but the most decrepit
Macs) and, if so, they'll be used.
Yea, you can check for glib 2.2 or see if __STDC_ISO_10646__ is defined.
UCS2 is no good on unix. UTF-8 is preferred.

Can someone tell me the big win here using double byte chars by default?
It seems more logical to implement a chage like this when there is wider native
support for uinicode. Since most of these implementations are wrapping libc file
system calls that want single byte chars anyway.

It seem to me at this point it would be better to focus on making making the
existing nsIFile implementations rock solid. 

Do we really want to be adding changes like this even if incremental, at this
stage of mozilla pre 1.0?

Or is default nsIFile unicode support a great nsIFile2 task?

--pete




Looking over the original goals why don't we revise the objective 

- NS_FILE_CONTRACTID should go away (IMPLEMENTED)

- methods w/ [const] can remove those usages as "in" parameterspecification
implies "const" (IMPLEMENTED)

- remove all foo*FollowLinks and use boolean flag followLinks (IMPLEMENTED)

- fully document semantics of the methods. make assumptions clear (TODO)

- spawn should go away. (IMPLEMENTED)

- nsILocalFile derrives from nsIFile (IMPLEMENTED)

- Write intensive test suite, debug fix and stablize (STARTED)

I have most of this implemented on unix already. I will obviously have to redo
everything to revert back to char * params. But these ojectives make more sense
to me.

--pete




> Can someone tell me the big win here using double byte chars by default?

    It avoids bad conversions between Unicode and the current system char set on
systems where the file system *is* Unicode. When we want to store the path to a
file in, say, the profile registry, we use Unicode paths. When the Unicode path
is later read from the registry and then passed to InitWithUnicodePath, the
Unicode path is converted into a char* path by code in nsLocalFileCommon. If the
system char set when InitWithUnicodePath is called is different from when the
path was retrieved by GetUnicodePath, that conversion will be wrong.
    If the API is Unicode-only and the implementation uses Unicode FS routines,
we never have to do any conversions. If that's not possible on Unix like it is
on Windows (NT, 2K) and Mac (OS9, OSX) those implementations can be changed and
weaned off of the conversion code in nsLocalFileCommon.
Agree, that makes sense. 
On unix we can do it for glib 2.2 systems as well.

But that implies we are still keeping the unicode API's.
Then we are just getting rid of the uconv dependencies where possible.

Ok, sorry for the misunderstanding.

--pete


How about a solution like this for unix systems.

// Unicode interface Wrapper
NS_IMETHODIMP
nsLocalFile::InitWithUnicodePath(const PRUnichar *filePath)
{
   wchar_t wpath[BUFSIZ];
   char path[BUFSIZ];
   size_t len = 0;
   while (filePath[len] != L'\0')
     wpath[len] = filePath[len++];
   wpath[len] = L'\0';
   wcstombs(path, wpath, BUFSIZ);
   return InitWithPath(path);
}


Can anyone see any problems w/ this?

It is just converting the PRUnichar to the system wchar.
Then wcstombs can take care of the systems current locale.

Thanks

--pete
Turning it into a macro we can ifdef the hooks for each system in
nsLocalFileCommon or should we just implement them in each platform
nsLocalFile*.cpp file?


#define SET_UNIX_WS( func , arg )                               \
{                                                               \
   wchar_t wpath[BUFSIZ];                                       \
   char path[BUFSIZ];                                           \
   size_t len = 0;                                              \
   while (arg[len] != L'\0')                                    \
     wpath[len] = arg[len++];                                   \
   wpath[len] = L'\0';                                          \
   wcstombs(path, wpath, BUFSIZ);                               \
   return (func)(path);                                         \
}
pete: don't do len++ in the right-hand-side of an assignment to foo[len] if you
want the index to be the pre-incremented value.  (Also, nit: 3-space indent,
then 2?!).  Can we tune for the case where sizeof(wchar_t) == sizeof(PRUnichar)
and just cast?

/be
> or should we just implement them in each platform nsLocalFile*.cpp file?

Yes. On other platforms, the file system takes unicode directly so we don't need
to do this kind of conversion. Separating out the unicode wrapper routines is
bug 103384.

> if you want the index to be the pre-incremented value.  

In this case Brendan we don't want the pre increment, we are starting at 0.

> (Also, nit: 3-space indent, then 2?!).  

I thought "when in Rome" . . .  

> Can we tune for the case where sizeof(wchar_t) == sizeof(PRUnichar)
> and just cast?

A more complete macro:

#define SET_WS( func , arg )                                    \
{                                                               \
   char path[BUFSIZ];                                           \
   if (sizeof(wchar_t) == sizeof(PRUnichar)) {                  \
     if (wcstombs(path, (wchar_t *)arg, BUFSIZ) < 0)            \
       return NS_ERROR_FAILURE;                                 \
   } else {                                                     \
       wchar_t wpath[BUFSIZ];                                   \
       size_t len = 0;                                          \
       while (arg[len] != L'\0')                                \
         wpath[len] = arg[len++];                               \
       wpath[len] = L'\0';                                      \
       if (wcstombs(path, wpath, BUFSIZ) < 0 )                  \
         return NS_ERROR_FAILURE;                               \
   }                                                            \
   return (func)(path);                                         \
}


This won't work in the case where we force wchar_t to be short (-fshort-wchar,
see http://lxr.mozilla.org/seamonkey/source/configure.in#1968).
Pete, what value of "Rome" uses 3, then 2 space indentation?  Not the
nsLocalFile impls I've hacked on.

My point about len++ is order of evaluation, not pre- vs. post-increment.  You
are assuming that the ++ takes place after the = operator.  Standard C has no
"sequence point" between the left and right hand sides of =, leaving it up to
the impl. to decide order of effects (left-hand-side-of-assignment-address-calc
can come before incremented-value-store, or vice versa -- it's not specified).

/be

> Pete, what value of "Rome" uses 3, then 2 space indentation?  Not the
> nsLocalFile impls I've hacked on.

nsLocalFileCommon.cpp has `3' space indentation. Why on earth would i use three
space indentation? But don't worry, i will be moving the implementations to
nsLocalFileUnix.cpp where it will have righteous indentation. The world will be
a better place.

Ok, i see. I will increment after the assignment.

       while (arg[len] != L'\0')  { 
         wpath[len] = arg[len]; 
         ++len;
       }

Is the explicit increment i want here. Thanks for the tip.

Jag if wchar_t is forced to be a short, then the check that Brendan suggested
accommodates this scenario. Because we know PRUnichar is a double byte char.

if (sizeof(wchar_t) == sizeof(PRUnichar)) 
  wcstombs(path, (wchar_t *)arg, BUFSIZ)


--pete

So the problem I'm seeing is that if we force sizeof(wchar_t) to be 2 where it's
normally 4, we'll be calling wcstombs with a buffer of 16-bit chars when it's
expecting a buffer of 32-bit chars.
Not sure about this one, my version of gcc doesn't support that flag.

Worst case, i can always typedef it back to: unsigned long wchar_t

--pete
You would have to detect in which cases we forced wchar_t to be 2 byte where
normally it would have been 4 byte. This is actually doable, though we'll need
to add a define to configure.in in order for us to know when we're forcing short
wchar.

The harder part is that you can't use typedef to change a standard type's
definition. So we'll have to use a type that'll be binary compatible with
wchar_t (both in size and in signedness, meaning we'd probably have to have two
additional defines, or infer the above define from the existence of these two).

It can all be done, it's just a matter of elegance...
Or, if the only reason we are using the two byte -fshort-wchar flag (where
available) is for conveniently setting HAVE_CPP_2BYTE_WCHAR_T so we can  typedef
it directly to PRUnichar, then maybe it would be wiser to just leave wchar_t in
it's default state of 4 bytes. 

It seems that by forcing the two byte wchar_t is making an assumption that we
are not going to call any native wide string functions that want the default 4
byte wide buffer.

Any reason not to pull out this check and leave unix systems using PRUint16 for
PRUnichar?

--pete

Using 2-byte wchar_t allows Linux to use the L"foo" version of
NS_LITERAL_STRING, which I don't think we want to throw away.  (I know I sure
don't).
Actually i think you *can* override the typedef.

This seems to work fine: 
  typedef PRUint32 __wchar_t;

This explicitly sets wchar_t to 4 bytes within the scope of the caller.
I just tested it on linux and BSD and it works fine.

wchar_t is just a typedef anyway and not a "standard type", so this looks like
the way to go here.

I'm going to proceed using the above unitil someone persuades me not to.

--pete
My unicode wrapper macros will now look something like this:

#define SET_WS( func , arg )                                \
{                                                           \
  char str[BUFSIZ];                                         \
  typedef PRUint32 __wchar_t;                               \
  typedef __wchar_t wchar_t;                                \
  wchar_t wpath[BUFSIZ];                                    \
  size_t len = 0;                                           \
  while (arg[len] != L'\0') {                               \
    wpath[len] = arg[len];                                  \
    ++len;                                                  \
  }                                                         \
  wpath[len] = L'\0';                                       \
  if (wcstombs(str, wpath, BUFSIZ) < 0 )                    \
    return NS_ERROR_FAILURE;                                \
  return (func)(str);                                       \
}

#define GET_WS( func , arg)                                 \
{                                                           \
  nsXPIDLCString str;                                       \
  nsresult res;                                             \
  typedef PRUint32 __wchar_t;                               \
  typedef __wchar_t wchar_t;                                \
  if (NS_SUCCEEDED(res = (func)(getter_Copies(str)))) {     \
    wchar_t wpath[BUFSIZ];                                  \
    if (mbstowcs(wpath, str.get(), BUFSIZ) < 0)             \
      return NS_ERROR_FAILURE;                              \
    size_t len = 0;                                         \
    PRUnichar wbuf[BUFSIZ];                                 \
    while (wpath[len] != L'\0') {                           \
      wbuf[len] = wpath[len];                               \
      ++len;                                                \
    }                                                       \
    wbuf[len] = L'\0';                                      \
    *arg = nsCRT::strdup(wbuf);                             \
  }                                                         \
  return res;                                               \
}
wchar_t is defined in ISO 10646.

Why do you need the double typedef, rather than just a cast when calling
wcstombs/mbstowcs?  (If you prefer a typedef rather than a cast, for whatever
reason, why do we need two, instead of just

typedef PRUint32 wchar_t;

?

From "Accelerated C++", on built-in types:

``There is also a ''wide character'' type, called wchar_t, which must contain at
least 16 bits, and is intended to be used for representing characters in
languages such as Japanese, which have more characters than the Latin alphabet
provides. The wchar_t type is required to behave the same way as one of the
other integral types. The particular other type involved depends on the
implementation and is normally chosen to yield the most efficient
representation.''

I don't think you can safely assume that all unix compilers have wchar_t behave
the same way as an unsigned 32-bits integral type. Secondly I don't think your
typedef will work on all compilers, even though it happens to work with the
compilers you used on linux and BSD (it surprised me, since wchar_t should be
treated the same way as char, I would think).

I still think our best bet is to add an autoconf test that stores the wchar_t's
"system" size and signedness so we can typedef SYSTEM_WCHAR_T system_wchar_t and
use that in your macros. SYSTEM_WCHAR_T could simply be wchar_t if we don't have
to -fshort-wchar (instead of embedding that logic in your macros).
If i don't typedef the two, the compilers warns.

nsLocalFileUnix.cpp:238: warning: passing `unsigned int *' as argument 2 of
`wcstombs(char *, const __wchar_t *, unsigned int)' changes signedness

As far as just casting, thats how i had it originally, but wcstombs should take
a 4 byte buffer, handing it a two byte buffer is asking for trouble i think.

I'll go w/ Jag's suggestion here.

--pete 

 

Attached file new file
Ok, i pulled unicode conversions out of the unix builds.

I will do windows next.

Conrad has the mac stuff.

--pete
Intentional error returns are: 
  NS_ERROR_FILE_INVALID_PATH
  NS_ERROR_FILE_ALREADY_EXISTS

If you see any other errors in the test output, then there is a problem.

--pete
Hey conrad - check out bug 110371 - I found with nsFileSpec that there are
actually very few consumers of the two routines that required conversion, and
they were all in mail. The same may be true of the nsFileSpec routines in
nsLocalFileUnicode. once I land bug 110371, I'll take a look at these too.
Blocks: 99160
pushing back
Target Milestone: mozilla0.9.7 → mozilla0.9.9
Is this really going to land for .99.  Or can we continue to use and support the
current nsIFile?


Comment on attachment 54447 [details] [diff] [review]
proposed implementation for windows

So in general I like what's going on here. Why don't we #ifdef stuff out and
get this landed on a platform-by-platform basis? 

sr=alecf on the windows stuff
Attachment #54447 - Flags: superreview+
Comment on attachment 54446 [details] [diff] [review]
added some logic to test for null pointers, got rid of damn three spaces

this part looks good but where is nsLocalFileUnixDefs.h? (or is it in a patch
above? I didn't feel like checking them all)
Attachment #54446 - Flags: superreview+
We can't land this stuff at this point. This is to big a change IMHO way to risky. 

At the time i posted the patch back in October, it was humming nicely and pretty
well tested on unix.
I have since landed nsIFile stuff that will break these patches.

I only did the unix stuff. The windows and mac need *lots* of love.
I don't have access to a win or mac to develop which sucks.  ;-(

Not sure how the wide char conversions will fly on all flavors of unix either. 

If i had full time to devote to this, doing it wouldn't even be an issue.
But w/ my current rate of one small patch per week, there is no way i can make
it fly.

If you guys want to go for it, i can handle the unix end of things. 

I will need to get a new patch in order as this one will certainly fail.


--pete  
    

ok, I'll see what I can do to at least get windows landed.. then we can procede
with linux and mac.
I'm putting together the Mac equivalent of attachment 54447 [details] [diff] [review].
The good news is that I've landed most of the real important bug fixes for
nsLocalFIleUnix in separate bugs.

So what i still need to do are:

- check in a fix for followlinks to work corectly (bug 57995)
- conver the API's over to use AStrings
- remove the specific links methods like permissionsOfLink, copyToFollowingLinks etc

I have been breaking this patch into pieces and landing fixes in seperate bugs.

Thanks for helping out w/ windows.  Conrad is on the mac stuff.

If someone wants to mail me a cheap box I can borrow /w windows on it, then i'm up
for it.  ;-)

--pete

Shouldn't attachment 54446 [details] [diff] [review] and 54447 be on bug 100676? Both of those patches and
the one I now have for Mac don't do anything WRT the nsIFile API but, instead,
use native conversion instead of uconv.
Yes, you are right. When i break apart this patch i will post the uconv code to
bug 100676.



--pete
The reason it was in this bug, is because using AStrings in nsIFile methods is
an API change.

;-)


--pete
this is true. I guess I was using that bug as part tracking bug, part
minor-cleanup-patch-container :)
pete how much of this is left to go?  Does this bug cover removal of the UNICODE
function in favor of a soon-to-be-here utf8 string (which is the only change
that I want to see)?
We still need to land bug 100676 on unix. I am waiting for testing and review.

Since everything in this bug is being done incrementally in seperate bugs,
perhaps we should break out removing the unicode methods and using AStrings as
params into it's own bug.

--pete
Here's what I'd like to see happen:

1. Combine all the char*/PRUnichar* pairs of methods to one method which takes
an  nsAString (or the soon-to-be utf8 string doug mentioned)
2. Drop the ...FollowingLinks methods and make sure the implementations actually
observe the followLinks attribute.
3. On nsILocalFile, move reveal() and launch() to some service. (Maybe)
4. On the directoryEntries attribute, be specify whether the entries that are
returned by the enumerator have symlinks resolved. This is independent of the
followLinks attribute of the parent. The items returned should not just inherit
this attribute from their parent.
5. Add a relativeDescriptor attribute to nsILocalFile. This is needed to get XP,
relative paths in prefs and such instead of the absolute native paths being used
now.

Anything else? I'd like to get nsIFile.idl and nsILocalFile.idl nailed down soon
as a roadmap for other bugs.
okay.  we have 5 weeks.  at that time, either we ask that mozilla 1.0 be delayed
or we freeze what we got.  

who is signing up for what?
The only thing i'm *real* concerned with are regressions.

- Pulling out the followLinks calls
- replacing char* params w/ unicode
- making followLinks work correctly
- conrads other numbered items

nsIFile is reasonably stable at this point in time. 
So i would suggest that only the changes that are *absolutely necessary* be made
and we lean towards a conservative effort to freeze the interfaces and
tighten/debug their implementations.

Each change we make will cause a new wake of related bugs.

--pete
I feel your pain.  However there are some thing that should be done for API
correctness.  

I can remove the UNICODE methods in favor for the utf8 string stuff.  Pete, can
you write up a bug on this?  I am sure that Alec would like to see less unicode
stuff in xpcom where possible.

> 2. Drop the ...FollowingLinks methods and make sure the implementations
actually observe the followLinks attribute.

I agree.  Promote the attribute |followLinks| to the nsIFile from nsILocalFile.
 If you ever have to "not follow links", you can clone and set this flag to false.  

> 3. On nsILocalFile, move reveal() and launch() to some service. (Maybe)

I agree.  this should *not* be part of the API freeze.  Especially Launch.  Too
confusing with nsIProcess.

> 4. On the directoryEntries attribute, be specify whether the entries that are
returned by the enumerator have symlinks resolved. This is independent of the
followLinks attribute of the parent. The items returned should not just inherit
this attribute from their parent.

I disagree.  Why not have the enumerator inherit the attribute from the parent?

> 5. Add a relativeDescriptor attribute to nsILocalFile. This is needed to get
XP, relative paths in prefs and such instead of the absolute native paths being
used now.

I disagree.  This, if anything, should be a service of some kind.  Takes a
nsIFile and return a relative persistant descriptor.  This does not have to be
part of the nsILocalFile api freeze.  

On items (2) and (3), are their any voluneers?  Conrad, your suggestions, do you
got any time for some good lovin'?  
nah, I love unicode - I just don't like to see unicode semantics (i.e. case
conversion/comparison) in xpcom :)

I'm sort of up in the air on the utf8 stuff. I kind of think that more code,
when written 'correctly' is going to be dealing with unicode strings for
filename parts, not utf8... which means we're going to incur an overhead when we
do the utf8 conversion.. however it's probably not that bad, and little of this
stuff is probably used in code that must be performant... but the fact that it's
not likely performance related means we don't necessarily need the overhead of a
utf8 class, the standard wstring might do just fine.

I don't know.. in any case I think the ASCII/Unicode method pairing has got to
go and I'm more than willing to help with tree-wide reviews, no matter which way
it goes.

> I disagree.  Why not have the enumerator inherit the attribute from the parent?

If you have a file which points to a directory through a symlink and you want to
enumerate it but don't want the children to be resolved (making a directory
listing). You have to resolve the parent or you can't enumerate it. In order to
do this now you would have to:
1. see if the file is a symlink
2. if so, get its target - can't just set followLinks(false) on the file or it
points to the wrong thing.
3. set followLinks(false) on the target
4. then enumerate

That's a pain. Why not just make it easy and obvious and add a boolean param?

> I disagree.  This, if anything, should be a service of some kind.  Takes a
nsIFile and return a relative persistant descriptor.

Well, we have the persistentDescriptor attribute, so a relativeDescriptor
attribute should be alongside it. The reason I don't think it should be done by
a service is that the descriptor pertains only to that file, so the file object
should implement it. I'm in favor of moving things to services where something
else (say, a process) takes some action on a file but not things like this which
are identity attributes.

> On items (2) and (3), are their any voluneers?

On (2) I made the Mac impl observe the followLinks attribute. Pete, didn't you
do the same for Unix? If somebody can make the Windows impl observe, I'll remove
the methods from the API.
I'll do (3) if I can come up with a better (existing) place to put reveal() and
launch()
>> I disagree.  Why not have the enumerator inherit the attribute from the parent?

I take my objection back.

>> I disagree.  This, if anything, should be a service of some kind.  Takes a
nsIFile and return a relative persistant descriptor.

>Well, we have the persistentDescriptor attribute, so a relativeDescriptor
attribute should be alongside it. The reason I don't think it should be done by
a service is that the descriptor pertains only to that file, so the file object
should implement it. I'm in favor of moving things to services where something
else (say, a process) takes some action on a file but not things like this which
are identity attributes.

There is a big difference between a persistent descriptor and a relative
decscriptor.  The former is always known by the object.  

The relative descriptor has a key question that needs to be answered before any
meaning can be applied: Relative to what?  The nsLocalFile object can never know
all of the relative locations that can be created.  

So, I think that we should abstract this and make an interface which takes a
nsIFile and returns a string which represents some relative location.  There
should also be a method that does the inverse.

Examples would be a "res" implementation that would have relative results from
the resourse directory.  There could be a XPCOM Component implementation which
hands back a string relative to the component direction.

With all of that said, I do not think that this is required for mozilla 1.0.  Or
at least none of the requirement that I have read require this.

> I'll do (3) if I can come up with a better (existing) place to put reveal() and
launch()

Great - make it happen.  Do you need a bug #?
Doug, one more pitch...

The location from which it's relative is known because it's an input param.
Here's what I was thinking:

===================================================================
RCS file: /cvsroot/mozilla/xpcom/io/nsILocalFile.idl,v
retrieving revision 1.15
diff -w -u -2 -r1.15 nsILocalFile.idl
--- io/nsILocalFile.idl	31 Jul 2001 19:05:28 -0000	1.15
+++ io/nsILocalFile.idl	21 Feb 2002 20:24:37 -0000
@@ -107,4 +107,28 @@
     attribute string persistentDescriptor;
 
+    
+    /**
+     *   getRelativeDescriptor
+     *
+     *   Returns a utf-8 encoded relative file path in an XP format.
+     *   It is therefore not a native path.
+     *
+     *   @param fromFile
+     *          the file from which the descriptor is relative
+     */
+    string getRelativeDescriptor(in nsILocalFile fromFile);
+
+
+    /**
+     *   setRelativeDescriptor
+     *
+     *   Initializes the file to the location relative to fromFile using
+     *   a string returned by getRelativeDescriptor.
+     *
+     *   @param fromFile
+     *          the file to which the descriptor is relative
+     */
+    void setRelativeDescriptor(in string relativeDesc, in nsILocalFile fromFile);
+    

I had originally thought the relative location would, instead of a file, be a
directory service key. Shaver pointed out on porkjockeys that this would require
N getService calls to resolve N descriptors which were relative to the same
location.
In the best Cuba Gooding Jr. impression:

"show me the code! show me the code! show me the code!"

okay.  I like it.  :-)
string != UTF-8.  See bug 84186 for more than you really want to know.
Oh yeah: we do something like "getRelativeDescriptor" at
http://lxr.mozilla.org/seamonkey/source/xpcom/components/nsComponentManager.cpp#2373
or so.  Just for reference: it's cheap and easy, and it works everywhere (I
recall asking smfr about Mac-friendliness, I think).

Which makes me wonder whether we need an API call for something that amounts to
a tiny bit of string math.  If so, do we need setRelativeDescriptor when we have
append?  What's the difference?
Because the component mgr is doing that with native paths. The point of what I'm
after is that it's XP, i.e. a prefs file written on Unix is readable on a Mac. 
I can work on (2), unix/windows 

If we aren't going set followLinks when we initWithPath, then we need to make it
a setter. Right now it is readonly.

FollowLinks has been broken for a very long time, i wonder what kind of
surprises we'll find when it actually works.

eg: when followLinks is fixed and is set to true, when you call GetPath and the
file is a symlink, you will get the links ultimate target which is *not* the
behavior we have now. 

As shaver would say, "I'll eat my hat" if this doesn't cause some regressions
somewhere.  

Just a heads up.

--pete
Doug, what are the new params going to be?

Originally it was AString. Has that changed?

It will be better to do this first, since follow links fix will touch many of
the nsLocalFileX methods, it would be best to do it only once using the new
param type. 

I agree w/ Shaver on local file path params. Since the *vast* majority are using
char* 's, why do away w/ the double byte methods? 

We are soon to be completely free of uconv in xpcom. Why not leave them there
for wide char consumers only. The rest of the world can use char *'s.

Less conversions to make, single byte chars are less memory consumption.

If we make a universal param change it limits consumer options by forcing the
use of a string class or double byte chars when single byte will do just fine.


--pete

Target Milestone: mozilla0.9.9 → mozilla1.0
Keywords: topembed
Keywords: mozilla1.0+
i *think* that the less we touch this interface the better.  Changing nsIFile to 
use the new nsAUT8String *may* be a lot of work.  If it is, I do not see how I 
can go to drivers and ask for permission for a zero sum gain.  Thoughts?
It's not zero gain, it's large. Here is the problem with having nsIFile methods
which are not Unicode.
(1) Boot your machine in JA locale and create some files with multi-byte chars.
(2) Boot again in US locale.
(3) Try and use the files.
It will fail because, if the names of the files created in (1) are not stored as
Unicode, they're now meaningless in the the current locale. If the names were
stored as Unicode, mapping them to the native charset will be meaningless as
well. The way to avoid charset conversion is to change the implementations of
nsLocalFile to use Unicode OS file APIs. To enforce this, to ensure that charset
conversion never happens with filenames, the nsIFile API has to have only
Unicode methods.

I admit, we probably can't do that with the implementations by 1.0. In the
meanwhile, we can at least change to API to Unicode-only, do the charset
conversion that we're currently doing, and improve the implementations later.

If we can't get rid of the char*/PRUnichar* duality mess that we currently have
on nsIFile, the API is not worthy of being frozen. 
I dont by that arguement.  The problem with is is that we do offer a persistant
description which can be used and that should preserve any and all "unicode"
state and do the right stuff if in your above example.

We will **not** provide a unicode only interface.  Most users just want char*'s.
 If we have to make any change, we will remove the unicode cruft and make these
API pass UTF8 encoded string.

>If we can't get rid of the char*/PRUnichar* duality mess that we currently have
on nsIFile, the API is not worthy of being frozen. 

Worthyness has nothing to do with it.  Looks at the windows API where there are
wide and narrow APIs.  Dualities exist. 

My thoughts are that we either do nothing or we removed UNICODE and convert to
UTF8.  I am leaning toward the former.
> My thoughts are that we either do nothing 

Doing nothing means that we take what we have now, fix it, stress test it then
bake it for 5 weeks at 400 degrees.

> (1) Boot your machine in JA locale and create some files with multi-byte chars.
> (2) Boot again in US locale.
> (3) Try and use the files.

Agree this is certainly a problem, but in reality, how many people will be
actually doing this?
I think it needs to be clearly spelled out as a known 1.0 local file issue.

I personally beleive that we will gain so much more by taking what we have and
use this time to make it rock solid. 

--pete
If we do end up with non-wide methods, I think we ought to change the names,
such that "Unicode" does not make a method special.. what I mean is that we have
things like SetLeafName and SetLeafNameUnicode - they should either be the same,
similar to windows' A vs. W, something like SetLeafNameNative() and
SetLeafNameUnicode(), or we should drop the "Unicode" and add "Native", like
SetLeafNameNative() and SetLeafName()

That said, we also must document the charset that the non-wide APIs take, and I
think our two choices are:
1) native charset
2) UTF8

I'm leaning towards the latter, just so that the API is the same across
platforms AND locales, (the locale issue being more important) but unfortunately
the existing code is (as far as I know) expecting the native charset, so we'd
have to fix up all the callers as well. The ugly ones will be in mail, where
they use nsFileSpec, which uses the native charset natively.

finally, when we write out to disk, we must store it using the system's unicode
APIs (or in the case of unix, convert using wcstombs or something similar) and
let the system do the 'right' thing... 


I think "SetLeafName" and "SetLeafNameUnicode" are a logical choice.

Heres why.
There is only one consumer of SetUnicodeLeafName (in cpp)
and approximately 75 or so consumers for SetLeafName.

Making the API change for the Unicode methods makes more sense since they are
less used.
Less regressions, less files touched.

Alec's other points sound reasonable to me.

--pete




I disagree - just because most of the tree is using SetLeafUnicode doesn't mean
they are right - we should be encouraging the use of unicode (or at least utf8)
and make the 'standard' APIs be i18n-friendly. I don't think that SetLeafName is
i18n friendly because it expects the system charset, which isn't easy to come by..

basically I think we want to indicate to people that "unicode is the norm, but
if you need to use the system charset, then use this differently-named API"
Conrad, what is the bug number for the relative path stuff?  

I want different bug for different tasks.  What specific task does this bug talk
about?  Can we agree on what has to be done?
Doug, the relative descriptor stuff is bug 12911 (and it's nearly done)
The other bug I made from this, to remove reveal() and launch(), is bug 127098
Created a bug 129279 which will track the unicode/utf8/ascii task.  Lets 
move that dicussion there.

Please look at 99160.  Are there any other task or discussions which need to be 
addressed?
are there any tasked in this bug that are not coverted by 99160?  I think not. 
Pete I do not think that anything in this bug will be fixed for mozilla 1.0.
99160 will track any remaining fall out.

Please marked either WONTFIX, or future this bug. 
Target Milestone: mozilla1.0 → Future
mozilla1.0- in favor of 99160
Keywords: mozilla1.0+mozilla1.0-
Keywords: topembedembed, topembed-
This aint going to happen prior to mozilla 1.0.  Marking WONTFIX.  
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: