Closed Bug 166735 Opened 20 years ago Closed 20 years ago

Unicode file/io in Necko-nsIOService


(Core :: Internationalization, defect)

Windows 2000
Not set





(Reporter: tetsuroy, Assigned: tetsuroy)



(Keywords: intl)


(1 file, 1 obsolete file)

Need to update the nsIOService to handle UCS2 FileName

patch to follow.
QA Contact: ruixu → ftang
Target Milestone: --- → mozilla1.2alpha
dougt: can you review?
Keywords: intl
please see bug 166792.  why do you want to do this only for XP_WIN?  also, are
you sure this is the right thing to do?  won't it break copy/paste for file://
URL strings?  won't you also have compatibility problems in file formats, etc.?
 i originally tried to do this when i worked on the nsIFile API changes, but
there just seem to be way too many problems with using UTF-8 instead of the
native charset for file:// URLs.
Depends on: 166792
Darin: thanks for your comment.  This bug is one of changes we need 
for making mozilla an unicode application in Windows platform.
( unicode app means we call RegisterClassW(), DefWindowProcW(), 
  GetOpenFileNameW(), CallWIndowProcW(), DispatchMessageW(), etc)
You can find related bugs here : 58866, 9449, 104305, 162361, 162362

Those bugs are due to the fact that we are selectively using the locale 
base Windows system APIs.  There is no way to fix these bugs except
to register moz as unicode app and start calling W APIs.

>why do you want to do this only for XP_WIN?
- so that we dont' break everything at once and only Windows OS 
  for now.

>there just seem to be way too many problems with using UTF-8
-  yes, I will be modifying NSPR, XPCOM/IO, Widget and Necko. (and more??)
   However, with above changes, we _now_ can open/save doc in unicode filename,
   windows title shows correctly.

>won't it break copy/paste for file:// URL strings?
Not sure. I haven't test this case yet. (thus MOZ_UNICODE :) ) 
As long as we don't call xxNativeFoo() functions, we should be ok.  

Fran, DougT, Wan-Teh and myself had a couple of discussions
with my approach. Please advise us of any considerations.
Blocks: 9449, 58866, 104305
Depends on: 162361, 162362
as for the cut/copy/paste issue... if you copy a file:// URL encoded using UTF-8
into an application like Netscape 6.2, you'll be unable to load the file:// URL.
 so, you either have to make the cut/copy/paste code do the conversion, or you
have to live with this deficiency.  what operating system APIs require UTF-8
file:// URLs?
BTW: if you simply modify the native charset to be UTF-8 when encoding file
paths as narrow strings, then you'd get UTF-8 file:// URLs automatically.
>native charset to be UTF-8
We thought of the same initially; but 
- we want to distinguish the differences between
  the native charset and UTF-8 in Windows OS
  Some modules may _really_ want the native charset. 

Necko may be a special case where it requires to  
store URI as UTF-8 (i could be wrong though). 
However, we want to keep the existing interfaces of XPCOM/IO 
either in UCS2 or in the native charset; but not three.

| Necko   |   Widget  |  Others |
|         |           |         |
      ^                 ^
      |                 |
    (UCS2)        (Native char)   
      |                 |
      V                 V
|     XPCOM/IO                  |
|    stores Paths as UTF8       |
| (may be changed to UCS2)      |
      ^                 ^
      |                 |
     (UCS2)          (UCS2)   
      |                 |
      V                 |
+---------+             |
|  NSPR   |             |   
|         |             V
|       OS System               |
|                               |

In the near future, we want to change the XPCOM/IO::mPaths 
to be in UCS2.  
so, if file paths are not going to use UTF-8, then why should file:// URLs be
any different?  they are supposed to have the same format.  i just know you are
going to hit a lot of regressions if you try to change this, and i don't see the
reason for doing so.  what WIN32 wide-API expects an UTF-8 file:// URL?  in
other words, why does the format of the file:// URL need to change?
>why does the format of the file:// URL need to change?
We are not going to change the format of file:// URL. URL can be in UTF8.
Problem here is that because we are calling nsIFile::GetNatvePath() 
in nsIOService which corrupts path name if you open non-ASCII filename in Win-En.
We are trying to eliminate the calls to xxNativeFoo().

>what WIN32 wide-API expects an UTF-8 file:// URL?
None.  We use wide-API with UCS2.
Target Milestone: mozilla1.2alpha → mozilla1.2beta
Here is the list of bugs related to unicode file i/o:

I haven't verified if all the above bugs get addressed
by my attached patch; but, it will be a good start. :)
We need to move away from FS file URLs.....
Comment on attachment 97877 [details] [diff] [review]
Call Unicode APIs instead of calling NativePath()

>Index: base/src/nsIOServiceWin.cpp

>+    nsCAutoString ePath;
>+    ePath.Assign(NS_ConvertUCS2toUTF8(ucsPath).get()); 

efficiency nit:

      NS_ConvertUCS2toUTF8 ePath(ucsPath);

remember, this patch will make mozilla use a file:// URL format that
is incompatible with older applications.  we need to weigh this fact
against the benefit of using unicode encoding.	an alternative would
be to hide file:// URLs from the user, which is what IE appears to do.
Attachment #97877 - Flags: needs-work+
darin: thanks.	Would you review the patch?

>an alternative would be to hide file:// URLs
One more question, would you know where I can find the
code where i can strip the 'file://' from URL before
displaying to the user?
Attachment #97877 - Attachment is obsolete: true
strip 'file://' and unescape the URL before displaying to the user,
of-course :)
i'm not sure where that is handled.  it may not be centralized.
Comment on attachment 103918 [details] [diff] [review]
incorporating nit

>Index: base/src/

>+# For Unicode mozilla

shouldn't you really be utilizing the file mozilla-config.h that is generated
after running configure or something like that instead of tweaking individual
makefiles like this?

also, i know i asked this question before, but why don't you just convert
NSPR narrow API to use UTF-8?  then wouldn't all of this be unnecessary?
you could then make nsIFile::GetNativePath (and friends) return UTF-8.

what am i missing?  why wouldn't this work?

e.g., necko just uses whatever nsIFile thinks the native charset is.  it
doesn't make any assumptions about the charset.
right, Darin.
We wouldn't need to change Necko.  Sorry about this.
I can make this work with changes in nsIFile only.
Marking invalid.
Closed: 20 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.