Closed Bug 135854 Opened 19 years ago Closed 12 years ago

Japanese in escaped characters in File | Save as dialog


(Core :: Internationalization, defect, P3)






(Reporter: amyy, Assigned: smontagu)



(Keywords: intl, regression)


(1 file)

Build: 04-04 trunk build on all platforms.

1. Launch Netscape and open Composer.
2. Open a Japanese named file.
3. Regardless modify the file or not, and go to File | Save As.

There will be a save file dialog show up, and the existing Japanese name is
displayed as escaped character, e.g. %93%FA%96%7B%8C%EA.html.

This is a regression from N6.2.1, Japanese file name display fine on N6.2.2.

I also saw it on 03-06 trunk build, so it has been there for a while.
Keywords: intl, regression
QA Contact: ruixu → ylong
This is not i18n specific and same for browser saveAs.
E.g. "test file.html" turns to "test%20file.html".
Hmm, then it's mostly a dup. of bug 130079.
Yuying, since 130079 has been checked in, could you verify if this is a dup of
No, it still showing as escaped characters on 04-15 branch build / Mac 10.1.3
and 04-12 trunk build / WinME-JA. (bug 130079 was checked in both trunk and
branch build).

And the space still display as "%20" in this dialog.
reassign to darin.
This is a regression directly caused by bug 124042. 
The URL for local file is stored in system charset, instead of UTF8. There are 2
options to fix this problem:
option A, convert file path from system charset to utf8 and store the utf8
string in url. Converting back to system charset before displaying. (We may
display ucs2 string directly in title in future.)
option B, (hack) If it is file scheme, we assume it is system charset instead of
Assignee: yokoyama → darin
how critical is this bug?  should this be nsbeta1?
Priority: -- → P3
Target Milestone: --- → mozilla1.1alpha
No matter with critical or not.
It is regression.
Yuying, could you attach a screen shot?
I think this does not look good but the file is usually saved as a different
name, also it is possible to restore the original name by selecting the original
file in the save as dialog list. So my opinion is that this is not nsbeta1+.
Notice the original name in Japanese, and the one inside save as dialog is

Yes, this problem happens when "Save As" or "Save charset As(Composer)" but not
"Save".  The data can be saved properly but the name looks ugly.
The escaped string in the screen shot, it is Shift_JIS which is a system charset
for Japanese Windows. I think simply unescaping it would solve the display problem.
Shanjian, why do you think this is related to UTF-8 URI?
nhotta: one of the changes that went in w/ support for UTF-8 URI was the switch
to make GetFileBaseName not unescape the result before returning it.  this was
done because the result from GetFileBaseName has to be UTF-8.  as a result, the
caller must unescape the string if that is what is appropriate.  i tried to
fixup all of the callers, but i obviously missed this one.  if the caller is JS
code, then we have a problem because i don't believe that there is any
scriptable method to unescape an URL-escaped string... or is there?!
>this was done because the result from GetFileBaseName has to be UTF-8
but looking at the screen shot, it is not UTF-8

>i don't believe that there is any scriptable method to unescape an URL-escaped 
>string... or is there?!
unescape() uses UTF-8 as a default (and use a document charset if available)

So when the URI is escaped in not UTF-8 (but Shift_JIS) then calling unescape()
would be a problem.
Darin, in the other bug I assigned to you today, that one is in filepicker. I am
not sure about this one. In that bug, window title takes system charset only. So
we need to converted back from utf8. (In future, we should try to display
unicode string directly.)
nhotta: my point was that an URL escaped string is ASCII compatible and can
therefore be treated as UTF-8 _unless_ you unescape it.  so, the fact is..
GetFileBaseName returns bytes that can be interpreted as UTF-8.  however, once
unescaped.. there is no guarantee that the text will be UTF-8 or native-charset
If 'originCharset' is set then unescaped string can be converted to Unicode
using that value. How about adding a function to nsIURI which unescape and
convert to Unicode using originCharset? I think that would be useful to display
URI to UI.

nhotta: given that nsIURI exposes the originCharset attribute, i think it'd be
sufficient to write up a helper function (maybe on nsIIOService) that does this
conversion.  that way the many nsIURI implementations don't have to duplicate
that functionality.
Target Milestone: mozilla1.1alpha → ---
mass futuring of untargeted bugs
Target Milestone: --- → Future
*** Bug 154469 has been marked as a duplicate of this bug. ***
Still a problem in 11-07-08 build.
Assignee: darin → smontagu
Target Milestone: Future → ---
QA Contact: amyy → i18n
WORKSFORME on all platforms
Closed: 12 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.