Closed Bug 76799 Opened 23 years ago Closed 22 years ago

[FIX]File...Save As is not unescaping filenames (%20)

Categories

(Core Graveyard :: File Handling, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.2alpha

People

(Reporter: greenrd, Assigned: bzbarsky)

References

Details

Attachments

(1 file)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux 2.2.19 i686; en-US; 0.8.1) Gecko/20010419

If a file contains the ~ character it will be saved to the wrong directory when
you try to save it.

Reproducible: Always
Steps to Reproduce:
1. Open the URL above in mozilla
2. pick File: Save As from the menubar
3. pick a directory other than /
4. Press Save


Actual Results:  File is saved under / with mangled filename.

Expected Results:  I know that ~ is a special character under Unix and Linux, so
mangling might be necessary, but it shouldn't save to the wrong directory.
bill, is this yours?
Assignee: dougt → law
~ is a perfectly valid character in UNIX filenames. It's only special to the
various shells.
I'll have a look.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
This bug is related to all URL-encoded characters and it also appears in the
0.9.1-milestone Win32-build.
Example: if I want to save a file named "IDon'tCare.mp3" then it is displayed in
the "Save As..." box as "IDon%27tCare".
Ignore previous comment: added to wrong bug#.
Is this really file?
*** Bug 85780 has been marked as a duplicate of this bug. ***
Setting target milestone.  A testcase is at http://www.illsley.org/bug85780.htm.
Target Milestone: --- → mozilla1.0
What I want to know is if we have code that does local filesystem translation, 
and if so, how up-to-snuff it is. We have had a couple problems with ":" 
handling in MacOS HFS-style filesystems (which is the delimiter).
Summary: Filenames containing ~ are incorrectly mangled when file saved from browser → File...Save As is not unescaping filenames
QA Contact: tever → benc
-> file handling (bug & qa).

Testcase in duplicate bug.
Component: Networking: File → File Handling
QA Contact: benc → sairuh
->bryner
Assignee: law → bryner
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
Target Milestone: mozilla1.0 → mozilla0.9.9
Nominating for nsbeta1 triage.
Keywords: nsbeta1
Things this bug bites on is when saving mp3 files, which often have lots of
spaces in them.

A good example of this is.
04%20Transmission%20on%20JJJ%20-%2023Feb02%20-%20Italic,%20pH,%20Chromatic.mp3

which is actually

04 Transmission on JJJ - 23Feb02 - Italic, pH, Chromatic.mp3

There are a series of these of similar length. Know I know much bug fixers hate
people to bring up the "but in IE it does this", but IE correctly escapes the
spaces. 

Comment 10 mentions having OS specific "safe" file unescapers, but perhaps in
the mean time, the most common ones could be escaped. ( Ones that are commonly
known to be safe on all file systems )

Or failing that, do a simple implementation for the set of obvious platforms (
Win32, Unix ), and leave a spot for the others?


nsbeta1- per ADT triage team
Keywords: nsbeta1nsbeta1-
Target Milestone: mozilla0.9.9 → mozilla1.2
*** Bug 129351 has been marked as a duplicate of this bug. ***
OS -> all (many of the dupes are windows)
OS: Linux → All
*** Bug 132127 has been marked as a duplicate of this bug. ***
*** Bug 133746 has been marked as a duplicate of this bug. ***
*** Bug 138915 has been marked as a duplicate of this bug. ***
Hardware: PC → All
*** Bug 146724 has been marked as a duplicate of this bug. ***
*** Bug 153197 has been marked as a duplicate of this bug. ***
*** Bug 154214 has been marked as a duplicate of this bug. ***
Brian, are you actually working on this?  Or should I take it?
*** Bug 77475 has been marked as a duplicate of this bug. ***
*** Bug 140997 has been marked as a duplicate of this bug. ***
I think the problem lies in the correct conversion between file-urls and
OS-specific filenames presented in the file-picker. Usage of the ESC_FORCED mask
comes to my mind.
Andreas, care to clarify that?  What is "ESC_FORCED" and what does one do with it?

All the code in question does is to call GetFileName (gets the "filename" prop)
on the nsIURL and then pass that to the filepicker.  The nsIURL documentation
explicitly says that the return value can have some escaped chars in it...
Typical escaping of urls is smart in a way that it trys to detect if a char is
already escaped or not and if it is it does not escape it again.

Consider a *filename* abc%20ef which contains chars that already look like
escaped chars. Converted into a *fileurl* it would end up like
"file:///path/abc%20ef. Usually file urls get unescaped before presented in a
directory listing as filenames, so we would get "abc ef" which is not the
original filename.

To prevent this from happening you can use the esc_Forced mask on escaping (for
example when converting from filename to fileurl) when you know you deal with a
filename. This way the fileurl would look like "file:///path/abc%2520ef which
when unescaped results in a filename "abc%20ef", which is much better.

My guess is that we have something like this here.
Well... Let's put it this way.  There is never a fileurl being explicitly used
in any of this code.

This code takes the URI of the thing we are getting and QIs it to an nsIURL.  It
then gets the .filename of the nsIURL (which is almost always an HTTP url, I
must add).  It takes this string and gives it to the filepicker.  All the code
is in JS.  It never does any escaping or unescaping.

I just read over the bug, and I'm confused.  The initial bug was very definitely
not about URL-escaping issues.  Robin, are you still seeing that problem?
Okay, but it saves to a local file. Please take into account all the other cases
you marked as duplicates. Sometimes it is about escaping, sometimes about
unescaping, always it seems to be about saving to a local file.
The local file is just an nsIFile, there are no fileurls involved.  And the
local file is created _after_ the filepicker stage (the filepicker creates it).

I thought all the bugs I marked dup of this one were about the fact that we show
an escaped version of the filename in the URI when we put up the filepicker.  If
any of the bugs I marked dup were _not_ about this, please reopen and cc me on
them and I will investigate...
Summary: File...Save As is not unescaping filenames → File...Save As is not unescaping filenames (%20)
*** Bug 158575 has been marked as a duplicate of this bug. ***
*** Bug 137752 has been marked as a duplicate of this bug. ***
Perhaps some clarification is needed. (This is how *I'M* reading the problem, I
could be wrong.) --The URL is a very bad example, since it doesn't exist ^_^

(You'll need a Unicode/ISO-2022-JP font for some of these examples.)

When a file contains a space or any other non-alphanumeric character (é, ü, ®,
½, etc) it remains escaped (for these characters, %E9, %FC, %AE, and %BD,
respectively). Likewise, for any upper-level ISO-2022 or Unicode characters (漢,
日, 本, 語, etc.), the files are saved with the escape codes (for these
characters, in UTF-8 (Unicode), %E6%BC%A2, %E6%97%A5, %E6%9C%AC, and %E8%AA%9E,
respectively). So, if you tried to save a file (theoretically) that was called "
日本語能力試験.lha" from a website, you would be downloading
%E6%97%A5%E6%9C%AC%E8%AA%9E%E8%83%BD%E5%8A%9B%E8%A9%A6%E9%A8%93.lha. Even in the
ISO-8859-1 charset, for non-alphanumeric characters, you would get, for trying
to download the file "México.zip", M%E9xico.zip. See the problem? (It's very
frustrating to have to manually fix these files instead of just having Mozilla
automatically -- and properly -- unescape them.
Um, sorry about that, it appears Bugzilla doesn't like non-ISO-8859-1 characters
and converted those into decimal escaped characters. >.< (You can still see my
ISO-8859-1 example, though. ^_^)
> automatically -- and properly -- unescape them

This is actually quite difficult to do.  Consider your example:

%E6%97%A5

is that UTF8? Or IS0-8859-1?  All we have there is the bits, not the encoding
they are encoded in....  And URLs never contain the encoding information needed
to properly escape them.

So at the moment, what's needed is a way to figure out what the "proper"
unescaping is.  Then we can write the code to do it....  I can do the latter,
but I'm stymied on the former; suggestions welcome.
ISO (o, not zero)-8859-1 only has one bit. In any other cases, Mozilla knows the
encoding of the pages when they load (either through manually selecting View >
Character Encoding, header information (all proper pages have their encoding
information as a Content-Type header), or auto-detection by Mozilla), and as
such, they should be able to unescape correctly. (The lovely thing about Unicode
is that it doesn't matter what language it's in -- the CJK unified characters
are assigned to the WORD/CONCEPT, and not to what the CHARACTER looks like.) In
any case, simple escaping should be done AT LEAST for ISO-8859-1 until all
character encodings can be implemented (again, quite a simple affair, with all
the selection, auto-detection, and Content-Type headers).
Attached patch Silly patchSplinter Review
This fixes all the dups that have actual testcases (99% of them are just %20
<--> space).  It's possible this will fail on non-western pages, but I do not
have a testcase offhand; one would be appreciated.
bz: I'm ok with the patch (r=brade) if it works in this scenario:
  create a local file and name it "print%25land.html"
     (I often add printing percentages to file I print often)
  (put a shell of html in it?)
  open the local file in the browser and save as to a different dir (same name).

I expect the new file to have to be print%25land.html (not print%land.html)
Yep.  Without the patch the suggested name is "print%2525land.html", with it
it's "print%25land.html"
Attachment #92685 - Flags: review+
Comment on attachment 92685 [details] [diff] [review]
Silly patch

r=brade (hurray!)
Assigning to Boris since he has a patch (no, I don't think I would have gotten
to this in the near future).
Assignee: bryner → bzbarsky
Status: ASSIGNED → NEW
http://www.solon.org/cgi-bin/j-e/tty/dosearch?sDict=on&H=PS&L=E&T=japanese&WC=none

Clicking on any of the links will give you a graphic with what the filename
should look like. Unfortunately, my FTP client wasn't cooperating (seems it
doesn't like non-Western encoding), so I couldn't get things uploaded for a
"real" test, and the particular way that server works (for Western-only
browsers, which was the only way I could get non-Western images out of it) it
doesn't have any encoding specified. :/
Comment on attachment 92685 [details] [diff] [review]
Silly patch

sr=jag
Attachment #92685 - Flags: superreview+
Colin, thanks.  What happens on that page is that unescape() fails (even if I
manually switch the page encoding to EUC-JP, which looks like the right one),
throws an exception, and we fall through to using the link text as the filename.
 Which is better than what we were doing before, I guess.

I think that we should land this at the beginning of the 1.2 cycle and look for
a decent way to unescape/decode non-ISO-8859-1 content....
Summary: File...Save As is not unescaping filenames (%20) → [FIX]File...Save As is not unescaping filenames (%20)
Actually, that script looks like it uses Shift_JIS encoding. (Don't quote me on
that, though.)
Nope.  EUC-JP is the one that shows the right stuff in the browser status bar...
It occurs to me that we should consider doing whatever that does; I'll try to
dig it up.
Yeah, I noticed the status bar does correctly display these things. Like I said,
between the manual selection of encoding, the Content-Type header, and
everything else, this should be an easy thing to squash. THAT SAID, there needs
to be conversion for non-ISO-8859-1 files to Unicode (Western Windows) or
whatever the system uses for the non-Western language. (J-Windows uses JIS, bla
bla.)
Oh, also, could someone please change status to CONFIRMED?
Yeah, I noticed the status bar does correctly display these things. Like I said,
between the manual selection of encoding, the Content-Type header, and
everything else, this should be an easy thing to squash. THAT SAID, there needs
to be conversion for non-ISO-8859-1 files to Unicode (Western Windows) or
whatever the system uses for the non-Western language. (J-Windows uses JIS, bla
bla.)

Oh, also, could someone please change the status of this bug to CONFIRMED? Thaaanks.
*** Bug 160977 has been marked as a duplicate of this bug. ***
Fix checked in.  bug 161242 filed on the remaining intl-related issues.  The
original problem here (%20) is certainly fixed.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
*** Bug 161335 has been marked as a duplicate of this bug. ***
url no longer exists. i made a simple test here:

http://hopey.mcom.com/tests/unescaped%20name.html

it has a literal whitespace. but when pasting, in the urlbar or in tab labels,
the whitespace appears as %20.

however, when saving the file (as well as viewing it in an html directory
listing), the whitespace is preserved (ie, %20 is unescaped).

vrfy'd fixed with 2002.09.16.08 comm trunk builds (all platforms).
Status: RESOLVED → VERIFIED
*** Bug 171626 has been marked as a duplicate of this bug. ***
*** Bug 172862 has been marked as a duplicate of this bug. ***
*** Bug 177927 has been marked as a duplicate of this bug. ***
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: