Closed
Bug 76799
Opened 23 years ago
Closed 22 years ago
[FIX]File...Save As is not unescaping filenames (%20)
Categories
(Core Graveyard :: File Handling, defect)
Core Graveyard
File Handling
Tracking
(Not tracked)
VERIFIED
FIXED
mozilla1.2alpha
People
(Reporter: greenrd, Assigned: bzbarsky)
References
Details
Attachments
(1 file)
1.04 KB,
patch
|
Brade
:
review+
jag+mozilla
:
superreview+
|
Details | Diff | Splinter Review |
From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux 2.2.19 i686; en-US; 0.8.1) Gecko/20010419 If a file contains the ~ character it will be saved to the wrong directory when you try to save it. Reproducible: Always Steps to Reproduce: 1. Open the URL above in mozilla 2. pick File: Save As from the menubar 3. pick a directory other than / 4. Press Save Actual Results: File is saved under / with mangled filename. Expected Results: I know that ~ is a special character under Unix and Linux, so mangling might be necessary, but it shouldn't save to the wrong directory.
Assignee | ||
Comment 2•23 years ago
|
||
~ is a perfectly valid character in UNIX filenames. It's only special to the various shells.
I'll have a look.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Fixing url.
Comment 5•23 years ago
|
||
This bug is related to all URL-encoded characters and it also appears in the 0.9.1-milestone Win32-build. Example: if I want to save a file named "IDon'tCare.mp3" then it is displayed in the "Save As..." box as "IDon%27tCare".
Comment 6•23 years ago
|
||
Ignore previous comment: added to wrong bug#.
Setting target milestone. A testcase is at http://www.illsley.org/bug85780.htm.
Target Milestone: --- → mozilla1.0
Comment 10•23 years ago
|
||
What I want to know is if we have code that does local filesystem translation, and if so, how up-to-snuff it is. We have had a couple problems with ":" handling in MacOS HFS-style filesystems (which is the delimiter).
Summary: Filenames containing ~ are incorrectly mangled when file saved from browser → File...Save As is not unescaping filenames
Comment 11•23 years ago
|
||
-> file handling (bug & qa). Testcase in duplicate bug.
Component: Networking: File → File Handling
QA Contact: benc → sairuh
Updated•23 years ago
|
Status: NEW → ASSIGNED
Target Milestone: mozilla1.0 → mozilla0.9.9
Comment 14•23 years ago
|
||
Things this bug bites on is when saving mp3 files, which often have lots of spaces in them. A good example of this is. 04%20Transmission%20on%20JJJ%20-%2023Feb02%20-%20Italic,%20pH,%20Chromatic.mp3 which is actually 04 Transmission on JJJ - 23Feb02 - Italic, pH, Chromatic.mp3 There are a series of these of similar length. Know I know much bug fixers hate people to bring up the "but in IE it does this", but IE correctly escapes the spaces. Comment 10 mentions having OS specific "safe" file unescapers, but perhaps in the mean time, the most common ones could be escaped. ( Ones that are commonly known to be safe on all file systems ) Or failing that, do a simple implementation for the set of obvious platforms ( Win32, Unix ), and leave a spot for the others?
Comment 15•23 years ago
|
||
nsbeta1- per ADT triage team
Assignee | ||
Comment 16•22 years ago
|
||
*** Bug 129351 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 18•22 years ago
|
||
*** Bug 132127 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 19•22 years ago
|
||
*** Bug 133746 has been marked as a duplicate of this bug. ***
Comment 20•22 years ago
|
||
*** Bug 138915 has been marked as a duplicate of this bug. ***
Updated•22 years ago
|
Hardware: PC → All
Comment 21•22 years ago
|
||
*** Bug 146724 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 22•22 years ago
|
||
*** Bug 153197 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 23•22 years ago
|
||
*** Bug 154214 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 24•22 years ago
|
||
Brian, are you actually working on this? Or should I take it?
Assignee | ||
Comment 25•22 years ago
|
||
*** Bug 77475 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 26•22 years ago
|
||
*** Bug 140997 has been marked as a duplicate of this bug. ***
Comment 27•22 years ago
|
||
I think the problem lies in the correct conversion between file-urls and OS-specific filenames presented in the file-picker. Usage of the ESC_FORCED mask comes to my mind.
Assignee | ||
Comment 28•22 years ago
|
||
Andreas, care to clarify that? What is "ESC_FORCED" and what does one do with it? All the code in question does is to call GetFileName (gets the "filename" prop) on the nsIURL and then pass that to the filepicker. The nsIURL documentation explicitly says that the return value can have some escaped chars in it...
Comment 29•22 years ago
|
||
Typical escaping of urls is smart in a way that it trys to detect if a char is already escaped or not and if it is it does not escape it again. Consider a *filename* abc%20ef which contains chars that already look like escaped chars. Converted into a *fileurl* it would end up like "file:///path/abc%20ef. Usually file urls get unescaped before presented in a directory listing as filenames, so we would get "abc ef" which is not the original filename. To prevent this from happening you can use the esc_Forced mask on escaping (for example when converting from filename to fileurl) when you know you deal with a filename. This way the fileurl would look like "file:///path/abc%2520ef which when unescaped results in a filename "abc%20ef", which is much better. My guess is that we have something like this here.
Assignee | ||
Comment 30•22 years ago
|
||
Well... Let's put it this way. There is never a fileurl being explicitly used in any of this code. This code takes the URI of the thing we are getting and QIs it to an nsIURL. It then gets the .filename of the nsIURL (which is almost always an HTTP url, I must add). It takes this string and gives it to the filepicker. All the code is in JS. It never does any escaping or unescaping. I just read over the bug, and I'm confused. The initial bug was very definitely not about URL-escaping issues. Robin, are you still seeing that problem?
Comment 31•22 years ago
|
||
Okay, but it saves to a local file. Please take into account all the other cases you marked as duplicates. Sometimes it is about escaping, sometimes about unescaping, always it seems to be about saving to a local file.
Assignee | ||
Comment 32•22 years ago
|
||
The local file is just an nsIFile, there are no fileurls involved. And the local file is created _after_ the filepicker stage (the filepicker creates it). I thought all the bugs I marked dup of this one were about the fact that we show an escaped version of the filename in the URI when we put up the filepicker. If any of the bugs I marked dup were _not_ about this, please reopen and cc me on them and I will investigate...
Summary: File...Save As is not unescaping filenames → File...Save As is not unescaping filenames (%20)
Comment 33•22 years ago
|
||
*** Bug 158575 has been marked as a duplicate of this bug. ***
Comment 34•22 years ago
|
||
*** Bug 137752 has been marked as a duplicate of this bug. ***
Comment 35•22 years ago
|
||
Perhaps some clarification is needed. (This is how *I'M* reading the problem, I could be wrong.) --The URL is a very bad example, since it doesn't exist ^_^ (You'll need a Unicode/ISO-2022-JP font for some of these examples.) When a file contains a space or any other non-alphanumeric character (é, ü, ®, ½, etc) it remains escaped (for these characters, %E9, %FC, %AE, and %BD, respectively). Likewise, for any upper-level ISO-2022 or Unicode characters (漢, 日, 本, 語, etc.), the files are saved with the escape codes (for these characters, in UTF-8 (Unicode), %E6%BC%A2, %E6%97%A5, %E6%9C%AC, and %E8%AA%9E, respectively). So, if you tried to save a file (theoretically) that was called " 日本語能力試験.lha" from a website, you would be downloading %E6%97%A5%E6%9C%AC%E8%AA%9E%E8%83%BD%E5%8A%9B%E8%A9%A6%E9%A8%93.lha. Even in the ISO-8859-1 charset, for non-alphanumeric characters, you would get, for trying to download the file "México.zip", M%E9xico.zip. See the problem? (It's very frustrating to have to manually fix these files instead of just having Mozilla automatically -- and properly -- unescape them.
Comment 36•22 years ago
|
||
Um, sorry about that, it appears Bugzilla doesn't like non-ISO-8859-1 characters and converted those into decimal escaped characters. >.< (You can still see my ISO-8859-1 example, though. ^_^)
Assignee | ||
Comment 37•22 years ago
|
||
> automatically -- and properly -- unescape them
This is actually quite difficult to do. Consider your example:
%E6%97%A5
is that UTF8? Or IS0-8859-1? All we have there is the bits, not the encoding
they are encoded in.... And URLs never contain the encoding information needed
to properly escape them.
So at the moment, what's needed is a way to figure out what the "proper"
unescaping is. Then we can write the code to do it.... I can do the latter,
but I'm stymied on the former; suggestions welcome.
Comment 38•22 years ago
|
||
ISO (o, not zero)-8859-1 only has one bit. In any other cases, Mozilla knows the encoding of the pages when they load (either through manually selecting View > Character Encoding, header information (all proper pages have their encoding information as a Content-Type header), or auto-detection by Mozilla), and as such, they should be able to unescape correctly. (The lovely thing about Unicode is that it doesn't matter what language it's in -- the CJK unified characters are assigned to the WORD/CONCEPT, and not to what the CHARACTER looks like.) In any case, simple escaping should be done AT LEAST for ISO-8859-1 until all character encodings can be implemented (again, quite a simple affair, with all the selection, auto-detection, and Content-Type headers).
Assignee | ||
Comment 39•22 years ago
|
||
This fixes all the dups that have actual testcases (99% of them are just %20 <--> space). It's possible this will fail on non-western pages, but I do not have a testcase offhand; one would be appreciated.
Comment 40•22 years ago
|
||
bz: I'm ok with the patch (r=brade) if it works in this scenario: create a local file and name it "print%25land.html" (I often add printing percentages to file I print often) (put a shell of html in it?) open the local file in the browser and save as to a different dir (same name). I expect the new file to have to be print%25land.html (not print%land.html)
Assignee | ||
Comment 41•22 years ago
|
||
Yep. Without the patch the suggested name is "print%2525land.html", with it it's "print%25land.html"
Updated•22 years ago
|
Attachment #92685 -
Flags: review+
Comment 42•22 years ago
|
||
Comment on attachment 92685 [details] [diff] [review] Silly patch r=brade (hurray!)
Comment 43•22 years ago
|
||
Assigning to Boris since he has a patch (no, I don't think I would have gotten to this in the near future).
Assignee: bryner → bzbarsky
Status: ASSIGNED → NEW
Comment 44•22 years ago
|
||
http://www.solon.org/cgi-bin/j-e/tty/dosearch?sDict=on&H=PS&L=E&T=japanese&WC=none Clicking on any of the links will give you a graphic with what the filename should look like. Unfortunately, my FTP client wasn't cooperating (seems it doesn't like non-Western encoding), so I couldn't get things uploaded for a "real" test, and the particular way that server works (for Western-only browsers, which was the only way I could get non-Western images out of it) it doesn't have any encoding specified. :/
Comment 45•22 years ago
|
||
Comment on attachment 92685 [details] [diff] [review] Silly patch sr=jag
Attachment #92685 -
Flags: superreview+
Assignee | ||
Comment 46•22 years ago
|
||
Colin, thanks. What happens on that page is that unescape() fails (even if I manually switch the page encoding to EUC-JP, which looks like the right one), throws an exception, and we fall through to using the link text as the filename. Which is better than what we were doing before, I guess. I think that we should land this at the beginning of the 1.2 cycle and look for a decent way to unescape/decode non-ISO-8859-1 content....
Summary: File...Save As is not unescaping filenames (%20) → [FIX]File...Save As is not unescaping filenames (%20)
Comment 47•22 years ago
|
||
Actually, that script looks like it uses Shift_JIS encoding. (Don't quote me on that, though.)
Assignee | ||
Comment 48•22 years ago
|
||
Nope. EUC-JP is the one that shows the right stuff in the browser status bar... It occurs to me that we should consider doing whatever that does; I'll try to dig it up.
Comment 49•22 years ago
|
||
Yeah, I noticed the status bar does correctly display these things. Like I said, between the manual selection of encoding, the Content-Type header, and everything else, this should be an easy thing to squash. THAT SAID, there needs to be conversion for non-ISO-8859-1 files to Unicode (Western Windows) or whatever the system uses for the non-Western language. (J-Windows uses JIS, bla bla.)
Comment 50•22 years ago
|
||
Oh, also, could someone please change status to CONFIRMED?
Comment 51•22 years ago
|
||
Yeah, I noticed the status bar does correctly display these things. Like I said, between the manual selection of encoding, the Content-Type header, and everything else, this should be an easy thing to squash. THAT SAID, there needs to be conversion for non-ISO-8859-1 files to Unicode (Western Windows) or whatever the system uses for the non-Western language. (J-Windows uses JIS, bla bla.) Oh, also, could someone please change the status of this bug to CONFIRMED? Thaaanks.
Updated•22 years ago
|
Assignee | ||
Comment 52•22 years ago
|
||
*** Bug 160977 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 53•22 years ago
|
||
Fix checked in. bug 161242 filed on the remaining intl-related issues. The original problem here (%20) is certainly fixed.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 54•22 years ago
|
||
*** Bug 161335 has been marked as a duplicate of this bug. ***
Comment 55•22 years ago
|
||
url no longer exists. i made a simple test here: http://hopey.mcom.com/tests/unescaped%20name.html it has a literal whitespace. but when pasting, in the urlbar or in tab labels, the whitespace appears as %20. however, when saving the file (as well as viewing it in an html directory listing), the whitespace is preserved (ie, %20 is unescaped). vrfy'd fixed with 2002.09.16.08 comm trunk builds (all platforms).
Status: RESOLVED → VERIFIED
Comment 56•22 years ago
|
||
*** Bug 171626 has been marked as a duplicate of this bug. ***
Comment 57•22 years ago
|
||
*** Bug 172862 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 58•22 years ago
|
||
*** Bug 177927 has been marked as a duplicate of this bug. ***
Updated•8 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•