Open Bug 293611 Opened 20 years ago Updated 6 months ago

'Web page, complete' isn't saved correctly if filename includes non-ASCII characters

Categories

(Firefox :: File Handling, defect)

x86
Windows XP
defect

Tracking

()

People

(Reporter: da_neil, Unassigned)

Details

(Keywords: intl)

Attachments

(4 files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.8) Gecko/20050421 Firefox/1.0.4 (MOOX M2) Build Identifier: Mozilla/5.0 /*any*/ (tried with Firefox aviary & trunks and Mozilla Suite 1.8b) When page is saved as 'Web-page, complete' to a file with non-ASCII characters, it can't be opened properly in other programs, except Mozilla software. The programs cannot parse path to <filename>_files directory since they treat '%' chacters inside the path as *regular percent signs* (and they are allowed in directory names!), not MIME-encoded strings. The Mozilla softeware should write HTML with proper (*not* MIME-encoded) regional (e.g. CP1251) or Unicode paths to the <filename>_files directory. Reproducible: Always Steps to Reproduce: 1. Save page to a file with non-ASCII charecters (e.g. Russian) 2. Open it in any other program, except Mozilla * Actual Results: Files is displayed without CSS and graphics. Expected Results: Files should be opened normally not only in Mozilla software. Test case: I save a page as 'forum.mozilla.ru _ Результаты поиска.htm' (Russian characters here). File name is: 'forum.mozilla.ru _ Результаты поиска.htm' Directory name: 'forum.mozilla.ru _ Результаты поиска_files' Actual path to the directory inside HTML: 'forum.mozilla.ru%20_%20%D0%E5%E7%F3%EB%FC%F2%E0%F2%FB%20%EF%EE%E8%F1%EA%E0_files' Path in HTML doesn't match with directory name, because 'forum.mozilla.ru%20_%20%D0%E5%E7%F3%EB%FC%F2%E0%F2%FB%20%EF%EE%E8%F1%EA%E0_files' and 'forum.mozilla.ru _ Результаты поиска_files' are *different* directories.
Assignee: dom-to-text → file-handling
Component: DOM to Text Conversion → File Handling
QA Contact: ian
When page is saved as 'Web-page, complete' to a file with non-ASCII characters, it can't be opened properly in other programs, except Mozilla software. The programs cannot parse path to <filename>_files directory since they treat '%' characters inside the path as *regular percent signs* (and they are allowed in directory names!), not MIME-encoded strings. The Mozilla softeware should write HTML with proper (*not* MIME-encoded) regional (e.g. CP1251) or Unicode paths to the <filename>_files directory.
Component: File Handling → DOM to Text Conversion
Component: DOM to Text Conversion → File Handling
Keywords: intl
Will this bug be fixed in 1.1? I have to use IE to save pages since it's the only browser that makes it in a proper way (Opera makes a large mess).
I'm tempted to mark this invalid. These are URIs; using URI-escaping in them should be perfectly reasonable. Unless unescaping the string gives the wrong bytes (that is, bytes in an encoding different from the page encoding)?
So the supposed bug is that the directory is (correctly) named 'forum.mozilla.ru _ Результаты поиска_files', but the HTML contains something like: <a href="forum.mozilla.ru%20_%20%D0%E5%E7%F3%EB%FC%F2%E0%F2%FB%20%EF%EE%E8%F1%EA%E0_files"> ?
(In reply to comment #4) > So the supposed bug is that the directory is (correctly) named 'forum.mozilla.ru > _ Результаты поиска_files', but the HTML contains something like: > > <a href="forum.mozilla.ru%20_%20%D0%E5%E7%F3%EB%FC%F2%E0%F2%FB%20%EF%EE%E8%F1%EA%E0_files"> ? Exactly. The only software that 'understands' such (local) URIs is from MoFo.
Er... Anything that works with URIs understands the percent-escaping part. The only part I can see having issues is if we have an encoding mismatch somewhere...
(In reply to comment #6) > Er... Anything that works with URIs understands the percent-escaping part. The > only part I can see having issues is if we have an encoding mismatch somewhere... That's not true. Such pages don't render properly in IE, Opera, Word, etc.. (See the screenshot attached).
A screenshot of same page rendered with Trident (IE) and Gecko. Screenshot was made in Maxthon.
> That's not true. Er... did you even bother TESTING before making that claim? Create the following two HTML files in a directory: test.html: --------------- <body><a href="%66oo.html">Click this</a></body> --------------- foo.html: --------------- <body>This is a test</body> --------------- ('f' is ASCII code 0x66). In my IE 5.5 over here the link in the first file works just dandy; the second file is loaded. So again, the problem is NOT the percent-escapes. It's something in the assumptions someone somewhere is making about what character encoding should be used for the bytes gotten after unescaping. I'm not saying your problem doesn't exist, just that the percent-escapes are not the issue with it.
Any progress?
Progress would be knowing what assumptions are being made by what software.
I don't know the inner work details of the browsers, but what I see from the user's POV is that pages saved by Mozilla software cannot be rendered in: - IE 6 - IE 7 Beta 2 Preview - Opera 8.5 - Opera 9 TP2 Couldn't test on other (minor) browsers. (PS: Today yet another user complained about this bug in Ru-board Mozilla support thread (http://forum.ru-board.com/topic.cgi?forum=5&topic=17868&start=720#20)..)
> but what I see from the user's POV Which doesn't help here. What would help would be an idea of what IE and company _think_ they're loading when they see that URI.
It's not a good practice shifting bug research work on users' shoulders, I thought just reporting it is enough.. =/ I haven't found any specification yet on IE/Opera. Anyway encoding Russian (*Unicode*) characters into escaped *1-byte ASCII characters* aint' a good idea. It should either encode them in proper way or not encode at all (like IE).
I didn't ask _you_ to do the legwork. But please don't add irrelevant comments to the bug if you're not working on it, ok?
(In reply to comment #6) > Er... Anything that works with URIs understands the percent-escaping part. The > only part I can see having issues is if we have an encoding mismatch somewhere... Well, looks like IE doesn't understand Mozilla's escaping. I'll attach testcase and screenshots.
Attached file Testcase (zip file)
It's slightly modified testcase from comment 9. It's zip archive with 2 files: foo.html and тест.html (second file name is written with cyrillic letters). foo.html: -------------------- <body> <a href="%D1%82%D0%B5%D1%81%D1%82.html">Click me - works in Firefox only</a> </br> <a href="тест.html">Click me - works in IE and Firefox</a> </body> -------------------- тест.html -------------------- <body>This is a test</body> --------------------
When I clicked on first link in IE 6, it showed a warning - can not display page and showed garbage in address bar
When I clicked on second link in IE 6, it had opened file correctly.
Assignee: file-handling → nobody
QA Contact: ian → file-handling
Product: Core → Firefox
Version: Trunk → unspecified
Severity: normal → S3

The severity field for this bug is relatively low, S3. However, the bug has 20 votes.
:Gijs, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(gijskruitbosch+bugs)

The last needinfo from me was triggered in error by recent activity on the bug. I'm clearing the needinfo since this is a very old bug and I don't know if it's still relevant.

Flags: needinfo?(gijskruitbosch+bugs)

Do we still need a change? Now IE is defunct and percent-encoded links work with Chrome and Edge.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: