Closed
Bug 205682
Opened 22 years ago
Closed 21 years ago
Save page as (complete) : URLs referred to are turned to/sent back in UTF-8
Categories
(Core :: Networking, defect)
Core
Networking
Tracking
()
RESOLVED
FIXED
People
(Reporter: jshin1987, Assigned: jshin1987)
References
()
Details
(Keywords: intl)
Attachments
(1 file, 3 obsolete files)
9.58 KB,
patch
|
adamlock
:
review+
chofmann
:
approval1.5b+
|
Details | Diff | Splinter Review |
Mozilla sends URLs in UTF-8 when an inline image with a non-UTF-8 filename in a
non-UTF-8 document is right-clicked on and 'view image' is selected. The same is
true of background images.
How to reproduce :
Go to http://jshin.net/moztest/download.html.
In the page (in EUC-KR), right-click on the second blue ball (next to
'KO : EUC-KR : raw 8bit') and select 'view image'. Mozilla sends the URL
in url-escaped UTF-8. On the other hand, if you do the same on the third
blueball (<img src="url-escaped image name in EUC-KR.png">), Mozilla
sends the URL in url-escaped EUC-KR.
---------------
Because viewing an in-line image in the current window is not
speed-critical(unlike in general cases discussed in bug 129726), we can get away
with trying once more with the origin charset if the first request (with UTF-8
url) fails.
Javascript code that invokes this is at
http://lxr.mozilla.org/seamonkey/source/xpfe/communicator/resources/content/nsContextMenu.js#604
http://lxr.mozilla.org/seamonkey/source/xpfe/communicator/resources/content/utilityOverlay.js#213
There's already some fallback mechanism for failed requests, but afaik, it falls
back to the locale charset instead of origincharset. I'm writing from memory and
I have to check again. Related to this bug is bug 199237.
Assignee | ||
Comment 1•22 years ago
|
||
There's another symptom which is more serious. When 'Web page complete' is
chosen in 'Save Page As' dialog box, images with names in the encoding 'native
to the document' are not downloaded because URLs are sent back in UTF-8, but the
server doesn't know how to handle URLs in UTF-8 because its filesystem charset
is not UTF-8. As a result, web pages saved with 'web page complete' do not
include images. This would surprise users because they are able to view images,
but can't save them using 'Save Page As' and "Save image as'
Summary: view image/view background image sends URL in UTF-8 → once loaded, image urls are sent back in UTF-8 (view image, save page as, save image as)
Assignee | ||
Comment 2•22 years ago
|
||
This bug was fixed by bz's recent nsIURI-related
check-in(s) ( bug 47534. I thought there's another
bug in which |originCharset| was made to be set explicitly, but couldn't find.
bz, can you tell me which one is?). That is, contextual menu 'view image' and
'save image' work fine with non-ASCII urls. (Try it at
http://i18nl10n.com/dl.html).
However, now 'save page (as complete)' doesn't work. Files referred to in a
html document with _relative_ urls are not downloaded/saved but instead their
urls (relative) are turned absolute in the downloaded/saved version of the
html document. I'm going to file a new bug on that.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Comment 3•22 years ago
|
||
Probably bug 166996? That's what added the function the checkin for bug 47534
uses...
What's up with the persistence stuff? Is it broken only on non-ASCII uris? Or
all URIs? Is it something that broke with the bug 47534 checkin?
If this is only broken for non-ASCII, does hacking all the NS_NewURI calls in
nsWebBrowserPersist to pass the page charset of whatever page it's failing on
fix things? (I hope it does....) In any case, please cc me on this bug that
you file.
Assignee | ||
Comment 4•22 years ago
|
||
I'm reopening with a revised summary line because only 'view image'/'save
image' was fixed.
'Save page as' (added in comment #1) issue is still with us. :-)
Thanks, bz. Yeah, it's bug 166996.
As for the persistent stuff, I was mistaken. It turned out that Mozilla always
turned URLs (referred in <a href="....">) to absolute URLs and never
downloaded referred documents (i.e. Mozilla is not a 'recursive batch-mode
downloader'. there's a bug filed for this *feature*) although inline images
are downloaded and saved. So, there's nothing wrong with turning relative urls
to absolute urls and fixes for bug 166996 and bug 47534 did NOT introduce a
new bug. (they just fixed a part of this bug.)
However, when converting (relative) urls to absolute urls, it also turns URL
specs to UTF-8 (url-escaped). It also tries to download inline images with
UTF-8 URLs only to get an error message returned from the server. A bit
interesting is that Mozilla saves 'the error page' (in html) as if it's an
image file and ends up with a strange link like '<img src="dl_files/a.html">
I added PR_LOG() to nsStandardURL::Init() as suggested by Darin in bug 150376
and found that |originCharset| is sometimes 'EUC-KR' (doccharset), but other
times it's null for files with non-ASCII names. I suspect it's null whenever
a url spec (urlstring) is passed from xpfe code to load/open it.
Comment 5•22 years ago
|
||
> However, when converting (relative) urls to absolute urls, it also turns URL
> specs to UTF-8 (url-escaped).
Yeah; that could be the NS_NewURI calls I mentioned in comment 3. Does hacking
them to pass "EUC-KR" (literal string for now) as the charset fix the EUC-KR
tescase? Just trying to establish which code needs to be fixed, exactly... ;)
Assignee | ||
Comment 6•22 years ago
|
||
Thanks. I added "EUC-KR" to all 4 calls in NS_NewURI() in the file and it
worked in ko_KR.EUC-KR, ja_JP.EUC-JP and ko_KR.UTF-8 locales. Now I have to
figure out how to get the value to use when calling NS_NewURI() and which one
to invoke with |charset|
BTW, non-ASCII inline images are saved into 'a.gif' (and presumably 'b.gif',
'c.gif', and so forth) instead of their original names (url-escaped. It's not
clear what to do when the local filesystem charset is different from the
charset of a document to save) This should be a separate bug, anyway.
Summary: once loaded, image urls are sent back in UTF-8 (view image, save page as, save image as) → Save page as (complete) : URLs referred to are turned to/sent back in UTF-8
Comment 7•22 years ago
|
||
The persistance object only does this whole fixup thing if it's told to
saveDocument, so there should be a document object floating around somewhere to
get the charset off of....
Assignee | ||
Comment 8•22 years ago
|
||
It took a bit more extensive change than I thought at first. Anyway, it works
as intended. There may be still some rough edges. I used 'const char*' for
|a(Origin)Charset| (partly because that's what NS_NewURI accepts), but changing
some of them to |nsACString| may be better.
Any comment would be welcome.
Assignee | ||
Comment 9•22 years ago
|
||
Assignee | ||
Comment 10•22 years ago
|
||
Comment on attachment 128053 [details] [diff] [review]
basically the same patch with nsACString
Asking for r/sr.
I'll replace the following comment to FixupNode()
+ * The charset of a document is passed along so that the charset conversion
+ * is done, if necessary, when fixing up nodes.
with
The charset of .... passed along so that the uri attribute of the fixed-up node
will be created with |originCharset| set to the charset.
Attachment #128053 -
Flags: superreview?(bzbarsky)
Attachment #128053 -
Flags: review?(adamlock)
Comment 11•22 years ago
|
||
Why pass the charset around? Why not just set it as a member at the same time
as we set mCurrentBaseURI or something?
btw, check that adam is still reading his @netscape.com email address?
Assignee | ||
Comment 12•22 years ago
|
||
Instead of passing around the charset, I use member variables, mCurrentCharset
and mCharset as suggested by bz.
Attachment #127800 -
Attachment is obsolete: true
Attachment #128053 -
Attachment is obsolete: true
Assignee | ||
Updated•22 years ago
|
Attachment #128053 -
Flags: superreview?(bzbarsky)
Attachment #128053 -
Flags: review?(adamlock)
Assignee | ||
Comment 13•22 years ago
|
||
Comment on attachment 128609 [details] [diff] [review]
a new patch
It seems that adamlock's new email address (adamlock @ eircom.net) is not yet
in bugzilla.
bz, do you know anyone else I can ask for r?
Attachment #128609 -
Flags: superreview?(bzbarsky)
Comment 14•22 years ago
|
||
Comment on attachment 128609 [details] [diff] [review]
a new patch
>Index: embedding/components/webbrowserpersist/src/nsWebBrowserPersist.cpp
> // Persist the main document
> nsCOMPtr<nsIDocument> doc(do_QueryInterface(aDocument));
>+ NS_ENSURE_TRUE(doc, NS_ERROR_UNEXPECTED);
Don't add this -- it's not going to happen. We already ensure aDocument is
non-null and the QI will never fail for a document object.
> // Save the document
> nsCOMPtr<nsIDocument> docAsDoc = do_QueryInterface(aDocument);
>+ NS_ENSURE_TRUE(docAsDoc, NS_ERROR_UNEXPECTED);
Same.
>+ rv = NS_NewURI(getter_AddRefs(uri),
>+ nsDependentCString(((nsCStringKey *) aKey)->GetString()),
>+ (data->mCharset).get());
How about nsDependentCString(((nsCStringKey*)aKey)->GetString(),
((nsCStringKey*)aKey)->GetStringLength())
?
Also, data->mCharset.get() should work, no? No need for parens there.
>- nsIDOMNode *aNode, const char *aAttribute, PRBool aNeedsPersisting,
>+ nsIDOMNode *aNode, const char *aAttribute, PRBool aNeedsPersisting,
Don't add spaces at ends of lines.
>+ nsresult rv = NS_NewURI(getter_AddRefs(uri), NS_ConvertUTF16toUTF8(aURI),
>+ mCurrentCharset.get(), mCurrentBaseURI);
Just pass aURI without converting; NS_NewURI will convert as and if needed.
sr=me with those changes.
Adam is really the only person qualified to do the r=, in my opinion. Just
mail him at whatever address works and as for review if bugzilla has not been
updated to that address yet.
Attachment #128609 -
Flags: superreview?(bzbarsky) → superreview+
Assignee | ||
Comment 15•21 years ago
|
||
Before asking Adam for r, I thought it might be easier if I upload a new patch
with bz's concerns all addressed.
Attachment #128609 -
Attachment is obsolete: true
Assignee | ||
Comment 16•21 years ago
|
||
Comment on attachment 129359 [details] [diff] [review]
a new patch (per bz's comment)
Assuming bz's sr still stands, I'm now asking for r.
Attachment #129359 -
Flags: review?(adamlock)
Comment 17•21 years ago
|
||
Comment on attachment 129359 [details] [diff] [review]
a new patch (per bz's comment)
r=adamlock
Looks good to me
Attachment #129359 -
Flags: review?(adamlock) → review+
Assignee | ||
Comment 18•21 years ago
|
||
Comment on attachment 129359 [details] [diff] [review]
a new patch (per bz's comment)
Thanks, Adam.
Asking for a1.5b. This is a rather low-risk and simple patch that fixes
problems with saving html documents with images/frames/iframes (in non-ASCII
filenames). I've been using this patch for a couple of weeks now on my build
and it's worked fine on both Linux/Win(well, there's nothing platform
specific). A lot of international users (who want to save non-ASCII documents)
would be glad to see this fixed.
This attachment doesn't have sr stamp, but this is an update to the previous
version addressing the concerns of bz who gave me sr.
Attachment #129359 -
Flags: approval1.5b?
Comment 19•21 years ago
|
||
Comment on attachment 129359 [details] [diff] [review]
a new patch (per bz's comment)
approved for 1.5 landing
Attachment #129359 -
Flags: approval1.5b? → approval1.5b+
Assignee | ||
Comment 20•21 years ago
|
||
Thank you all. Fix got landed in the trunk.
Status: REOPENED → RESOLVED
Closed: 22 years ago → 21 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•