Closed Bug 205682 Opened 21 years ago Closed 21 years ago

Save page as (complete) : URLs referred to are turned to/sent back in UTF-8

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: jshin1987, Assigned: jshin1987)

References

(
URL
)

Details

(Keywords: intl)

Attachments

(1 file, 3 obsolete files)

a patch 21 years ago Jungshik Shin 27.86 KB, patch		Details \| Diff \| Splinter Review
basically the same patch with nsACString 21 years ago Jungshik Shin 29.70 KB, patch		Details \| Diff \| Splinter Review
a new patch 21 years ago Jungshik Shin 11.49 KB, patch	bzbarsky : superreview+	Details \| Diff \| Splinter Review
a new patch (per bz's comment) 21 years ago Jungshik Shin 9.58 KB, patch	adamlock : review+ chofmann : approval1.5b+	Details \| Diff \| Splinter Review

Jungshik Shin

Assignee

Description

•

21 years ago

Mozilla sends URLs in UTF-8 when an inline image with a non-UTF-8 filename in a
non-UTF-8 document is right-clicked on and 'view image' is selected. The same is
true of background images. 

How to reproduce :

Go to  http://jshin.net/moztest/download.html.
In the page (in EUC-KR), right-click on the second blue ball (next to
'KO : EUC-KR : raw 8bit') and select 'view image'. Mozilla sends the URL
in url-escaped UTF-8. On the other hand, if you do the same on the third
blueball (<img src="url-escaped image name in EUC-KR.png">), Mozilla
sends the URL in url-escaped EUC-KR. 

---------------
Because viewing an in-line image in the current window  is not
speed-critical(unlike in general cases discussed in bug 129726), we can get away
with trying once more with the origin charset if the first request (with UTF-8
url) fails. 

Javascript code that invokes this is at 
http://lxr.mozilla.org/seamonkey/source/xpfe/communicator/resources/content/nsContextMenu.js#604
http://lxr.mozilla.org/seamonkey/source/xpfe/communicator/resources/content/utilityOverlay.js#213

There's already some fallback mechanism for failed requests, but afaik, it falls
back to the locale charset instead of origincharset. I'm writing from memory and
I have to check again. Related to this bug is bug 199237.

Jungshik Shin

Assignee

Comment 1

•

21 years ago

There's another symptom which is  more serious. When 'Web page complete' is
chosen in 'Save Page As' dialog box, images with names in the encoding 'native
to the document' are not downloaded because URLs are sent back in UTF-8, but the
server doesn't know how to handle URLs in UTF-8 because its filesystem charset
is not UTF-8. As a result, web pages saved with 'web page complete' do not
include images. This would surprise users because they are able to view images,
but can't save them using 'Save Page As' and "Save image as'

Summary: view image/view background image sends URL in UTF-8 → once loaded, image urls are sent back in UTF-8 (view image, save page as, save image as)

Jungshik Shin

Assignee

Comment 2

•

21 years ago

This bug  was fixed by bz's recent nsIURI-related   
check-in(s) ( bug 47534. I thought there's another   
bug in which |originCharset| was made to be set explicitly, but couldn't find.   
bz, can you tell me which one is?). That is, contextual menu 'view image' and   
'save image' work fine with non-ASCII urls. (Try it at  
http://i18nl10n.com/dl.html).   
 
However, now 'save page (as complete)' doesn't work. Files referred to in a 
html document with _relative_ urls are not downloaded/saved but instead  their 
urls (relative) are turned absolute in the downloaded/saved version of the 
html document.  I'm going to file a new bug on that.

Status: NEW → RESOLVED

Closed: 21 years ago

Resolution: --- → FIXED

Boris Zbarsky [:bzbarsky]

Comment 3

•

21 years ago

Probably bug 166996?  That's what added the function the checkin for bug 47534
uses...

What's up with the persistence stuff?  Is it broken only on non-ASCII uris?  Or
all URIs?  Is it something that broke with the bug 47534 checkin?

If this is only broken for non-ASCII, does hacking all the NS_NewURI calls in
nsWebBrowserPersist to pass the page charset of whatever page it's failing on
fix things?   (I hope it does....)  In any case, please cc me on this bug that
you file.

Jungshik Shin

Assignee

Comment 4

•

21 years ago

I'm reopening with a revised summary line because only 'view image'/'save  
image' was fixed.   
  
'Save page as' (added in comment #1) issue  is still with us.  :-)  
  
Thanks, bz. Yeah, it's bug 166996.   
  
As for the persistent stuff, I was mistaken. It turned out that Mozilla always  
turned URLs (referred in <a href="....">) to absolute URLs and never  
downloaded referred documents (i.e. Mozilla is not a 'recursive batch-mode  
downloader'. there's a bug filed for this *feature*) although inline images  
are downloaded and saved. So, there's nothing wrong with turning relative urls  
to absolute urls and fixes for bug 166996 and bug 47534 did NOT introduce a 
new bug. (they just fixed a part of this bug.) 
  
However, when converting (relative) urls to absolute urls, it also turns URL  
specs to UTF-8 (url-escaped). It also tries to download inline images with  
UTF-8 URLs only to get an error message returned from the server. A bit  
interesting is that Mozilla saves 'the error page' (in html) as if it's an  
image file and ends up with  a strange link like '<img src="dl_files/a.html">  
  
I added PR_LOG() to nsStandardURL::Init() as suggested by Darin in bug 150376 
and found that |originCharset| is sometimes 'EUC-KR' (doccharset), but other 
times it's null for files with non-ASCII names.  I suspect it's null whenever 
a url spec (urlstring) is passed from xpfe code to load/open it.

URL: http://i18nl10n.com/dl.html

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Boris Zbarsky [:bzbarsky]

Comment 5

•

21 years ago

> However, when converting (relative) urls to absolute urls, it also turns URL  
> specs to UTF-8 (url-escaped). 

Yeah; that could be the NS_NewURI calls I mentioned in comment 3.  Does hacking
them to pass "EUC-KR" (literal string for now) as the charset fix the EUC-KR
tescase?  Just trying to establish which code needs to be fixed, exactly... ;)

Jungshik Shin

Assignee

Comment 6

•

21 years ago

Thanks. I added "EUC-KR" to all 4 calls in NS_NewURI() in the file and it 
worked in ko_KR.EUC-KR, ja_JP.EUC-JP and ko_KR.UTF-8 locales.  Now I have to 
figure out how to get the value to use when calling NS_NewURI() and which one 
to  invoke with |charset|   
 
 
BTW, non-ASCII inline images are saved into 'a.gif' (and presumably 'b.gif', 
'c.gif', and so forth) instead of their original names (url-escaped. It's not 
clear what to do when the local filesystem charset is different from the 
charset of a document to save) This should be a separate bug, anyway.

Summary: once loaded, image urls are sent back in UTF-8 (view image, save page as, save image as) → Save page as (complete) : URLs referred to are turned to/sent back in UTF-8

Boris Zbarsky [:bzbarsky]

Comment 7

•

21 years ago

The persistance object only does this whole fixup thing if it's told to
saveDocument, so there should be a document object floating around somewhere to
get the charset off of....

Jungshik Shin

Assignee

Comment 8

•

21 years ago

Attached patch a patch (obsolete) — Details — Splinter Review

It took a bit more extensive change than I thought at first. Anyway, it works
as intended. There may be still some rough edges. I used 'const char*' for
|a(Origin)Charset| (partly because that's what NS_NewURI accepts), but changing
some of them to |nsACString| may be better. 

Any comment would be welcome.

Jungshik Shin

Assignee

Comment 9

•

21 years ago

Attached patch basically the same patch with nsACString (obsolete) — Details — Splinter Review

Jungshik Shin

Assignee

Comment 10

•

21 years ago

Comment on attachment 128053 [details] [diff] [review]
basically the same patch with nsACString

Asking for r/sr.

I'll replace the following comment to FixupNode() 

+   * The charset of a document is passed along so that the charset conversion 
+   * is done, if necessary, when fixing up nodes.

with

The charset of .... passed along so that the uri attribute of the fixed-up node
will be created with |originCharset| set to the charset.

Attachment #128053 - Flags: superreview?(bzbarsky)

Attachment #128053 - Flags: review?(adamlock)

Boris Zbarsky [:bzbarsky]

Comment 11

•

21 years ago

Why pass the charset around?  Why not just set it as a member at the same time
as we set mCurrentBaseURI or something?

btw, check that adam is still reading his @netscape.com email address?

Jungshik Shin

Assignee

Comment 12

•

21 years ago

Attached patch a new patch (obsolete) — Details — Splinter Review

Instead of passing around the charset, I use member variables, mCurrentCharset
and mCharset as suggested by bz.

Attachment #127800 - Attachment is obsolete: true

Attachment #128053 - Attachment is obsolete: true

Jungshik Shin

Assignee

Updated

•

21 years ago

Attachment #128053 - Flags: superreview?(bzbarsky)

Attachment #128053 - Flags: review?(adamlock)

Jungshik Shin

Assignee

Comment 13

•

21 years ago

Comment on attachment 128609 [details] [diff] [review]
a new patch

It seems that adamlock's new email address (adamlock @ eircom.net) is not yet
in bugzilla. 

bz, do you know anyone else I can ask for r?

Attachment #128609 - Flags: superreview?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 14

•

21 years ago

Comment on attachment 128609 [details] [diff] [review]
a new patch

>Index: embedding/components/webbrowserpersist/src/nsWebBrowserPersist.cpp
>     // Persist the main document
>     nsCOMPtr<nsIDocument> doc(do_QueryInterface(aDocument));
>+    NS_ENSURE_TRUE(doc, NS_ERROR_UNEXPECTED);

Don't add this -- it's not going to happen.  We already ensure aDocument is
non-null and the QI will never fail for a document object.

>         // Save the document
>         nsCOMPtr<nsIDocument> docAsDoc = do_QueryInterface(aDocument);
>+        NS_ENSURE_TRUE(docAsDoc, NS_ERROR_UNEXPECTED);

Same.

>+    rv = NS_NewURI(getter_AddRefs(uri), 
>+                   nsDependentCString(((nsCStringKey *) aKey)->GetString()),
>+                   (data->mCharset).get());

How about nsDependentCString(((nsCStringKey*)aKey)->GetString(),
			     ((nsCStringKey*)aKey)->GetStringLength())
?

Also, data->mCharset.get() should work, no?  No need for parens there.

>-    nsIDOMNode *aNode, const char *aAttribute, PRBool aNeedsPersisting,
>+    nsIDOMNode *aNode, const char *aAttribute, PRBool aNeedsPersisting, 

Don't add spaces at ends of lines.

>+    nsresult rv = NS_NewURI(getter_AddRefs(uri), NS_ConvertUTF16toUTF8(aURI), 
>+                            mCurrentCharset.get(), mCurrentBaseURI);

Just pass aURI without converting; NS_NewURI will convert as and if needed.

sr=me with those changes.

Adam is really the only person qualified to do the r=, in my opinion.  Just
mail him at whatever address works and as for review if bugzilla has not been
updated to that address yet.

Attachment #128609 - Flags: superreview?(bzbarsky) → superreview+

Jungshik Shin

Assignee

Comment 15

•

21 years ago

Attached patch a new patch (per bz's comment) — Details — Splinter Review

Before asking Adam for r, I thought it might be easier if I upload a new patch
with bz's concerns all addressed.

Attachment #128609 - Attachment is obsolete: true

Jungshik Shin

Assignee

Comment 16

•

21 years ago

Comment on attachment 129359 [details] [diff] [review]
a new patch (per bz's comment)

Assuming bz's sr still stands, I'm now asking for r.

Attachment #129359 - Flags: review?(adamlock)

Adam Lock

Comment 17

•

21 years ago

Comment on attachment 129359 [details] [diff] [review]
a new patch (per bz's comment)

r=adamlock

Looks good to me

Attachment #129359 - Flags: review?(adamlock) → review+

Jungshik Shin

Assignee

Comment 18

•

21 years ago

Comment on attachment 129359 [details] [diff] [review]
a new patch (per bz's comment)

Thanks, Adam. 

Asking for a1.5b. This is a rather low-risk and simple patch that fixes
problems with saving html documents with images/frames/iframes (in non-ASCII
filenames). I've been using this patch for a couple of weeks now on my build
and it's worked fine on both Linux/Win(well, there's nothing platform
specific). A lot of international users (who want to save non-ASCII documents)
would be glad to see this fixed.

This attachment doesn't have sr stamp, but this is an update to the previous
version addressing the concerns of bz who gave me sr.

Attachment #129359 - Flags: approval1.5b?

chris hofmann

Comment 19

•

21 years ago

Comment on attachment 129359 [details] [diff] [review]
a new patch (per bz's comment)

approved for 1.5 landing

Attachment #129359 - Flags: approval1.5b? → approval1.5b+

Jungshik Shin

Assignee

Comment 20

•

21 years ago

Thank you all. Fix got landed in the trunk.

Status: REOPENED → RESOLVED

Closed: 21 years ago → 21 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.