Closed Bug 201040 Opened 22 years ago Closed 21 years ago

Unable to view page source of the page that uses IDN

Categories

(Core :: Networking, defect)

defect
Not set
minor

Tracking

()

RESOLVED FIXED
mozilla1.8alpha2

People

(Reporter: marina, Assigned: darin.moz)

References

()

Details

(Keywords: fixed-aviary1.0, fixed1.7.5, intl)

Attachments

(4 files, 1 obsolete file)

seen with 2003033103 build Steps to reproduce: - select an active IDN (http://南极星.com/, big5) - go to View|page source; - you get an error message (the screen shot of the message to follow)
also happens on Mac
OS: Windows XP → All
This worksforme with a 1.5b build on Win98. Is this still a problem?
Boris: The view-source does not work for me in 1.5rc1/WinXP. It doesn't also work for some other Polish users with 1.4 on Win98, with 1.5b on Win2K, and with 1.5b on Win98, too. There's a forum thread on Polish Mozilla Forums at http://mozillapl.org/forum/viewtopic.php?t=3646&postdays=0&postorder=asc&start=0 with links to some existing pages with IDN domains. (Bugzilla mangles the international characters if I write something like: http://www.żółw.pl).
> Bugzilla mangles the international characters if I write something > like [off-topic] It wouldn't have if you had set View | Character Coding to UTF-8(or one that covers 'mangled' letters in your posting. However, UTF-8 is a lot better choice because in bugs like this, the full Unicode coverage is essential) before posting your comment. Sure, Bugzilla(at mozilla.org) should be configured to emit 'charset=UTF-8' so that bugzilla users don' have to, which is a long standing bug (I forgot the bug number). Anyway, I wish to test iDNS support, but somehow my ISP has done something strange with their name server and iDNS doesn't work for me (I have no clue how they can possibly do anything to make this not work..) I'll try other name servers any way.
Ok, I set the Unicode encoding in View/Character Coding (maybe Bugzilla should have default encoding set to UTF-8). The mangled URL in comment 4 was http://www.żółw.pl/ (hope this time it'll be ok ;)
interesting, I see that problem too (win2k 2003091510)
On MozillaPL's forum (see link above) Marek Wawoczny ("GmbH") writes that changing the code in navigator.js line 1484 from: BrowserViewSourceOfURL(webNav.currentURI.spec, docCharset, pageCookie); to: BrowserViewSourceOfURL(webNav.currentURI.asciiSpec, docCharset); This seems to solve the problem, but it probably breaks something with cookies since the pageCookie argument is left out of the 'corrected' code (and it doesn't work when spec is simply changed to asciiSpec and pageCookie left in th code)...
Taking. The pagecookie arg is absolutely necessary, but I suspect I have an idea of what's up with this. I'll work on a patch in a few days. I can reproduce the bug using the URL in comment 6 on Windows; hopefully I will also be able to do so on Linux....
Assignee: smontagu → bz-vacation
Priority: -- → P1
Target Milestone: --- → mozilla1.6alpha
This patch fixes problems with view-source, history, bookmarks, css. It changes all occurrences of .spec (->GetSpec) to .asciiSpec (->GetAspiiSpec), I've tested it a bit, seems to be working fine, but can't guarantee that this won't broke anything else...
It'll likely break a good number of things in the UI, actually....
So.. one immediate problem is that nsSimpleURI does not support originCharset in a useful way. Darin, what do you think? Should we make nsSimpleURI handle origin charsets? Or should I just switch view-source over to using asciiSpec?
GetSpec is meant to be used with the presentation layer (UI). GetAsciiSpec is meant to be used by the low-level networking layer. i would prefer to see a solution to this problem that involves fixing nsSimpleURI or making view-source use a different URI implementation. one problem: there is no way to set the origin charset of a nsSimpleURI. see nsIStandardURL... nsSimpleURI should not have to support that interface. perhaps view-source should just use a nsStandardURL instead. but, wait a second... after reading the summary of this bug, i'm confused. origin charset is not involved really. if we are talking about IDN, then we are talking about the hostname portion. IDN conversion in nsStandardURL happens for any non-ASCII hostname independent of the origincharset. remember: origin charset tells us the charset that the server needs to receive. the actual URL data that is passed into necko may have nothing to do with this charset. it sounds to me as if someone somewhere is improperly exposing the inner URI referenced by a viewsource: URI. the inner URI string should be extracted, and then passed into NS_NewURI to construct a nsIURI representation of it. if that is done, then IDN should just work. sorry if i've gone down a tangent here... haven't had enough time to review the bug thoroughly. hope this helps!
So the basic problem is that nsSimpleURI URL-escapes the path it's given. See http://lxr.mozilla.org/seamonkey/source/netwerk/base/src/nsSimpleURI.cpp#158 As a result, the hostname in the URL that view-source creates does not match the hostname in the original URL (because apparently URL-unescaping is not performed on the hostname?) and as a result we don't get the right cache entry, hit DNS with this url-escaped hostname, and all is bad. Is there really a good reason for url-escaping the path given to nsSimpleURI?
*** Bug 229516 has been marked as a duplicate of this bug. ***
I'm not likely to get to this any time in the nearest few months. Punting to default networking owner; nsSimpleURI needs to be less simple.
Assignee: bz-vacation → darin
Component: Internationalization → Networking
Priority: P1 → --
QA Contact: amyy → benc
Target Milestone: mozilla1.6alpha → ---
*** Bug 229721 has been marked as a duplicate of this bug. ***
I have added a test-setup: *.idn.ter.dk is set up DNS- and Apache-wise to reply. Please visit http://זרו.idn.ter.dk/
on the page http://www.malmö.nu I can open view source, since the .nu registrar redirects to a page on all unregistered domain names. the page title says www.malm%3%b6.nu instead of www.malmö.nu.
*** Bug 236123 has been marked as a duplicate of this bug. ***
*** Bug 236132 has been marked as a duplicate of this bug. ***
*** Bug 236200 has been marked as a duplicate of this bug. ***
Keywords: helpwanted
I created a new bug for Camino on this because i was just searching for Camino bugs. Additionally the Platform selected in this bug just directs the error to the PC platform - so thats why i am posting this.
*** Bug 236254 has been marked as a duplicate of this bug. ***
(In reply to comment #23) > Created an attachment (id=142745) > Problem also occurs on Mac Platform in Camino > > I created a new bug for Camino on this because i was just searching for Camino > bugs. Additionally the Platform selected in this bug just directs the error to > the PC platform - so thats why i am posting this. Hardware -> All
Hardware: PC → All
Hardware: All → PC
*** Bug 236449 has been marked as a duplicate of this bug. ***
Hardware: PC → All
*** Bug 236544 has been marked as a duplicate of this bug. ***
*** Bug 236916 has been marked as a duplicate of this bug. ***
*** Bug 237389 has been marked as a duplicate of this bug. ***
Blocks: 237820
Confirming this bug using Mozilla/5.0 (Windows; U; Windows NT 5.0; de-DE; rv:1.6) Gecko/20040206 Firefox/0.8 test-URL: www.müller.ch
*** Bug 239870 has been marked as a duplicate of this bug. ***
5 of the 11 dupes have the keyword umlaut in their summary. I think it would be a good idea to include that in the summary.
*** Bug 246007 has been marked as a duplicate of this bug. ***
Confirming the bug on Mozilla 1.7: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040624 Netscape/7.0, Mozilla Debian Package 1.7-2 When using proxy, it's clear whats wrong: http://www.szalagavató.hu/ loads fine, looking at the source: <HTML><HEAD> <TITLE>ERROR: The requested URL could not be retrieved</TITLE> </HEAD><BODY> <H1>ERROR</H1> <H2>The requested URL could not be retrieved</H2> <HR> <P> While trying to retrieve the URL: <A HREF="http://www.szalagavat%c3%b3.hu/">http://www.szalagavat%c3%b3.hu/</A> <P> The following error was encountered: <UL> <LI> <STRONG> Invalid URL </STRONG> </UL> <P> Some aspect of the requested URL is incorrect. Possible problems: <UL> <LI>Missing or incorrect access protocol (should be `http://'' or similar) <LI>Missing hostname <LI>Illegal double-escape in the URL-Path <LI>Illegal character in hostname; underscores are not allowed </UL> [...]
So, I think Boris is right when he says that the problem is likely with the way the view-source protocol handler handles non-ASCII characters. It %-escapes anything non-ASCII that appears after the first colon. It does that because it has no notion of an inner URL. It should either have its own implementation of nsIURI or it should parse the inner URL and normalize it to ASCII before stuffing the result into a nsSimpleURI. Here's a link to repro the bug: view-source:http://www.szalagavató.hu/
Severity: normal → minor
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.8alpha2
Attached patch v1 patchSplinter Review
This patch implements my suggested fix. It includes a fair amount of cleanup in the view-source protocol handler code. With this patch I'm able to view the source of URLs containing an internationalized domain name. There is still more work to be done in view source land however since the title of the view source window shows the ACE version of the domain name instead of the Unicode version. But, that requires a separate patch to the view source UI.
Attachment #131861 - Attachment is obsolete: true
Attachment #152024 - Flags: review?(cbiesinger)
Comment on attachment 152024 [details] [diff] [review] v1 patch nsNetUtil.h +#define NS_VIEWSOURCEHANDLER_CID \ it'd be nice to document what this implements... Index: protocol/viewsource/src/nsViewSourceHandler.cpp const char *aCharset, // ignore charset info that comment seems outdated
Attachment #152024 - Flags: review?(cbiesinger) → review+
fixed-on-trunk
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
AVIARY_1_0_20040515_BRANCH does not checkin. Please checkin. And in Camino, when source of http://&#39640;&#23665;&#12363;&#12365;&#12418;&#12385;.jp/ is displayed, a title is set to http://xn--u8je4dxgy65utiwe.jp/. This thinks that it is easy to understand the way of the same &#39640;&#23665;&#12363;&#12365;&#12418;&#12385;.jp as the original page.
Flags: blocking-aviary1.0RC1?
*** Bug 249888 has been marked as a duplicate of this bug. ***
*** Bug 249889 has been marked as a duplicate of this bug. ***
Attachment #152024 - Flags: approval1.7.1?
*** Bug 250106 has been marked as a duplicate of this bug. ***
A problem is still reproduced. AVIARY_1_0_20040515_BRANCH build Mozilla/5.0 (Windows; U; Windows NT 5.1; ja-JP; rv:1.7) Gecko/20040706 Firefox/0.9.0+
(In reply to comment #43) > A problem is still reproduced. > AVIARY_1_0_20040515_BRANCH build When fixed on aviary-1.0, it'll be noted here. You don't have to remind us that it's not yet fixed on aviary-1.0 branch, which everybody here is aware. Scott, can you check this into aviary-1.0 branch? Hope the patch can be applied cleanly to the branch.
*** Bug 250480 has been marked as a duplicate of this bug. ***
Comment on attachment 152024 [details] [diff] [review] v1 patch a=mkaply
Attachment #152024 - Flags: approval1.7.2? → approval1.7.2+
*** Bug 253086 has been marked as a duplicate of this bug. ***
*** Bug 245959 has been marked as a duplicate of this bug. ***
Attachment #152024 - Flags: approval-aviary?
Attachment #152024 - Flags: approval-aviary?
Comment on attachment 154649 [details] [diff] [review] v1.1 patch -- simplified for the 1.7 and aviary branches This is a reduced version of the original patch that includes only the necessary changes. This is what I checked into the 1.7 branch. (The original patch had many conflicts with the 1.7 branch source.)
Attachment #154649 - Flags: approval-aviary?
Comment on attachment 154649 [details] [diff] [review] v1.1 patch -- simplified for the 1.7 and aviary branches a=asa (on behalf of the aviary drivers) for checkin to the aviary branch.
Attachment #154649 - Flags: approval-aviary? → approval-aviary+
*** Bug 254920 has been marked as a duplicate of this bug. ***
Flags: blocking-aviary1.0PR?
Why is this patch still not checked in to the aviary branch?
*** Bug 255955 has been marked as a duplicate of this bug. ***
Keywords: fixed1.7
*** Bug 260126 has been marked as a duplicate of this bug. ***
This seems NOT to work with Mozilla 1.7.3 but with Firefox 0.10.x (aviary branch). See Bug 265395.
(In reply to comment #56) > This seems NOT to work with Mozilla 1.7.3 but with Firefox 0.10.x (aviary > branch). See Bug 265395. that's because 1.7.3 is 1.7 + a few handpicked security patches. its release notes would have told you that. this means that 1.7.3 DOES NOT contain this patch. fixing keyword.
*** Bug 265395 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: