Unable to view page source of the page that uses IDN

RESOLVED FIXED in mozilla1.8alpha2

Status

()

Core
Networking
--
minor
RESOLVED FIXED
15 years ago
13 years ago

People

(Reporter: marina, Assigned: Darin Fisher)

Tracking

({fixed-aviary1.0, fixed1.7.5, intl})

Trunk
mozilla1.8alpha2
fixed-aviary1.0, fixed1.7.5, intl
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(4 attachments, 1 obsolete attachment)

(Reporter)

Description

15 years ago
seen with 2003033103 build
Steps to reproduce:
- select an active IDN (http://南极星.com/, big5)
- go to View|page source;
- you get an error message (the screen shot of the message to follow)
(Reporter)

Comment 1

15 years ago
Created attachment 119700 [details]
this is a screen shot of the error message you get when attempting to view a page source
(Reporter)

Comment 2

15 years ago
also happens on Mac
OS: Windows XP → All
This worksforme with a 1.5b build on Win98.  Is this still a problem?
Boris:
The view-source does not work for me in 1.5rc1/WinXP. It doesn't also work for
some other Polish users with 1.4 on Win98, with 1.5b on Win2K, and with 1.5b on
Win98, too.

There's a forum thread on Polish Mozilla Forums  at
http://mozillapl.org/forum/viewtopic.php?t=3646&postdays=0&postorder=asc&start=0
with links to some existing pages with IDN domains.

(Bugzilla mangles the international characters if I write something like:
http://www.żółw.pl).

Comment 5

14 years ago
> Bugzilla mangles the international characters if I write something 
> like

[off-topic]
  It wouldn't have if you had set View | Character Coding to UTF-8(or one that
covers 'mangled' letters in your posting. However, UTF-8 is a lot better choice
because in bugs like this, the full Unicode coverage is essential) before
posting your comment. Sure, Bugzilla(at mozilla.org) should be configured to
emit 'charset=UTF-8' so that bugzilla users don' have to, which is a long
standing bug (I forgot the bug number). 

Anyway, I wish to test iDNS support, but somehow my ISP has done something
strange with their name server and iDNS doesn't work for me (I have no clue how
they can possibly do anything to make this not work..) I'll try other name
servers any way. 
Ok, I set the Unicode encoding in View/Character Coding (maybe Bugzilla should
have default encoding set to UTF-8).

The mangled URL in comment 4 was http://www.żółw.pl/ (hope this time it'll be ok ;)
interesting, I see that problem too (win2k 2003091510)
On MozillaPL's forum (see link above) Marek Wawoczny ("GmbH") writes that
changing the code in navigator.js line 1484 from:

BrowserViewSourceOfURL(webNav.currentURI.spec, docCharset, pageCookie);

to:

BrowserViewSourceOfURL(webNav.currentURI.asciiSpec, docCharset);

This seems to solve the problem, but it probably breaks something with cookies
since the pageCookie argument is left out of the 'corrected' code (and it
doesn't work when spec is simply changed to asciiSpec and pageCookie left in th
code)...
Taking.  The pagecookie arg is absolutely necessary, but I suspect I have an
idea of what's up with this.  I'll work on a patch in a few days.  I can
reproduce the bug using the URL in comment 6 on Windows; hopefully I will also
be able to do so on Linux....
Assignee: smontagu → bz-vacation
Priority: -- → P1
Target Milestone: --- → mozilla1.6alpha

Comment 10

14 years ago
Created attachment 131861 [details] [diff] [review]
Change all occurrences of .spec (->GetSpec) to .asciiSpec (->GetAspiiSpec)

This patch fixes problems with view-source, history, bookmarks, css. It changes
all occurrences of .spec (->GetSpec) to .asciiSpec (->GetAspiiSpec), I've
tested it a bit, seems to be working fine, but can't guarantee that this won't
broke anything else...
It'll likely break a good number of things in the UI, actually....
So.. one immediate problem is that nsSimpleURI does not support originCharset in
a useful way.  Darin, what do you think?  Should we make nsSimpleURI handle
origin charsets?  Or should I just switch view-source over to using asciiSpec?
(Assignee)

Comment 13

14 years ago
GetSpec is meant to be used with the presentation layer (UI).  GetAsciiSpec is
meant to be used by the low-level networking layer.  i would prefer to see a
solution to this problem that involves fixing nsSimpleURI or making view-source
use a different URI implementation.  one problem: there is no way to set the
origin charset of a nsSimpleURI.  see nsIStandardURL... nsSimpleURI should not
have to support that interface.  perhaps view-source should just use a
nsStandardURL instead.

but, wait a second... after reading the summary of this bug, i'm confused. 
origin charset is not involved really.  if we are talking about IDN, then we are
talking about the hostname portion.  IDN conversion in nsStandardURL happens for
any non-ASCII hostname independent of the origincharset.  remember: origin
charset tells us the charset that the server needs to receive.  the actual URL
data that is passed into necko may have nothing to do with this charset.

it sounds to me as if someone somewhere is improperly exposing the inner URI
referenced by a viewsource: URI.  the inner URI string should be extracted, and
then passed into NS_NewURI to construct a nsIURI representation of it.  if that
is done, then IDN should just work.

sorry if i've gone down a tangent here... haven't had enough time to review the
bug thoroughly.  hope this helps!
So the basic problem is that nsSimpleURI URL-escapes the path it's given.  See
http://lxr.mozilla.org/seamonkey/source/netwerk/base/src/nsSimpleURI.cpp#158

As a result, the hostname in the URL that view-source creates does not match the
hostname in the original URL (because apparently URL-unescaping is not performed
on the hostname?) and as a result we don't get the right cache entry, hit DNS
with this url-escaped hostname, and all is bad.

Is there really a good reason for url-escaping the path given to nsSimpleURI?

Comment 15

14 years ago
*** Bug 229516 has been marked as a duplicate of this bug. ***
I'm not likely to get to this any time in the nearest few months.  Punting to
default networking owner; nsSimpleURI needs to be less simple.
Assignee: bz-vacation → darin
Component: Internationalization → Networking
Priority: P1 → --
QA Contact: amyy → benc
Target Milestone: mozilla1.6alpha → ---
*** Bug 229721 has been marked as a duplicate of this bug. ***

Comment 18

14 years ago
I have added a test-setup: *.idn.ter.dk is set up DNS- and Apache-wise to reply.

Please visit http://זרו.idn.ter.dk/

Comment 19

14 years ago
on the page http://www.malmö.nu I can open view source, since the .nu registrar
redirects to a page on all unregistered domain names.

the page title says www.malm%3%b6.nu instead of www.malmö.nu.

Comment 20

14 years ago
*** Bug 236123 has been marked as a duplicate of this bug. ***

Comment 21

14 years ago
*** Bug 236132 has been marked as a duplicate of this bug. ***

Comment 22

14 years ago
*** Bug 236200 has been marked as a duplicate of this bug. ***
Keywords: helpwanted

Comment 23

14 years ago
Created attachment 142745 [details]
Problem also occurs on Mac Platform in Camino

I created a new bug for Camino on this because i was just searching for Camino
bugs. Additionally the Platform selected in this bug just directs the error to
the PC platform - so thats why i am posting this.

Comment 24

14 years ago
*** Bug 236254 has been marked as a duplicate of this bug. ***

Comment 25

14 years ago
(In reply to comment #23)
> Created an attachment (id=142745)
> Problem also occurs on Mac Platform in Camino
> 
> I created a new bug for Camino on this because i was just searching for Camino
> bugs. Additionally the Platform selected in this bug just directs the error to
> the PC platform - so thats why i am posting this.

Hardware -> All
Hardware: PC → All

Updated

14 years ago
Hardware: All → PC

Comment 26

14 years ago
*** Bug 236449 has been marked as a duplicate of this bug. ***

Updated

14 years ago
Hardware: PC → All

Comment 27

14 years ago
*** Bug 236544 has been marked as a duplicate of this bug. ***

Comment 28

14 years ago
*** Bug 236916 has been marked as a duplicate of this bug. ***
*** Bug 237389 has been marked as a duplicate of this bug. ***
(Assignee)

Updated

14 years ago
Blocks: 237820
Confirming this bug using Mozilla/5.0 (Windows; U; Windows NT 5.0; de-DE;
rv:1.6) Gecko/20040206 Firefox/0.8
test-URL: www.müller.ch

Comment 31

14 years ago
*** Bug 239870 has been marked as a duplicate of this bug. ***

Comment 32

14 years ago
5 of the 11 dupes have the keyword umlaut in their summary. I think it would be
a good idea to include that in the summary.
*** Bug 246007 has been marked as a duplicate of this bug. ***

Comment 34

14 years ago
Confirming the bug on Mozilla 1.7: Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.7) Gecko/20040624 Netscape/7.0, Mozilla Debian Package 1.7-2

When using proxy, it's clear whats wrong:
http://www.szalagavató.hu/ loads fine, looking at the source:

<HTML><HEAD>
<TITLE>ERROR: The requested URL could not be retrieved</TITLE>
</HEAD><BODY>
<H1>ERROR</H1>
<H2>The requested URL could not be retrieved</H2>
<HR>
<P>
While trying to retrieve the URL:
<A HREF="http://www.szalagavat%c3%b3.hu/">http://www.szalagavat%c3%b3.hu/</A>
<P>
The following error was encountered:
<UL>
<LI>

<STRONG>
Invalid URL
</STRONG>
</UL>

<P>
Some aspect of the requested URL is incorrect.  Possible problems:
<UL>
<LI>Missing or incorrect access protocol (should be `http://'' or similar)
<LI>Missing hostname
<LI>Illegal double-escape in the URL-Path
<LI>Illegal character in hostname; underscores are not allowed
</UL>
[...]
(Assignee)

Comment 35

14 years ago
So, I think Boris is right when he says that the problem is likely with the way
the view-source protocol handler handles non-ASCII characters.  It %-escapes
anything non-ASCII that appears after the first colon.  It does that because it
has no notion of an inner URL.  It should either have its own implementation of
nsIURI or it should parse the inner URL and normalize it to ASCII before
stuffing the result into a nsSimpleURI.

Here's a link to repro the bug:

view-source:http://www.szalagavató.hu/
Severity: normal → minor
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.8alpha2
(Assignee)

Comment 36

14 years ago
Created attachment 152024 [details] [diff] [review]
v1 patch

This patch implements my suggested fix.  It includes a fair amount of cleanup
in the view-source protocol handler code.  With this patch I'm able to view the
source of URLs containing an internationalized domain name.  There is still
more work to be done in view source land however since the title of the view
source window shows the ACE version of the domain name instead of the Unicode
version.  But, that requires a separate patch to the view source UI.
(Assignee)

Updated

14 years ago
Attachment #131861 - Attachment is obsolete: true
(Assignee)

Updated

14 years ago
Attachment #152024 - Flags: review?(cbiesinger)
Comment on attachment 152024 [details] [diff] [review]
v1 patch

nsNetUtil.h
+#define NS_VIEWSOURCEHANDLER_CID		      \

it'd be nice to document what this implements...

Index: protocol/viewsource/src/nsViewSourceHandler.cpp
			     const char *aCharset, // ignore charset info
that comment seems outdated
Attachment #152024 - Flags: review?(cbiesinger) → review+
(Assignee)

Comment 38

14 years ago
fixed-on-trunk
Status: ASSIGNED → RESOLVED
Last Resolved: 14 years ago
Resolution: --- → FIXED

Comment 39

14 years ago
AVIARY_1_0_20040515_BRANCH does not checkin.
Please checkin.

And in Camino, when source of http://&#39640;&#23665;&#12363;&#12365;&#12418;&#12385;.jp/ is displayed, a title is
set to http://xn--u8je4dxgy65utiwe.jp/.
This thinks that it is easy to understand the way of the same &#39640;&#23665;&#12363;&#12365;&#12418;&#12385;.jp as
the original page.
Flags: blocking-aviary1.0RC1?

Comment 40

14 years ago
*** Bug 249888 has been marked as a duplicate of this bug. ***

Comment 41

14 years ago
*** Bug 249889 has been marked as a duplicate of this bug. ***
(Assignee)

Updated

14 years ago
Attachment #152024 - Flags: approval1.7.1?
*** Bug 250106 has been marked as a duplicate of this bug. ***

Comment 43

14 years ago
A problem is still reproduced.

AVIARY_1_0_20040515_BRANCH build
Mozilla/5.0 (Windows; U; Windows NT 5.1; ja-JP; rv:1.7) Gecko/20040706
Firefox/0.9.0+

Comment 44

14 years ago
(In reply to comment #43)
> A problem is still reproduced. 
> AVIARY_1_0_20040515_BRANCH build
When fixed on aviary-1.0, it'll be noted here. You don't have to remind us that
it's not yet fixed on aviary-1.0 branch, which everybody here is aware. 

Scott, can you check this into aviary-1.0 branch? Hope the patch can be applied
cleanly to the branch. 

Comment 45

14 years ago
*** Bug 250480 has been marked as a duplicate of this bug. ***

Comment 46

14 years ago
Comment on attachment 152024 [details] [diff] [review]
v1 patch

a=mkaply
Attachment #152024 - Flags: approval1.7.2? → approval1.7.2+
*** Bug 253086 has been marked as a duplicate of this bug. ***

Comment 48

14 years ago
*** Bug 245959 has been marked as a duplicate of this bug. ***
(Assignee)

Updated

14 years ago
Attachment #152024 - Flags: approval-aviary?
(Assignee)

Comment 49

14 years ago
Created attachment 154649 [details] [diff] [review]
v1.1 patch -- simplified for the 1.7 and aviary branches
(Assignee)

Updated

14 years ago
Attachment #152024 - Flags: approval-aviary?
(Assignee)

Comment 50

14 years ago
Comment on attachment 154649 [details] [diff] [review]
v1.1 patch -- simplified for the 1.7 and aviary branches

This is a reduced version of the original patch that includes only the
necessary changes.  This is what I checked into the 1.7 branch.  (The original
patch had many conflicts with the 1.7 branch source.)
Attachment #154649 - Flags: approval-aviary?

Comment 51

14 years ago
Comment on attachment 154649 [details] [diff] [review]
v1.1 patch -- simplified for the 1.7 and aviary branches

a=asa (on behalf of the aviary drivers) for checkin to the aviary branch.
Attachment #154649 - Flags: approval-aviary? → approval-aviary+

Comment 52

14 years ago
*** Bug 254920 has been marked as a duplicate of this bug. ***
Keywords: fixed-aviary1.0
Whiteboard: need-aviary1.0

Updated

14 years ago
Flags: blocking-aviary1.0PR?

Comment 53

14 years ago
Why is this patch still not checked in to the aviary branch?

Comment 54

13 years ago
*** Bug 255955 has been marked as a duplicate of this bug. ***

Updated

13 years ago
Keywords: fixed1.7

Comment 55

13 years ago
*** Bug 260126 has been marked as a duplicate of this bug. ***

Comment 56

13 years ago
This seems NOT to work with Mozilla 1.7.3 but with Firefox 0.10.x (aviary
branch). See Bug 265395.
(In reply to comment #56)
> This seems NOT to work with Mozilla 1.7.3 but with Firefox 0.10.x (aviary
> branch). See Bug 265395.

that's because 1.7.3 is 1.7 + a few handpicked security patches. its release
notes would have told you that. this means that 1.7.3 DOES NOT contain this patch.

fixing keyword.
Keywords: fixed1.7, helpwanted → fixed1.7.x
*** Bug 265395 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.