Open Bug 439616 Opened 17 years ago Updated 4 years ago

Redirect of IDN - Domains

Categories

(Core :: Networking: HTTP, defect, P5)

defect

Tracking

()

Tracking Status
blocking2.0 --- -

People

(Reporter: db, Unassigned)

References

()

Details

(Whiteboard: [necko-backlog])

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.1.14) Gecko/20080404 Firefox/2.0.0.14 <meta http-equiv="refresh" content="0; url=http://www.fübar.de" /> -> alert with "url is invalid" PHP: header("Location: http://www.fübar.de"); -> opens: http://www.f%c3%bcbar.de/ Reproducible: Always Steps to Reproduce: self-explanatory Expected Results: call the right url? =) a "document.location.href" works fine
Reporter: Is this still an issue with FF3.0.3 ?
<meta http-equiv="refresh" content="0; url=http://www.fübar.de" /> FF3.0.3 -> redirect to 'www.fübar.de' -> OK <?php header("Location: http://www.fübar.de"); ?> FF3.0.3 -> redirect to 'http://www.f%fcbar.de/' IE7 -> redirect to 'http://www.fbar.de/' -- For Info: PHP: header — Send a raw HTTP header (http://php.net/header) [maybe this is no bug .. because characters like 'ü' not allowed in the http protocol? (i dunno)] The ACE Notation header("Location: http://www.xn--fbar-0ra.de"); FF3.0.3 -> redirect to 'www.fübar.de' -- <script type="text/javascript"> document.location.href="http://www.fübar.de"; </script> Still works fine
Confirming. This is a problem with websites which return UTF-8 encoded URIs in their Location redirects, such as the bit.ly link in the URL field. With this bug present, the bit.ly (or other URL shorteners) links to such domains cannot be opened, even if the user presses Reload after seeing the failed page, because Firefox tries to query the wrong DNS name. The only work-around is going to the location bar which contains the expanded URL and hit Enter, which is not intuitive. The same URLs work in Chrome. I'm confirming this bug with a Major priority, since it effectively prevents users from visiting certain websites.
Severity: normal → major
Status: UNCONFIRMED → NEW
blocking2.0: --- → ?
Component: General → Networking: HTTP
Ever confirmed: true
OS: Windows XP → All
Product: Firefox → Core
QA Contact: general → networking.http
Hardware: x86 → All
Version: unspecified → Trunk
> This is a problem with websites which return UTF-8 encoded URIs in > their Location redirects Which is a violation of the HTTP RFC, no? The value of Location is an absoluteURI (RFC 2616 section 14.30), which is defined in RFC 2396 as: absoluteURI = scheme ":" ( hier_part | opaque_part ) hier_part = ( net_path | abs_path ) [ "?" query ] net_path = "//" authority [ abs_path ] authority = server | reg_name reg_name = 1*( unreserved | escaped | "$" | "," | ";" | ":" | "@" | "&" | "=" | "+" ) So sending unescaped non-ASCII bytes is actually not allowed in the Location header. Of course error handling if that's done is not defined either. It sounds like Chrome is treating Location values as IRIs instead of URIs. What we do is to take the URI given (as bytes), escape any non-ASCII bytes using URI-escaping so that we don't violate the HTTP spec ourselves, and use the resulting URI. Note that comment 2 indicates that PHP in that case is sending the non-ASCII char encoded as ISO-8859-1, not as UTF-8. Also note that it's not consistent with comment 0 in terms of the PHP behavior. The upshot of all of which is that httpbis needs to define what's supposed to happen here. Right now (as of http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-08 ) it seems to match RFC 2616 on the matter.
(In reply to comment #4) > > This is a problem with websites which return UTF-8 encoded URIs in > > their Location redirects > > Which is a violation of the HTTP RFC, no? AFAICT, yes. > The value of Location is an > absoluteURI (RFC 2616 section 14.30), which is defined in RFC 2396 as: > > absoluteURI = scheme ":" ( hier_part | opaque_part ) > hier_part = ( net_path | abs_path ) [ "?" query ] > net_path = "//" authority [ abs_path ] > authority = server | reg_name > reg_name = 1*( unreserved | escaped | "$" | "," | > ";" | ":" | "@" | "&" | "=" | "+" ) > > So sending unescaped non-ASCII bytes is actually not allowed in the Location > header. Of course error handling if that's done is not defined either. Yes, this is also true to the best of my knowledge. > It sounds like Chrome is treating Location values as IRIs instead of URIs. Are IRIs a full superset of URIs? If yes, then is it a good decision to us to treat the Location value as an IRI as well? > What we do is to take the URI given (as bytes), escape any non-ASCII bytes > using URI-escaping so that we don't violate the HTTP spec ourselves, and use > the resulting URI. > > Note that comment 2 indicates that PHP in that case is sending the non-ASCII > char encoded as ISO-8859-1, not as UTF-8. Also note that it's not consistent > with comment 0 in terms of the PHP behavior. I'm not 100% sure, but AFAIK, PHP (at least up to PHP5) doesn't have any special notion of UTF-8 for the most part, so it the source code file is encoded in UTF-8, and contains something like: <?php header('Location: http://دامنه.کام/path'); ?> then PHP will end up sending the raw bytes between the quote characters as a header, which will map to UTF-8 codepoints. > The upshot of all of which is that httpbis needs to define what's supposed to > happen here. Right now (as of > http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-08 ) it seems to > match RFC 2616 on the matter. Can we consider this as a webcompat issue, and deviate from the spec for that reason? I know that most browsers treat relative URIs for the Location field as URIs pointing to resources on the same domain relative to the current request's location, but I can't find anywhere in the HTTP RFC which defines what should happen in this case. Can this be considered a similar issue?
> Are IRIs a full superset of URIs? I believe that any valid URI is a valid IRI, yes. But I'm not exactly an expert on that stuff. > is it a good decision to us to treat the Location value as an IRI as well _That_ I don't know. > then PHP will end up sending the raw bytes between the quote characters as a > header That would explain the comment 0 vs comment 2 mess, yes. > Can we consider this as a webcompat issue We could, but it should still get specced in httpbis. Is UTF-8 the common case for these headers when they're non-ascii? Does it depend on the part of the URI (certainly that's the case in other places in IE)? Data needed... Data on how IE and Safari and Opera handle this would be good too. Also of interest is how the "actual" charset of the Location header lines up with the origin charset of the current request's URI... In general, they're uncorrelated, but what happens in practice? In general, I have no problem aligning with other browsers on UTF-8 here, as long as it doesn't break existing consumers. > but I can't find anywhere in the HTTP RFC which defines what should happen in > this case There isn't anything; relative URIs in Location are not valid HTTP 1.1.
What if we look for escaped characters inside the domain name, and convert the domain to the ACE notation if it has any? Does that make any sense? I don't think that in that case we will be breaking any existing consumers.
You mean like bug 412457 or bug 309671?
(In reply to comment #8) > You mean like bug 412457 or bug 309671? Yes!
Not blocking 1.9.3 on this but I marked bug 412457 as wanted for 1.9.3. Am I correct in assuming that fixing that bug alone would fix this problem as well? If so, should we dupe this?
blocking2.0: ? → -
Whiteboard: [necko-backlog]
Priority: -- → P1
Priority: P1 → P3

Bulk-downgrade of unassigned, >=3 years untouched DOM/Storage bug's priority.

If you have reason to believe this is wrong, please write a comment and ni :jstutte.

Severity: major → S4
Priority: P3 → P5
You need to log in before you can comment on or make changes to this bug.