We have all seen the SPAM that comes with URL's obfuscated by changing an IP address from it's 4-octet style to a simple decimal number (like http://3486011863 instead of http://www.mozilla.org), or in octal (like http://00000000317.00000000310.00000000121.00000000327/), or in base 256 notation (like http://4294967503.4294967496.4294967377.4294967511/). Why Netscape and Internet Explorer follow these type of URL's is beyond me. It would go along way in the battle against spammers to simply have Mozilla check if a URL is either a decimal number in quad notation, or a fully qualified domain name before it follows it.
I don't think we should block these urls. It's much easier for me to remember 2259499800 than it is for me to remember 126.96.36.199, and it isn't much more obfuscated. Converting anything that looks like an IP address to the canonical xxx.xxx.xxx.xxx when following a link might make sense, but it could break things, and I don't think there would be a great benefit because the xxx.xxx.xxx.xxx form doesn't mean anything to most users either.
Remembering a 10 digit number more easily than a set of 4 numbers seems to fly in the face of all my knowledge about human memory and probably in incidence terms is less common than odd number bases being used to obfuscate an IP address. That said, not supporting them would do what? Ignore them silently, put up a message box? S
This must break compliance with at least one RFC, surely? :-) I'm against. Gerv
Mozilla's current behaviour is clearly in violation of RFC2396, which specifies that a host name MUST be either a domain name or an IPv4 dotted quad.
RFC 2396 says in section 3.2.2: The host is a domain name of a network host, or its IPv4 address as a set of four decimal digit groups separated by ".". Literal IPv6 addresses are not supported. hostport = host [ ":" port ] host = hostname | IPv4address hostname = *( domainlabel "." ) toplabel [ "." ] domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum toplabel = alpha | alpha *( alphanum | "-" ) alphanum and somewhat earlier: alphanum = alpha | digit I believe all digits are legal.
Digits are legal in a host name, so just see if the host portion of a URI is a valid dotted quad, and if it's not then just send it through DNS like any other host name. That would cause all of the above URI's to fail just as http://987.474.264.712 would (unless one of those numbers happens to be a hostname on a LAN).
Simple decimal URLs also can allow people to bypass blocked (blacklisted by filtering software) addresses. That might be considered a feature, not a bug.
That might be a beneficial side effect of Mozilla following obfuscated URL's, but I think that by and large it [following obfuscated URI's] is not helpful. It shouldn't be the job of Mozilla to ensure that it gives people a way to bypass filtering software. What it is commonly used for prevents many users from determining which domain, or IP subnet, a spam-vertized host is on. A lot of users are now clueful enough to send LARTs to email@example.com, and a few to do a whois on the IP. I don't think there are any legitimate reasons to follow these type of URI's (although I disagree with the principle of censorware, it is technically not legit to bypass them on system where the admin has installed them).
All digits are legal but a hostname made up only of digits isn't. Look at that grammar more closely. host = hostname | IPv4address hostname = *( domainlabel "." ) toplabel [ "." ] domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum toplabel = alpha | alpha *( alphanum | "-" ) alphanum hostname must contain a toplabel, and toplabel must begin with an alpha. If the URI is all numbers and it's not decimal IPv4 dotted quad, it's illegal.
OK, I'm usually wrong with respect to grammar. Jerry: what do you mean by "just send it through DNS?" You really have to use the OS equivalent of gethostbyname. On Linux, gethostbyname resolves all-digit names. It concerns me that the current behavior is pervasive. Perhaps there is some obscure RFC that demands it. This needs investigation.
Parsing octal-obfuscated IP address literals is a "happy" accident of the way how inet_addr() is implemented in most systems -- it commonly uses strtol() which has implicite rule of leading '0' meaning octal, unless '0x' which means hexa-decimal... Having stricter parser inside inet_addr() (or its equivalent) would certainly block those obfuscations. The gethostbyname() at many implementations does also call inet_addr() if it can't resolve the input string via DNS lookup. As to IPv6 -- See RFC 2732: Format for Literal IPv6 Addresses in URL's
mass move, v2. qa to me.
+qawanted, mozilla1.0 This should happen only in some versions of Windows, per Sean's comments in bug 12748. If someone has time, can they verify this? I would like this fixed. If someone wants to map 10 digit decimal numbers to IP addresses, they need to register a domain and use that.
*** Bug 73597 has been marked as a duplicate of this bug. ***
RFC 1738 seems to explicitly disallow this format for valid HTTP URLs. RFC 1945 and RFC 2068 defer to this, although 2068 does note that "HTTP proxies may receive requests for URIs not defined by RFC 1738."
CONFIRMED: Linux, Mozilla 0.9.4 This only fails on Mac. If you literalize the URL by putting a "." after the number, DNS does error.
+pp, -qawanted, ALL/ALL.
Change summary to something more descriptive.
Mass removing self from CC list.
Now I feel sumb because I have to add back. Sorry for the spam.
Chimerea accepts this (Chimera 0.3).
I see decimal IP addresses in links every once in a while, most recently on http://www.berrypatch.org/pictures.html. Fixing this bug in order to comply with RFCs would break sites. Does the RFC say that user agents should/must reject addresses that don't match the RFC definition? Is there any benefit to fixing this bug other than RFC compliance? Would canonicalizing the decimal address to xxx.xxx.xxx.xxx form (for the benefit of filtering software) be a reasonable compromise?
The benefit is disallowing the obfuscation of URL's for nefarious purposes. There is no reason to hide a host's identity whatsoever. The spec states that the name SHOULD be checked before being sent to DNS.
I don't see how a simple decimal number is anymore obfuscated than a 4 octet style number. And filing bugs to with the specific intent of breaking links because spammers use them is a futile way to fight spam.
It's not a way to fight spam. It's a way to remove one more little trick in the spammer's toolbox, AND get Mozilla to adhere to the standards that it should be complying with anyway.
Joe: Most humans are using dotted quad addressing for a reason, it is relatively human readable. And almost all interfaces accept this format, OS configs, web sites, even ARIN. Why should vendors and web sites start bolting on more code to support a decimal to dotted quad and 32bit unsigned int just because a couple system API's are too liberal?
Test case links do not work on MacOS 10.2. They do work on Windows 2000 using Mozilla 1.2a. Could someone test other platforms?
This is a testcase that is regularly checked. http://www.mozilla.org/quality/networking/testing/coretests.html Basically, Chimera allows this addressing format as well. I think only Mozilla on Mac OS X ignores it (this bug blocks bug 150966 for chimera).
moving neeti's futured bugs for triaging.
[RFE] is deprecated in favor of severity: enhancement. They have the same meaning.
Mozilla 1.3b for Mac OS X accepts this, so it is a characteristic of mach-o
-mozilla 1.0: long gone -pp: now that mac cfm is gone, all plats do this.
I added the word "dotless" because MS describes it using that term. Here's something interesting to think about. http://www.microsoft.com/technet/treeview/default.asp?url=/technet/security/bulletin/MS01-055.asp I'm new to cookies, so I'm trying to figure out if this matters to us. If anyone can think of a reason this bug would intersect badly w/ cookies, please open a bug in cookies: * The third vulnerability is a new variant of a vulnerability discussed in Microsoft Security Bulletin MS01-051 affecting how IE handles URLs that include dotless IP addresses. If a web site were specified using a dotless IP format (e.g., http://031713501415 rather than http://188.8.131.52), and the request were malformed in a particular way, IE would not recognize that the site was an Internet site. Instead, it would treat the site as an intranet site, and open pages on the site in the Intranet Zone rather than the correct zone. This would allow the site to run with fewer security restrictions than appropriate. This vulnerability does not affect IE 6.
*** Bug 150966 has been marked as a duplicate of this bug. ***
Why has this bug languished so long?
Controversial to spammers and virus writers. Can you attach the code, or email it? I imagine it shouldn't be to hard to use it to validate the URL prior to submitting it from the location bar to Necko, but I don't know without tinkering.
Jerry: see bug 268619 and bug 268893. I think I posted the test harness so there is a file where you can try out any values you want.
*** Bug 358447 has been marked as a duplicate of this bug. ***
This bug reflects a fundamental misunderstanding of what an IP address is. An IP address is a long int. That's it. One big number. The dot-quad version of the long int improves readability, but the long int is in fact a valid form of a valid IP address. One might argue that a large number is harder to remember than a dot-quad version of an IP. To someone making such an argument I would inquire if they know their phone number. Because my system's IP address (1079075330) is no harder to remember than a phone number. "Fixing" this "bug" would cause the browser to behave differently than every other TCP/IP using utility on the system, including ones that fetch web pages (Curl et al.) I would consider firefox not retrieving an address in this format to be a bug. Speaking of which, my OSX version of Firefox does not retrieve an address in this format, and I consider that a bug.
(In reply to comment #40) > This bug reflects a fundamental misunderstanding of what an IP address is. No, it doesn't. There is a format to IP addresses called a dotted quad. Your argument is that since the dotted quad is just a representation of a hexadecimal number, that any representation of the IP address which can be converted to that hexadecimal number should be allowed at the user interface level. Should a word processor allow you to type in hexadecimal, or octal, or even binary? They don't. Why not? It's the same thing. Why don't telephones allow you to dial in octal? The fact is that there are both formal definitions, and conventions. By following numerical addresses not in dotted quad form, Firefox is violating the convention of representing IP addresses as a dotted quad. The brokenness of other products is not a convincing argument for maintaining the brokenness of Firefox. What functionality are you losing by not being able to follow decimal representations of hexadecimal addresses? Is there some reason you cannot use the dotted quad format?
I'm saying that the number IS a valid address and deliberately breaking that functionality breaks an ad-hoc convention that is used by every other TCP/IP client program that I've tested it on, with several different flavors of UNIX. You are intercepting and subverting an underlying capability of the system standard library. I mostly use the capability to demonstrate to programmers new to TCP/IP that an address IS just a number. Being able to browse to a number in that format drives that point home quite effectively.
Pedagogy is a weak justification. (I was taught C using gets(), and it was one of the most insecure functions ever devised.) The RFCs are fairly clear on the correct behavior. If there's a reason to ignore them, it's what Jesse said above -- that some people actually use this stuff for legitimate things.
(In reply to comment #43) > The RFCs are fairly clear on the correct behavior. If there's a reason to > ignore them, it's what Jesse said above -- that some people actually use this > stuff for legitimate things. My position is that you need to do a cost/benefit analysis. What are the costs and benefits associated with each course of action? 1. The cost of leaving it as-is: Spammers and phishers are able to pile on another layer of obfuscation to their sites, making it more difficult for features like Google's anti-phishing or Thunderbird's phishing detection. 2. The cost of fixing it: Some may not be able to use Firefox to demonstrate that an IP address is really just a number that can be represented as a decimal number, or a decimal representation of a DWORD. The judgment of which cost is the greater evil depends on your opinion of the severity of each.
Another cost of leaving is as-is: it makes Firefox appear to differ between operating systems (iirc) Another cost of fixing it: some sites will break. (Perhaps markp could tell us how many.)
Since these IP address formats already don't work in Firefox on some operating systems, I would not expect many sites to break if we were to drop support for them entirely.
I agree with Jesse's comment above. These address formats serve no useful purpose any more, and they interact nastily with external security measures. I added this comment to bug 554596, which might help clarify some of the historical issues here: http://tools.ietf.org/html/draft-main-ipaddr-text-rep-00 -- see section 2.1.1, "Early Practice", which explains how the 4.2BSD inet_aton() became the de-facto standard for IPv4 address interpretation, and that compatibility with this lingers to this day. It concludes: The 4.2BSD inet_aton() has been widely copied and imitated, and so is a de facto standard for the textual representation of IPv4 addresses. Nevertheless, these alternative syntaxes have now fallen out of use (if they ever had significant use). The only practical use that they now see is for deliberate obfuscation of addresses: giving an IPv4 address as a single 32-bit decimal number is favoured among people wishing to conceal the true location that is encoded in a URL. All the forms except for decimal octets are seen as non-standard (despite being quite widely interoperable) and undesirable. http://www.pc-help.org/obscure.htm contains a number of different examples of IP address obfuscation techniques, including uses of the numeric overflows described above.
Also rescued from the comments there, here's another little-known format: Various implementations of inet_aton() have exciting semi-documented features such as two- and three-part dotted numerical addresses, for example: a.b -- 8.24 bits -- example: http://0x42.0x660d63 a.b.c -- 8.8.16 bits -- example: http://0x42.0x66.0x0d63 See http://www.securelist.com/en/blog/148/New_Brazilian_banking_Trojans_recycle_old_URL_obfuscation_tricks for the original test cases.
Ooh looks like Vint Cerf agrees with me! He's the father of the Internet, you know? ;-P http://interviews.slashdot.org/story/11/10/25/1532213/vint-cerf-answers-your-questions-about-ipv6-and-more VC: LOL! actually, most of us assumed that any way to generate the 32 number should be acceptable since the connection process doesn't actually use the text representation of the IP address. I think any value in the range 0 to 2^32-1 should be acceptable as an IP reference. As to stateless operation, I know what you mean; you have to get used to figuring out how to stash intermediate state (cookies usually)...
Bruce, you're conflating IPs and URIs here. The browser's location bar takes a URI, not an IP. The RFC for the URI specifies that the host part may be specified by name or by IP, but prescribes a certain format for the IP. Your question to Vint Cerf conveniently neglected to mention this distinction, and you can't infer from his answer that he actually read this bug report to find out what the issue really was.
13 years later, I rescind comment #3. See http://blogs.msdn.com/b/ieinternals/archive/2014/03/06/browser-arcana-ipv4-ipv6-literal-urls-dotted-va-dotless.aspx . We are sending these unusual formats over the wire, in a header ("Host") which can be used by some for security-related decisions. This is a hostage to fortune; we should certainly stop doing that. Gerv
to be clear the suggestion from the blog in comment 53 is that we translate these addresses into dotted decimal notation, not that we block them. I like that. "OS; one of the first steps that class undertakes when constructing a URL object from a string is to convert any IP literal hostname into its canonical dotted-quad form. Chrome and Opera appear to match Internet Explorer’s behavior here, while Firefox 27 leaves the undotted decimal in the address bar and in the request sent to the network2:"
(In reply to Patrick McManus [:mcmanus] from comment #54) > to be clear the suggestion from the blog in comment 53 is that we translate > these addresses into dotted decimal notation, not that we block them. I like > that. And this suggestion was copied by bug 1063010. At what level do we want to fix this? Note that other browsers even display the corrected address in e.g. tooltips (for in-page links), the URL bar, etc. etc. It's 6pm on a monday and so if having a default-to-on pref for this is what appeases the people who insist on having http://2130706433/ work, I can live with doing that, too.
Who's gonna drive this?