Last Comment Bug 67730 - Obfuscated "dotless" IP (single large decimal or hexed) addresses shouldn't work
: Obfuscated "dotless" IP (single large decimal or hexed) addresses shouldn't work
Status: NEW
[sg:low] bypass external filters that...
: sec-low, testcase
Product: Core
Classification: Components
Component: Networking (show other bugs)
: Trunk
: All All
: -- enhancement with 7 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
: 73597 150966 160819 358447 430273 554596 1157388 (view as bug list)
Depends on: 430273
Blocks:
  Show dependency treegraph
 
Reported: 2001-02-05 16:35 PST by Jerry Baker
Modified: 2016-06-21 08:35 PDT (History)
32 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments

Description Jerry Baker 2001-02-05 16:35:09 PST
We have all seen the SPAM that comes with URL's obfuscated by changing an IP 
address from it's 4-octet style to a simple decimal number (like 
http://3486011863 instead of http://www.mozilla.org), or in octal (like 
http://00000000317.00000000310.00000000121.00000000327/), or in base 256 
notation (like http://4294967503.4294967496.4294967377.4294967511/).

Why Netscape and Internet Explorer follow these type of URL's is beyond me. It 
would go along way in the battle against spammers to simply have Mozilla check 
if a URL is either a decimal number in quad notation, or a fully qualified 
domain name before it follows it.
Comment 1 Jesse Ruderman 2001-02-05 17:43:55 PST
I don't think we should block these urls.  It's much easier for me to remember 
2259499800 than it is for me to remember 134.173.59.24, and it isn't much more 
obfuscated.

Converting anything that looks like an IP address to the canonical 
xxx.xxx.xxx.xxx when following a link might make sense, but it could break 
things, and I don't think there would be a great benefit because the 
xxx.xxx.xxx.xxx form doesn't mean anything to most users either.
Comment 2 Simon Lucy 2001-02-05 17:55:13 PST
Remembering a 10 digit number more easily than a set of 4 numbers seems to fly
in the face of all my knowledge about human memory and probably in incidence
terms is less common than odd number bases being used to obfuscate an IP address.

That said, not supporting them would do what?  Ignore them silently, put up a
message box?

S
Comment 3 Gervase Markham [:gerv] 2001-02-06 00:26:33 PST
This must break compliance with at least one RFC, surely? :-) I'm against.

Gerv
Comment 4 Doug Sheppard 2001-02-06 02:52:34 PST
Mozilla's current behaviour is clearly in violation of RFC2396, which specifies
that a host name MUST be either a domain name or an IPv4 dotted quad.
Comment 5 tenthumbs 2001-02-06 06:37:15 PST
RFC 2396 says in section 3.2.2:

   The host is a domain name of a network host, or its IPv4 address as a
   set of four decimal digit groups separated by ".".  Literal IPv6
   addresses are not supported.

      hostport      = host [ ":" port ]
      host          = hostname | IPv4address
      hostname      = *( domainlabel "." ) toplabel [ "." ]
      domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
      toplabel      = alpha | alpha *( alphanum | "-" ) alphanum

and somewhat earlier:

      alphanum = alpha | digit

I believe all digits are legal.
Comment 6 Jerry Baker 2001-02-06 06:48:17 PST
Digits are legal in a host name, so just see if the host portion of a URI is a 
valid dotted quad, and if it's not then just send it through DNS like any other 
host name. That would cause all of the above URI's to fail just as 
http://987.474.264.712 would (unless one of those numbers happens to be a 
hostname on a LAN).
Comment 7 Patrick Lam 2001-02-06 09:12:44 PST
Simple decimal URLs also can allow people to bypass blocked (blacklisted by
filtering software) addresses.  That might be considered a feature, not a bug.
Comment 8 Jerry Baker 2001-02-06 09:24:27 PST
That might be a beneficial side effect of Mozilla following obfuscated URL's, 
but I think that by and large it [following obfuscated URI's] is not helpful. It 
shouldn't be the job of Mozilla to ensure that it gives people a way to bypass 
filtering software. What it is commonly used for prevents many users from 
determining which domain, or IP subnet, a spam-vertized host is on. A lot of 
users are now clueful enough to send LARTs to abuse@domain.com, and a few to do 
a whois on the IP. I don't think there are any legitimate reasons to follow 
these type of URI's (although I disagree with the principle of censorware, it is 
technically not legit to bypass them on system where the admin has installed 
them).
Comment 9 Doug Sheppard 2001-02-06 14:51:20 PST
All digits are legal but a hostname made up only of digits isn't.  Look at that
grammar more closely.

      host          = hostname | IPv4address
      hostname      = *( domainlabel "." ) toplabel [ "." ]
      domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
      toplabel      = alpha | alpha *( alphanum | "-" ) alphanum

hostname must contain a toplabel, and toplabel must begin with an alpha. If the
URI is all numbers and it's not decimal IPv4 dotted quad, it's illegal.
Comment 10 tenthumbs 2001-02-08 10:58:13 PST
OK, I'm usually wrong with respect to grammar.

Jerry: what do you mean by "just send it through DNS?" You really have
to use the OS equivalent of gethostbyname. On Linux, gethostbyname
resolves all-digit names.

It concerns me that the current behavior is pervasive. Perhaps there is
some obscure RFC that demands it. This needs investigation.
Comment 11 Matti Aarnio 2001-04-10 09:53:09 PDT
Parsing octal-obfuscated IP address literals is a "happy" accident of the way
how inet_addr() is implemented in most systems -- it commonly uses strtol()
which has implicite rule of leading '0' meaning octal, unless '0x' which means
hexa-decimal...

Having stricter parser  inside  inet_addr()  (or its equivalent) would certainly
block those obfuscations.  The  gethostbyname() at many implementations does
also call  inet_addr() if it can't resolve the input string via DNS lookup.

As to IPv6 -- See RFC 2732:  Format for Literal IPv6 Addresses in URL's
Comment 12 benc 2001-05-23 12:40:03 PDT
mass move, v2.
qa to me.
Comment 13 benc 2001-05-24 10:49:26 PDT
+qawanted, mozilla1.0

This should happen only in some versions of Windows, per Sean's comments in bug
12748. If someone has time, can they verify this?

I would like this fixed. If someone wants to map 10 digit decimal numbers to IP
addresses, they need to register a domain and use that.
Comment 14 benc 2001-06-11 14:49:50 PDT
*** Bug 73597 has been marked as a duplicate of this bug. ***
Comment 15 Chase Tingley 2001-07-06 10:43:53 PDT
RFC 1738 seems to explicitly disallow this format for valid HTTP URLs.  RFC 1945
and RFC 2068 defer to this, although 2068 does note that "HTTP proxies may
receive requests for URIs not defined by RFC 1738."

Comment 16 benc 2001-10-04 16:31:50 PDT
CONFIRMED:
Linux, Mozilla 0.9.4

This only fails on Mac.

If you literalize the URL by putting a "." after the number, DNS does error.
Comment 17 benc 2002-04-26 12:20:06 PDT
+pp, -qawanted, ALL/ALL.
Comment 18 Jerry Baker 2002-04-26 17:49:40 PDT
Change summary to something more descriptive.
Comment 19 Jerry Baker 2002-05-27 14:39:54 PDT
Mass removing self from CC list.
Comment 20 Jerry Baker 2002-05-27 15:06:12 PDT
Now I feel sumb because I have to add back. Sorry for the spam.
Comment 21 benc 2002-07-01 19:19:46 PDT
Chimerea accepts this (Chimera 0.3).
Comment 22 Jesse Ruderman 2002-07-18 11:04:18 PDT
I see decimal IP addresses in links every once in a while, most recently on
http://www.berrypatch.org/pictures.html.  Fixing this bug in order to comply
with RFCs would break sites.  Does the RFC say that user agents should/must
reject addresses that don't match the RFC definition?  Is there any benefit to
fixing this bug other than RFC compliance?  Would canonicalizing the decimal
address to xxx.xxx.xxx.xxx form (for the benefit of filtering software) be a
reasonable compromise?
Comment 23 Jerry Baker 2002-07-18 11:13:31 PDT
The benefit is disallowing the obfuscation of URL's for nefarious purposes.
There is no reason to hide a host's identity whatsoever. The spec states that
the name SHOULD be checked before being sent to DNS.
Comment 24 Joseph Elwell 2002-09-21 19:11:19 PDT
I don't see how a simple decimal number is anymore obfuscated than a 4 octet
style number. And filing bugs to with the specific intent of breaking links
because spammers use them is a futile way to fight spam.
Comment 25 Jerry Baker 2002-09-21 19:20:36 PDT
It's not a way to fight spam. It's a way to remove one more little trick in the
spammer's toolbox, AND get Mozilla to adhere to the standards that it should be
complying with anyway.
Comment 26 benc 2002-09-22 20:40:30 PDT
Joe:

Most humans are using dotted quad addressing for a reason, it is relatively
human readable. And almost all interfaces accept this format, OS configs, web
sites, even ARIN. Why should vendors and web sites start bolting on more code to
support a decimal to dotted quad and 32bit unsigned int just because a couple
system API's are too liberal?
Comment 27 John G. Myers 2002-09-22 21:57:26 PDT
Test case links do not work on MacOS 10.2.  They do work on Windows 2000 using
Mozilla 1.2a.  Could someone test other platforms?
Comment 28 benc 2002-09-23 09:33:38 PDT
This is a testcase that is regularly checked.

http://www.mozilla.org/quality/networking/testing/coretests.html

Basically, Chimera allows this addressing format as well. I think only Mozilla
on Mac OS X ignores it (this bug blocks bug 150966 for chimera).
Comment 29 Doug Turner (:dougt) 2002-10-01 13:02:58 PDT
moving neeti's futured bugs for triaging.
Comment 30 Brant Gurganus 2002-10-13 11:30:08 PDT
[RFE] is deprecated in favor of severity: enhancement.  They have the same meaning.
Comment 31 benc 2003-02-12 10:13:13 PST
Mozilla 1.3b for Mac OS X accepts this, so it is a characteristic of mach-o
Comment 32 benc 2003-02-20 08:46:21 PST
-mozilla 1.0: long gone
-pp: now that mac cfm is gone, all plats do this.
Comment 33 benc 2003-03-19 12:07:22 PST
I added the word "dotless" because MS describes it using that term.

Here's something interesting to think about. 

http://www.microsoft.com/technet/treeview/default.asp?url=/technet/security/bulletin/MS01-055.asp

I'm new to cookies, so I'm trying to figure out if this matters to us. If anyone
can think of a reason this bug would intersect badly w/ cookies, please open a
bug in cookies:

* The third vulnerability is a new variant of a vulnerability discussed in
Microsoft Security Bulletin MS01-051 affecting how IE handles URLs that include
dotless IP addresses. If a web site were specified using a dotless IP format
(e.g., http://031713501415 rather than http://207.46.131.13), and the request
were malformed in a particular way, IE would not recognize that the site was an
Internet site. Instead, it would treat the site as an intranet site, and open
pages on the site in the Intranet Zone rather than the correct zone. This would
allow the site to run with fewer security restrictions than appropriate. This
vulnerability does not affect IE 6. 
Comment 34 benc 2003-03-27 14:54:20 PST
*** Bug 150966 has been marked as a duplicate of this bug. ***
Comment 35 Simon Fraser 2003-03-27 15:34:20 PST
Why has this bug languished so long?
Comment 36 benc 2005-03-18 02:29:22 PST
I've written and submitted javascript functions that do strict IPv4 and DNS FQDN
validation. If you hooked them into the URL parser, you could reject a lot of
this stuff. You can see the work in bug 273097 and bug 268893.

This would be pretty controversial, and would need to be a pref. It would also
need to be modernized to include IPv6 and IDN, I'm focused on a certain level of
base functionality.

Comment 37 Jerry Baker 2005-03-18 10:06:04 PST
Controversial to spammers and virus writers. Can you attach the code, or email
it? I imagine it shouldn't be to hard to use it to validate the URL prior to
submitting it from the location bar to Necko, but I don't know without tinkering.
Comment 38 benc 2005-03-22 20:43:30 PST
Jerry: see bug 268619 and bug 268893.

I think I posted the test harness so there is a file where you can try out any
values you want.
Comment 39 Jesse Ruderman 2006-10-28 01:42:45 PDT
*** Bug 358447 has been marked as a duplicate of this bug. ***
Comment 40 Bruce Ide 2008-02-22 13:13:44 PST
This bug reflects a fundamental misunderstanding of what an IP address is. An IP address is a long int. That's it. One big number. The dot-quad version of the long int improves readability, but the long int is in fact a valid form of a valid IP address.

One might argue that a large number is harder to remember than a dot-quad version of an IP. To someone making such an argument I would inquire if they know their phone number. Because my system's IP address (1079075330) is no harder to remember than a phone number.

"Fixing" this "bug" would cause the browser to behave differently than every other TCP/IP using utility on the system, including ones that fetch web pages (Curl et al.) I would consider firefox not retrieving an address in this format to be a bug.

Speaking of which, my OSX version of Firefox does not retrieve an address in this format, and I consider that a bug.
Comment 41 Jerry Baker 2008-02-22 15:43:38 PST
(In reply to comment #40)
> This bug reflects a fundamental misunderstanding of what an IP address is.

No, it doesn't. There is a format to IP addresses called a dotted quad. Your argument is that since the dotted quad is just a representation of a hexadecimal number, that any representation of the IP address which can be converted to that hexadecimal number should be allowed at the user interface level. Should a word processor allow you to type in hexadecimal, or octal, or even binary? They don't. Why not? It's the same thing. Why don't telephones allow you to dial in octal? The fact is that there are both formal definitions, and conventions. By following numerical addresses not in dotted quad form, Firefox is violating the convention of representing IP addresses as a dotted quad. The brokenness of other products is not a convincing argument for maintaining the brokenness of Firefox.

What functionality are you losing by not being able to follow decimal representations of hexadecimal addresses? Is there some reason you cannot use the dotted quad format?
Comment 42 Bruce Ide 2008-02-25 08:39:40 PST
I'm saying that the number IS a valid address and deliberately breaking that functionality breaks an ad-hoc convention that is used by every other TCP/IP client program that I've tested it on, with several different flavors of UNIX. You are intercepting and subverting an underlying capability of the system standard library.

I mostly use the capability to demonstrate to programmers new to TCP/IP that an address IS just a number. Being able to browse to a number in that format drives that point home quite effectively.
Comment 43 Chase Tingley 2008-02-26 18:52:36 PST
Pedagogy is a weak justification.  (I was taught C using gets(), and it was one of the most insecure functions ever devised.)

The RFCs are fairly clear on the correct behavior.  If there's a reason to ignore them, it's what Jesse said above -- that some people actually use this stuff for legitimate things.
Comment 44 Jerry Baker 2008-02-26 21:24:36 PST
(In reply to comment #43)
> The RFCs are fairly clear on the correct behavior.  If there's a reason to
> ignore them, it's what Jesse said above -- that some people actually use this
> stuff for legitimate things.

My position is that you need to do a cost/benefit analysis. What are the costs and benefits associated with each course of action?

1. The cost of leaving it as-is: Spammers and phishers are able to pile on another layer of obfuscation to their sites, making it more difficult for features like Google's anti-phishing or Thunderbird's phishing detection.

2. The cost of fixing it: Some may not be able to use Firefox to demonstrate that an IP address is really just a number that can be represented as a decimal number, or a decimal representation of a DWORD.

The judgment of which cost is the greater evil depends on your opinion of the severity of each.
Comment 45 Jesse Ruderman 2008-02-27 00:34:07 PST
Another cost of leaving is as-is: it makes Firefox appear to differ between operating systems (iirc)

Another cost of fixing it: some sites will break.  (Perhaps markp could tell us how many.)
Comment 46 Aakash Desai [:aakashd] 2009-06-02 13:53:11 PDT
*** Bug 430273 has been marked as a duplicate of this bug. ***
Comment 47 Jesse Ruderman 2010-04-14 16:34:18 PDT
*** Bug 554596 has been marked as a duplicate of this bug. ***
Comment 48 Jesse Ruderman 2010-04-14 16:42:09 PDT
Since these IP address formats already don't work in Firefox on some operating systems, I would not expect many sites to break if we were to drop support for them entirely.
Comment 49 Neil Harris 2010-04-14 16:58:42 PDT
I agree with Jesse's comment above. These address formats serve no useful purpose any more, and they interact nastily with external security measures.

I added this comment to bug 554596, which might help clarify some of the historical issues here:

http://tools.ietf.org/html/draft-main-ipaddr-text-rep-00 -- see section 2.1.1,
"Early Practice", which explains how the 4.2BSD inet_aton() became the de-facto
standard for IPv4 address interpretation, and that compatibility with this
lingers to this day. It concludes:


   The 4.2BSD inet_aton() has been widely copied and imitated, and so is
   a de facto standard for the textual representation of IPv4 addresses.
   Nevertheless, these alternative syntaxes have now fallen out of use
   (if they ever had significant use).  The only practical use that they
   now see is for deliberate obfuscation of addresses: giving an IPv4
   address as a single 32-bit decimal number is favoured among people
   wishing to conceal the true location that is encoded in a URL.  All
   the forms except for decimal octets are seen as non-standard (despite
   being quite widely interoperable) and undesirable.

http://www.pc-help.org/obscure.htm contains a number of different examples of
IP address obfuscation techniques, including uses of the numeric overflows
described above.
Comment 50 Neil Harris 2010-04-14 17:04:19 PDT
Also rescued from the comments there, here's another little-known format:

Various implementations of inet_aton() have exciting semi-documented features such as two- and three-part dotted numerical addresses, for example:

 a.b -- 8.24 bits -- example: http://0x42.0x660d63 
 a.b.c -- 8.8.16 bits -- example: http://0x42.0x66.0x0d63

See http://www.securelist.com/en/blog/148/New_Brazilian_banking_Trojans_recycle_old_URL_obfuscation_tricks for the original test cases.
Comment 51 Bruce Ide 2011-10-25 09:57:23 PDT
Ooh looks like Vint Cerf agrees with me! He's the father of the Internet, you know? ;-P

http://interviews.slashdot.org/story/11/10/25/1532213/vint-cerf-answers-your-questions-about-ipv6-and-more

VC: LOL! actually, most of us assumed that any way to generate the 32 number should be acceptable since the connection process doesn't actually use the text representation of the IP address. I think any value in the range 0 to 2^32-1 should be acceptable as an IP reference. As to stateless operation, I know what you mean; you have to get used to figuring out how to stash intermediate state (cookies usually)...
Comment 52 Tristan Miller 2011-10-26 01:16:46 PDT
Bruce, you're conflating IPs and URIs here.  The browser's location bar takes a URI, not an IP.  The RFC for the URI specifies that the host part may be specified by name or by IP, but prescribes a certain format for the IP.  Your question to Vint Cerf conveniently neglected to mention this distinction, and you can't infer from his answer that he actually read this bug report to find out what the issue really was.
Comment 53 Gervase Markham [:gerv] 2014-03-11 03:11:49 PDT
13 years later, I rescind comment #3.

See http://blogs.msdn.com/b/ieinternals/archive/2014/03/06/browser-arcana-ipv4-ipv6-literal-urls-dotted-va-dotless.aspx . We are sending these unusual formats over the wire, in a header ("Host") which can be used by some for security-related decisions. This is a hostage to fortune; we should certainly stop doing that.

Gerv
Comment 54 Patrick McManus [:mcmanus] 2014-03-11 05:11:38 PDT
to be clear the suggestion from the blog in comment 53 is that we translate these addresses into dotted decimal notation, not that we block them. I like that.

"OS; one of the first steps that class undertakes when constructing a URL object from a string is to convert any IP literal hostname into its canonical dotted-quad form. Chrome and Opera appear to match Internet Explorer’s behavior here, while Firefox 27 leaves the undotted decimal in the address bar and in the request sent to the network2:"
Comment 55 :Gijs Kruitbosch 2014-09-08 10:01:39 PDT
*** Bug 1063010 has been marked as a duplicate of this bug. ***
Comment 56 :Gijs Kruitbosch 2014-09-08 10:06:39 PDT
(In reply to Patrick McManus [:mcmanus] from comment #54)
> to be clear the suggestion from the blog in comment 53 is that we translate
> these addresses into dotted decimal notation, not that we block them. I like
> that.

And this suggestion was copied by bug 1063010.

At what level do we want to fix this? Note that other browsers even display the corrected address in e.g. tooltips (for in-page links), the URL bar, etc. etc.

It's 6pm on a monday and so if having a default-to-on pref for this is what appeases the people who insist on having http://2130706433/ work, I can live with doing that, too.
Comment 57 Florian Bender 2014-11-22 08:31:55 PST
Who's gonna drive this?
Comment 58 Kevin Brosnan [:kbrosnan] 2015-04-22 12:19:00 PDT
*** Bug 1157388 has been marked as a duplicate of this bug. ***
Comment 59 Patrick McManus [:mcmanus] 2015-12-14 14:30:17 PST
*** Bug 160819 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.