Closed Bug 67730 Opened 23 years ago Closed 5 years ago

Obfuscated "dotless" IP (single large decimal or hexed) addresses shouldn't work

Categories

(Core :: Networking, enhancement, P3)

enhancement

Tracking

()

RESOLVED DUPLICATE of bug 1381139

People

(Reporter: mozilla, Unassigned)

References

Details

(Keywords: sec-low, testcase, Whiteboard: [sg:low] bypass external filters that are unfamiliar with these formats [necko-backlog])

We have all seen the SPAM that comes with URL's obfuscated by changing an IP 
address from it's 4-octet style to a simple decimal number (like 
http://3486011863 instead of http://www.mozilla.org), or in octal (like 
http://00000000317.00000000310.00000000121.00000000327/), or in base 256 
notation (like http://4294967503.4294967496.4294967377.4294967511/).

Why Netscape and Internet Explorer follow these type of URL's is beyond me. It 
would go along way in the battle against spammers to simply have Mozilla check 
if a URL is either a decimal number in quad notation, or a fully qualified 
domain name before it follows it.
I don't think we should block these urls.  It's much easier for me to remember 
2259499800 than it is for me to remember 134.173.59.24, and it isn't much more 
obfuscated.

Converting anything that looks like an IP address to the canonical 
xxx.xxx.xxx.xxx when following a link might make sense, but it could break 
things, and I don't think there would be a great benefit because the 
xxx.xxx.xxx.xxx form doesn't mean anything to most users either.
Remembering a 10 digit number more easily than a set of 4 numbers seems to fly
in the face of all my knowledge about human memory and probably in incidence
terms is less common than odd number bases being used to obfuscate an IP address.

That said, not supporting them would do what?  Ignore them silently, put up a
message box?

S
This must break compliance with at least one RFC, surely? :-) I'm against.

Gerv
Mozilla's current behaviour is clearly in violation of RFC2396, which specifies
that a host name MUST be either a domain name or an IPv4 dotted quad.
RFC 2396 says in section 3.2.2:

   The host is a domain name of a network host, or its IPv4 address as a
   set of four decimal digit groups separated by ".".  Literal IPv6
   addresses are not supported.

      hostport      = host [ ":" port ]
      host          = hostname | IPv4address
      hostname      = *( domainlabel "." ) toplabel [ "." ]
      domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
      toplabel      = alpha | alpha *( alphanum | "-" ) alphanum

and somewhat earlier:

      alphanum = alpha | digit

I believe all digits are legal.
Digits are legal in a host name, so just see if the host portion of a URI is a 
valid dotted quad, and if it's not then just send it through DNS like any other 
host name. That would cause all of the above URI's to fail just as 
http://987.474.264.712 would (unless one of those numbers happens to be a 
hostname on a LAN).
Simple decimal URLs also can allow people to bypass blocked (blacklisted by
filtering software) addresses.  That might be considered a feature, not a bug.
That might be a beneficial side effect of Mozilla following obfuscated URL's, 
but I think that by and large it [following obfuscated URI's] is not helpful. It 
shouldn't be the job of Mozilla to ensure that it gives people a way to bypass 
filtering software. What it is commonly used for prevents many users from 
determining which domain, or IP subnet, a spam-vertized host is on. A lot of 
users are now clueful enough to send LARTs to abuse@domain.com, and a few to do 
a whois on the IP. I don't think there are any legitimate reasons to follow 
these type of URI's (although I disagree with the principle of censorware, it is 
technically not legit to bypass them on system where the admin has installed 
them).
All digits are legal but a hostname made up only of digits isn't.  Look at that
grammar more closely.

      host          = hostname | IPv4address
      hostname      = *( domainlabel "." ) toplabel [ "." ]
      domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
      toplabel      = alpha | alpha *( alphanum | "-" ) alphanum

hostname must contain a toplabel, and toplabel must begin with an alpha. If the
URI is all numbers and it's not decimal IPv4 dotted quad, it's illegal.
OK, I'm usually wrong with respect to grammar.

Jerry: what do you mean by "just send it through DNS?" You really have
to use the OS equivalent of gethostbyname. On Linux, gethostbyname
resolves all-digit names.

It concerns me that the current behavior is pervasive. Perhaps there is
some obscure RFC that demands it. This needs investigation.
Target Milestone: --- → Future
Parsing octal-obfuscated IP address literals is a "happy" accident of the way
how inet_addr() is implemented in most systems -- it commonly uses strtol()
which has implicite rule of leading '0' meaning octal, unless '0x' which means
hexa-decimal...

Having stricter parser  inside  inet_addr()  (or its equivalent) would certainly
block those obfuscations.  The  gethostbyname() at many implementations does
also call  inet_addr() if it can't resolve the input string via DNS lookup.

As to IPv6 -- See RFC 2732:  Format for Literal IPv6 Addresses in URL's
mass move, v2.
qa to me.
QA Contact: tever → benc
+qawanted, mozilla1.0

This should happen only in some versions of Windows, per Sean's comments in bug
12748. If someone has time, can they verify this?

I would like this fixed. If someone wants to map 10 digit decimal numbers to IP
addresses, they need to register a domain and use that.
Keywords: mozilla1.0, qawanted
Summary: [RFE] Mozilla should not follow obfuscated URL's → [RFE] IP addresses should not work if they are in decimal
*** Bug 73597 has been marked as a duplicate of this bug. ***
RFC 1738 seems to explicitly disallow this format for valid HTTP URLs.  RFC 1945
and RFC 2068 defer to this, although 2068 does note that "HTTP proxies may
receive requests for URIs not defined by RFC 1738."

CONFIRMED:
Linux, Mozilla 0.9.4

This only fails on Mac.

If you literalize the URL by putting a "." after the number, DNS does error.
+pp, -qawanted, ALL/ALL.
Keywords: qawantedpp
OS: Windows 2000 → All
Hardware: PC → All
Change summary to something more descriptive.
Summary: [RFE] IP addresses should not work if they are in decimal → [RFE] Obfuscated IP addresses shouldn't work
Mass removing self from CC list.
Now I feel sumb because I have to add back. Sorry for the spam.
Chimerea accepts this (Chimera 0.3).
Summary: [RFE] Obfuscated IP addresses shouldn't work → [RFE] Obfuscated IP (single large decimal or hexed) addresses shouldn't work
Blocks: 150966
I see decimal IP addresses in links every once in a while, most recently on
http://www.berrypatch.org/pictures.html.  Fixing this bug in order to comply
with RFCs would break sites.  Does the RFC say that user agents should/must
reject addresses that don't match the RFC definition?  Is there any benefit to
fixing this bug other than RFC compliance?  Would canonicalizing the decimal
address to xxx.xxx.xxx.xxx form (for the benefit of filtering software) be a
reasonable compromise?
The benefit is disallowing the obfuscation of URL's for nefarious purposes.
There is no reason to hide a host's identity whatsoever. The spec states that
the name SHOULD be checked before being sent to DNS.
I don't see how a simple decimal number is anymore obfuscated than a 4 octet
style number. And filing bugs to with the specific intent of breaking links
because spammers use them is a futile way to fight spam.
It's not a way to fight spam. It's a way to remove one more little trick in the
spammer's toolbox, AND get Mozilla to adhere to the standards that it should be
complying with anyway.
Joe:

Most humans are using dotted quad addressing for a reason, it is relatively
human readable. And almost all interfaces accept this format, OS configs, web
sites, even ARIN. Why should vendors and web sites start bolting on more code to
support a decimal to dotted quad and 32bit unsigned int just because a couple
system API's are too liberal?
Test case links do not work on MacOS 10.2.  They do work on Windows 2000 using
Mozilla 1.2a.  Could someone test other platforms?
This is a testcase that is regularly checked.

http://www.mozilla.org/quality/networking/testing/coretests.html

Basically, Chimera allows this addressing format as well. I think only Mozilla
on Mac OS X ignores it (this bug blocks bug 150966 for chimera).
moving neeti's futured bugs for triaging.
Assignee: neeti → new-network-bugs
[RFE] is deprecated in favor of severity: enhancement.  They have the same meaning.
Severity: normal → enhancement
Summary: [RFE] Obfuscated IP (single large decimal or hexed) addresses shouldn't work → Obfuscated IP (single large decimal or hexed) addresses shouldn't work
Mozilla 1.3b for Mac OS X accepts this, so it is a characteristic of mach-o
No longer blocks: 150966
-mozilla 1.0: long gone
-pp: now that mac cfm is gone, all plats do this.
Keywords: mozilla1.0, pp
Keywords: testcase
I added the word "dotless" because MS describes it using that term.

Here's something interesting to think about. 

http://www.microsoft.com/technet/treeview/default.asp?url=/technet/security/bulletin/MS01-055.asp

I'm new to cookies, so I'm trying to figure out if this matters to us. If anyone
can think of a reason this bug would intersect badly w/ cookies, please open a
bug in cookies:

* The third vulnerability is a new variant of a vulnerability discussed in
Microsoft Security Bulletin MS01-051 affecting how IE handles URLs that include
dotless IP addresses. If a web site were specified using a dotless IP format
(e.g., http://031713501415 rather than http://207.46.131.13), and the request
were malformed in a particular way, IE would not recognize that the site was an
Internet site. Instead, it would treat the site as an intranet site, and open
pages on the site in the Intranet Zone rather than the correct zone. This would
allow the site to run with fewer security restrictions than appropriate. This
vulnerability does not affect IE 6. 
Summary: Obfuscated IP (single large decimal or hexed) addresses shouldn't work → Obfuscated "dotless" IP (single large decimal or hexed) addresses shouldn't work
*** Bug 150966 has been marked as a duplicate of this bug. ***
Why has this bug languished so long?
I've written and submitted javascript functions that do strict IPv4 and DNS FQDN
validation. If you hooked them into the URL parser, you could reject a lot of
this stuff. You can see the work in bug 273097 and bug 268893.

This would be pretty controversial, and would need to be a pref. It would also
need to be modernized to include IPv6 and IDN, I'm focused on a certain level of
base functionality.

Controversial to spammers and virus writers. Can you attach the code, or email
it? I imagine it shouldn't be to hard to use it to validate the URL prior to
submitting it from the location bar to Necko, but I don't know without tinkering.
Jerry: see bug 268619 and bug 268893.

I think I posted the test harness so there is a file where you can try out any
values you want.
*** Bug 358447 has been marked as a duplicate of this bug. ***
This bug reflects a fundamental misunderstanding of what an IP address is. An IP address is a long int. That's it. One big number. The dot-quad version of the long int improves readability, but the long int is in fact a valid form of a valid IP address.

One might argue that a large number is harder to remember than a dot-quad version of an IP. To someone making such an argument I would inquire if they know their phone number. Because my system's IP address (1079075330) is no harder to remember than a phone number.

"Fixing" this "bug" would cause the browser to behave differently than every other TCP/IP using utility on the system, including ones that fetch web pages (Curl et al.) I would consider firefox not retrieving an address in this format to be a bug.

Speaking of which, my OSX version of Firefox does not retrieve an address in this format, and I consider that a bug.
(In reply to comment #40)
> This bug reflects a fundamental misunderstanding of what an IP address is.

No, it doesn't. There is a format to IP addresses called a dotted quad. Your argument is that since the dotted quad is just a representation of a hexadecimal number, that any representation of the IP address which can be converted to that hexadecimal number should be allowed at the user interface level. Should a word processor allow you to type in hexadecimal, or octal, or even binary? They don't. Why not? It's the same thing. Why don't telephones allow you to dial in octal? The fact is that there are both formal definitions, and conventions. By following numerical addresses not in dotted quad form, Firefox is violating the convention of representing IP addresses as a dotted quad. The brokenness of other products is not a convincing argument for maintaining the brokenness of Firefox.

What functionality are you losing by not being able to follow decimal representations of hexadecimal addresses? Is there some reason you cannot use the dotted quad format?
I'm saying that the number IS a valid address and deliberately breaking that functionality breaks an ad-hoc convention that is used by every other TCP/IP client program that I've tested it on, with several different flavors of UNIX. You are intercepting and subverting an underlying capability of the system standard library.

I mostly use the capability to demonstrate to programmers new to TCP/IP that an address IS just a number. Being able to browse to a number in that format drives that point home quite effectively.
Pedagogy is a weak justification.  (I was taught C using gets(), and it was one of the most insecure functions ever devised.)

The RFCs are fairly clear on the correct behavior.  If there's a reason to ignore them, it's what Jesse said above -- that some people actually use this stuff for legitimate things.
(In reply to comment #43)
> The RFCs are fairly clear on the correct behavior.  If there's a reason to
> ignore them, it's what Jesse said above -- that some people actually use this
> stuff for legitimate things.

My position is that you need to do a cost/benefit analysis. What are the costs and benefits associated with each course of action?

1. The cost of leaving it as-is: Spammers and phishers are able to pile on another layer of obfuscation to their sites, making it more difficult for features like Google's anti-phishing or Thunderbird's phishing detection.

2. The cost of fixing it: Some may not be able to use Firefox to demonstrate that an IP address is really just a number that can be represented as a decimal number, or a decimal representation of a DWORD.

The judgment of which cost is the greater evil depends on your opinion of the severity of each.
Another cost of leaving is as-is: it makes Firefox appear to differ between operating systems (iirc)

Another cost of fixing it: some sites will break.  (Perhaps markp could tell us how many.)
Depends on: 430273
Since these IP address formats already don't work in Firefox on some operating systems, I would not expect many sites to break if we were to drop support for them entirely.
Whiteboard: [sg:low] bypass external filters that are unfamiliar with these formats
I agree with Jesse's comment above. These address formats serve no useful purpose any more, and they interact nastily with external security measures.

I added this comment to bug 554596, which might help clarify some of the historical issues here:

http://tools.ietf.org/html/draft-main-ipaddr-text-rep-00 -- see section 2.1.1,
"Early Practice", which explains how the 4.2BSD inet_aton() became the de-facto
standard for IPv4 address interpretation, and that compatibility with this
lingers to this day. It concludes:


   The 4.2BSD inet_aton() has been widely copied and imitated, and so is
   a de facto standard for the textual representation of IPv4 addresses.
   Nevertheless, these alternative syntaxes have now fallen out of use
   (if they ever had significant use).  The only practical use that they
   now see is for deliberate obfuscation of addresses: giving an IPv4
   address as a single 32-bit decimal number is favoured among people
   wishing to conceal the true location that is encoded in a URL.  All
   the forms except for decimal octets are seen as non-standard (despite
   being quite widely interoperable) and undesirable.

http://www.pc-help.org/obscure.htm contains a number of different examples of
IP address obfuscation techniques, including uses of the numeric overflows
described above.
Also rescued from the comments there, here's another little-known format:

Various implementations of inet_aton() have exciting semi-documented features such as two- and three-part dotted numerical addresses, for example:

 a.b -- 8.24 bits -- example: http://0x42.0x660d63 
 a.b.c -- 8.8.16 bits -- example: http://0x42.0x66.0x0d63

See http://www.securelist.com/en/blog/148/New_Brazilian_banking_Trojans_recycle_old_URL_obfuscation_tricks for the original test cases.
Assignee: general → nobody
QA Contact: benc → networking
Target Milestone: Future → ---
Ooh looks like Vint Cerf agrees with me! He's the father of the Internet, you know? ;-P

http://interviews.slashdot.org/story/11/10/25/1532213/vint-cerf-answers-your-questions-about-ipv6-and-more

VC: LOL! actually, most of us assumed that any way to generate the 32 number should be acceptable since the connection process doesn't actually use the text representation of the IP address. I think any value in the range 0 to 2^32-1 should be acceptable as an IP reference. As to stateless operation, I know what you mean; you have to get used to figuring out how to stash intermediate state (cookies usually)...
Bruce, you're conflating IPs and URIs here.  The browser's location bar takes a URI, not an IP.  The RFC for the URI specifies that the host part may be specified by name or by IP, but prescribes a certain format for the IP.  Your question to Vint Cerf conveniently neglected to mention this distinction, and you can't infer from his answer that he actually read this bug report to find out what the issue really was.
13 years later, I rescind comment #3.

See http://blogs.msdn.com/b/ieinternals/archive/2014/03/06/browser-arcana-ipv4-ipv6-literal-urls-dotted-va-dotless.aspx . We are sending these unusual formats over the wire, in a header ("Host") which can be used by some for security-related decisions. This is a hostage to fortune; we should certainly stop doing that.

Gerv
to be clear the suggestion from the blog in comment 53 is that we translate these addresses into dotted decimal notation, not that we block them. I like that.

"OS; one of the first steps that class undertakes when constructing a URL object from a string is to convert any IP literal hostname into its canonical dotted-quad form. Chrome and Opera appear to match Internet Explorer’s behavior here, while Firefox 27 leaves the undotted decimal in the address bar and in the request sent to the network2:"
(In reply to Patrick McManus [:mcmanus] from comment #54)
> to be clear the suggestion from the blog in comment 53 is that we translate
> these addresses into dotted decimal notation, not that we block them. I like
> that.

And this suggestion was copied by bug 1063010.

At what level do we want to fix this? Note that other browsers even display the corrected address in e.g. tooltips (for in-page links), the URL bar, etc. etc.

It's 6pm on a monday and so if having a default-to-on pref for this is what appeases the people who insist on having http://2130706433/ work, I can live with doing that, too.
Who's gonna drive this?
Whiteboard: [sg:low] bypass external filters that are unfamiliar with these formats → [sg:low] bypass external filters that are unfamiliar with these formats [necko-backlog]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3

this is also an issue with the IPv4 concept parser
bug 1381139
"feature" - https://bugzilla.mozilla.org/show_bug.cgi?id=1288049

http://10.0.514
resolves to
http://10.0.2.2

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → DUPLICATE

@annevk this should not be a duplicate
bug 1381139 is a subset of this bug, should be marked duplicate the other way.

This bug should be kept open.

You need to log in before you can comment on or make changes to this bug.