Closed Bug 355181 Opened 18 years ago Closed 16 years ago

net_IsValidHostName() comment says one thing, code does another

Categories

(Core :: Networking, defect, P1)

1.8 Branch
x86
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla1.9beta5

People

(Reporter: usenet, Assigned: michal)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 4 obsolete files)

While checking the DNS code, I discovered that, while code for net_IsValidHostName() in netwerk/dns/src/nsURLHelper.cpp says that it blocks the control characters and !\"#%&'()*,/;<=>?@\\^{|}\x7f
the _code_ appears not to block the right/closing brace character '{' (character code \x7d) as listed in the comment.

If this is deliberate, it should be documented correctly, with an explanation of wht this special character is being allowed. If, as I suspect, it's an accidental ommission, it should be fixed.
Component: General → Networking
Product: Firefox → Core
QA Contact: general → networking
Version: 2.0 Branch → 1.8 Branch
In addition, the backquote character (0x60) appears to have been let through in both the code and the comment. Is there any particular reason why this character should be allowed in DNS lookups, or is it just being let through by default?

I propose that we block it, unless there's a good reason for keeping this in, in which case, again, the reason for doing so should be documented.
Just for completeness sake, I wrote a small program to double-check: the characters being allowed appear to be:

$ + - . 0 1 2 3 4 5 6 7 8 9 : A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ ] _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z }

removing the LDH character set and the dot leaves the following:

$ + : [ ] _ ` }

I understand that some people might be using '$', '_', and '+' in non-RFC-compliant machine names, and that we don't want to break them, and that '[', ':', and ']' are used in IPv6 addresses, and we don't want to break that either.

But I cannot think of any reason to include '`' or '}'.
Currently, the list is phrased as a blacklist, with 50 entries. That means that every character will have to be scanned over all 50 entries.

Given that I had to squint at a table of hex digits to find the above, would be surely be more self-documenting and more secure to have a whitelist, with (I propose) the following 70 entries:

'$+-.0123456789:[]_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

This would also have the advantage of _explicitly_ excluding oddities such as NULs and Unicode characters with codepoints > 128.

In addition, if a linear scan is used, and the characters are in a sensible order (eg lowercase letters before uppercase), then even performance can be slightly better than before, since there is no need to scan through the entire list in order to admit a valid character in the common case, and most searches will terminate after scanning at most 44 characters in the list (and perhaps half that on average). 
do we really need to allow $ ?
The original comment should of course read netwerk/base/src/nsURLHelper.cpp

The attached patch:
* makes the list an explicit whitelist, as per comment #3
* removes the close-curly-brace and backquote characters from the allowed list, as per comment #2 

It seems to work OK for both normal ASCII domain names and IDNs, and detects the characters not in the list.

NB: it has not been smoketested yet.
Following on with the whitelist idea: we probably should have two different whitelists:

For DNS names and IPv4 dotted quads:
   $ + - . 0 1 2 3 4 5 6 7 8 9 _ 
   A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
   a b c d e f g h i j k l m n o p q r s t u v w x y z

For IPv6 literals:
   [ : ] 0 1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F

Doing this will further reduce the number of combinations of characters which might be used for spoofing, particularly in edge cases involving exotic Unicode characters and Unicode normalization interactions. The checks should be carried out in the order given, which will make the common case the fast case.
 
This implements the dual-whitelist approach of comment #6. Note that the earlier changes of disallowing backquote and close-curly-brace still apply.

NB Not smoketested
One more comment: on the face of it, the original blacklist version of the code could, on the face of it, have leaked DNS lookups containing characters with the top bit set, since they were not explicitly forbidden. 

It's important that this should not happen (consider, for example, the possibility of leaking UTF-8 strings) -- the current version fixes this.
If we're going to fix this (and we should) we need to get it into a beta to make sure no one with any traffic is using '[' or the other odd characters. Nominating for blocking.
Flags: blocking1.9?
Priority: -- → P1
Flags: blocking1.9? → blocking1.9+
(And if we don't switch to using a whitelist for Firefox 3, we should land the patch in bug 377808 instead to fix the blacklist.)
Moving to b4 - Michal can you take this?
Target Milestone: --- → mozilla1.9beta4
Assignee: nobody → michal
This patch changes blacklist to whitelist in net_IsValidHostName(). I didn't include second whitelist from patch #241137 because it isn't RFC 4007 compliant. PR_StringToNetAddr() should check it correctly.
Attachment #241132 - Attachment is obsolete: true
Attachment #241137 - Attachment is obsolete: true
Attachment #304724 - Flags: review?(cbiesinger)
Checking strings against set of characters in net_FindCharNotInSet and net_FindCharInSet is highly inefficient. This is probably not a real bottleneck, because it isn't called too often. Checking against some bitmap would be probably ideal but it would be hard to read. What about to introduce ranges? Somehing like:

char *
net_FindCharNotInRangesAndSet(const char *iter, const char *stop, const char *ranges, const char *set);

Checking would be done against ranges first and then against set.

Set "$+-.0123456789_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" can be replaced by ranges "az09-.AZ" and set "$+_" and it should be faster.

Any opinions?
Status: NEW → ASSIGNED
Target Milestone: mozilla1.9beta4 → mozilla1.9
Flags: tracking1.9+ → blocking1.9+
Comment on attachment 304724 [details] [diff] [review]
whitelist for checking host names

+    // Whitelist for DNS names (RFC 1035) with extra characters added 
+    // for pragmatic reasons: this will also match IPv4 dotted quads.

What characters did you add for IPv4 dotted quads?
Attachment #304724 - Flags: review?(cbiesinger) → review+
also, claiming that "the common characters are near the start of the list" and then starting the list with $ doesn't seem right...
(In reply to comment #15)
> What characters did you add for IPv4 dotted quads? 

Good question. I just took the whitelist with comment from previous patch. Every IPv4 dotted quad can be described by characters allowed in RFC1035. Added characters was IMO "$+_" but comment was misleading. In new patch I changed the comment and also changed the order of characters.
Attachment #304724 - Attachment is obsolete: true
Attachment #308398 - Flags: review?(cbiesinger)
Comment on attachment 308398 [details] [diff] [review]
fixed comment and order of some characters in whitelist

+    // willing to send to lower-level DNS logic This is more

missing a dot here

+    // the commonest characters will tend to be near the start of
+    // the list.

the most common characters in a hostname are lowercase alphanumeric letters IMO, not dots and digits.
Attachment #308398 - Flags: review?(cbiesinger) → review+
Attachment #308398 - Attachment is obsolete: true
Attachment #308413 - Flags: review?(cbiesinger)
Keywords: checkin-needed
Comment on attachment 308413 [details] [diff] [review]
fixed according to hints in comment #12

you don't have to ask for review again if I had marked review+ and all you did was fix the things I mentioned
Attachment #308413 - Flags: review?(cbiesinger) → review+
(In reply to comment #20)
> (From update of attachment 308413 [details] [diff] [review])
> you don't have to ask for review again if I had marked review+ and all you did
> was fix the things I mentioned
> 

Sorry, but I added also keywork "checkin-needed" because I don't have checkin privileges. So I also wanted to have + for the final patch. Can somebody check it in, please?
Checking in netwerk/base/src/nsURLHelper.cpp;
/cvsroot/mozilla/netwerk/base/src/nsURLHelper.cpp,v  <--  nsURLHelper.cpp
new revision: 1.71; previous revision: 1.70
done
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: mozilla1.9 → mozilla1.9beta5
My code-reading skills are modest, so I'll ask rather than risk an invalid bug.

I was playing with some ad hoc testing of this function and I found that something like: www.host\.com fails with "hostname not found" error page.

I *think* it's because this function is used like this:

http://mxr.mozilla.org/seamonkey/source/netwerk/dns/src/nsHostResolver.cpp


417     // ensure that we are working with a valid hostname before proceeding.  see
418     // bug 304904 for details.
419     if (!net_IsValidHostName(nsDependentCString(host)))
420         return NS_ERROR_UNKNOWN_HOST;

Should we have a different error conditions for illegal/invalid hostnames?
benc, I think you're right.  The "Try Again" button isn't very useful in that situation either.  Can you file a new bug?
Jesse: Bug 448063
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: