Closed Bug 110418 Opened 23 years ago Closed 23 years ago

Stealing cookies using %00


(Core :: Networking: Cookies, defect)

Not set





(Reporter: security-bugs, Assigned: morse)


(Whiteboard: [PDT+] [ETA 11.19] [Fixed 6.2] [verified fixed 6.2.1 br])


(3 files, 6 obsolete files)

Loading a URL such as:

...will cause Mozilla to connect to the hostname
specified before the "%00", but send the cookies
to the server based on the entire hostname.

Full details are available in my draft writeup at:
Severity: normal → critical
Target Milestone: --- → mozilla0.9.6
Problem is that the networking module and the cookie module have
different ideas as to how to treat a null embedded in a url address. 
Specifically, when the browser requests a page at:


it sends out a request to http://x and ignores the %00 and the y.  The
cookie module considers the entire string and therefore treats this url
as being in any domains contained in y.

Interestingly enough, in 4.x we did not have this problem because
networking did send the request to http://x%00y -- the result was that
we got back a no-such-server response.  We did not go to http://x which
is the attackers site.

IE doesn't have this problem either.  It looks like their networking
layer is ignoring the %00 and the y (just like we do in mozilla).  And
their cookie module is probably ignoring it as well which is why they
don't have the problem.

It doesn't matter which module gets changed (networking or cookies) but
the key point here is that they must both do the same thing.  In this
case it's probably wiser to modify the cookie module so I'll go ahead
and do it.
The cookie module gets its view of the urlfrom 
nsCookieHTTPNotify::OnModifyRequest which is in nsCookieHTTPNotify.cpp.  That
routine is passed an nsIHTTPChannel parameter.  The mHost field of the mURI of 
that parameter is already bad -- i.e., it contains the %00y.

So the problem originates upstream from the cookie module.  Tracing this back a 
few level (see stacktrace below) I see that we already have this error in the 
call to nsHttpChannel::Init (in file nsHttpChannel.cpp).

COOKIE_GetCookieFromHttp(char * 0x048ee570, char * 0x048ee4d0, nsIIOService * 
0x01732b90) line 858
nsCookieService::GetCookieStringFromHttp(nsCookieService * const 0x04871490, 
nsIURI * 0x048eb280, nsIURI * 0x048eb280, char * * 0x0012f860) line 178 + 37 
nsCookieHTTPNotify::OnModifyRequest(nsCookieHTTPNotify * const 0x017b5970, 
nsIHttpChannel * 0x048eee60) line 193 + 48 bytes
XPTC_InvokeByIndex(nsISupports * 0x017b5970, unsigned int 3, unsigned int 1, 
nsXPTCVariant * 0x048ee650) line 154
nsProxyObject::Post(unsigned int 3, nsXPTMethodInfo * 0x0285f4e8, 
nsXPTCMiniVariant * 0x0012f94c, nsIInterfaceInfo * 0x00aac590) line 428 + 34 
nsProxyEventObject::CallMethod(nsProxyEventObject * const 0x048ee6e0, unsigned 
short 3, const nsXPTMethodInfo * 0x0285f4e8, nsXPTCMiniVariant * 0x0012f94c) 
line 547 + 55 bytes
PrepareAndDispatch(nsXPTCStubBase * 0x048ee6e0, unsigned int 3, unsigned int * 
0x0012f9fc, unsigned int * 0x0012f9ec) line 115 + 31 bytes
SharedStub() line 139
nsHttpHandler::OnModifyRequest(nsIHttpChannel * 0x048eee60) line 583
nsHttpChannel::Init(nsIURI * 0x048eb280, unsigned char 1, nsIProxyInfo * 
0x00000000) line 200 + 16 bytes
nsHttpHandler::NewProxiedChannel(nsHttpHandler * const 0x0174dea0, nsIURI * 
0x048eb280, nsIProxyInfo * 0x00000000, nsIChannel * * 0x0012fbf0) line 1667 + 23 
nsIOService::NewChannelFromURI(nsIOService * const 0x01732b90, nsIURI * 
0x048eb280, nsIChannel * * 0x0012fbf0) line 773 + 40 bytes
NS_OpenURI(nsIChannel * * 0x0012fcd4, nsIURI * 0x048eb280, nsIIOService * 
0x01732b90, nsILoadGroup * 0x042c2470, nsIInterfaceRequestor * 0x042c2e68, 
unsigned int 327680) line 150 + 20 bytes
nsHttpChannel::ProcessRedirection(unsigned int 302) line 1197 + 95 bytes
nsHttpChannel::ProcessResponse() line 504 + 12 bytes
nsHttpChannel::OnStartRequest(nsHttpChannel * const 0x048eabc4, nsIRequest * 
0x048ec6d4, nsISupports * 0x00000000) line 2273 + 11 bytes
nsOnStartRequestEvent::HandleEvent() line 125 + 53 bytes
nsARequestObserverEvent::HandlePLEvent(PLEvent * 0x048eda14) line 80
PL_HandleEvent(PLEvent * 0x048eda14) line 590 + 10 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x00ae3a40) line 520 + 9 bytes
_md_EventReceiverProc(HWND__ * 0x00e008de, unsigned int 49262, unsigned int 0, 
long 11418176) line 1071 + 9 bytes
USER32! 77e71268()

Going back a few more levels on the stack I can see that the problem comes from 
the following line in nsHTTPChannel::ProcessRedirection

   const char *location = mResponseHead->PeekHeader(nsHttp::Location);

After the above statement is executed, the value in location contains the %00y.
Adding PDT for tracking and 6.2 branch review purposes.

What are the chances we will have a patch for this by Monday?

Is this something we'd like on the edt094 branch?
Whiteboard: [PDT]
Chances are good for Monday.  Either I'll convince the networking people to make 
sure that they don't pass such a URI to the cookie module (preferable solution 
in my opinion) or I'll add code to the cookie module to remove such garbage from 
the url.  The latter I can definetely do by Monday.
Just to clarify my comment above about "location".

The location-header is something that the server sends back to the 
client.  So of course we have no control over it.  And it contains the %00y as 
the site that it wants the browser to redirect to.

Problem is that the networking layer takes this string and plops it into a URI 
when the following statement (a few lines later) is executed:

   rv = ioService->NewURI(location, mURI, getter_AddRefs(newURI));

And it is this URI that eventually gets trickled down to the cookie module.
And let me clarify still further.  Although this bad URI gets trickled down to 
the cookie module, something else is obviously being done to the URI when it 
gets to the point of actually requesting this page, because the %00y is already 
removed from it by that time.  So I need to look further and see where that 
I think I'm zeroing in on the problem.

When networking generates the request headers (the host header in particular), 
it calls on nsStdURL::GetHost which in turn calls on GetString().  And GetString 
will unescape the string so the %00y all vanishes.

When the cookie module gets the host, it calls on nsStdURL::GetSpec.  However 
GetSpec never calls on GetString so it never unescapes the string and therefore 
uses the %%00y.
More info on GetSpec.  It does its unescaping by calling on AppendString rather 
than GetString.  That's fine.  But it makes the following calles to 

   rv = AppendString(finalSpec,mScheme,ESCAPED,esc_Scheme);
   rv = AppendPreHost(finalSpec,mUsername,mPassword,ESCAPED);
   rv = AppendString(finalSpec,mHost,HOSTESCAPED,esc_Host);

So now to step through and find out why mHost didn't get escaped properly.
Wait.  I think I've confused escaping with unescaping.

When networking generates the host: header it calls on GetHost which does an 

When cookies extracts the host, it calls on GetSpec which does an escaping.

That's why the %00 is not being unescaped in the cookies case.
Keywords: edt0.9.4, topembed
please make sure you are testing a recent build... i just landed the patch for
bug 103916 which changed much of the standard url parsing.
Now you tell me!  :-(
Attachment #58173 - Attachment is obsolete: true
i confirmed that the problem still exists with today's build.  the solution is
going to look very different.  i can work out a patch that is similar to morse's
for the new code.
actually, i think the current solution is not quite correct.  the correct
solution is probably to NOT unescape the hostname in nsStandardURL::GetHost.
Attached patch v1.0 patchSplinter Review
makes GetHost() return "x%00y" thus foiling the attack.  this is better IMO
than trying to unescape the host and then re-escape it.  i ran all of the url
tests after making this change and nothing broke, so it should be a safe fix.
Attachment #58175 - Attachment is obsolete: true
Comment on attachment 58179 [details] [diff] [review]
v1.0 patch

might want to add some comments explaining the changes. no need to create a
newer patch just add the comments when you check it in. r=gagan
Attachment #58179 - Flags: review+
I'm pulling a new tree right now and will try out the patch when my build 
completes.  If it all looks good at that time, I'll check it in.

This fix will have to go into 6.2.1 and also into 9.4.  So shall we use my patch 
for those cases?  If so, please give me a review on my patch.
I tried out Darin's patch and it indeed fixes the problem.  However it does so 
by mimicking the 4.x behavior (sending out http://x%%y on the http request) 
rather than the IE behavior (removing the %%y when doing the cookie test.

That patch that I presented obviously needs to be redone because of the 
restructuring of networking that just occurred.  But it did fix the problem by 
using the IE behavior.

In any event, we are going to need both patches because we can't use Darin's 
patch for the 9.4 branch and the 6.2 branch.  So shouldn't we get the two 
patches to at least have the same behavior?

Removing the "obsolete" from my patch because we are going to need it.
Attachment #58175 - Attachment is obsolete: false
Just a few random comments from someone who has been feeding more malformed or 
odd URLs to his web browsers over the past week or two than any browser 
deserves to see in it's lifetime...

I am not convinced that the IE behaviour of treating %00 as a null terminator 
for a string is proper or the best path to go down.  While it avoids this 
particular problem, since IE uses the same string for resolving the hostname 
and for figuring out which cookies to use, a %00 does not imply that, somehow, 
some part of the URL should be magically chopped off at the %00.

Magically chopping off anything after the %00 could possibly result in other 
security issues if there are places that can use the URL without decoding it, 
or that decode it but don't use null-terminated strings.  For example, in IE if 
you load a document from, then inside that 
document the javascript document.domain variable is set to  document.cookies is still set to the value, but 
there may be maipulations possible to access a document in another frame if 
they both set document.domain to a common suffix (eg.  This 
doesn't appear to be a problem in IE due to other bugs/misfeatures in IE, but 
it is an example of the type of issue that could crop up.  If the hostname is 
truncated at the %00, then you have to make very sure it is truncated for 
everything that could get a hold of it...

My thinking is that perhaps it makes sense to:

1. where a URL component doesn't have to be URLdecoded, then don't.

2. if a component does have to be URL decoded, then it is a fatal error to have 
a %00 since it isn't compatible with the concept of null terminated strings, 
which is ingrained pretty much everywhere.  If the ability to return an error 
isn't there (or things don't consistently check it), then decoding everything 
else but leaving the %00 as %00 may be ok, as long as it is done 
consistently... although I'm not entirely sure that is a good idea without 
thinking through it more.

I think it is legitimate to have a URL such as (which 
would then be equal to for the purposes of what cookies should 
be sent, what host should be connected to, etc... but it is a royal nightmare 
trying to figure out which standard applies to check if that is really 
valid...), so the hostname has to be unescaped somewhere.

It is also arguable that a %00 is legitimate in some cases (eg. query part of 
URL), so it can't just be rejected all the time... only in a context where it 
isn't valid, such as a hostname.  But I'm not sure if that argument is valid or 

Just a few random thoughts.  Unfortunately, I don't know enough about the 
internals of Mozilla to know what is actually practical to implement given the 
current codebase.
Curiously enough, I was bothered by the same points all night and just woke up 
early to try to verbalize my concerns regarding future attacks.

There are probably arguments on both sides of this coin.  For example, I am 
thinking about the two recent IE attacks involving cookie stealing -- one with 
cross scripting and frames, the other with about: urls.  I could easily see how 
other attacks could be crafted based on the unescaping or lack thereof of the 

I concluded even before reading your comment above that any such future attacks 
would be based on the fact that, by accident, the hostname is unescaped in 
certain places and not in others.  There's probably no way that we can be sure 
that we are completely consistent throughout the browser.  Given that reality, 
I'm not sure which is the safer approach to take here -- to leave the host as is 
or unescape it.

Your thoughts about unescaping everything but the %00 is certainly interesting 
and worthy of more investigation.  I'm going to copy Jim Roskind, our local guru 
on such issues, for his thoughts.
Attaching an alternate patch for the 0.9.4 and 6.2 branches that uses the 4.x 
solution rather than the IE solution.
In summary, there are currently three non-obsolete patches in this bug report.  
These patches, and the cases that they cover, are:

1. patch for 0.9.4 using the IE solution (11/16/01 17:01)
2. patch for 0.9.4 using the 4.x solution (11/17/01 06:48)
3. patch for trunk using 4.x solution (11/16/01 17:21)

For completeness, we need a patch for the trunk using the IE solution.  I'll try 
to come up with one.  Then a decision will need to be made as to whether it is 
safer to go with the 4.x solution or with the IE solution.
As already mentioned, the trunk patch that darin posted does not unescape the 
hostname (the 4.x solution).  If we decide to go with that decision, for 
the sake of future security should we also not be unescaping the other 
components of the URL -- i.e., the protocol, username, password, path, etc.?  
The current trunk code unescapes those other components.
In looking over the code in the trunk, I discovered that two entirely different 
mechanisms are being used to extract the hostname in the two cases.  That is 
what is leading to all the problems.  If one unified method were being used, 
then whatever was being extracted for the host header would be the same thing 
that was being extracted by the cookie module (be it escaped or unescaped), and 
this problem would not occur.

Specifically, the cookie module takes a spec and extracts the host part by 
calling the following routines

   ExtractUrlPart_Helper(const char * 0x046fb2e0, unsigned int 7, int 28,
                         unsigned int * 0x0012f770, unsigned int * 0x0012f784,
                         char * * 0x0012f78c) line 528
   nsIOService::ExtractUrlPart(nsIOService * const 0x01844420,
                               const char * 0x046fb2e0, short 2056,
                               unsigned int * 0x0012f770,
                               unsigned int * 0x0012f784, char * * 0x0012f78c)
                               line 609 + 29 bytes
   COOKIE_GetCookie(char * 0x046fb2e0, nsIIOService * 0x01844420)
                    line 636 + 33 bytes

and for the host header the spec is decomposed into its various parts in 
nsStandardURL::ParseURL(const char *spec)

Therefore I'd like to make the following recommendation:

1. Stop the firedrill by immediately fixing the problem in the branches using 
either of the two branch fixes that I posted (4.x method or IE method).

2. Investigate the trunk build further so that one unified method of extracting 
the host is used.
I just spoke with jar.  Together we checked for the specs on allowable 
characters in host names.  I found what I think are the relevant specs as shown 

Bottom line is that only a small set of characters are allowed in host names and 
% is not one of them.  Therefore the most secure thing we can at the time we 
generate the host header in the http request is check for a % and, if found, 
reject the request.  By that I mean do not send out the request but rather 
return a message saying "malformed URL".  If we chose to, we could go a step 
further and only allow those characters explicitly enumerated (A-Z, 0-9, minus, 
and period.  But this might be too restrictive and we might wind up rejecting 
some existing (albeit invalidly-named) sites.

If we do this blocking of %, then the issue of unescaping becomes moot -- there 
is nothing to unescape.  Will generate a patch that does such % blocking.

Here are the relevent spec:

rfc1945, section 3.2.2 http URL

   host = <A legal Internet host domain name ... as defined by section 2.1 of            
rfc 1123>

rfc1123, section 2.1, Host Names and Numbers

   The syntax of a legal Internet host name was specified in RFC-952 [DNS:4].
   One aspect of host name syntax is hereby changed: the restriction on the
   first character is relaxed to allow either a letter or a digit.

spec rfc0952:

  A "name" (Net, Host, Gateway, or Domain name) is a text string up to 24
  characters drawn from the alphabet (A-Z), digits (0-9), minus sign (-), and
  period (.). Note that periods are only allowed when they serve to delimit
  components of "domain style names". (See RFC-921, "Domain Name System
  Implementation Schedule", for background). No blank or space characters are
  permitted as part of a name. No distinction is made between upper and lower
  case. The first character must be an alpha character. The last character must
  not be a minus sign or period. A host which serves as a GATEWAY should have
  "-GATEWAY" or "-GW" as part of its name. Hosts which do not serve as
  Internet gateways should not use "-GATEWAY" and "-GW" as part of their names.
  A host which is a TAC should have "-TAC" as the last part of its host
  name, if it is a DoD host. Single character names or nicknames are not
rfc1945 is obsolete and irrelevant.  In any case, it references 1738 for the 
URL spec (which is obsolete and irrelevant too), and 1738 seems to suggest that 
urlencoding the hostname is valid in a URL, and it should be decoded.

1123 does say what can be in a hostname, but that says nothing about if a 
hostname in a URL can be URL encoded or not.  That is a property of the URL 
and/or the scheme.

RFC2616 is the relevant HTTP spec.  See section 3.2.3 on comparing two URLs; 
according to that, "" is the same 
as "".

2396 is the relevant URI spec.  It has a grammar for certain types of URIs, and 
the way it defines "server" doesn't appear to allow for URL encoding, but each 
scheme is the definitive source of information on things like that, what 2396 
says in that section is just a general form that many things can use, it 
doesn't mean it is the same as http... necessarily.

If any character in the hostname, when unescaped, isn't valid in a hostname 
(well, that has to be relaxed from the strict definition to allow "_" and maybe 
a few others), then reject it, yes.  That is along the lines of what I 
suggested by rejecting any url with %00 in.  %00 isn't the only potentially 
unsafe character; %0A is dangerous too, and only a set list of safe chars 
should be sent.  Every piece of data sent in a request, for any protocol (not 
just http), must either have "unsafe" characters encoded when it is sent (eg. 
in the HTTP GET /foo ... line) or, if there is no encoding defined in that 
context, requests that would cause the unsafe chars must be rejected.  The 
unsafe chars could be filtered in some situations, but that isn't the greatest.

But I am not convinced that it is valid to reject "", 
since it is a URL and it seems reasonable for a URL to be URL encoded.  But I'm 
not sure either way...

Another question has to do with the internationalized domain name schemes that 
have been created to allow for people to "kind of" use non-ASCII characters in 
domain names.  I think what happens there is they are entered in a localized 
charset and encoding, but then translated to ASCII before being sent to the 
resolver.  But I'm not sure about the state of that, since I haven't been 
following it... and I have no idea if mozilla does/will support it.

Another question is regarding what other ways there are to create URLs that 
include "special" characters without entering them %-encoded... eg. creating a 
javascript string that includes the special characters unencoded.  You have to 
ensure you deal with those too.

Then on top of that, even if a URL encoded hostname is valid in the URL, is it 
valid or not in the Host: header... 
OK, you got me!  I wasn't aware of which specs were obsolete when I did my web 

So let's make some decisions.  Here is my recommendation.

1. We never attempt to unencode the host strings.  That means we apply the 
following patches for the branch and trunk respectively

   patch for 0.9.4 using the 4.x solution (11/17/01 06:48)
   patch for trunk using 4.x solution (11/16/01 17:21)

2. As an added security measure (belt-and-suspenders approach), we reject a host 
header field on an http request if it contains %00.  Yes, it's better from a 
security point of view to have an accept list instead of a reject list, but we 
might break some international sites if we tried that since we don't know all 
the things that they might be including.  I'll generate a patch to do this 
rejection -- same patch would probably work for both branch and trunk.

3. There should be some additional cleanup on the trunk for the fact that the 
spec is being parsed by different routines in different places (as noted in my 
comment #28 above).  But once the above two items are done, this third item is 
simply a housecleaning chore that the networking people can do or not do at some 
time in the future.

Marc, does this sound like a good plan to you?
As mentioned, Steve and I had an interesting chat.

IMO, the various flavors of acceptance by IE and Nav 4.x  that were 
identified (never un-escape, or always unescape) are fairly accidental, and I 
believe that there is no justification for escaping (via %nn) to be used in the 
host portion of the domain name.  As pointed out by Steve, it is illegal to have 
a % character in the host name, and the list of legal host name charaters is 
VERY restrictive.  Rather than trying to tempt fate, I would argue for being 
very restrictive about what characters are legal host names.  I'd 
rather see the browser barf (i.e., reject) a non-rfc-compliant host name, than 
find out that there is a way to exploit this questionable area and produce a 
security breach.

To be clear, my fear of security breaches in this area is based on the fact that 
a lot of security policy is based upon host name matching (sometimes in addition 
to other items).  These policies include JavaScript access rules, as well as 
cookie transmission rules. As a result, this is not an area where I believe we 
should leave any wiggle room, unless there is a clear need and and very clear 
semantic specification.  Just picking either the IE or Nav 4.x approach seems 

Unless we can find out a significant reason to support %nn spelling of 
characters in domain names (and the RFC sure seems to say that only very simple 
set are allowed... and they need no special spelling), I'm a definite backer of 
having a very restrictive policy.

I don't see a justification for un-escaping, when the (valid) name could have 
been spelled with no escape sequences.  Despite the above reading of the RFC, 
the fact that IE and Nav 4.x disagree in terms of handling suggests this is not 
a critical element for utility of the browser... and it is clearly critical to 

Please add comments if you can see any justification (need) for either the IE or 
 Nav 4.x policy.  I really think they are purely accidental.

I really like the "malformed URL" error when faced with such characters as % in 
the hostname.  We can always become more friendly in the future if there is a 
need.... but at least we won't fight any more security flaw battles in this 

One last point.... I'm very interested in seeing a resolution of this RSN.  As a 
result, I would argue that the primary short term result is a SAFE solution.  We 
can always be more "friendly" in the future if we have time to review code, 
check international implications, validate uniform handling of such names across 
our codebase, etc. etc.
Jim, as I mentioned above, the patch that I'm creating will reject only if the 
host contains %00 -- it will not reject all host containing %.  Although 
rejecting all %'s would be the most secure thing to do, it might be the most 
likely to break legitimate sites.  Especially since, as Marc pointed out, the 
spec for hostnames that I was looking at is obsolete.

The other two patches mentioned above the make sure we never unescape are 
sufficient to stop the current attack.  This added patch gives a bit more 
security against unforseen future attacks.
Marking my patch labelled "fix indentation" as obsolete -- it was the patch that 
fixes the branch using the IE method but it seems clear now that we want to use 
the 4.x method instead.
Attachment #58175 - Attachment is obsolete: true
Marc, if you agree with the code changes in three patches remaining, please 
indicate your approval by adding r=marcs in this bug report.  Thanks.

cc'ing alecf for sr.
Attachment #58244 - Attachment is obsolete: true
I checked with Roy Fielding (HTTP spec author), and he said that %-encoding is 
not valid for the host, however there have been some requests to change it, and 
it could change in future revisions of the spec.  Apparently the changes were 
coming from DNS internationalization folks... I expected there may be some 
issues there, but that isn't an issue for now and it can be deal with if and 
when it has to be.

So I'm happy with not supporting % encoding in the hostname at all.  I haven't 
had time to look through the proposed immediate fixes, and won't until Sunday 
evening PDT.

I do wonder if there are any odd interactions remaining with javascript, etc. 
due to the fact that control characters can be specified directly without %-
encoding... but haven't thought through that.
If what Marc is saying is correct (i.e., we can reject all %-encodings in host 
names), the we simply change the following line in the "11/17/01 16:29" patch:

   if (PL_strstr(value, "%00")) {


   if (PL_strchr(value, '%')) {

and of course we would change the comment from "if a %00" to "if a %".
2 things--

1. completely ignoring % in hostnames will break iDNS issues (see bug 42898)
cc'ing bobj for his info.

2. if the check for %00 is the only solution then it needs to be reviewed by
darin since I don't see a SetHeader being called from ProcessRedirection. But
then maybe I am missing something here. My recommendation is to just end the
host at %00 -- that way we correctly will connect to the host before the %00 and
ignore everything after the %00.
> if the check for %00 is the only solution

That was never suggested.  It's part of a belt-and-suspenders approach to 
security, sort of like a second line of defense.  The solution is to implement 
the 4.x model on both the branch (patch 11/17/10 06:48) and the trunk (patch 
11/16/01 17:21).  In addition, the %00 or % check (patch 11/17/01 16:29) gives 
the added protection.

> I don't see a SetHeader being called from ProcessRedirection

It's there.  Set a breakpoint and you'll see it.

> My recommendation is to just end the host at %00 -- that way we correctly
> will connect to the host before the %00 and ignore everything after the %00.

That's what the 11/17/01 16:29 patch does.  Does the word "just" in your comment 
mean that you don't advocate doing the additional patches?  If so, that would be 
a mistake from the point of view of avoiding future security fire drills.

>  it needs to be reviewed by darin

Darin, please review the two patches that I posted.  Thanks.
FWIW i talked with andreas otte a bit about not unescaping host headers, and he
seemed to agree that it is probably ok to not unescape host headers.

also, RFC2616 requires that every HTTP/1.1 request contain a Host header, and
HTTP/1.1 servers must reject requests issued without a Host header, so this
means that patch 58249 is quite what we want.  instead, if we have a bad Host
header, we shouldn't bother making a request.
correction: andreas and i talked about URL hostname's not HTTP Host headers.
Comment on attachment 58249 [details] [diff] [review]
slightly cleaner patch to reject host header if it contains a %00 

why modify the ProcessRedirection failure case?  this doesn't make any sense to

what's wrong with sending a Host header with a %00 in it?  are we worried about
what servers/proxies might do with this?
Attachment #58249 - Flags: needs-work+
It is great to hear standards support for not unescaping host names in the URL.

From a security perspective it is IMO quite important to aggressively avoid 
processing URLs with escape sequences in the host portion.  I agree completely 
that we should not be making any request with the bogus (hostname) header, and 
we should be sure that we don't leave a partially populated DOM (perchance, with 
cookie data, and a confused hostname :-( ) laying around in the browser for the 
would-be URL fetch.

IMO, when we see an escape sequence in the hostname, we should presume that some 
one is attempting to create a cracking URL, and we should see what we can do to 
avoid getting deeper into a match of wits than declaring the URL to be 
malformed, and doing (IMO) zero additional processing (of the URL).  It sure 
sounds like some of these patches go in that direction... and I just wanted to 
weigh in with my preferred semantics ;-). 

Maybe someday there will be international support that needs this stuff, but for 
now, a very safe approach seems like the way to go.
...also... I probably didn't understand the issue about the "Process Redirection 
Case" ...but I will comment that there are (or have been) security problems with 
 misunderstandings twixt the browser and some proxies.  If any of this comes 
close to causing a "questionable" hostname to be passed to a proxy, we should 
try to *not* test the quality of the proxy.  There is no upside... and past 
examples of "difficult" cases have created exploitable weaknesses when some 
proxies were used with some browsers.
Darin wrote:

> servers must reject requests issued without a Host header, so this means that
> patch 58249 is [not] quite what we want.  instead, if we have a bad Host
> header, we shouldn't bother making a request.

and then in his next comment he wrote:

> why modify the ProcessRedirection failure case?


The code in patch 58249 does just that -- it doesn't make the request if there 
is a bad (i.e., contains %00) host header.  That is what the modification in the 
ProcessRedirction failure case accomplishes.  So the patch is exactly what we 
Adding myself to cc list, and putting in ETA based on Morse comments in the bug.
Whiteboard: [PDT] → [PDT] [ETA 11.19]
As an example of why it isn't just %00 that matters in hostnames, why just 
truncating at %00 doesn't fix the problem, and why all invalid characters need 
to be filtered/rejected/not-decoded/somethingelsed, consider the following case:

1. configure mozilla to use a proxy
2. load the following URL: http://hostname%0Asomething%0Awhee%

Currently (unless this has changed since the version I'm running), this will 
generate something like:

Host: hostname


By doing this, you can make arbitrary HTTP requests (being able to generate 
requests with embedded newlines is the "holy grail" for nearly any ASCII based 
protocol) to the proxy that will generate responses mozilla thinks are from, _OR_ can make the proxy send a request to a completely different 
server that embeds the cookies for the original request in it.

If you have to allow non-ASCII stuff for i-dns, and urldecode the hostname for 
the same reason then that needs to be converted to a valid domain name (since i-
dns does not use valid domain names, and does not use URLs... it uses it's own 
non-standard layer on top of them... I'm assuming everything else in the 
browser uses the ASCII "real" domain names, eg. domains for cookies, javascript 
origin domain checks, etc.  Wonder what security issues having two completely 
different versions of a hostname for i-dns introduces...) before checks for 
valid characters are made.  In doing this, %00 has to be special cased due to 
the artifact that strings are represented as null terminated.  Everything else 
can be checked after conversion by i-dns, as long as the i-dns conversion 
routines don't go crazy with "bogus" input...  ugh.  This ugly i-dns hack has 
the potential to be nasty.
with the patch to disable unescaping of the hostname, you wouldn't get that
result.  the Host header would instead look like:


morse: why are you only concerned with protecting what happens when the server
redirects the client?  ProcessRedirection only happens when we get a 30x server
response.  what about normal requests/responses?

i think i agree that we should reject any host containing a %, and to do this we
can simply modify the URL parser.  there's a small patch to ignore anything
after a %.  take a look at nsAuthURLParser::ParseServerInfo in nsURLParsers.cpp.
i'll attach a patch to fix this.

we'll also want to get in touch with the iDNS folks to find out if this will
break the IDN ascii compatible conversion.
this latest patch alone should fix the problem by itself, but i would still
recommend the patch to disable unescaping of the hostname since there's no point
in unescaping something that is known to not contain any escape sequences.
> morse: why are you only concerned with protecting what happens when the server
> redirects the client?  ProcessRedirection only happens when we get a 30x
> server response.  what about normal requests/responses?

I wasn't concerned only about redirection.  My patch was in 
nsHttpHeaderArray.cpp which is used for anybody setting a host header, be it for 
a redirection or not.

However I was forced to add another check when in redirection because the code 
as it currently stands is to do a normal request if a redirect request failed.  
In the case of the failure due to the embedded % (or %00), we don't want the 
client to try another request when the redirect one failed.
Darin's latest patch (11/19/01 11:56) is a replacement for my reject-host 
header patch (11/17/01 16:29.  However, instead of rejecting the offending 
header as my patch does, it truncates it but still sends out the request to the 
server.  I would think that the safer thing to do here is not even send out the 
request since we suspect that we are being attacked and are now going to be 
engaging in a battle of wits.
Let's summarize where we are here.  I think we all agree now that doing the 
unescaping is wrong.  So the two patches that remove the unescaping (one for 
trunk, one for branch) are essential.  The one for the trunk already has a 
review and I'm waiting for Darin to review my patch for the branch.  Also 
waiting for alecf to sr both of these patches.

Next is the added protection of testing for %.  Are we in agreement that we 
should test for % and not just %00?  Also, should we truncate (darin's latest 
patch) or reject (my latest patch) when a % is found?

morse: ok, then let's return NS_ERROR_MALFORMED_URI in the parse routine.  this
should prevent the creation of any necko channel to the URI.

i just don't think adding code to the http channel is the right way to solve
this... we should really cut it off at the earliest point, which is the url
parsing code.
Comment on attachment 58216 [details] [diff] [review]
patch for 0.9.4 branch that uses the 4.x solution rather than the IE solution

Attachment #58216 - Flags: review+
Thanks for the heads-up.  CC'ing nhotta who can correct any errors in my
comments :-)

The IDN proposals will not encode % in a host name, so this is not a problem.

 - The leading IDN proposals will encode host names in ASCII Compatible
   Encodings (ACEs)which do not allow byte values of 0.
 - An alternative to ACEs, is to encode host names in Unicode UTF-8.  (But
   this is currently a small minority position).  UTF-8 format also does
   not allow for byte values of 0.
ok, let's get the first two patches landed on the trunk and 0.9.4 branch,
respectively, so we can get some bake time.  we need to do this quickly in order
to have any chance of getting PDT's approval.

the issue of how to handle URLs containing a '%' char is probably too risky for
the branch, and isn't needed to solve this bug.
Comment on attachment 58179 [details] [diff] [review]
v1.0 patch

Attachment #58179 - Flags: superreview+
Comment on attachment 58216 [details] [diff] [review]
patch for 0.9.4 branch that uses the 4.x solution rather than the IE solution

Attachment #58216 - Flags: superreview+
checked in 58179 on the trunk.
talked to darin & sr='ed the two patches here. He brought up the point about
using % to map to valid ascii characters... seems quite reasonable to break that
in order to plug this hole right now..

That said, the only "valid" use of escaping ascii characters in hostnames that I
could think of was to spoof a proxy or url blocking software (i.e. netnanny) to
get around stuff like trying to get to but escaping the "p"

adding sspitzer just because he's the one who gave me that idea back when he
knew someone who worked at surfwatch
Comment on attachment 58436 [details] [diff] [review]
alternate patch to reject URLs with hostname's that contain % chars [for real this time]

That look's better.

r = morse
Comment on attachment 58436 [details] [diff] [review]
alternate patch to reject URLs with hostname's that contain % chars [for real this time]

That look's better.

r = morse
Attachment #58436 - Flags: review+
Attachment #58391 - Attachment is obsolete: true
Attachment #58249 - Attachment is obsolete: true
Whiteboard: [PDT] [ETA 11.19] → [PDT] [ETA 11.19] [tETA-done](not the same as 0.9.4 patch because code has changed)
Attachment #58436 - Flags: superreview+
Comment on attachment 58436 [details] [diff] [review]
alternate patch to reject URLs with hostname's that contain % chars [for real this time]

sure, looks good to me too.
checked in patch 58436 on the trunk.
fixed0.9.4, fixed6.2.1
Keywords: fixed0.9.4
what i meant to say was that patch 58216 was checked in on the 0.9.4 branch and
on the 6.2.1 branch.
excellent. thanks darin.
Whiteboard: [PDT] [ETA 11.19] [tETA-done](not the same as 0.9.4 patch because code has changed) → [PDT+] [ETA 11.19] [Fixed 6.2](not the same as 0.9.4 patch because code has changed)
All three patches have now been checked in -- namely the patch to reject all % 
chars (not just %00) has been checked into the trunk, and the two patches to not 
unescape the host header have been checked into their respective trees -- one on 
the trunk and the other on both the 0.9.4 branch and the 6.2.1 branch.

Marking this report as fixed.
Closed: 23 years ago
Resolution: --- → FIXED
Sorry, I just backed out the change (patch 58216) to nsStdURL.h on the

This bug has not been approved for 0.9.4 checkin (indicated by the "edt0.9.4+"
keyword), it has only been nominated. We definately (for obvious reasons) want
to patch this hole on 0.9.4, but, not without sufficient testing. Changing URL
element escaping semantics can have broooaaad reaching effects.

topembed keyword - edt0.9.4 nomination is sufficient
fixed0.9.4 keyword - mods were backed out of 0.9.4 branch
associated status whiteboard comments

Changing the escaping semantics of our GetHost() impl needs more
verification/regression testing.

We call GetHost() from:
LDAP, imglib, P3P, wallet, caps, webshell, docshell, protocolproxy service, ftp,
gopher, http, res, nsGenericHTMLElement, nsHTMLDocument, nsLocation, global
history, and all over mailnews (but I'm assuming that the mailnews URL impl is
their own which isn't affected by this change; should they be doing the same
thing though?)

Here are a few things we should test in a 0.9.4 build before landing there...
- double clicking a html document (with and without spaces in the name) on the
desktop, with a browser already open. Does the resulting URL in the URL bar look
like it should?
- are we able to load embedded content (<img src= tags for example) that have
escaped chars in the host, as well as in the path?
- is our handling of embedded content links with true spaces in the host/path as
- does the status bar provide the "right" text for hosts that need escaping
(i18n perhaps)?
- does the status bar provide the "right" text when hovering over links in a
page (links with and without escaped chars/spaces)?
- does copying a link location put the "right" thing on the clipboard?
- I'm not sure how to best drive CAPS functionality here.

If I'm going overboard here, someone point that out please. I've just seen (and
have been a part of :-)) too many URL parsing regressions. Should andreas be on
this bug?
Keywords: fixed0.9.4, topembed
Whiteboard: [PDT+] [ETA 11.19] [Fixed 6.2](not the same as 0.9.4 patch because code has changed) → [PDT+] [ETA 11.19] [Fixed 6.2]
valeski: sorry for the confusion on EDT approval, but yesterday at around 4 or
5pm i was called up and instructed to land this patch on the 6.2.1 branch as
well as on the 0.9.4 branch.  the instructions where very clear.

i have spoken offline to andreas about this change (without getting into any
details about why i was proposing this change), and he agreed that there is
little reason to unescape hostnames.  he said the original version of nsStdURL
unescaped hostnames because he figured that was the way iDNS was going to be
implemented... ie. using % sequences.  of course, that's not the case... iDNS
will not use % characters, so he agreed that it should be safe and reasonable to
no longer unescape hostnames.

i would have cc'd him on this bug, i haven't... essentially because i don't know
the policy for adding non-netscape members to a sensitive bug like this.
looks like patch 58436 caused some regressions in mailnews... see bug 110938
for starters.
attachment 58436 [details] [diff] [review] has been backed out to fix today's smoketest blocker(s)... the
problem is that some other URL schemes use nsStandardURL with hostnames that
contain '%' chars.
losing this patch does not mean that this bug is not fixed.  it is still fixed.

the latter patch was meant as a proactive measure to prevent problems that might
arise with a badly written proxy server.  we can probably come up with another
patch that only targets http URLs or some set of networking URLs.
patch 58436 does not affect the branch builds, right?  Would branch patch 58216
cause a similar problem like patch 58436 did?
nope... attachment 58216 [details] [diff] [review] is OK.  it is the branch equivalent of attachment 58179 [details] [diff] [review].
Attachment #58436 - Flags: needs-work+
This appears to be fixed on 2001-11-26 branch builds (WinNT4, Linux rh6, Mac
osX).  Domain cookies are not being stolen using the supplied test case. 
Leaving open while additional regression testing is being done.

Per Jud's recomendations in comment #74 I've checked loading html docs with and
without spaces, and the resulting url in the url bar is correct.

Moied will be testing embedded content links and will add comments later.

is the fix ready for the 0.9.4 branch so it can be picked up by embeding customers?
I've gone through all the tests that Jud recommended with the exception of any
I18N or caps tests.  so far I have not seen any new problems.  cc'ing for I18N testing.
Whiteboard: [PDT+] [ETA 11.19] [Fixed 6.2] → [PDT+] [ETA 11.19] [Fixed 6.2] [verified fixed 6.2.1 br]
As per Jud's recommendation in comment #74 Tested on win2k, linux and mac and 
was able to load embedded content that have escaped chars in the host, as well 
as in the path. 
Embedded content links with spaces in the host/path name are handled correctly.
thanks for all the regression testing. adding edt0.9.4+ keyword, please checkin
the patch in attatchement #58216 to the 0.9.4 branch. Once it's checked in
there, please add "fixed0.9.4" to the keyword field.
Keywords: edt0.9.4edt0.9.4+
So why can't I check this in on the branch?  I've done the following:

   cvs update -r MOZILLA_0_9_4_BRANCH nsStdURL.h
   cvs stat nsStdURL.h -- showed that I indeed had the branch version
   applied the patch
   cvs diff -u nsStdURL.h -- showed that the patch was applied correctly
   cvs commit -m "..." nsStdURL.h

and I got the error message:

   Checking in nsstdurl.h;
   /cvsroot/mozilla/netwerk/base/src/Attic/nsStdURL.h,v  <--  nsStdURL.h
   cvs server: nsStdURL.h: No such file or directory
   cvs [server aborted]: error diffing nsStdURL.h
you might try pulling the entire branch version of netwerk/base/src instead.
Finally succeeded.  Patch checked in on 0.9.4 branch.
Keywords: fixed0.9.4
This is still broken in the 2001-12-05-23-0.9.4ec build.   The test case in
comment #1 is able to steal cookies and return them.
The bug has failed regression testing in 094, removing KW fixed094.
Keywords: fixed0.9.4
re-nominating. Steve, any insight here?
Keywords: edt0.9.4+edt0.9.4
> Jud: re-nominating. Steve, any insight here?

No, none.  Unless the patch didn't get checked into the right place.  Can 
someone check to see if the 0.9.4 build contains the patch that I checked in on 
This may be a build problem. I most recent builds of 094 have had problems. See
email thread from Christine Hoffmann.
g:src> cvs diff -u -r -r nsStdURL.h
Index: nsStdURL.h
RCS file: /cvsroot/mozilla/netwerk/base/src/Attic/nsStdURL.h,v
retrieving revision
retrieving revision
diff -u -r1.27.2.2 -r1.27.2.3
--- nsStdURL.h	20 Nov 2001 15:46:03 -0000
+++ nsStdURL.h	3 Dec 2001 22:19:39 -0000
@@ -135,7 +135,7 @@
 inline NS_METHOD
 nsStdURL::GetHost(char* *o_Host)
-    return GetString(o_Host, mHost, UNESCAPED);
+    return GetString(o_Host, mHost, ESCAPED);

seems like it's been checked in to me.  i'll play around with my 0.9.4 build to
see if i can figure out what's going on.
WORKSFORME w/ my own linux build of the 0.9.4 branch.
This is still broken in the 2001-12-10-09-0.9.4ec build.

if you used the stub installer from that build, tever, you're using 6.2.
More recent stub installers fetch from sweetlou
or similar).

Using the N6SetupB.exe from sweetlou ensures that you're using the current 0.9.4
build, and when using the stub installer make sure that you're pulling the xpi
files from sweetlou, not (you can see the urls while the
installer is downloading).
yes that was it, thx

verified:  2001-12-11-22-094ec

so now that I have checked this on 2 branches and also the the 12/05 trunk (win,
mac, and linux) ...   Verified
Keywords: verified0.9.4
adding back keyword foo
Group: security?
Keywords: fixed0.9.4
You need to log in before you can comment on or make changes to this bug.