Closed Bug 28027 Opened 25 years ago Closed 9 years ago

spec enforcement security review

Categories

(Core :: Networking: HTTP, defect, P3)

x86
Linux
defect

Tracking

()

RESOLVED WORKSFORME
Future

People

(Reporter: dmosedale, Unassigned)

Details

The attached message was sent to security@netscape.com; sounds like Mozilla
needs an audit here.

-----

It's just amazing that a whole set of requirements can come together to make
this exploit work.  namely, check out the list of what's required in clients and
servers to allow this to happen.  It sounds like our clients aren't strictly
following RFC guidelines for schemes and allowing this to happen.  Maybe this is
something we can fix. 

bill 
  

Kragen Sitaker wrote: 

  Description of the Problem 
  -------------------------- 

  CGI.pm contains a method self_url which returns the URL with which the 
  script was called, including all of the data fields submitted --- 
  except for the .submit= field added by CGI.pm. 

  Normally, this is used something like this: 

          my $self = self_url; 
          print qq(<a href="$self#Section2">Section 2</a>\n); 

  If CGI.pm is running on Apache 1.3.6, probably other versions of 
  Apache, and possibly other Web servers, it is possible for a client to 
  cause self_url to include arbitrary sequences of characters at its 
  beginning, such as 

          "><script language="JavaScript">evil_code()</script><a href=" 

  which, if used in the manner described above, leads to the problem 
  described in CERT Advisory CA-2000-02, "Malicious HTML Tags Embedded in 
  Client Web Requests". 

  Apparently, anything following an unencoded space in the URL used to 
  invoke the script ends up being inserted, unencoded but converted to 
  lower case, at the beginning of self_url's return value. 

  Unencoded spaces are, of course, illegal in URLs.  Most web browsers 
  accept them anyway in HREF attributes, and don't bother to %-encode 
  them when they send them in a GET request. 

  Netscape 4.6, MSIE 3.0, Mozilla M12, and Lynx 2.8.1rel.2 at least, 
  allow HREF attribute values to be delimited by ' single-quotes instead 
  of " double-quotes, which allows insertion of unencoded " double-quotes 
  into the URL --- which is crucial to exploiting this problem.  Lynx 
  2.8.1rel.2, however, strips the spaces from the URL found in HTML, 
  preventing it from being exploited via <A HREF=''>. 

  Diagnosis 
  --------- 

  It appears that this happens because the unencoded space is interpreted 
  by the HTTP server (Apache 1.3.6 in my tests) as separating the URL 
  from the protocol name.  So the environment variable SERVER_PROTOCOL 
  gets set to everything following the space, followed by a space and the 
  actual protocol, such as "HTTP/1.0". 

  Three of the four tested browsers (Netscape 4.6, MSIE 3.0, and Mozilla 
  M12) send the unencoded space in the request URL, which generates an 
  illegal HTTP Request-Line. 

  CGI.pm simply takes that environment variable, chops off everything 
  from the slash onwards, lowercases it, and returns the result as the 
  URL scheme. 

  Suggested fixes 
  --------------- 

  RFC 1738 and RFC 2068 say that only a-z, 0-9, "+", ".", 
  and "-" are allowed in scheme names.  Accordingly, I suggest the 
  following change to CGI.pm: 

  *** /usr/local/lib/perl5/5.00503/CGI.pm Tue May 18 00:04:20 1999 
  --- /home/kragen/lib/perl5/site_perl/5.005//CGI.pm      Mon Feb 14 12:07:37
2000 
  *************** 
  *** 2594,2600 **** 
        return 'https' if $self->server_port == 443; 
        my $prot = $self->server_protocol; 
        my($protocol,$version) = split('/',$prot); 
  !     return "\L$protocol\E"; 
    } 
    END_OF_FUNC 

  --- 2594,2602 ---- 
        return 'https' if $self->server_port == 443; 
        my $prot = $self->server_protocol; 
        my($protocol,$version) = split('/',$prot); 
  !     $protocol = lc $protocol; 
  !     $protocol =~ tr/-+.a-z0-9//cd; 
  !     return $protocol; 
    } 
    END_OF_FUNC 

  (Sorry --- I'm using Solaris diff, which doesn't have unified diff 
  capability.) 

  This prevents the exploit, but of course the resulting URL is 
  incorrect.  It won't affect responses to well-formed HTTP requests, 
  which should never have anything other than HTTP for the $protocol to 
  begin with. 

  It might be smarter to always return 'http' when not returning 'https'; 
  I'm not presently aware of any protocols other than HTTP and SSL HTTP used
with 
  CGI.  The current draft CGI spec says: 

          Note that the scheme and the protocol are not identical; for 
          instance, a resource accessed via an SSL mechanism may have a 
          Client-URI with a scheme of "https" rather than "http". 
          CGI/1.1 provides no means for the script to reconstruct this, 
          and therefore the Script-URI includes the base protocol used. 

  . . . in other words, implementing self_url in a way that is guaranteed 
  to be correct for future non-HTTP CGI implementations is not possible. 

  The successful exploit requires a remarkable chain of extreme forgiveness: 
  1- The web browser must accept an illegal URL from (possibly valid, 
     although very unusual) HTML. 
  2- The web browser must send an illegal HTTP request with the illegal 
     URL, without %-encoding the URL to make it legal. 
  3- The HTTP server must accept the illegal HTTP request. 
  4- The HTTP server must invoke the CGI script with a nonsensical 
     SERVER_PROTOCOL. 
  5- The CGI script must accept the nonsensical SERVER_PROTOCOL and use it to 
     produce an illegal URL, which it must then embed in HTML it outputs. 
  6- The web browser must then trust the output of the CGI script in some 
     fashion inappropriate to the supplier of the original URL. 

  Netscape 4.6, MSIE 3.0, and Mozilla M12 (and, I would guess, most Web 
  browsers) will happily perform steps 1 and 2; Apache 1.3.6 (and, I 
  would guess, most Web servers) will happily perform steps 3 and 4; any 
  program using CGI.pm and embedding self_url's return value in their 
  outputs will perform step 5; and as CERT advisory CA-2000-02 documents, 
  there are a wide variety of situations that can cause step 6 to 
  happen. 

  My patch above breaks the chain at step 5.  It would be nice to break 
  it at other steps as well. 

  The HTTP requests used in this exploit are broken --- i.e. by having a 
  Request-Line that has a protocol name that not only fails to be "HTTP", 
  but actually fails to be a valid protocol name at all.  Perhaps Apache 
  and other web servers should respond to such egregious protocol 
  violations with error messages, rather than passing the bogus data on 
  to CGI scripts. 

  I have not sent copies of this mail to other web-server teams, because 
  I do not have the facilities or inclination to properly verify that 
  they are equally lenient.  Preliminary testing suggests that they are 
  not: 

  - IIS 5.0 responds, "The parameter is incorrect". 
  - Netscape-Enterprise/3.6 responds, "Your browser sent a 
    message this server could not understand." 
  - Zeus 3.3 responds with a 400 Bad Request error. 
  - thttpd 2.15 responds with a 400 Bad Request error. 

  I also believe that Web browsers should take some steps to avoid 
  sending illegal HTTP requests; since the problem here happens only when 
  both the server and browser are trusted --- perhaps due to some earlier 
  authentication exchange between them --- while the URL is untrusted, 
  the browser should validate the URL, at least to the point of not 
  sending illegal requests to the server. 

  References 
  ---------- 

  http://www.w3.org/CGI/ --- information about CGI 
  http://Web.Golux.Com/coar/cgi/draft-coar-cgi-v11-03-clean.html --- current 
          draft specification for CGI 
  http://www.cert.org/advisories/CA-2000-02.html --- CERT advisory CA-2000-02, 
          "Malicious HTML Tags Embedded in Client Web Requests" 
  RFC 1738, http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1738.txt --- 
          "Uniform Resource Locators (URL)" --- in particular, section 2.1, 
          which defines the syntax of scheme names 
  RFC 2068, http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2068.txt --- 
          "Hypertext Transfer Protocol -- HTTP/1.1" 
          --- in particular, section 3.2.1, which defines the syntax of 
          URI scheme names identically to RFC 1738, but including 
          uppercase US-ASCII letters. 
          --- and section 5.1, which defines the syntax of HTTP Request-Lines, 
          indicating (together with the sections defining URI syntax and 
          section 33.1, defining HTTP-Version syntax) that they must 
          contain exactly two spaces. 
  http://stein.cshl.org/WWW/CGI/ --- documentation for CGI.pm 
  http://www.apache.org/info/css-security/apache_specific.html --- changes made 
          to Apache in response to CA-2000-02 
  http://www.netcraft.co.uk/survey/ --- Netcraft Web Server Survey, 
          which lists the most popular web server software 

  -- 
  <kragen@pobox.com>       Kragen Sitaker     <http://www.pobox.com/~kragen/> 
  The Internet stock bubble didn't burst on 1999-11-08.  Hurrah! 
  <URL:http://www.pobox.com/~kragen/bubble.html> 
  The power didn't go out on 2000-01-01 either.  :)

-- 
Bill Burns 
Senior Security Engineer 
Netscape Communications Corp. 
http://people.netscape.com/shadow 
What was YOUR new idea today?
sounds like norris.
Assignee: dougt → norris
Bulk moving all Browser Security bugs to new Security: General component.  The 
previous Security component for Browser will be deleted.
Component: Security → Security: General
Status: NEW → ASSIGNED
Target Milestone: M16
Target Milestone: M16 → M18
Bulk reassigning most of norris's bugs to mstoltz.
Assignee: norris → mstoltz
Status: ASSIGNED → NEW
Security reviews and denial-of-service attacks. These will be addressed in the 
post-beta2 timeframe (unless someone's interested in tackling them earlier?)
Status: NEW → ASSIGNED
Assigning QA to czhang
QA Contact: junruh → czhang
Future.
Target Milestone: M18 → Future
QA Contact: czhang → junruh
Mass changing QA to ckritzer.
QA Contact: junruh → ckritzer
Not sure if this is still a problem, but it may be worth looking at. Reassigning
to Networking and cc'ing bbaetz.
Assignee: mstoltz → neeti
Status: ASSIGNED → NEW
Component: Security: General → Networking: HTTP
QA Contact: ckritzer → benc
Hmm.

We still accept %0a and %0d in urls, remember. ns4 didn't. (we don't unescape
them, in an http request, though)
Does that mean we're still vulnerable to this attack, or not?
cc'ing self--it should be simple for me to test this tonight.  (hurray for apache!)
moving neeti's futured bugs for triaging.
Assignee: neeti → new-network-bugs
a lot has changed since this bug was filed.  in particular, our scheme parsing
is very strict now-a-days.  perhaps this is a duplicate of some already fixed
security bug?
should security own this? 
It sounds like we should create a file that has a couple links formated as
described, and then test our escaping of the URL...

I'm doing some work on a lower-level URL parsing spec, but I'm not going to
spend a lot of time messing w/ the formatting of HREFs in HTML...
afaict our scheme parser should be robust here
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.