Internet Security Professional Reference:CGI Security

-->

The HTTP Server

All efforts to ensure CGI security are moot if the HTTP server itself cannot be trusted. Unfortunately, this is no idle point. In February 1995, it was demonstrated that a buffer overrun in NCSA httpd 1.3 could be exploited to execute shell commands. This bug and several others have been fixed in more recent versions, but it remains quite likely that more lie undiscovered.

In addition, many CGIs make assumptions about the server that might not be valid. For example, it is common to find a CGI in which the PATH environment variable is not explicitly set. It is expected that the server will supply a sane default path, and most do, but there is no such guarantee. The current working directory of a CGI is not well defined and varies between servers. Make no assumptions!

As the Web grows in complexity and volume, server authors are adding features and enhancements with great rapidity, a practice that bodes ill for server security. It is dangerous to write CGIs that rely on the capabilities of one server; one never knows what server will be used in the future, or even if a future release of the same server will have the same behavior.

The HTTP Protocol

HTTP (HyperText Transfer Protocol) is a simple TCP application layer protocol that describes the client request and server response. The latest information can be found at the WWW Consortium page on HTTP: http://www.w3.org/pub/WWW/Protocols/. The capabilities of CGI are intimately tied to the information passed by the HTTP protocol, so it is important for a CGI developer to follow changes. The current standard is HTTP 1.0, although at the time of this writing, HTTP 1.1 has been a proposed standard for several months.

There is a basic authentication model in HTTP that can be used to restrict access to users with passwords. This method should not be used for protecting any sensitive data; passwords are sent across the network in the clear, making it of limited utility on an untrusted network like the Internet.

The Environment Variables

Some information regarding the connection can be extracted from the environment variables passed to a CGI, but this information should be treated suspiciously. Many of the environment variables are obtained directly from the headers supplied by the client, which means they can be spoofed to contain arbitrary data.

The HTTP_REFERER variable, if supplied, contains the URL of the referring document. The HTTP_FROM variable, if supplied, contains the e-mail address of the user operating the browser. It is tempting to CGI authors to use the contents of these to control access to documents. For example, the programmer might not want a CGI to execute unless the referring document is a particular page that the user should read before running the CGI.

The contents of these variables are easily spoofable. It is possible to telnet directly to the HTTP server port (usually 80, but any port can be used) and issue a request to the server by hand. For example:

% telnet domain.com 80
Trying 1.2.3.4…
Connected to domain.com
GET /cgi-bin/program HTTP/1.0
Referer: http://domain.com/secret-document.html
From: president@whitehouse.gov

Never make any important decisions based on the contents of environment variables starting with HTTP (these are the ones that were culled directly from the client headers). On the other hand, some environment variables are set directly by the server and can be trusted, at least as far as the server can be.

REMOTE_ADDR should always contain the IP address of the requesting machine. REMOTE_HOST contains the hostname if DNS lookups are turned on in the server, but be aware that DNS can potentially be spoofed. Some servers can be configured to perform a reverse and forward lookup on every transaction, which is slower but safer, especially if DNS information is being used to control WWW access.

Recall the basic authentication model HTTP uses—if in effect, REMOTE_USER is set to the username the client has authenticated as. If the server performs identd lookups (RFC 931, http://ds.internic.net/rfc/rfc931.txt), REMOTE_IDENT is set to the user indicated by the client machine. Neither of these methods should be used to authenticate access to important data. HTTP authentication is weak, and identd is broken by design for security purposes (it requires trusting the client machine to provide accurate information).

Table of Contents