|
Previous | Table of Contents | Next |
The CGI specification details only the means by which data is passed between programs. The basic model of a CGI looks like figure 14.1.
Figure 14.1 Data passing between browser, server, and CGI.
A CGI is designated nph (non-parsed headers) if the program name begins with nph-. The program can then bypass the server and output directly to the browser, which is necessary if the program needs to decide its own http response code or ensure that the server does not perform any buffering.
The CGI program can receive information in the following three ways, any of which can potentially be abused by a cracker attempting to subvert security:
For historical reasons, the server is not guaranteed to send EOF when all available data has been sent. The number of bytes available for reading is stored in the CONTENT_LENGTH environment variable, and CGI programs must read only this many bytes. This is a potential security issue because some servers do send EOF at the end of data, so an incorrectly written CGI might work as expected when first tested, but when moved to another server its behavior might change in an exploitable way.
In the following example, the behavior is undefined, as the code is trying to read until it receives an EOF.
if($ENV{REQUEST_METHOD} eq POST) { # Wrong, may never terminate while(<STDIN>) { [ ] } # or read bogus data }
The second example correctly reads only CONTENT_LENGTH bytes.
if($ENV{REQUEST_METHOD} eq POST) { read(STDIN, $input, $ENV{CONTENT_LENGTH}); # Right }
The data passed to a CGI is a series of key/value pairs representing the forms contents. It is encoded according to a simple scheme in which all unsafe characters are replaced by their percent-encoding, which is the % character followed by the hexadecimal value of the character. For example, the ~ character is replaced by %7E. For historical reasons, the space character is usually not percent-encoded but is instead replaced by the + character.
Note: A complete list of unsafe characters is available in RFC 1738, Universal Resource Locators, http://ds.internic.net/rfc/rfc1738.txt.
Despite what the unsafe designation seems to imply, the characters are not encoded for security reasons. They are subject to accidental modification at gateways or are used for other purposes in URLs. Because the encoding is expected to be performed by the client, there is no guarantee that unsafe characters have actually been encoded according to the specification. A CGI must not assume that any encoding has been performed.
Before submission to the server, the browser joins each key/value pair with the = character and concatenates them all, separated by the & character. Again, although this is the expected and desired behavior, data of any kind can potentially be submitted.
This data format does not lend itself to easy access by the CGI programmer. Several libraries have already done the difficult work. Some of the features available in these libraries are as follows:
Re-inventing these facilities can easily introduce avoidable security problems. Learning to use one or more is a wise time investment. They can be found at the following addresses:
There are several points of attack possible when attempting to compromise a CGI program. The HTTP server and protocol should not be trusted blindly, but environment variables and CGI input data are the most likely avenues of attack. Each of these should be considered before writing or using a new CGI.
Previous | Table of Contents | Next |