Topic: CGI

CGI (Common Gateway Interface) was one of the first ways to write programming content for Websites. As such, it enjoys nearly universal support, so I use it pretty extensively.

Advantages over (for instance) Tcl or ASP programming:

  • Universal support
    Every server supports CGI programming. A great deal of ready-to-use CGI code can be found for free on the internet. Most ISPs also allow CGI programming (although some will restrict you to scripts that they have approved, for security and system stability reasons.)
  • Choice of languages
    The CGI protocol is extremely general, so that programs may be written in nearly any language. Perl is by far the most popular, with the result that many people think that CGI means Perl. But C or Python are fine or heck, I suppose you could write CGI in Forth or something.
Sounds great, eh? Usually it is. But disadvantages:
  • Runs in own process and establishes own database connections
    Unlike embedded scripting languages (ASP scripts under IIS or Tcl under AOLserver) each call to a CGI program requires a process to be started, run, and stopped. On heavy sites this can add up to a lot of load for the machine. Worse, the process boundary means that database connections must also be established for each hit, and that can add up to far more load as compared to any platform which establishes database connections in advance and doles them out to embedded scripts.
  • Security risk
    Since a CGI is an arbitrary program running with the userid of the Web server, a malicious or simply incompetent CGI programmer can wreak serious havoc on the system. With careful security planning, most of this risk can be avoided, but it does mean that you pay ISPs extra (usually) for the ability to run arbitrary scripts. Fortunately, a crash in a CGI program doesn't crash the server (unlike Apache modules), so at least you have a little win.
So how does it work? CGI programming is very simple. The key is whatever your program writes to standard output gets sent back to the client. The server messes with it a little along the way, if you want, but not much. Pretty much you're just talking to the browser. So the first thing you have to print is the headers of a valid response. If you mess these up, most servers will really complain. The simplest header you can write is:
Content-type: text/html
This marks the rest of the output as HTML, and you can usually get away with no more headers than those.

Input to your script comes in two ways. Anything that's part of the request (i.e. a POST content) is on the standard input. And then there are lots of other things (including the URL and query part of the URL) which are in environment variables. (As you can see from the use of standard input and output and environment variables, the CGI interface was designed in a C-influenced Unix environment.) The environment variables actually defined vary from server to server somewhat, but these are pretty standard:
SERVER_SOFTWARE The software you're running under.
SERVER_NAME Name of the server
GATEWAY_INTERFACE Not sure, I never use it.
SERVER_PROTOCOL HTTP or HTTPS, I believe.
SERVER_PORT The port the request came in on.
REQUEST_METHOD Method (GET or PUT).
HTTP_ACCEPT Mime types the browser will accept as an answer.
PATH_INFO The path of the CGI (I believe; again, never use it.)
PATH_TRANSLATED The virtual path.
SCRIPT_NAME The name of the program itself being called.
QUERY_STRING The query string (everything after the '?' in the URL).
REMOTE_HOST Host name of the client. (Careful: firewalls and proxies really mess this up.)
REMOTE_ADDR IP address of the client.
REMOTE_USER Remote username. Don't depend on this being at all useful.
AUTH_TYPE Usually BASIC.
HTTP_USER_AGENT Browser identifier string.
CONTENT_TYPE Mime type of the browser request on standard input.
CONTENT_LENGTH Length of same.
That's all I feel like writing at the moment. I just got more and more irritated about always having to look up those environment variables, so there they are. I guess a sample CGI program would be useful, but I don't have time right now.






Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.