Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.
Download this listing!

Web Techniques Column 5 (August 1996)

In the previous column, I showed how to use the ``flock'' operation to ensure that only one CGI script was touching a particular file for writing at a time. However, in that script, there was essentially no interaction with user-entered data. This time, I'm going to look at a CGI-generated form that connects to a shared tiny database.

The particular CGI script discussed here handles a common problem: allowing users to ``register'' themselves to access a controlled directory. Here, a URL hierarchy cannot be accessed until the user provides a valid email name that will be recorded for statistical purposes (or perhaps to generate a mailing list, ugh). The registration is validated by having the script generate a random password which is sent to the selected email address -- if the user gave a bad email address, they will never get the email, so the random password will never be known, thus blocking access from that user.

A user is allowed to select their own ``login'' name. A more fascist script might auto-generate both the loginname and the random password.

Once the password has been received, the user may immediately return to the protected URL, because the ``htpasswd'' database has already been updated with this information. No work is required on the part of the webmaster or pagemaster (nice).

While I was finalizing the work on this month's column, Lincoln Stein sent out an email message with a brand-new alpha release of CGI.pm (version 2.20a). This new release supports something similar to the ``HTML::AsSubs'' from the full-scale LWP module. I immediately rushed off to use it, because makes the embedded HTML code sooooo much easier to read and type. However, as it was an alpha release, the version you end up using will almost certainly be different, and there may be slight interface changes. Please keep this in mind if something doesn't work.

As always, the latest CGI.pm can be found in the nearest CPAN (using Tom Christiansen's handy ``nearest CPAN'' locator) in the directory of:

        http://www.perl.com/CPAN/id/LDS/

The program (which I call ``subscribe'') can be found in Listing 1 [below].

Lines 1 through 3 start nearly every program I write. Line 1 selects the Perl interpreter for this file, with the ``taint'' (-T) flag turned on to try to help me from shooting myself in the foot by using external user-supplied data directly in a semi-dangerous operation. Line 2 turns on some compile-time restrictions to help me be a sane programmer, and line 3 causes STDOUT to become unbuffered, so I can ensure proper sequencing of ``print'' output intermingled with child-process output.

Lines 4 and 5 pull in that ``new CGI.pm'' module I described above. I stuck it into a directory below my home directory, so the ``use lib'' compile-time directive in line 4 causes that directory to be searched. Line 5 pulls in the new CGI module, passing it an import parameter of ``standard''. This causes the CGI module to create a number of subroutines directly in the current package (package main), including subroutines to generate HTML. (Remember, this was an alpha release of the new CGI module, and already one of the comments was that this name should be ``:standard'' instead of ``standard''. Only time will tell if this change was made.)

Line 5 also creates an implicit ``CGI'' object, causing the STDIN, environment, and command-line args to be parsed for further access. You can refer to this object explictly with $CGI::CGI, but that seems redundant now.

Lines 7 through 9 define a few configuration constants: the UNIX path to the protected directory, the URL path to the same directory, and the name of the ``.htpasswd'' file within that directory. (Actually, this skeleton version of ``subscribe'' doesn't use the URL, but later versions would have, I bet.)

Line 10 defines a handy $N newline constant, used frequently later.

Lines 12 through 14 print the first (common) part of the HTML output page to STDOUT. ``header'' is actually a call to &CGI::header, which is in turn an implicit invocation of $CGI::CGI->header, causing the proper CGI/HTTP header fields to be sent back to the server. Similarly, ``start_html'' also comes from the CGI package, printing the proper ``title'' directive and nearby friends. The $N values create newlines in the output -- not necessary for HTML interpretation, but somewhat easier to read while debugging the output.

Lines 16 through 29 represent the original form input, when we're called for the first time. If this is the case, then the param routine (actually, &CGI::param) returns a false value in a scalar context. Later, when the form is submitted, it will reinvoke this script with some parameters, and thus param will return a true value instead of a false value, causing this portion to be skipped.

Lines 17 through 27 print the body of the HTML form. The functions ``h1'', ``hr'', ``submit'', and ``p'' generate same-named HTML constructs. The functions ``start_form'', ``textfield'', ``url'', ``end_form'' and ``end_html'' have additional intelligence to access CGI parameters, and generate more complicated HTML constructs . But as you can see, the HTML form is very clean and easy to write directly. If you do enough this way, writing it directly starts seeming like a lot of extra work (why type ``<P>'' when you can type ``p'', for example). Again, the $N values are just to make the output line-oriented for humans to look at the HTML -- they serve no real function otherwise.

Line 28 exits the program if we were just printing out the initial form.

Lines 32 through 72 form an ``eval'' block that I can use to trap errors -- both ``expected'' errors that I know will happen, and errors that I'm not sure about. Within this block, any fatal error or ``die'' operator will cause an immediate jump out of the block, setting $@ to the error text. I test this later outside the block (described below).

Lines 33 through 43 capture and validate the three input fields. The names given in the ``param'' invocations here must match up with the names in the form above. For convenience, I cache these into three local variables with roughly the same names. In a serious script, I'd probably perform a little more validation than a simple regular expression, but hopefully, you'll see the point here.

Note that the ``die'' messages here all begin with ``BACK:''. I'm using this string prefix as a special value in the abnormal-exit catcher outside the eval block. More on that later.

Lines 46 through 48 open the ``.htpasswd'' database, and grab it exclusively. Only one ``subscribe'' script (or other cooperating script) is allowed to have an exclusive lock on this file. (This was described in fairly heavy detail last month, so I won't repeat all the gory explanation.)

Lines 52 through 57 examine the existing ``.htpasswd'' database to see if the requested username has already been taken. If so, the ``die'' operator bails out of the loop. If not, we make it all the way through, and fall through to the next chunk of Perl.

Lines 59 to 65 add the new username to the password database, along with an encrypted randomly chosen password. First, line 59 seeks to the end of the file (we'll probably already be there, but I like double-checking). Next, line 60 selects a random password by calling a subroutine (defined later).

Lines 61 and 62 write the username and encrypted password to the database. Note that a ``salt'' of ``aa'' is used every time -- in a secure system, I'd make this a random salt by selecting any two characters from the password character set.

Line 63 sends email to the user, giving them their new (hopefully temporary) password. Line 64 similarly records the username, the real name, and the designated email address into some database for record-keeping.

LIne 65 closes the file, thus freeing up the ``.htpasswd'' file for possible alteration by another invocation of ``subscribe''.

Lines 67 through 70 display an acknowledgement message because everything finally worked. Line 71 exits the program in a normal way.

Lines 73 through 92 handle all the abnormal exits from the eval block. If the error message begins with ``BACK:'', then it's one of our early-exit ``expected'' errors. The remainder of the error-message line is a message of some kind that needs to be displayed to the user, along with a ``please go back and try again'' warning. This is handled in line 75 through 77.

If the error-message did not begin with ``BACK:'', it's an unexpected error. Because the error might contain HTML-significant characters (ampersand, less-than, or greater-than), these need to be encoded in such a way that they'll come out properly on the displayed page. Lines 80-82 handle that encoding. The print at lines 83 through 86 display the properly encoded message inside a ``pre'' ``code'' block, so it'll look pretty close to the original text.

Whether it was internal error or an expected error, lines 88 through 90 display a standard ``go back'' message. It is presumed that the user will understand how to go back to the previous page. If they don't, they probably don't need access to my protected directory.

Lines 94 through 104 define the three routines implementing some of the functions called from above. In this demonstration program, the selected random password is always ``password'' (very very very unwise in anything but a demo program like this). The user information is not actually recorded (although the parameters are at least named), and the selected password is not emailed (oh well, it's ``password'' every time anyway). Obviously, to be useful, I'd replace these with real functions.

And that does it for the program. I'd stick it somewhere in some CGI-BIN area, and then invoke it with no parameters to get the original form.

I'd also have to create an ``.htaccess'' file for the ``protected'' directory, which would look something like this:

        AuthName Protected Directory
        AuthType Basic
        AuthUserFile /home/merlyn/public_html/protected/.htpasswd
        <limit GET POST>
        require valid-user
        </limit>

I hope you enjoyed this demonstration of using a flocked (tiny) database with a CGI form. If you'd like to see a specific Perl-and-web-related topic covered in a future column, please feel free to email me...

Listing 1

        =1=     #!/usr/bin/perl -T
        =2=     use strict;
        =3=     $|++;
        =4=     use lib "/home/merlyn/CGIA";
        =5=     use CGI qw(standard);
        =6=     
        =7=     my $target_dir = "/home/merlyn/public_html/protected";
        =8=     my $target_url = "http://www.teleport.com/~merlyn/protected";;
        =9=     my $target_htpasswd = "$target_dir/.htpasswd";
        =10=    my $N = "\n";                   # two chars instead of 4 :-)
        =11=    
        =12=    print
        =13=      header, $N,
        =14=      start_html('subscribe to protected', 'merlyn@stonehenge.com'), $N;
        =15=    
        =16=    unless (param) {                # generate initial form
        =17=      print +
        =18=        h1 ('Subscribe to "protected"'), $N,
        =19=        hr, $N,
        =20=        start_form('POST',url), $N,
        =21=        p, 'Your desired username: ', textfield('username','',20), $N,
        =22=        p, 'Your e-mail address: ', textfield('email','',60), $N,
        =23=        p, 'Your real name: ', textfield('real','',60), $N,
        =24=        p, submit, $N,
        =25=        end_form, $N,
        =26=        hr, $N,
        =27=        end_html, $N;
        =28=      exit 0;
        =29=    }
        =30=    
        =31=    ## main toplevel:
        =32=    eval {
        =33=      my $field_username = param('username');
        =34=      die "BACK: Username must be lowercase alphabetic!\n"
        =35=        unless $field_username =~ /^[a-z]+$/;
        =36=    
        =37=      my $field_email = param('email');
        =38=      die "BACK: Your email address must be non-empty!\n"
        =39=        unless $field_email =~ /\S/;
        =40=    
        =41=      my $field_real = param('real');
        =42=      die "BACK: Your real name must be non-empty!\n"
        =43=        unless $field_real =~ /\S/;
        =44=    
        =45=      ## fields are authenticated, so now lets try to add...
        =46=      open PW, "+>>$target_htpasswd" or
        =47=        die "Cannot attach to $target_htpasswd: $!";
        =48=      flock PW, 2;                  # wait for exclusive lock
        =49=      ## begin critical region (only one proc at a time gets past here)
        =50=    
        =51=      ## first, ensure that we don't have a duplicate username
        =52=      seek PW, 0, 0;                # beginning of file
        =53=      while (<PW>) {
        =54=        my ($user) = split ":";
        =55=        die "BACK: sorry, that username is already taken\n"
        =56=          if $user eq $field_username;
        =57=      }
        =58=      ## good name, so add it
        =59=      seek PW, 0, 2;                # end of file
        =60=      my $password = &random_password;
        =61=      print PW
        =62=        join (":", $field_username, crypt($password,"aa")), "\n";
        =63=      &send_password($field_email, $field_username, $password);
        =64=      &record_user($field_email, $field_username, $field_real);
        =65=      close PW;
        =66=      ## end critical region
        =67=      print +
        =68=        h1("You've been added!"), $N,
        =69=        p, "You've been added! Your password is arriving in email!", $N,
        =70=        end_html;
        =71=      exit 0;
        =72=    };
        =73=    if ($@) {                       # somebody died
        =74=      if ($@ =~ /^BACK: (.*)/) {    # one of our BACK errors?
        =75=        print +
        =76=          h1('Form entry error'), $N,
        =77=          p, $1, $N;
        =78=      } else {                      # nope, an internal error
        =79=        $_ = $@;
        =80=        s/&/&amp;/g;
        =81=        s/</&lt;/g;
        =82=        s/>/&gt;/g;
        =83=        print +
        =84=          h1('Form entry INTERNAL error'), $N,
        =85=          p, 'The error message was ', $N,
        =86=          code(pre($_)), $N;
        =87=      }
        =88=      print
        =89=        p, 'Go back and try again!', $N,
        =90=        end_html, $N;
        =91=      exit 0;
        =92=    }
        =93=    
        =94=    sub random_password {
        =95=      "password";
        =96=    }
        =97=    
        =98=    sub send_password {
        =99=      my ($email, $user, $pass) = @_;
        =100=   }
        =101=   
        =102=   sub record_user {
        =103=     my ($email, $user, $real) = @_;
        =104=   }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.