Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in WebTechniques magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Download this listing!

Web Techniques Column 8 (November 1996)

I have mixed feelings about writing this column. I've been a strong advocate of browser independent, standard HTML as the universal medium of the Web. However, as a friend of mine (Devin Ben-Hur, a web designer) points out, not all browsers are currently up to even the current standards, and this can be stifling from a communications point-of-view, or worse yet, confusing from a reader's view.

Devin suggested that I write a column about how to handle HTML that adapts itself to the browser dynamically. I said that it sounded like a neat idea, so here it is. So, thanks to Devin (email: <dbenhur@emarket.com>) for this month's idea.

I decided to take a tackle at the table problem. Most modern browsers handle tables just fine, but two frequently-used browsers (Lynx, and the w3-mode of GNU Emacs) do not. So, if data is to be presented in a universal fashion, it would have to be HTML-table-encoded for nearly all of the world, but generated as a pseudo-table (using <PRE> format) for the few that can't handle that.

One possible solution would be to have a filter that would read an HTML-encoded table, reverse-engineer it into its rows and columns, and then perform some layout function to generate equivalent <PRE> text. This can get pretty messy, and would certainly have had me writing a fairly hefty chunk of code. I'm lazier than that, so I took a different approach.

Namely, I decided to generate either an HTML table or a <PRE> text area from a common, simple format -- a plain text file with tab-separated columns and newline-delimited rows. This unambiguous format can be parsed rapidly, and translates nicely into both <PRE> text and full-blown <TABLE> text.

The resulting code is contained within Listing 1 [below].

Lines 1 and 2 begin nearly every program I write for CGI scripts (and anything else that's longer than 10 lines), enabling ``taint checks'', warnings, and compile-time restrictions.

Lines 7 and 8 disable output buffering and set a particular path, respectively. You'll probably want to adapt line 8 to whatever your particular system requires.

Line 11 tells the server-side include processing that we're gonna be generating real HTML, although this is mostly ignored, because what matters to the browser is what the original including document defines as a MIME type.

Lines 14 through 76 define an eval block, used for trapping errors. If we die for any reason within this block, something reasonable is generated, instead of just throwing the die message into the server-log and then returning an error 500. More on that later.

Lines 17, 18, and 19 grab the URI, extra stuff, and user agent fields, respectively, from the process environment variables. We need the URI to find the correlated data table. We need the extra stuff to decide the filename of the data. And finally, the user agent will determine if we output an HTML table or a flat text pseudo-table.

Line 22 aborts if we aren't called as an SSI, because DOCUMENT_URI (and hence $uri) will be empty.

Line 25 gets the directory and filename out of the URI (the document that included us). This is needed because the data table is required to be in the same directory as the original document.

Lines 28 through 32 translate this directory part into a UNIX path. Now, this is necessarily system dependent, so I'm illustrating the code for my ISP, Teleport. You'll most certainly have to figure this out for yourself. At Teleport, the URL /~merlyn/fred.html is located at UNIX path /home/merlyn/public_html/fred.html, so that's what we've got.

Line 35 attempts to go to this computed directory, failing the entire process if this also fails.

Lines 38 and 39 compute a filename within the directory that contains the data to be formatted into either a table or a flat text layout. The regular expression grabs the first word. This word has ``.table'' attached to it, for security reasons. (Without it, it might be possible to grab arbitrary files through guest books or other things, and that seems a little too powerful. At least this way, only things that end in ``.table'' are vulnerable.)

Lines 41 verifies this computed filename, and opens it if it exists.

Line 42 defines a temporary array called @max_in_col, which will be used to hold the maximum column width seen so far. We'll need this value if we are constructing a pseudo-table, but not if we're letting the browser do all the layout stuff.

Line 43 defines @table, which holds the table data itself. This will be a list of references to lists.

Line 44 decides whether we are going to output just flat text (a pseudo-table) or a full HTML table, by looking at the user agent (in $agent). Now, in my limited exposure of testing, I discovered that neither Emacs-W3 nor Lynx handled tables (yet). So, if the user agent string matches either of these, I'm gonna use a flat text operation. If you discover other browsers that are table-challenged, it's only necessary to extend this regular expression. The variable $text is checked in a few places later in the program.

Lines 45 through 59 acquire the table data from the file. Each line is read into the $_ variable, and then split by tabs into @row in line 47.

If the output format is flat text, then lines 49 through 57 examine each column in the row to see how wide it is. We'll need that info to determine the width of the maximum element in each column.

Lines 51 through 56 are executed once for each column. The data is stored into $tmp in line 52, so that we can strip the HTML markup in line 53. This way, the width of the string is gonna be just the text without the HTML. This won't be completely correct, but it's a better first cut than counting all the HTML tags.

Line 54 converts the string in $tmp into its length, and line 55 saves that length into the @max_in_col array if the new length is wider than what's been seen so far for that particular column.

Whether or not I'm building a text-table, the data itself gets shoved in as an anonymous list into the @table array. Note that it would be wrong to use:

        push @table, \@row;

here, because that would put the same listref in every slot of the table. Instead, I'm creating a brand new anonymous list by copying the data once.

Lines 60 through 75 dump the collected data as either an HTML table or a text pseudo-table, selected by the value of $text.

Lines 61 through 66 handle the text pseudo-table case. Line 61 creates a printf format string from all of the columns widths. This is pretty complex, so let me break it down from right to left.

First, there's a map operator, which takes each element in @max_in_col and turns it into ``%-Ns'', where ``N'' is the width. Then, those elements are list-concatenated with an empty element on the left, and a newline on the right. Then, the resulting list is glued together in a single single string by putting `` | '' between the elements (but not before or after).

Wow. The result will look something like:

        " | %-5s | %-10s | %-3s | \n"

if @max_in_col was something like (5,10,3). As you can see, we therefore create the correct format string to hand printf to put the columns into the right shapes. Neat, huh?

Lines 62 and 66 put the right <pre> and </pre> enclosure around the pseudo-table. Lines 63 through 65 output each line from the @table array. Notice that the value in $_ is a listref representing an original row. Line 64 dereferences this listref to get the original data. The format string generated above automatically puts the data into the right shape.

Lines 68 to 74 generate the HTML table structure from the same original data. Lines 68 and 74 output the outer HTML codes.

Lines 69 through 73 generate each row, similar to the text-only version above. Lines 70 and 72 bound the beginning and ending of the table row, and line 71 enclose each table element in table data tags.

That's pretty much all there is in the main program. The final dozen lines take care of unexpected errors or other Perl fatalities resulting from the inside of the eval block. Line 79 detects an error result, which is then stripped of any final newline by line 80.

Lines 81 through 85 massage the error message into something meaningful and HTML-safe. Line 86 generates the message to the output.

That's it. Now, to use this puppy, stick it into a CGI directory. Let's say it's ``/cgi/table''. Then, plop your data down into a file named ``something.table'' in the same directory as your HTML file in which you want to use the data. Let's say it's ``fred.table''. Every line of the file will be a row in the resulting table. Every column in the table should be tab-separated from its neighbors.

Then, it's just a matter of shoving something like:

        <!--#include cgi="/cgi/table/fred" -->

into your SSI-parsed file. Note that I'm using the Apache server here, so your SSI invocation sequence may vary. Also, you'll have to do the right thing to make sure the file itself is SSI parsed. That might be by adding something to the .htaccess file, or renaming it so that it ends in ``.shtml'' or turning on the executable bit or something.

In summary, while I wouldn't recommend making every output decision based on the browser type, from time to time knowing and using such information can come in handy. See ya next time.

Listing 1

        =1=     #!/usr/bin/perl -Tw
        =2=     use strict;
        =3=     
        =4=     ## table: write a table or a <pre> based on browser type
        =5=     
        =6=     ## system stuff
        =7=     $|++;
        =8=     $ENV{PATH} = "/usr/ucb:/bin:/usr/bin";
        =9=     
        =10=    ## HTML stuff
        =11=    print "Content-type: text/html\n\n";
        =12=    
        =13=    ## the main program (in eval so we can trap problems)
        =14=    eval {
        =15=    
        =16=      ## get the CGI data
        =17=      my $uri = $ENV{DOCUMENT_URI}; # valid only in SSI
        =18=      my $path = $ENV{PATH_INFO};   # stuff after cgi name
        =19=      my $agent = $ENV{HTTP_USER_AGENT}; # user agent
        =20=    
        =21=      ## validate this as an SSI
        =22=      die "missing document_uri" unless $uri;
        =23=    
        =24=      ## split the URI up, so we know where the file was
        =25=      my ($dir,$file) = $uri =~ m,(.*/)(.*),s;
        =26=    
        =27=      ## massage the directory to get the containing dir
        =28=      if ($dir =~ m,^/~(\w+)/(.*)$,s) {
        =29=        $dir = "/home/$1/public_html/$2"; # teleport specific
        =30=      } else {
        =31=        die "cannot translate dir";
        =32=      }
        =33=    
        =34=      ## go there
        =35=      chdir $dir or die "cannot cd to $dir: $!\n";
        =36=    
        =37=      ## open sourcefile
        =38=      my ($filename) = $path =~ m!^/?([-\w.]+)!;
        =39=      $filename .= ".table";        # ensure only "whatever.table"
        =40=      die "missing filename in $path" unless $filename and -f $filename;
        =41=      open F, $filename or die "cannot open $filename: $!";
        =42=      my @max_in_col;
        =43=      my @table;
        =44=      my $text = $agent =~ /Emacs-W3|Lynx/i; # you may wish to extend this list
        =45=      while (<F>) {
        =46=        chomp;
        =47=        my @row = split /\t/;
        =48=    
        =49=        if ($text) {
        =50=          ## save maximums
        =51=          for (0..$#row) {
        =52=            my $tmp = $row[$_];
        =53=            $tmp =~ s/<.*?>//g;     # don't count HTML markup
        =54=            $tmp = length $tmp;
        =55=            $max_in_col[$_] = $tmp if $max_in_col[$_] < $tmp;
        =56=          }
        =57=        }
        =58=        push(@table, [@row]);
        =59=      }
        =60=      if ($text) {
        =61=        my $format = join " | ", "", (map "%".(0-$_)."s", @max_in_col), "\n";
        =62=        print "<pre>\n";
        =63=        for (@table) {
        =64=          printf $format, @$_;
        =65=        }
        =66=        print "</pre>\n";
        =67=      } else {
        =68=        print "<table border=1>\n";
        =69=        for (@table) {
        =70=          print "<tr>";
        =71=          print map "<td>$_</td>", @$_;
        =72=          print "</tr>\n";
        =73=        }
        =74=        print "</table>\n";
        =75=      }
        =76=    };
        =77=    
        =78=    ## if an error, say so:
        =79=    if ($@) {
        =80=      chomp $@;
        =81=      $_ = "[error: $@]";
        =82=      s/&/&amp;/g;
        =83=      s/</&lt;/g;
        =84=      s/>/&gt;/g;
        =85=      s/\n/<br>/g;
        =86=      print;
        =87=    }

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Web Techniques Column 8 (November 1996)

Listing 1