Copyright Notice
This text is copyright by InfoStrada Communications, Inc., and is used with their permission. Further distribution or use is not permitted.This text has appeared in an edited form in Linux Magazine magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.
Please read all the information in the table of contents before using this article.
Linux Magazine Column 44 (Jan 2003)
[suggested title: What Perl got right]
As I type this month's column, we're just pulling away from Ocho Rios,
Jamaica, on the latest Geek Cruise (www.geekcruises.com
) called
``Linux Lunacy 2''. Earlier today, some of the speakers on this
conference/cruise, including Linus Torvalds and Eric Raymond, held a
meeting with the Jamaican Linux Users Group. We're out at sea,
en-route to Holland America's private island, ``Half Moon Cay'', so I'm
using the satellite link to upload and review this column (for a mere
30 cents a minute).
Earlier this week Eric Raymond gave one of his many visionary presentations. This one in particular mentioned Perl for a section on ``What Perl Got Right''. The message surprised me, because Eric prefers that other popular ``P'' language over Perl for his personal and professional work. The one thing that Eric says that Perl got right is one of the many things that I think Perl got right: Perl's easy access to low-level operating system functionality.
Let's take a look at what this means. Perl gives you unlink()
and
rename()
to remove and rename files. These calls pass nearly
directly to the underlying ``section 2'' Unix system calls, without
hiding the call behind a confusing abstraction layer. In fact, the
name ``unlink'' is a direct reflection of that. Many beginners look for
a ``file delete'' operation, without stumbling across ``unlink'' because
of its peculiar name.
But the matchup doesn't stop there. Perl's file and directory
operations include such entries as chdir()
, chmod()
, chown()
,
chroot()
, fcntl()
, ioctl()
, link()
, mkdir()
,
readlink()
, rmdir()
, stat()
, symlink()
, umask()
, and
utime()
. All of these are mapped nearly directly to the
corresponding system call. This means that file-manipulating programs
don't have to call out to a shell just to perform the heavy lifting.
And if you want process control, Perl gives you alarm()
, exec()
,
fork()
, get/setpgrp()
, getppid()
, get/setpriority()
,
kill()
, pipe()
, sleep()
, wait()
, and waitpid()
. With
fork and pipe, you can create any feasible piping configuration, again
not limited to a particular process abstraction provided by a more
limited scripting language. And you can manage and modify those
processes directly as well.
Let's not forget those socket functions, like accept()
, bind()
,
connect()
, getpeername()
, getsockname()
, get/setsockopt()
,
listen()
, recv()
, send()
, shutdown()
, socket()
, and
socketpair()
. Although most people usually end up using the higher
level modules that wrap around these calls (like LWP
or
Net::SMTP
), they in turn can call these operations to set up the
interprocess communication. And if a protocol isn't provided by a
readily accessible library, you can get down near the metal and tweak
to your heart's content.
Speaking of interprocess communication, you've also got the ``System V''
interprocess communications, like msgctl()
, msgget()
,
msgrcv()
, msgsnd()
, semctl()
, semget()
, semop()
,
shmctl()
, shmget()
, shmread()
and shmwrite()
. Again, each
of these calls maps nearly directly to the underlying system call,
making existing C-based literature a ready source of examples and
explanation, rather than providing a higher-level abstraction layer.
Then again, if you don't want to deal with the low-level interfaces,
common CPAN modules hide away the details if you wish.
And then there's the user and group info (getpwuid()
and friends),
network info (like gethostbyname()
). Even opening a file can be
modified using all of the flags directly available to the open
system call, like O_NONBLOCK
, O_CREAT
or O_EXCL
.
Hopefully, you can see from these lists that Perl provides a rich set of interfaces to low-level operating system details. Why is this ``what Perl got right''?
It means that while Perl provides a decent high-level language for text wrangling and object-oriented programming, we can still get ``down in the dirt'' to precisely control, create, modify, manage, and maintain our systems and data. For example, if our application requires a ``write to temp file, then close and rename atomically'' to keep other applications from seeing a partially written file, we can spell it out as if we were in a systems implementation language like C:
open TMP, ">ourfile.$$" or die "..."; print TMP @our_new_data; close TMP; chmod 0444, "ourfile.$$" or die "..."; rename "ourfile.$$", "ourfile" or die "...";
By keeping the system call names the same (or similar), we can leverage off existing examples, documentation, and knowledge.
In a scripting language without these low-level operations, we're forced to accept a world as presented by the language designer, not the world in which we live as a practicality. Eric Raymond gave as examples an old LISP system which provided many layers of abstraction (sometimes buggy) before you got to actual file input/output system calls, and the classic Smalltalk image, which provides a world unto itself, but very few hooks out into the real world. As a modern example, Java seems to be somewhat painful about ``real world'' connections, preferring instead to have its users implement the ideal world for it rather than it adapting to its world.
And in this, I agree. I've personally written probably a thousand system admin utilities over the 13 years that I've been playing with Perl, and many of those involved those mundane tasks of opening a file precisely the way I wanted, moving it around, and watching processes and files to make sure they weren't getting out of hand. It may not be sexy, but it's where the work actually is -- where the work gets done.
So while I encourage everyone to rush out and play with Squeak
Smalltalk (www.squeak.org
) to learn real object-oriented
programming, at the end of the day it's still gonna be Perl (OO or
not) that monitors my website and pages me when the system goes down.
One interesting side-effect of Perl having so many low-level functions is that it forced those who ported Perl from Unix to other operating systems to think about how to perform those functions portably. Thus, the ``Unix API'' provides a ``virtual'' operating system interface for Perl programmers, regardless of the platform.
And since I'm familiar with Unix, I can actually code up portable Perl programs that run on MacOS and Windows and VMS without having to be very smart on their oddities, or relearn a different API, even for apparently low-level operations. I remember squealing with delight when a program I had written for Unix that dealt with forking and sockets ran without any code changes on a Windows box at a customer site. I actually had not expected it to work, especially not as-is.
But what if something in section 2 of my Unix manual isn't supported
directly by Perl? Well, on those platforms that support it, the
syscall()
interface provides a nifty escape hatch. Given the right
parameters, the syscall
function can call nearly any
single-value-return system call.
For example, suppose the rename()
function weren't provided
directly by Perl. We could simply look it up in
/usr/include/sys/syscall.h
, apply the proper parameters as
indicated by the rename(2)
page, and we're up and running anyway.
The code might look something like this:
sub my_rename { my $from = shift; my $to = shift; $! = 0; syscall(128, $from, $to); return ! $!; }
my_rename("fred", "barney") or die "Cannot rename: $!";
The magic ``128'' came from hunting around in my /usr/include
directory until I could find the system call number of rename
.
That's the highly non-portable part of this operation, so your mileage
and number will vary.
Once we have that number, we can issue a syscall
. The value of
$!
is set to 0 before the call, and checked for a non-zero value
after the call. If the operator returned anything of interest, we
could also check that at the call itself. If the call fails, the
normal die
with $!
in the text string gives us a reasonable
error message.
So, if syscall
works, we can wrap anything in Unix manual section 2
that isn't already provided, all without leaving Perl.
But what if syscall
didn't work? Well, even all the way back to
Perl version 4, we had a documented way of ``extending'' a Perl
interpreter using the C-level Perl interfaces. And it all got nicely
easier with the release of Perl version 5, using the XS
interface.
With XS, we can write dynamically loaded object code for our low-level
interface (or statically linked on some of the more limited systems),
and then use it at will.
But this XS
interface was still a stumbling block for many people.
Many consider it arcane, requiring too many knowledge steps to be
useful. So, thankfully, last year Brian Ingerson (``ingy'') came along
and wrote the beginnings of the Inline
architecture. In
particular, Inline::C
allows me to define arbitrary subroutines in
C, and they simply appear as callable Perl subroutines. Behind the
scenes, an MD5-hash of the C code is created, and used to maintain a
cache of to-be-compiled or pre-compiled loadable object files. At
this point, renaming a file would be as simple as copying the syntax
nearly directly from the example of the rename(2) manpage:
use Inline C => <<'END';
#include <stdio.h>
int my_rename(char *from, char *to) { return rename(from, to) >= 0; /* -1 is bad, 0 is good */ }
END
my_rename("fred", "barney") or die "Cannot rename fred to barney: $!";
Here I'm providing the definition for my_rename
as a C function.
The arguments are specified exactly as they would be in a C program,
and the rename
system call gets called in the middle, massaging the
return value slightly.
The Inline
structure creates the proper glue to hook the snippet
into the Perl-to-C code, and arranges for the C compiler to process
that code. The results are cached: the first time this program is
run, it takes about a second or so, but every invocation following
that is lightning fast.
So, as you can see, Perl can easily get ``down to C level'' (just like this cruise ship I'm on). And Eric Raymond says this is the one thing that Perl got right. I tend to think it's a bit more than that. By the way, if you want to hack Perl with experts, be sure to check out the upcoming Perl Geek Cruise on the web site. I'll be there, coding on the high seas. Until next time, enjoy!