Copyright Notice

This text is copyright by CMP Media, LLC, and is used with their permission. Further distribution or use is not permitted.

This text has appeared in an edited form in SysAdmin/PerformanceComputing/UnixReview magazine. However, the version you are reading here is as the author originally submitted the article for publication, not after their editors applied their creativity.

Please read all the information in the table of contents before using this article.

Unix Review Column 68 (Jan 2007)

[Suggested title: ``Show me your references'']

I spend a lot of time (perhaps some would say ``far too much'') on chat channels helping out Perl beginners, and even the occasional expert. One of the things that still amazes me is the absolute gibberish that people come up with while trying to construct references and then dereferencing them. It's like they just stomp on the top row of the keyboard, and then hand that to Perl and say ``here, interpret this''.

So, I thought that this month I'd go ``back to the basics'', and review the standard forms of creating a reference, and then using those references by dereferencing them.

The first thing to understand about a reference is that it fits wherever a scalar fits, except as the key of a hash. So we can put a reference into a scalar variable, or as an element of a list, or as a value within an array or a hash. We can also pass those lists containing references to and from subroutines. Packages like Storable and Data::Dumper can take complex bundles that include references and safely serialize and restore them.

One way to create a reference is to put a backslash in front of an existing variable or subroutine name. For example, I can create scalar, array, hash, and subroutine references from existing things like so:

  my $scalar_ref = \$scalar;
  my $array_ref = \@array;
  my $hash_ref = \%hash;
  my $code_ref = \&marine;

Note that for the subroutine (``code'') reference, I must include the ampersand in front of the subroutine name.

These references could also have been placed immediately as elements into an array or hash:

  my @refs = (\$scalar, \@array, \%hash, \&marine);
  my %ref_map = (
    scalar => \$scalar,
    array => \@array,
    hash => \%hash,
    code => \&marine,
  );

In this case, all of $refs[2], $ref_map{hash}, and \%hash contain the same reference to the hash %hash.

I can also create a reference to an anonymous array, hash, or subroutine using the anonymous constructor syntax, like so:

  my $array_ref = [3, 4, 5];
  my $hash_ref = { first => 'Randal', last => 'Schwartz', login => 'merlyn' };
  my $code_ref = sub { my $sum = 0; $sum += $_ for @_; return $sum };

In every respect, these references to anonymous items act identically with the references to named items from earlier. (Note that there is no simple syntax to create the rarely needed anonymous scalar.)

To access the original item, I need to dereference the reference to it. In the case of a scalar, array, or hash, a dereference lets me get at the variable to get or set its value. In the case of a code ref, dereferencing generally invokes the corresponding subroutine.

First, let's look at the canonical rule of dereferencing that will always work regardless of how the reference is obtained. We start by taking the syntax as if references aren't involved, such as $some_array[$element]. We then take the name of the item out, and replace it with curly-braces around the expression that gives us a reference. The most simple example is scalar access. Start with a scalar variable:

  $scalar = 42;  # update
  print $scalar; # access

and replace the name (scalar) with curly braces around the thing holding the reference:

  ${$scalar_ref} = 42;  # update via $scalar_ref
  print ${$scalar_ref}; # access via $scalar_ref
  ${$refs[0]} = 42;   # update via $refs[0]
  print ${$refs[0]};  # access via $refs[0]
  ${$ref_map{scalar}} = 42;   # update via $ref_map{scalar}
  print ${$ref_map{scalar}};  # access via $ref_map{scalar}

An array has more access forms, so there are more canonical dereferencing equivalents. First, the non-reference versions:

  @array           # entire array
  $array[$index]   # single element of array
  @array[@indices] # array slice
  $#array          # index of last array element

Again, the canonical rule is the same. Replace the name with curly braces around the thing holding the reference. For $array_ref, this looks like:

  @{$array_ref}           # entire array
  ${$array_ref}[$index]   # single element of array
  @{$array_ref}[@indices] # array slice
  $#{$array_ref}          # index of last array element

And for $refs[1] and $ref_map{array}, it looks like:

  @{$refs[1]}           # entire array
  ${$refs[1]}[$index]   # single element of array
  @{$refs[1]}[@indices] # array slice
  $#{$refs[1]}          # index of last array element

  @{$ref_map{array}}           # entire array
  ${$ref_map{array}}[$index]   # single element of array
  @{$ref_map{array}}[@indices] # array slice
  $#{$ref_map{array}}          # index of last array element

Yes, these are admittedly rather ugly. Luckily, most of them are not common in typical Perl programs. It's important to learn this canonical rule first though, because you can always fall back on them when you get into trouble.

Continuing on, the hash also has a number of access forms:

  %hash        # entire hash
  $hash{$key}  # single element of hash
  @hash{@keys} # hash slice

And the rule is again the same: replace the name of the item with curly braces around the thing holding the reference. For $hash_ref, $refs[2] and $ref_map{hash}, this looks like:

  %{$hash_ref}        # entire hash
  ${$hash_ref}{$key}  # single element of hash
  @{$hash_ref}{@keys} # hash slice

  %{$refs[2]}        # entire hash
  ${$refs[2]}{$key}  # single element of hash
  @{$refs[2]}{@keys} # hash slice

  %{$ref_map{hash}}        # entire hash
  ${$ref_map{hash}}{$key}  # single element of hash
  @{$ref_map{hash}}{@keys} # hash slice

OK, now we have ugly on top of ugly. Ugly squared. I can honestly say that I don't recall ever taking a hash slice of a hash whose hashref came from an element of another hash. But if I did, that last line would be how I would need to do it.

Finally, for code ref dereferencing, we're invoking the subroutine. For the purpose of constructing the canonical form, we'll pretend that subroutine invocations without an ampersand are forbidden:

  &marine        # invoke subroutine passing current @_
  &marine()      # invoke subroutine with no arguments
  &marine(@args) # invoke subroutine passing @args

Again, the rule is the same (see how simple this is?). Replace the name with curly braces around the thing holding the reference:

  &{$code_ref}        # invoke subroutine passing current @_
  &{$code_ref}()      # invoke subroutine with no arguments
  &{$code_ref}(@args) # invoke subroutine passing @args

  &{$refs[3]}        # invoke subroutine passing current @_
  &{$refs[3]}()      # invoke subroutine with no arguments
  &{$refs[3]}(@args) # invoke subroutine passing @args

  &{$ref_map{code}}        # invoke subroutine passing current @_
  &{$ref_map{code}}()      # invoke subroutine with no arguments
  &{$ref_map{code}}(@args) # invoke subroutine passing @args

And that finishes the canonical form. If this was all there was, you could do everything you wanted with references, but they'd be ugly.

Luckily, there are a few syntax optimizations that actually end up applying about 90% of the time. First, you can remove any curly braces you introduced for dereferencing as long as the only thing inside the braces is a simple scalar (not array or hash element, or complex expression). That simplifies some of the items above to the following forms:

  $$scalar_ref = 42;  # update via $scalar_ref
  print $$scalar_ref; # access via $scalar_ref

  @$array_ref           # entire array
  $$array_ref[$index]   # single element of array
  @$array_ref[@indices] # array slice
  $#$array_ref          # index of last array element

  %$hash_ref        # entire hash
  $$hash_ref{$key}  # single element of hash
  @$hash_ref{@keys} # hash slice

  &$code_ref        # invoke subroutine passing current @_
  &$code_ref()      # invoke subroutine with no arguments
  &$code_ref(@args) # invoke subroutine passing @args

Also, as an optimization along a different axis, the most common things to do with arrays and hashes is to access a single element, so you can replace an ugly dereferencing with an equivalent arrow form:

  ${ UGLY_ARRAY_REF_EXPRESSION }[$index] # canonical form for array element
  UGLY_ARRAY_REF_EXPRESSION->[$index]    # arrow form for array element

  ${ UGLY_HASH_REF_EXPRESSION }{$key} # canonical form for hash element
  UGLY_HASH_REF_EXPRESSION->{$index}  # arrow form for hash element

Similarly, invoking a subroutine has an equivalent arrow form (thanks to Chip Salzenberg under my nudging during the 5.004 release cycle):

  ${ UGLY_CODE_REF_EXPRESSION }(@args) # canonical form for code invocation
  UGLY_CODE_REF_EXPRESSION->(@args)    # arrow form for code invocation

The nice thing about these arrows is that they read from left to right. For example, the code ref stored in $hash_map{code} simplifies nicely:

  ${$hash_map{code}}(@args) # canonical form
  $hash_map{code}->(@args)  # arrow form

And that leads us to the final optimization. If as a result of the previous rules for an arrow form, we end up with an arrow between a pair of delimiters for array indices, hash keys, or subroutine arguments, we can drop that arrow:

  ${$refs[1]}[$index] # canonical form
  $refs[1]->[$index]  # arrow form
  $refs[1][$index]    # reduced arrow form

  ${$ref_map{hash}}{$key} # canonical form
  $ref_map{hash}->{$key}  # arrow form
  $ref_map{hash}{$key}    # reduced arrow form

  $hash_map{code}(@args)  # reduced arrow form

And there you have it, the complete set of referencing and dereferencing instructions for Perl version 5. For Perl 6, all the rules are different, of course. So for now, enjoy!

Randal L. Schwartz is a renowned expert on the Perl programming language (the lifeblood of the Internet), having contributed to a dozen top-selling books on the subject, and over 200 magazine articles. Schwartz runs a Perl training and consulting company (Stonehenge Consulting Services, Inc of Portland, Oregon), and is a highly sought-after speaker for his masterful stage combination of technical skill, comedic timing, and crowd rapport. And he's a pretty good Karaoke singer, winning contests regularly.

Schwartz can be reached for comment at merlyn@stonehenge.com or +1 503 777-0095, and welcomes questions on Perl and other related topics.

Worldwide training and consulting by Perl experts

Copyright Notice

Unix Review Column 68 (Jan 2007)