chap07 2 common higher-order functions

Currying

Common Higher-Order Functions

Probably the two most fundamental higher-order functions for any list or other kind of sequence are analogs of map() and grep(). map() and grep() are higher-order functions because each of them takes an argument which is itself another function. We've already seen versions of map() and grep() for iterators and streams. Perl's standard map() and grep() each take a function and a list and return a new list, for example

        map { $_ * 2 } (1..5);           # returns 2, 4, 6, 8, 10
        grep { $_ % 2 == 0 } (1..10);    # returns 2, 4, 6, 8, 10

Often it's more convenient to have curried versions of these:

Download code for cmap

        sub cmap (&) {
          my $f = shift;
          my $r = sub {
            my @result;
            for (@_) {
              push @result, $f->($_);
            }
            @result;
          };
          return $r;
        }

=test cmap-oops 1

        BEGIN { do "cmap" }
        my @a;
        # my @empty = cmap { $_ * 2 } @a;
        SKIP: {
          skip "this was to check if poymorphic cmap failed on an empty list", 1;
          is(scalar(@empty), 0, "oops");
        }

=endtest

Download code for cgrep

        sub cgrep (&) {
          my $f = shift;
          my $r = sub {
            my @result;
            for (@_) {
              push @result, $_ if $f->($_);
            }
            @result;
          };
          return $r;
        }

These should be called like this:

        $double = cmap { $_ * 2 };
        $find_slashdot = cgrep { $_->{referer} =~ /slashdot/i };

After which $double->(1..5) returns (2, 4, 6, 8, 10) and $find_slashdot->(weblog()) returns the weblog records that represent referrals from Slashdot.

It may be tempting to try to make cmap() and cgrep() polymorphic, as we did with slope(). (I was tempted, anyway.)

        sub cmap (&;@) {
          my $f = shift;
          my $r = sub {
            my @result;
            for (@_) {
              push @result, $f->($_);
            }
            @result;
          };
          return @_ ? $r->(@_) : $r;
        }

Then we would also be able to use cmap() and cgrep() like regular map() and grep():

        @doubles = cmap { $_ * 2 } (1..5);
        @evens = cgrep { $_ % 2 == 0 } (1..10);

Unfortunately, this apparently happy notation hides an evil surprise:

        @doubles = cmap { $_ * 2 } @some_array;

If @some_array is empty, @doubles is assigned a reference to a doubling function.

=test cmapgrep 2

        BEGIN {
        do 'cmap';
        do 'cgrep';
        }

        $double = cmap { $_ * 2 };
        $find_evens = cgrep { $_ % 2 == 0 };

        is_deeply( [$double->(1..5)], [2,4,6,8,10] );
        is_deeply( [$find_evens->(1..10)], [2,4,6,8,10] );

        # SKIP: {
        #   skip "you took out the stuff about polymorphic cmap and cgrep", 2;
        #   @doubles = cmap { $_ * 2 }      (1..5);
        #   @evens = cgrep  { $_ % 2 == 0 } (1..10);
        #
        #   is_deeply( \@doubles, [2,4,6,8,10] );
        #   is_deeply( \@evens, [2,4,6,8,10] );
        # }

=endtest cmapgrep

Automatic Currying

We've written the same code several times to implement curried functions:

        sub some_curried_function {
          my $first_arg = shift;
          my $r = sub { 
            ...
          };
          return @_ ? $r->(@_) : $r;
        }

(Possibly with the poymorphism trick omitted from the final line.)

As usual, once we recognize this pattern, we should see if it makes sense to abstract it into a function:

Download code for Curry.pm

        package Curry;
        use base 'Exporter';
        @EXPORT = ('curry');
        @EXPORT_OK = qw(curry_listfunc curry_n);

        sub curry_listfunc {
          my $f = shift;
          return sub { 
            my $first_arg = shift;
            return sub { $f->($first_arg, @_) };
          };
        }

        sub curry {
          my $f = shift;
          return sub { 
            my $first_arg = shift;
            my $r = sub { $f->($first_arg, @_) };
            return @_ ? $r->(@_) : $r;
          };
        }

        1;

curry() takes any function and returns a curried version of that function. For example, consider the imap() function from Chapter ???:

        sub imap (&$) {
          my ($transform, $it) = @_;
          return sub {
            my $next = NEXTVAL($it);
            return unless defined $next;
            return $transform->($next);
          }
        }

imap() is analogous to map(), but operates on iterators rather than on lists. We might use it like this:

        my $doubles_iterator = imap { $_[0] * 2 } $it;

If we end up doubling a lot of iterators, we have to repeat the {$_[0] * 2} part:

        my $doubles_a = imap { $_[0] * 2 } $it_a;
        my $doubles_b = imap { $_[0] * 2 } $it_b;
        my $doubles_c = imap { $_[0] * 2 } $it_c;

We might wish we had a single, special purpose function for doubling every element of an iterator, so we could write instead

        my $doubles_a = double $it_a;
        my $doubles_b = double $it_b;
        my $doubles_c = double $it_c;

Or even

        my ($doubles_a, $doubles_b, $doubles_c) 
          = map double($_), $it_a, $it_b, $it_c;

If we had written imap() in a curried style, we could have done

        *double = imap { $_[0] * 2 };

but we didn't, so we can't. But that's no problem, because curry() will manufacture a curried version of imap() on the fly:

        *double = curry(\&imap)->(sub { $_[0] * 2 });

Since the curried imap() function came in handy once, perhaps we should keep it around in case we need it again:

        *c_imap = curry(\&imap);

Then to manufacture double() we do:

        *double = c_imap(sub { $_[0] * 2 });

=test curry

        use Curry;

        # easier to just drop this here, so we know what version we've got
            sub imap (&$) {
              my ($transform, $it) = @_;
              return sub {
                my $next = $it->();
                return unless defined $next;
                return $transform->($next);
              }
            }
            sub upto {
              my ($m, $n) = @_;
              return sub {
                return $m <= $n ? $m++ : undef;
              };
            }

        *c_imap = curry(\&imap);
        *double = c_imap(sub { $_[0] * 2 });
        my $it = upto(1,4);
        my $doubleit = double($it);
        is($doubleit->(),2);
        is($doubleit->(),4);
        is($doubleit->(),6);
        is($doubleit->(),8);
        is($doubleit->(),undef);

=endtest curry

Prototypes

The only drawback of this approach is that we lose imap()'s pretty calling syntax, which is enabled by the (&@) prototype at compile time. We can get it back, although the results are somewhat peculiar. First, we modify curry() so that the function it manufactures has the appropriate prototype:

        sub curry {
          my $f = shift;
          return sub (&;@) { 
            my $first_arg = shift;
            my $r = sub { $f->($first_arg, @_) };
            return @_ ? $r->(@_) : $r;
          };
        }

Then we call curry() at compile time instead of at run time:

        BEGIN { *c_imap = curry(\&imap); }

Now we can say

        *double = c_imap { $_[0] * 2 };

and we can still use c_imap() in place of regular imap():

        $doubles_a = c_imap { $_[0] * 2 } $it_a;

Prototype Problems

The problem with this technique is that the prototype must be hardwired into curry(), so now it will only generate curried functions with the prototype (&;@). This isn't a problem for functions like c_imap() or c_grep(), which would have had that prototype anyway. But that prototype is inappropriate for the curried version of the scale() function from Chapter ???. The uncurried version was:

        sub scale {
          my ($s, $c) = @_;
          $s->transform(sub { $_[0]*$c });
        }

curry(\&scale) returns a function that behaves like this:

        sub { 
            my $s = shift;
            my $r = sub { scale($s, @_) };
            return @_ ? $r->(@_) : $r;
        }

The internals of this function are correct, and it will work just fine, as long as it doesn't have a (&;@) prototype. Such a prototype would be inappropriate, since the function is expecting to get one or two scalar arguments. The correct prototype would be ($;$). But if we did:

        BEGIN { *c_scale = curry(\&scale) }

then the resulting c_scale() function wouldn't work, because it would have a (&;@) prototype when we expectedto call it as though it had a ($;$) prototype. We want to call it in one of these two ways:

        my $double = c_scale(2);
        my $doubled_it = c_scale(2, $it);

but because c_scale() would have a prototype of (&;@), these both would be syntax errors, yielding:

        Type of arg 1 to main::c_scale must be block or sub {} (not
        constant item)...

This isn't a show-stopper. This works:

        *c_scale = curry(\&scale);
        my $double = c_scale(2);
        my $doubled_it = c_scale(2, $it);

Here the call to c_scale() is compiled, with no prototype, before *c_scale is assigned to; the call to curry() that sets up the bad prototype occurs too late to foul up our attempt to (correctly) call c_scale().

But now we have a somewhat confusing situation. Our curry() function creates curried functions with (&;@) prototypes, and these prototypes may be inappropriate. But the prototypes are inoperative unless curry() is called in a BEGIN block. To add to the confusion, this doesn't work:

        *c_scale = curry(\&scale);
        my $double = eval 'c_scale(2)';

because, once again, the call to c_scale() has been compiled after the prototype was set up by curry().

There isn't really any easy way to fix this. The obvious thing to do is to tell curry() what prototype we desire by supplying it with an optional parameter:

        # Doesn't really work
        sub curry {
          my $f = shift;
          my $PROTOTYPE = shift;
          return sub ($PROTOTYPE) { 
            my $first_arg = shift;
            my $r = sub { $f->($first_arg, @_) };
            return @_ ? $r->(@_) : $r;
          };
        }

Unfortunately, this is illegal; ($PROTOTYPE) does not indicate that the desired prototype is stored in $PROTOTYPE. Perl 5.8.1 provides a Scalar::Util::set_prototype function to set the prototype of a particular function:

Download code for curry.set_prototype

        # Doesn't work before 5.8.1
        use Scalar::Util 'set_prototype';
        
        sub curry {
          my $f = shift;
          my $PROTOTYPE = shift;
          set_prototype(sub { 
            my $first_arg = shift;
            my $r = sub { $f->($first_arg, @_) };
            return @_ ? $r->(@_) : $r;
          }, $PROTOTYPE);
        }

If you don't have 5.8.1 yet, the only way to dynamically specify the prototype of a function is to use string eval:

Download code for curry.eval

        sub curry {
          my $f = shift;
          my $PROTOTYPE = shift;
          $PROTOTYPE = "($PROTOTYPE)" if defined $PROTOTYPE;
          my $CODE = q{sub PROTOTYPE { 
                         my $first_arg = shift;
                         my $r = sub { $f->($first_arg, @_) };
                         return @_ ? $r->(@_) : $r;
                       }};
          $CODE =~ s/PROTOTYPE/$PROTOTYPE/;
          eval $CODE;
        }

More Currying

We can extend the idea of curry() and build a function that generates a generic curried version of another function:

=contlisting Curry.pm

        sub curry_n {
          my $N = shift;
          my $f = shift;
          my $c;
          $c = sub {
            if (@_ >= $N) { $f->(@_) }
            else {
              my @a = @_;
              curry_n($N-@a, sub { $f->(@a, @_) });
            }
          };
        }

curry_n() takes two arguments: a number N, and a function f, which expects at least N arguments. The result is a new function, c, which does the same thing f does, but which accepts curried arguments. If c is called with N or more arguments, it just passes them on to f and returns the result. If there are fewer than N arguments, c generates a new function that remembers the arguments that were passed; if this new function is called with the remaining arguments, both old and new arguments are given to f. For example:

        *add = curry_n(2, sub { $_[0] + $_[1] });

And now we can call

        add(2, 3);      # Returns 5

or:

        *increment = add(1);
        increment(8);   # return 9

Or perhaps more realistically:

        *csubstr = curry_n(3, sub { defined $_[3] ?
                                       substr($_[0], $_[1], $_[2], $_[3]) :
                                       substr($_[0], $_[1], $_[2]) });

Then we can use any of:

        # Just like regular substr

        $ss = csubstr($target, $start, $length);
        csubstr($target, $start, $length, $replacement);

        # Not just like regular substr

        $target = "I like pie";

        # This '$part' function gets two arguments: a start position
        # and a length; it returns the apporpriate part of $target.

        $part = csubstr($target);
        my $ss = $part->($start, $length);  

        # This function gets an argument N and returns that many characters
        # from the beginning of $target.

        $first_N_chars = csubstr($target, 0);
        my $prefix_3 = $first_N_chars->(3);     # "I l"
        my $prefix_7 = $first_N_chars->(7);     # "I like "

=test curry_n

        use Curry 'curry_n';

         *add = curry_n(2, sub { $_[0] + $_[1] });

        is(add(2, 3), 5);

        *increment = add(1);
        is(increment(8),9);

        *csubstr = curry_n(3, sub { defined $_[3] ?
                                    substr($_[0], $_[1], $_[2], $_[3]) :
                                    substr($_[0], $_[1], $_[2]) });
        {
        my $target = "I like pie";
        is(csubstr($target, 2, 4), "like");
        is(csubstr($target, 2, 4, "eat"), "like");
        is($target, "I eat pie");
        }
        { # prove it works with substr too
        my $target = "I like pie";
        is(substr($target, 2, 4), "like");
        is(substr($target, 2, 4, "eat"), "like");
        is($target, "I eat pie");
        }

        # This '$part' function gets two arguments: a start position
        # and a length; it returns the apporpriate part of $target.
        {
        my $target = "I like pie";
        my $part = csubstr($target);
        is($part->(2,4), "like");

        my $ss = $part->(2,4,"eat");
        is($ss,"like");
        # hrm.  i expected this to work like the previous ones. - rspier
        # hrm.  me too. - mjd
        ### is($target,"I eat pie");
        }

        {
        my $target = "I like pie";
        my  $first_N_chars = csubstr($target, 0);
        is($first_N_chars->(3), "I l");
        is($first_N_chars->(7), "I like ");
        }

=endtest curry_n

Yet More Currying

Many of the functions we saw earlier in the book would benefit from currying. For example, dir_walk() from Chapter ???:

--- dir_walk_callbacks ??? ---

Here we specify a top directory and two callback functions. But the callback functions are constant through any call to dir_walk(), and we might like to specify them in advance, because we might know them well before we know what directories we want to search. The conversion is easy:

Download code for dir_walk_curried

         sub dir_walk {
          unshift @_, undef if @_ < 3;
          my ($top, $filefunc, $dirfunc) = @_;

          my $r;
          $r = sub {
            my $DIR;
            my $top = shift;
            if (-d $top) {
              my $file;
              unless (opendir $DIR, $top) {
                warn "Couldn't open directory $code: $!; skipping.\n";
                return;
              }

              my @results;
              while ($file = readdir $DIR) {
                next if $file eq '.' || $file eq '..';
                push @results, $r->("$top/$file");
              }
              return $dirfunc->($top, @results);
            } else {
              return $filefunc->($top);
            }
          };
          defined($top) ? $r->($top) : $r;
        }

We can still call dir_walk($top, $filefunc, $dirfunc) and get the same result, or we can omit the $top argument (or pass undef) and get back a specialized file-walking function. As a minor added bonus, the recursive call will be fractionally more efficient because the callback arguments don't need to be explicitly passed.

=test dir_walk_curried 3

        do "dir_walk_curried";  
        my @RESULT;
        sub accumulate { @_ }
        my $TOP = "Tests/TESTDIR";
        my @items = ($TOP, 
                    qw(a a/a1 a/a2 b b/b1 c c/c1 c/c2 c/c3 c/d c/d/d1 c/d/d2));

        @RESULT = dir_walk($TOP, \&accumulate, \&accumulate);
        s{^$TOP/}{}o for @RESULT;
        print "# @RESULT\n";
        is_deeply(\@RESULT, \@items,  "uncurried version");
        
        my $DW = dir_walk(\&accumulate, \&accumulate);
        @RESULT = $DW->($TOP);
        s{^$TOP/}{}o for @RESULT;
        print "# @RESULT\n";
        is_deeply(\@RESULT, \@items,  "curried version");

        $DW = dir_walk(undef, \&accumulate, \&accumulate);
        @RESULT = $DW->($TOP);
        s{^$TOP/}{}o for @RESULT;
        print "# @RESULT\n";
        is_deeply(\@RESULT, \@items,  "curried version");

=endtest

TOP