PerlPhrasebook - Python Wiki

(Based on [http://llama.med.harvard.edu/~fgibbons/PerlPythonPhrasebook.html an original] by the late Jak Kirman.)

Introduction

This phrasebook contains a collection of idioms, various ways of accomplishing common tasks, tricks and useful things to know, in Perl and Python side-by-side. I hope this will be useful for people switching from Perl to Python, and for people deciding which to choose. The first part of the phrasebook is based on Tom Christiansen's [http://www.perl.com/perl/pdsc/ Perl Data Structures Cookbook].

I have only been working on this for a short time, so many of the translations could probably be improved, and the format could be greatly cleaned up.

I will get the data-structures cookbook translated first and then go back to clean up the code. Also, since I have been using Python for far less time than Perl, there are certainly idioms I don't know or that I will misuse. Please feel free to fix and update.

Other references: [http://pleac.sourceforge.net/ PLEAC].

Thanks to David Ascher, Guido van Rossum, Tom Christiansen, Larry Wall and Eric Daniel for helpful comments.

TODO:

break up into multiple smaller pages
use modern Python idioms
use modern Perl idioms
add more points of comparison
Use sorted() where appropriate once 2.4 has been out a while.
Get rid of map() where possible.
Simple types (strings, lists, dictionaries, etc.)
Common tasks (reading from a file, exception handling, splitting strings, regular expression manipulation, etc.)
Sections 4 and 5 of the Perl Data Structures Cookbook.
Vertical whitespace needs fixing.

QUESTIONS:

Should function and data structure names for python code be in python_style (and more appropriate/informative)?

The obvious

Python don't need no steenking semicolons.

The not so obvious

There are many Integrated Development Environments, (IDEs), for Python that are usually recommended to new users and used by seasoned Python programmers alike. The Idle IDE is a TK based GUI providing language-aware editing, debugging and command line shell for Python that is part of the Python distribution. Many of the python examples shown can be experimented with in the Idle IDE.

Simple types

Strings

Creating a string

Perl:

$s = 'a string';

Python:

s = 'a string'

Note that string variables in Perl are specified with a dollar sign; in Python you just specify the name of the variable.

Larry Wall points out:

This is rather oversimplifying what is going on in both Perl and Python. The $ in Perl indicates a scalar variable, which may hold a string, a number, or a reference. There's no such thing as a string variable in Python, where variables may only hold references. You can program in a Pythonesque subset of Perl by restricting yourself to scalar variables and references. The main difference is that Perl doesn't do implicit dereferencing like Python does.

Quoting

Perl:

$s1 = "some string";
$s2 = "a string with\ncontrol characters\n";
$s3 = 'a "quoted" string';
$s4 = "a 'quoted' string";
$s5 = qq/a string with '" both kinds of quotes/;
$s6 = "another string with '\" both kinds of quotes";
$s7 = 'a stri\ng that au\tomatically escapes backslashes';

for $i ($s1, $s2, $s3, $s4, $s5, $s6, $s7)
{
  print ("$i\n");
}

Python:

s1 = "some string"
s2 = "a string with\ncontrol characters\n"
s3 = 'a "quoted" string'
s4 = "a 'quoted' string"
s5 = '''a string with '" both kinds of quotes'''
s6 = "another string with '\" both kinds of quotes"
s7 = r"a stri\ng that au\tomatically escapes backslashes"

for i in (s1, s2, s3, s4, s5, s6, s7):
  print i

In both languages, strings can be single-quoted or double-quoted. In Python, there is no difference between the two except that in single- quoted strings double-quotes need not be escaped by doubling them, and vice versa. In Perl, double-quoted strings have control characters and variables interpolated inside them (see below) and single-quoted strings do not.

Both languages provide other quoting mechanisms; Python uses triple quotes (single or double, makes no difference) for multi-line strings; Python has the r prefix (r"" or r'' or r"""""" or r'''''') to indicate strings in which backslash is automatically escaped -- highly useful for regular expressions. Perl has very elaborate (and very useful) quoting mechanisms; see the operators q, qq, qw, qx, etc. in the PerlManual.

Quoting is definitely one of the areas where Perl excels.

Interpolation

Perl:

$name = "Fred";
$header1 = "Dear $name,";
$title = "Dr.";
$header2 = "Dear $title $name,";

print "$header1\n$header2\n";

Python:

name = "Fred"
header1 = "Dear %s," % name
title = "Dr."
header2 = "Dear %(title)s %(name)s," % vars()

print header1
print header2

Perl's interpolation is much more convenient, though slightly less powerful than Python's % operator. Remember that in Perl variables are interpolated within double-quoted strings, but not single-quoted strings.

Perl has a function sprintf that behaves similarly to Python's % operator; the above lines could have been written:

$name = "Fred";
$header1 = sprintf ("Dear %s,", $name);
$title = "Dr.";
$header2 = sprintf ("Dear %s %s,", $name, $title);

Python's % (format) operator is generally the way to go when you have more than minimal string formatting to do (you can use + for concatenation, and [:] for slicing). It has three forms. In the first, there is a single % specifier in the string; the specifiers are roughly those of C's sprintf. The right-hand side of the format operator specifies the value to be used at that point:

x = 1.0/3.0
s = 'the value of x is roughly %.4f' % x

If you have several specifiers, you give the values in a list on the right hand side:

x = 1.0/3.0
y = 1.0/4.0
s = 'the value of x,y is roughly %.4f,%.4f' % (x, y)

Finally, you can give a name and a format specifier:

x = 1.0/3.0
y = 1.0/4.0
s = 'the value of x,y is roughly %(x).4f,%(y).4f' % vars()

The name in parentheses is used as a key into the dictionary you provide on the right-hand side; its value is formatted according to the specifier following the parentheses. Some useful dictionaries are locals() (the local symbol table), globals() (the global symbol table), and vars() (equivalent to locals() except when an argument is given, in which case it returns arg.__dict__).

[http://www.python.org/peps/pep-0215.html PEP215] proposes a $"$var" substitution mode as an alternative to "%(var)s" % locals(), but seems to be losing traction to the explicit Template class proposed in [http://www.python.org/peps/pep-0292.html PEP292], which requires no syntax changes.

Modifying a string

$s1 = "new string";        # change to new string
$s2 =~ s/\n/[newline]/g;   # substitute newlines with the text "[newline]"
substr($s2, 0, 3) = 'X';   # replace the first 3 chars with an X

print ("$s1\n$s2\n");

s1 = "new string"          # change to new string
                           # substitute newlines with the text "[newline]"
s2 = s2.replace("\n", "[newline]")
s2 = 'X' + s2[3:]

print s1
print s2

In Perl, strings are mutable; the third assignment modifies s2. In Python, strings are immutable, so you have to do this operation a little differently, by slicing the string into the appropriate pieces.

A Python string is just an array of characters, so all of the array operations are applicable to strings. In particular, if a is an array, a[x:y] is the slice of a from index x up to, but not including, index y. If x is omitted, the slice starts at the beginning of the array; if y is omitted, the slice ends at the last element. If either index is negative, the length of the array is added to it.

In Perl, slicing is performed by giving the array a list of indicies to be included in the slice. This list can be any arbitrary list and by using the range operator ..., you can get Python like slicing. If any of the indices in the list is out of bounds an undef is inserted there.

@array = ('zero', 'one', 'two', 'three', 'four')  

# slicing with range operator to generate slice index list
@slice = @array[0..2]  # returns ('zero', 'one', 'two')

# Using arbitary index lists
@slice = @array[0,3,2] # returns ('zero', 'three', 'two')
@slice = @array[0,9,1] # returns ('zero', undef, 'one')

Note: Perl range operator uses a closed interval.

Importing

use Module;

use Module (symbol1, symbol2, symbol3);
# or use Module qw(symbol1 symbol2 symbol3);

from module import symbol1, symbol2, symbol3

# Allows mysymbol.func()
from module import symbol1 as mysymbol

# Unless the module is specifically designed for this kind of import, don't use it
from module import *

I need to figure out the precise differences here. Roughly, from..import * and use Module mean import the entire namespace; the other versions import only selected names.

require Module;

Module::func();

import module

module.func()

This "loads" the specified module, executing any initialization code. It does not modify the namespace. In order to access symbols in the module, you have to explicitly qualify the name, as shown.

Common tasks

Reading a file as a list of lines

$filename = "cooktest1.1-1";
open (F, $filename) or die ("can't open $filename: $!\n");
@lines = <F>;

filename = "cooktest1.1-1"
try:
    f = open(filename)
except IOError:
    sys.stderr.write("can't open %s: %s %s\n" %
                                   (filename, sys.exc_type, sys.exc_value))
lines = f.readlines()

In Perl, variables are always preceded by a symbol that indicates their type. A $ indicates a simple type (number, string or reference), an @ indicates an array, a % indicates a hash (dictionary), and an & indicates a function.

In Python, objects must be initialized before they are used, and the initialization determines the type. For example, a = [] creates an empty array a, d = {} creates an empty dictionary.

looping over files given on the command line or stdin

The useful perl idiom of:

while (<>) {
    ...                 # code for each line
}

loops over each line of every file named on the commandline when executing the script; or, if no files are named, it will loop over every line of the standard input file descriptor.

The Python fileinput module does a similar task:

import fileinput
for line in fileinput.input():
    ...                 # code to process each line

The fileinput module also allows inplace editing or editing with the creation of a backup of the files, and a different list of files can be given insteaad of taking the command line arguments.

Some general comparisons

This section is under construction; for the moment I am just putting random notes here. I will organize them later.

Perl's regular expressions are much more accessible than those of Python being embedded in Perl syntax in contrast to Pythons import of its re module.
Perl's quoting mechanisms are more powerful than those of Python.
I find Python's syntax much cleaner than Perl's
I find Perl's syntax too flexible, leading to silent errors. The -w flag and use strict helps quite a bit, but still not as much as Python.
I like Python's small core with a large number of standard libraries. Perl has a much larger core, and though many libraries are available, since they are not standard, it is often best to avoid them for portability.
Python's object model is very uniform, allowing you, for example, to define types that can be used wherever a standard file object can be used.
Python allows you to define operators for user-defined types. The operator overloading facility in Perl is provided as an add-on---the overload module.

Lists of lists

The perl code in this section is taken, with permission, almost directly from Tom Christiansen's [http://www.perl.com/perl/pdsc/ Perl Data Structures Cookbook], part 1, release 0.1, with a few typos fixed.

Lists of lists: preliminaries

sub printSep {   print ("=" x 60, "\n"); }

sub printLoL
{
  my ($s, $lol) = @_;
  print ("$s\n");
  for $l (@$lol)
  {
    print (join (" ", @$l));
    print ("\n");
  }
  printSep();
}

# which is longhand for:
sub printLoL {
        print $_[0] . "\n";
        print join(" ", @$_) . "\n" foreach (@{$_[1]});
        printSep();
}

# or even:
sub printLoL {
        print $_[0] . "\n", map(join(" ", @$_) . "\n" , @{$_[1]}), "=" x 60 . "\n";
}

# return numeric (or other) converted to string
sub somefunc {  my ($i) = shift;  "$i";  }

def printSep():
    print '=' * 60

def printLoL(s, lol):
    out = [s] + [' '.join(str(elem)) for elem in lol]
    print '\n'.join(out)
    printSep()

def somefunc(i):
    return str(i)  # string representation of i

printLoL pretty-prints a list of lists.

printSep prints a line of equal signs as a separator.

somefunc is a function that is used in various places below.

Lost in the translation

In converting Perl examples so directly to Python, whilst initially useful, the casual browser should be aware that the task of printLoL is usually accomplished by just

  print lol

As Python can print default string representations of all objects.

An import of the pprint at the beginning of a module would then allow

  pprint(lol)

to substitute for all cases of printLol in a more 'pythonic' way. (pprint gives even more formatting options when printing data structures).

requires/imports

import sys

Perl's use is roughly equivalent to Python's import.

Perl has much more built in, so nothing here requires importing.

"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." - Jamie Zawinski

For many simple operations, Perl will use a regular expression where Pythonic code won't. Should you really need to use regular expressions, import the re module.

Declaration of a list of lists

@LoL = (
       [ "fred", "barney" ],
       [ "george", "jane", "elroy" ],
       [ "homer", "marge", "bart" ],
     );
@LoLsave = @LoL; # for later

printLoL ('Families:', \@LoL);

LoL = [["fred", "barney"],
       ["george", "jane", "elroy"],
       ["homer", "marge", "bart"]]
LoLsave = LoL[:] # See comment below

printLoL('Families:', LoL)

In Python, you are always dealing with references to objects. If you just assign one variable to another, e.g.,

a = [1, 2, 3]
b = a

you have just made b refer to the same array as a. Changing the values in b will affect a.

Sometimes what you want is to make a copy of a list, so you can manipulate it without changing the original. In this case, you want to make a new list whose elements are copies of the elements of the original list. This is done with a full array slice --- the start of the range defaults to the beginning of the list and the end defaults to the end of the list, so

a = [1, 2, 3]
b = a[:]

makes a separate copy of a.

Note that this is not necessarily the same thing as a deep copy, since references in the original array will be shared with references in the new array:

a = [ [1, 2, 3], [4, 5, 6] ]
b = a[:]
b[0][0] = 999
print a[0][0]   # prints 999

You can make a deep copy using the copy module:

import copy

a = [[1, 2, 3], [4, 5, 6]]
b = copy.deepcopy(a)
b[0][0] = 999
print a[0][0]   # prints 1

Generation of a list of lists

Reading from a file line by line

open (F, "cookbook.data1");
@LoL = ();
while ( <F> ) {
    push @LoL, [ split ];
}

printLoL ("read from a file: ", \@LoL);

LoL = []
for line in open('cookbook.data1'):
    LoL.append(line[:-1].split())
printLoL('read from a file: ', LoL)

Unless you expect to be reading huge files, or want feeback as you read the file, it is easier to slurp the file in in one go.

In Perl, reading from a file-handle, e.g., <STDIN>, has a context-dependent effect. If the handle is read from in a scalar context, like $a = <STDIN>;, one line is read. If it is read in a list context, like @a = <STDIN>;the whole file is read, and the call evaluates to a list of the lines in the file.

Reading from a file in one go

open (F, "cookbook.data1");

@LoL = map { chop; [split]; } <F>;

printLoL ("slurped from a file: ", \@LoL);

LoL = [line[:-1].split() for line in open('cookbook.data1')]
printLoL("slurped from a file: ", LoL)

Thanks to Adam Krolnik for help with the perl syntax here.

Filling a list of lists with function calls

for $i ( 0 .. 9 ) {
    $LoL[$i] = [ somefunc($i) ];
}
printLoL("filled with somefunc:", \@LoL);

LoL = [0] * 10  # populate the array -- see comment below

for i in range(10):
    LoL[i] = [ somefunc(i) ]

printLoL('filled with somefunc:', LoL)

Alternatively, you can use a list comprehension:

LoL = [somefunc(i) for i in range(10)]
printLoL('filled with somefunc:', LoL)

In python:

You have to populate the matrix -- this doesn't happen automatically in Python.
It doesn't matter what type the initial elements of the matrix are, as long as they exist.

Filling a list of lists with function calls, using temporaries

for $i ( 0 .. 9 ) {
    @tmp = somefunc($i);
    $LoL[$i] = [ @tmp ];
}

printLoL ("filled with somefunc via temps:", \@LoL);

for i in range(10):
    tmp = [ somefunc(i) ]
    LoL[i] = tmp

printLoL('filled with somefunc via temps:', LoL)

@LoL = map { [ somefunc($_) ] } 0..9;
printLoL ('filled with map', \@LoL);

LoL = map(lambda x: [ somefunc(x) ], range(10))
printLoL('filled with map', LoL)

Both Perl and Python allow you to map an operation over a list, or to loop through the list and apply the operation yourself.

I don't believe it is advisable to choose one of these techniques to the exclusion of the other --- there are times when looping is more understandable, and times when mapping is. If conceptually the idea you want to express is "do this to each element of the list", I would recommend mapping because it expresses this precisely. If you want more precise control of the flow during this process, particularly for debugging, use loops.

Tom Christiansen suggests that it is often better to make it clear that a function is being defined, by writing:

@LoL = map { [ somefunc($_) ] }, 0..9;

Rather than

@LoL = map ({[ somefunc($_) ]}, 0..9);

@LoL = map ( [ somefunc($_) ] , 0..9);

Adding to an existing row in a list of lists

@LoL = @LoLsave;  # start afresh
push @{ $LoL[0] }, "wilma", "betty";
printLoL ('after appending to first element:', \@LoL);

LoL = LoLsave[:]  # start afresh
LoL[0] += ["wilma", "betty"]
printLoL('after appending to first element:', LoL)

In python, the + operator is defined to mean concatenation for sequences. An alternative to the above code is to append each element of the list to LoL[0]:

LoL[0].append("wilma")
LoL[0].append("betty")

Accessing elements of a list of lists

One element

$LoL[0][0] = "Fred";
print ("first element is now $LoL[0][0]\n");
printSep();

LoL[0][0] = "Fred"
print 'first element is now', LoL[0][0]
printSep()

Page

User

Introduction

The obvious

The not so obvious

Simple types

Strings

Creating a string

Quoting

Interpolation

Modifying a string

Importing

Common tasks

Reading a file as a list of lines

looping over files given on the command line or stdin

Some general comparisons

Lists of lists

Lists of lists: preliminaries

Lost in the translation

requires/imports

Declaration of a list of lists

Generation of a list of lists

Reading from a file line by line

Reading from a file in one go

Filling a list of lists with function calls

Filling a list of lists with function calls, using temporaries

Adding to an existing row in a list of lists

Accessing elements of a list of lists

One element

Another element