How to Write "Pythonic" Code

Author: Christopher Arndt
Date: 2008-04-13
Location:RuPy Conference Poznań, Poland
Copyright:CC Attribution/Share-Alike
Knowing the syntax and the standard library of Python alone doesn't make one a good Python programmer. In more than a decade of increasing usage and popularity of the language, Pythonistas have developed many typical Python idioms and pythonic ways "to do it". These are either based on the fact that in Python, being a highly dynamic language, many things necessary in more static languages just make no sense or on the fundamental principles laid down in the so-called "Zen of Python". This talks tries to explain these principles and how they translate into actual code.

Acknowledgements

This talk is based to a good part on the slides of David Goodger's talk Code Like a Pythonista. Idiomatic Python resp. Jeff Hinrich's adaptation with the same title, from which I picked the topics I could most relate too and then added a few of my own favorite Python idioms. The presentation is released under the Creative Commons Attribution/Share-Alike License.

The Zen of Python

The Zen of Python

The Zen of Python (1)

Try this at your Python prompt:

>>> import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.

The Zen of Python (2)

The Zen of Python, cont.

In the face of ambiguity, refuse the temptation to guess.
There should be one -- and preferably only one -- obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Coding Style

Beautiful is better than ugly

Coding Style

Programs must be written for people to read, and only incidentally for machines to execute.

— Abelson & Sussman, Structure and Interpretation of Computer Programs

Read PEP 8!

Every Python programmer should know PEP 8:

http://www.python.org/dev/peps/pep-0008/

PEP = Python Enhancement Proposal

The Python community has its own standards for what source code should look like, codified in PEP 8. These standards are different from those of other communities, like C, C++, C#, Java, VisualBasic, etc.

Because indentation and whitespace are so important in Python, the Style Guide for Python Code is as good as a standard.

Most open-source projects and (hopefully) in-house projects follow the style guide quite closely and there are even tools to check whether code adheres to the standard.

Whitespace (1)

  • 4 spaces per indentation level.
  • No hard tabs.
  • Never mix tabs and spaces.
  • One blank line between functions.
  • Two blank lines between classes.

Whitespace (2)

def make_squares(key, value=0):
    """Return a dictionary and a list..."""
    d = {key: value}
    l = [key, value]
    return d, l

Naming Conventions

I never use __private form. And so will probably you.

Long Lines & Continuations

Keep lines below 80 characters in length.

Use implied line continuation inside parentheses/brackets/braces:

def __init__(self, first, second, third,
             fourth, fifth, sixth):
    output = (first + second + third
              + fourth + fifth + sixth)

Use backslashes as a last resort:

VeryLong.left_hand_side \
    = even_longer.right_hand_side()

Backslashes are fragile; they must end the line they're on. If you add a space after the backslash, it won't work any more. Also, they're ugly.

Long Strings (1)

Adjacent literal strings are concatenated by the parser:

>>> print 'o' 'n' "e"
one

The string prefixed with an "r" is a "raw" string. Backslashes are not evaluated as escapes in raw strings. They're useful for regular expressions and Windows filesystem paths.

Note named string objects are not concatenated:

>>> a = 'three'
>>> b = 'four'
>>> a b
  File "<stdin>", line 1
    a b
      ^
SyntaxError: invalid syntax

Long strings (2)

That's because this automatic concatenation is a feature of the Python parser/compiler, not the interpreter. You must use the "+" operator to concatenate strings at run time.

text = ('Long strings can be made up '
        'of several shorter strings.')

The parentheses allow implicit line continuation.

Multiline strings use triple quotes:

"""\
Triple
double
quotes"""

Compound Statements

Good:

if foo == 'blah':
    do_something()
do_one()
do_two()
do_three()

Bad:

if foo == 'blah': do_something()
do_one(); do_two(); do_three()

Docstrings & Comments

Docstrings = How to use code

Comments = Why (rationale) & how code works

Docstrings explain how to use code, and are for the users of your code.
Comments explain why, and are for the maintainers of your code. That includes yourself!

Simple is Better Than Complex

Simple is Better Than Complex

Simple is Better Than Complex

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

— Brian W. Kernighan

In other words, KISS!

General Python Idioms

General Python Idioms

Swap Values

In other languages:

temp = a
a = b
b = temp

In Python:

b, a = a, b

Tuples

We saw that the comma is the tuple constructor, not the parentheses. Example:

>>> 1,
(1,)

The Python interpreter shows the parentheses for clarity, and I recommend you use parentheses too:

>>> (1,)
(1,)

Don't forget the comma!

>>> (1)
1

Interactive "_"

This is a really useful feature that surprisingly few people know.

In the interactive interpreter, whenever you evaluate an expression or call a function, the result is bound to a temporary name, _ (an underscore):

>>> 1 + 1
2
>>> _
2

_ stores the last printed expression.

When a result is None, nothing is printed, so _ doesn't change. That's convenient!

Note

This only works in the interactive interpreter, not within a module.

Building Strings from Substrings

Start with a list of strings and then join() it:

colors = ['red', 'blue', 'green', 'yellow']
print ", ".join(colors)

We want to join all the strings together into one large string. Especially when the number of substrings is large...

Don't do this:

result = ''
for s in colors:
    result += s

Testing for None

Good:

if foo is None:
    do_something()

Maybe:

if not foo:
    do_someting()

Bad:

if foo == None:
    do_something()

Iterating a list

Good:

for i, item in enumerate(mylist):
    if i >= 1:
        print mylist[i-1] + item

Bad:

for i in range(len(mylist)):
    if i >= 1:
        print mylist[i-1] + mylist[i]

Also bad:

i = 0
for item in mylist:
    if i >= 1:
        print mylist[i-1] + item
    i += 1

Use in where possible

Good:

for key in d:
    print key

Dictionary setdefault Method

Dicts have a setdefault method that is very useful to initialise dicts:

navs = {}
for (portfolio, equity, position) in data:
    navs.setdefault(portfolio, 0)
    navs[portfolio] += position * prices[equity]

The setdefault dictionary method returns the default value, and we're taking advantage of setdefault's side effect, that it sets the dictionary value only if there is no value already.

Tip

Python 2.5 has the defaultdict class. Look it up in the standard library reference!

Other languages have "variables" (1)

In many other languages, assigning to a variable puts a value into a box.

int a = 1;
images/a1box.png

Box "a" now contains an integer 1.

Other languages have "variables" (2)

Assigning another value to the same variable replaces the contents of the box:

a = 2;
images/a2box.png

Now box "a" contains an integer 2.

Other languages have "variables" (2)

Assigning one variable to another makes a copy of the value and puts it in the new box:

int b = a;
images/b2box.png images/a2box.png

"b" is a second box, with a copy of integer 2. Box "a" has a separate copy.

Python has "names" (1)

In Python, a "name" or "identifier" is like a parcel tag (or nametag) attached to an object.

a = 1
images/a1tag.png

Here, an integer 1 object has a tag labelled "a".

If we reassign to "a", we just move the tag to another object:

a = 2
images/a2tag.png images/1.png

Python has "names" (2)

If we assign one name to another, we're just attaching another nametag to an existing object:

b = a
images/ab2tag.png

The name "b" is just a second tag bound to the same object as "a".

Default Parameter Values (1)

This is a common mistake that beginners often make. Even more advanced programmers make this mistake if they don't understand Python names.

def bad_append(new_item, a_list=[]):
    a_list.append(new_item)
    return a_list

The problem here is that the default value of a_list, an empty list, is evaluated at function definition time. So every time you call the function, you get the same default value. Try it several times:

>>> print bad_append('one')
['one']

>>> print bad_append('two')
['one', 'two']

Default Parameter Values (2)

Lists are a mutable objects; you can change their contents. The correct way to get a default list (or dictionary, or set) is to create it at run time instead, inside the function:

def good_append(new_item, a_list=None):
    if a_list is None:
        a_list = []
    a_list.append(new_item)
    return a_list

List Comprehensions

List comprehensions ("listcomps" for short) are syntax shortcuts for this general pattern:

The traditional way, with for and if statements:

new_list = []
for item in a_list:
    if condition(item):
        new_list.append(fn(item))

As a list comprehension:

new_list = [fn(item) for item in a_list
            if condition(item)]

Generator Expressions (1)

Let's sum the squares of the numbers up to 100. As a loop:

total = 0
for num in range(1, 101):
    total += num * num

We can use the sum function to quickly do the work for us, by building the appropriate sequence. As a list comprehension:

total = sum([num * num for num in range(1, 101)])

Generator Expressions (2)

As a generator expression:

total = sum(num * num for num in xrange(1, 101))

Rule of thumb:

Generators

Here's a useful generator à la find(1):

def walkfiles(startdir, pattern=None):
    """Return generator for full paths of all files below startdir.

    Optionally filters out files not matching pattern.
    """
    for dir, dirlist, filelist in os.walk(startdir):
        for fname in filelist:
            if pattern and not fnmatch.fnmatch(fname, pattern):
                continue
            yield os.path.join(dir, fname)

Sorting with DSU

DSU = Decorate-Sort-Undecorate

Instead of creating a custom comparison function, we create an auxiliary list that will sort naturally:

alist = [(4, 5), (3, 2), (2, 1), (6, 7)]

# Decorate:
to_sort = [(item[2], item) for item in alist]

# Sort:
to_sort.sort()

# Undecorate:
alist = [item[-1] for item in to_sort]

Sorting with DSU

In Python 2.4 and above, you can use the key parameter to sort to do this in one step.

from operator import itemgetter

alist.sort(key=itemgetter(1))

EAFP vs. LBYL

It's easier to ask forgiveness than permission

Look before you leap

Good:

try:
    return str(x)
except TypeError:
    ...

Bad:

if isinstance(x, basestring):
    do_something(x)

Program structure

Program structure

Program structure

  1. (Shebang)
  2. Source encoding declaration
  3. Module docstring
  4. Imports (stdlib, third-party, private modules)
  5. Global constants and initialization code
  6. Exceptions
  7. Module-level functions
  8. Classes
  9. main function

Command line scripts

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#!/usr/bin/env python

# examples/script-template.py

def main(args):
    if not args:
        print "Usage: foo ARG1 [ARG2...]"
        return 2
    return 0

if __name__ == '__main__':
    import sys
    status = main(sys.argv[1:])
    sys.exit(status)
    # or combined
    # sys.exit(main(sys.argv[1:]))

OO-Programming

OO-Programming

Setters & Getters (1)

Bad:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
class Foo:

    def __init__(self, spamm, eggs):
        self.spamm = spamm
        self.eggs = eggs

    def get_spamm(self):
        return self.spamm

    def set_spamm(self, value):
        self.spamm = value

    # et cetera

f = Foo('bar', 'baz')
myspamm = f.get_spamm()

Setters & Getters (2)

Good:

1
2
3
4
5
6
7
8
class Foo:

    def __init__(self, spamm, eggs):
        self.spamm = spamm
        self.eggs = eggs

f = Foo('bar', 'baz')
myspamm = f.spamm

Setters & Getters (3)

But what if you need to make your attribute dynamic later?

Bad:

1
2
3
4
5
6
7
8
class Foo:
    # ...

    def get_spamm(self):
        return make_spamm()

f = Foo()
myspamm = f.get_spamm()

Setters & Getters (4)

Solution: use the property builtin:

Good:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
class Foo:
    # ...

    def _spamm(self):
        """Return  fresh portion of spamm."""
        return make_spamm()
    spamm = property(_spamm)

f = Foo()
myspamm = f.spamm

Setters & Getters (5)

Or, using property as a decorator:

Very good:

1
2
3
4
5
6
7
8
9
class Foo:
    # ...

    @property
    def spamm(self):
        return make_spamm()

f = Foo()
myspamm = f.spamm

Warning

Both forms will turn spamm into a read-only attribute!

Setters & Getters (6)

The same works for setting attributes:

Bad:

1
2
3
4
5
6
7
8
9
class Foo:

    def set_spamm(self, value):
        if is_valid(value):
            self.spamm = value
        else:
            raise ValueError('I want my spamm!')
f = Foo()
f.set_spamm = "Eggs"

Setters & Getters (7)

Good (again using property):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# examples/property_01.py

class Foo(object):
    def _get_spamm(self):
        return make_spamm()
    def _set_spamm(self, value):
        if is_valid(value):
            self.__dict__['spamm'] = value
        else:
            raise ValueError('I want my spamm!')
    spamm = property(_get_spamm, _set_spamm, None,
        "Tasty spamm")

Static and Class methods

Static methods

Static methods don't receive the instance as the first argument. They can be be thought of as functions living in the namespace of the class. They are similar to the same concept in Java or C++. They are not very useful in Python (just use a normal function instead) but can be used for helper functions in a class, which doesn't need access to self, and are no use outside the class.

1
2
3
4
5
6
class Foo:
    # ...

    @staticmethod
    def _format_name(name):
        return name.strip().replace('_').capitalize()

Class methods (1)

Class methods receive the class object as the first argument, not the instance. It is therefore good practice to name the first parameter cls (class is a keyword!) instead of self. A good use for class methods are factory functions, i.e. alternative, convenient ways to create pre-configured class instances.

Class methods (2)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# examples/classmethod_01.py

class Template:

    def __init__(self, template, **data):
        self.template = template
        self.data = data

    @classmethod
    def from_file(cls, filename, **data):
        """Return Template with template string read
            from filename."""
        return cls(open(filename).read(), data)

    def render(self, **data):
        subst =  self.data.copy()
        subst.update(data)
        return self.template % data

Class methods (3)

Note

Classmethods can be called on the class:

Template.from_file(...)

or the instance with the same effect:

t.from_file(...)

Summary

Summary

Summary

  • Follow PEP 8
  • Read the standard library reference
  • Know your lists, dicts and iterators
  • Import this

How to recognize trees from afar

And now for something completely different...

How to recognize trees from afar

Number 1: The Larch

EggBasket

images/eggbasket_screenshot.png

For more infomation, please visit http://chrisarndt.de/projects/Eggbasket.