The evils of `except:`

September 29, 2011 at 03:34 PM | Python | View Comments

I had some discussion recently about the evils of using a naked except:. Here is a more complete description of the dangers, and of the correct solutions.

In short, except: is bad because hides the source source of the exception and frustrates debugging. For example, consider this code:

try:
    parsed = parse(file_name)
except:
    raise ParseError("can't parse file")

It will likely produce an error something like this:

$ ./my_program.py
Traceback (most recent call last):
    ...
ParseError: can't parse file

This kind of error makes me want to high-five someone. In the face. With a chair.

Notice that it does not contain any information about:

  • The file which caused the error
  • The line which caused the error
  • The nature of the error (is it an expected error? A bug? Who knows!)

And tracking down the source of this error would likely involve some binary searching on the input file or dropping into a debugger.

These are some other, equally unhelpful, bits of code that I have seen:

# There isn't much worse than completely hiding the error
except:
    pass

# Almost as bad is not giving any hit at what it was
except:
    print "there was an error!"

# And even showing the original error can be unhelpful if the error is
# something like an IndexError which could come from anywhere
except Exception, e:
    raise MyException("there was an error: %r" %(e, ))

Now, there is a situations where using a naked except: can be used safely. Exactly one.

1. The except: block is terminated with a raise

For example, when some cleanup needs to be done before leaving the function:

cxn = open_connection()
try:
    use_connection(cxn)
except:
    close_connection(cxn)
    raise

(note that, usually, the finally: block should be used for this kind of cleanup, but there are some situations where the code above makes more sense)

Every other situation should use except Exception, e::

2. A new exception is raised but the original stack trace is used

For example:

try:
    parsed = parse(file_name)
except Exception, e:
    raise ParseError("error parsing %r: %r" %(file_name, e)), None, sys.exc_info()[2]

A few things to note: first, the three expression version of raise is used, the third of which being the current stack trace. This means that the stack trace will point to the original source of the error:

File "my_program.py", line 9, in <module>
  parse(file_name)
File "parser.py", line 2, in parse
  for lineno, line in enumerate(open(file_name), "rb"):
ParseError: error parsing 'input.bin': TypeError("'str' object cannot be interpreted as an index",)

Instead of the (less helpful) line which re-raised the error:

File "my_program.py", line 11, in <module>
  raise ParseError("error parsing %r: %r" %(file_name, e))
ParseError: error parsing 'input.bin': TypeError("'str' object cannot be interpreted as an index",)

Second, the error includes the file name and original exception, which will make debugging significantly easier. When I'm writing particularly fragile code I'll often wrap the entire block in a try/except which will include as much state as is sensible in the error. For example, the main loop of the parse function might be:

def parse(file_name):
    lineno = -1
    current_foo = None
    try:
        f = open(file_name)
        for lineno, line in enumerate(f):
            current_foo = line.split()[0]
            ...
    except Exception, e:
        raise ParseError("error while parsing %r (line %r; current_foo: %r): %r"
                         %(file_name, lineno, current_foo, e)), None, sys.exc_info()[2]

3. The exception and stack trace are logged

For example, the main runloop of an application might be:

while 1:
    try:
        do_stuff()
    except Exception, e:
        log.exception("error in mainloop")
        time.sleep(1)

A few things to note: first, a naked except: should not be used here, as it will also catch KeyboardInterrupt and SystemExit exceptions, which is almost certainly a bad thing.

Second, log.exception is used, which includes a complete stack trace in the log (care should also be taken to make sure that these logs will be checked - for example by sending an email on exception logs).

Third, the time.sleep(1) ensures that the system won't get clobbered if the do_stuff() function immediately raises an exception.

Permalink + Comments

Checking types in Python

September 26, 2011 at 01:53 PM | Python | View Comments

A friend asked me recently when it's acceptable to check types in Python. Here is my reply:

It is almost never a good idea to check that function arguments are exactly the type you expect. For example, these two functions are very, very bad:

def add(a, b):
    if not isinstance(a, int):
        raise ValueError("a is not an int")
    if not isinstance(b, int):
        raise ValueError("b is not an int")
    return a + b

def sum(xs):
    if not isinstance(xs, list):
        raise ValueError("xs is not a list")
    base = 0
    for x in xs:
        base += x
    return base

There's no reason to impose those restrictions, and it makes life difficult if, for example, you want to add floats or sum an iterator:

>>> add(1.2, 3)
...
ValueError("a is not an int")
>>> sum(person.age for person in people)
...
ValueError("xs is not a list")

Type checking to correctly handle different kinds of input is occasionally acceptable, but should be used carefully (ex, to do optimizations, or situations where method overloading would be used in other languages). For example, these functions could be ok:

def contains_all(haystack, needles):
    if not isinstance(haystack, (set, dict)):
        haystack = set(haystack)
    return all(needle in haystack for needle in needles)

def ping_ip(addr):
    if isinstance(addr, numbers.Number):
        addr = numeric_ip_to_string(addr)
    # ping 'addr' which should be a string in "1.2.3.4" form
    ...

But it's almost always better to check for capabilities instead of checking for types. For example, if you want to make sure that add throws an error on invalid input, this would be a better way:

def add(a, b):
    if not (hasattr(a, "__add__") or hasattr(b, "__radd__")):
        raise ValueError("can't add a to b"))
    return a + b

This would be equivalent to excepting an interface instead of an implementation in a statically typed language:

// This is equivilent to ``isinstance(xs, list)`` -- usually bad
public static int sum(ArrayList xs) {
    ...
}

// This is equivilent to ``hasattr(xs, "__iter__")`` -- almost always better
public static int sum(Collection xs) {
    ...
}

Or better yet, Just Do It and wrap any exceptions which pop up:

def add(a, b):
    try:
        return a + b
    except Exception, e:
        raise ValueError("cannot add %r and %r: %r" %(a, b, e)), None, sys.exc_info()[2]

In general, though, code should assume that function arguments will behave correctly, then let the caller use your documentation and Python's helpful stack traces and debugging facilities to figure out what they did wrong.

Permalink + Comments

Python 2.X's str.format is unsafe

September 22, 2011 at 07:33 PM | Python, Unicode | View Comments

I posted a tweet today when I learned that Python's %-string-formatting isn't actually a special case - the str class just implements the __mod__ method.

One side effect of this is that a few people commented that %-formatting is to be replaced with .format formatting... So I'd like to take this opportunity to explain why .format string formatting is unsafe in Python 2.X.

With %-formatting, if the format string is a str while one of the replacements is a unicode the result will be unicode:

>>> "Hello %s" %(u"world", )
u'Hello world'

However, .format will always return the same type of string (str or unicode) as the format string:

>>> "Hello {}".format(u"world")
'Hello world'

This is a problem in Python 2.X because unqualified string literals are instances of str, and the implicit encoding of unicode arguments will almost certainly explode at the least opportune moments:

>>> "Hello {}".format(u"\u263a")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u263a' in position 0: ordinal not in range(128)

Of course, one possible solution to this is remembering to prefix all string literals with u:

>>> u"Hello {}".format(u"\u263a")
u'Hello \u263a'

But I prefer to simply use %-style formatting, because then I don't need to remember anything:

>>> "Hello %s" %(u"\u263a", )
u'Hello \u263a'
>>> print _.encode('utf-8')
Hello ☺

Of course, as you've probably noticed, this means that the format string is being implicitly decoded to unicode... But since my string literals generally don't contain non-ASCII characters it's not much of an issue.

Note that this is not a problem in Py 3k because string literals are unicode.

Permalink + Comments

Lies, More Lies and Python Packaging Documentation on `package_data`

July 15, 2011 at 08:35 PM | Python | View Comments

My slice of Python packaging hell today was thanks to the lie that is package_data.

You see, I've been trying to create an package that includes non-Python files in the distribution... So I did what any good developer would do and hit the documentation:

Package data can be added to packages using the package_data keyword argument to the setup() function.

Distutils documentation

and

If you want finer-grained control over what files are included (for example, if you have documentation files in your package directories and want to exclude them from installation), then you can also use the package_data keyword.

Distribute documentation

Over the last hour, though, I've learned that these statements are somewhere between “dangerously misleading” and “damn lies”.

This is because the primary type of Python package is a source package, and the canonical method for creating a source package is by using setup.py sdist. However, the data specified in package_data are not included in source distributions — they are only included in binary (setup.py bdist) distributions and installs (setup.py install).

The only way to get package data included in source packages is the MANIFEST.in file... Which will also include data in binary distributions and installs.

Which renders the package_data option useful only if sdist is not used… And dangerously misleading if sdist is used.

tl;dr: package_data is a lie. Ignore it. Only use MANIFEST.in.

Permalink + Comments

Tips for Managing a Django Project

June 22, 2011 at 04:44 PM | Python, Django | View Comments

During the time I've spent with Django, I've picked up a couple tricks for making life a little bit less terrible.

First, split the project project into three (or more) parts, each with its own settings.py: the main project, the development environment and the production environment. For example, my current project, code named eos, has a directory structure something like this:

eos/
    .hg/
    .hgignore
    manage.py
    run
    eos/
        __init__.py
        settings.py
        templates/
        urls.py
        ...
    hacking/
        __init__.py
        settings.py
        db.sqlite3
        ...
    production/
        __init__.py
        settings.py
        run.wsgi
    ...

The eos/ directory is more or less a standard Django project (ie, created by django-admin.py startproject), except that eos/settings.py does not have any environment-specific information in it (for example, it doesn't have any database settings or URLs).

The hacking/ and production/ directories also contain settings.py files, except they define only environment specific settings. For example, hacking/settings.py looks a bit like this:

from eos.settings import *
path = lambda *parts: os.path.join(os.path.dirname(__file__), *parts)

DATABASE_ENGINE = "sqlite3"
DATABASE_NAME = path("db.sqlite3")

DEBUG = True

While production/settings.py contains:

from eos.settings import *

DATABASE_ENGINE = "psycopg2"
DATABASE_NAME = "eos"
DATABASE_USER = "eos"
DATABASE_PASSWORD = "secret"

DEBUG = False

Then, instead of configuring Django (ie, calling setup_environment) on eos.settings, it is called on either hacking.settings or production.settings. For example, manage.py contains:

...
import hacking.settings
execute_manager(hacking.settings)

And production/run.wsgi contains:

...
os.environ["DJANGO_SETTINGS_MODULE"] = "production.settings"
...

Second, every settings.py file should contain the path lambda:

path = lambda *parts: os.path.join(os.path.dirname(__file__), *parts)

It will make specifying paths relative to the settings.py file very easy, and completely do away with relative-path-related issues. For example:

MEDIA_ROOT = path("media/")
DATA_ROOT = path("data/")
DATABASE_NAME = path("db.sqlite3")

Third, there should be scripts for running, saving and re-building the environment. I use two scripts for this: run and dump_dev_data. By default the run script calls ./manage.py runserver 8631 (specifying a port is useful so that web browsers can distinguish between different applications - keeping passwords, history, etc. separate). Run can also be passed a reset argument, which will delete the development database and rebuild it from the dev fixtures. These fixtures are created by the dump_dev_data script, which calls ./manage.py dumpdata for each application, saving the data to fixtures named dev (these fixtures are committed along side the code, so all developers can work off the same data).

So, for example, when I'm developing a new model, my workflow will look something like this:

... add new model to models.py ...
$ ./run reset # Reset the database adding the new model
... use the website to create data for the new model ...
$ ./dump_dev_data # Dump the newly created data
$ hg commit -m "Adding new model + test data"
Permalink + Comments