The evils of `except:`

September 29, 2011 at 03:34 PM | Python | View Comments

I had some discussion recently about the evils of using a naked except:. Here is a more complete description of the dangers, and of the correct solutions.

In short, except: is bad because hides the source source of the exception and frustrates debugging. For example, consider this code:

    parsed = parse(file_name)
    raise ParseError("can't parse file")

It will likely produce an error something like this:

$ ./
Traceback (most recent call last):
ParseError: can't parse file

This kind of error makes me want to high-five someone. In the face. With a chair.

Notice that it does not contain any information about:

  • The file which caused the error
  • The line which caused the error
  • The nature of the error (is it an expected error? A bug? Who knows!)

And tracking down the source of this error would likely involve some binary searching on the input file or dropping into a debugger.

These are some other, equally unhelpful, bits of code that I have seen:

# There isn't much worse than completely hiding the error

# Almost as bad is not giving any hit at what it was
    print "there was an error!"

# And even showing the original error can be unhelpful if the error is
# something like an IndexError which could come from anywhere
except Exception, e:
    raise MyException("there was an error: %r" %(e, ))

Now, there is a situations where using a naked except: can be used safely. Exactly one.

1. The except: block is terminated with a raise

For example, when some cleanup needs to be done before leaving the function:

cxn = open_connection()

(note that, usually, the finally: block should be used for this kind of cleanup, but there are some situations where the code above makes more sense)

Every other situation should use except Exception, e::

2. A new exception is raised but the original stack trace is used

For example:

    parsed = parse(file_name)
except Exception, e:
    raise ParseError("error parsing %r: %r" %(file_name, e)), None, sys.exc_info()[2]

A few things to note: first, the three expression version of raise is used, the third of which being the current stack trace. This means that the stack trace will point to the original source of the error:

File "", line 9, in <module>
File "", line 2, in parse
  for lineno, line in enumerate(open(file_name), "rb"):
ParseError: error parsing 'input.bin': TypeError("'str' object cannot be interpreted as an index",)

Instead of the (less helpful) line which re-raised the error:

File "", line 11, in <module>
  raise ParseError("error parsing %r: %r" %(file_name, e))
ParseError: error parsing 'input.bin': TypeError("'str' object cannot be interpreted as an index",)

Second, the error includes the file name and original exception, which will make debugging significantly easier. When I'm writing particularly fragile code I'll often wrap the entire block in a try/except which will include as much state as is sensible in the error. For example, the main loop of the parse function might be:

def parse(file_name):
    lineno = -1
    current_foo = None
        f = open(file_name)
        for lineno, line in enumerate(f):
            current_foo = line.split()[0]
    except Exception, e:
        raise ParseError("error while parsing %r (line %r; current_foo: %r): %r"
                         %(file_name, lineno, current_foo, e)), None, sys.exc_info()[2]

3. The exception and stack trace are logged

For example, the main runloop of an application might be:

while 1:
    except Exception, e:
        log.exception("error in mainloop")

A few things to note: first, a naked except: should not be used here, as it will also catch KeyboardInterrupt and SystemExit exceptions, which is almost certainly a bad thing.

Second, log.exception is used, which includes a complete stack trace in the log (care should also be taken to make sure that these logs will be checked - for example by sending an email on exception logs).

Third, the time.sleep(1) ensures that the system won't get clobbered if the do_stuff() function immediately raises an exception.

Permalink + Comments

Checking types in Python

September 26, 2011 at 01:53 PM | Python | View Comments

A friend asked me recently when it's acceptable to check types in Python. Here is my reply:

It is almost never a good idea to check that function arguments are exactly the type you expect. For example, these two functions are very, very bad:

def add(a, b):
    if not isinstance(a, int):
        raise ValueError("a is not an int")
    if not isinstance(b, int):
        raise ValueError("b is not an int")
    return a + b

def sum(xs):
    if not isinstance(xs, list):
        raise ValueError("xs is not a list")
    base = 0
    for x in xs:
        base += x
    return base

There's no reason to impose those restrictions, and it makes life difficult if, for example, you want to add floats or sum an iterator:

>>> add(1.2, 3)
ValueError("a is not an int")
>>> sum(person.age for person in people)
ValueError("xs is not a list")

Type checking to correctly handle different kinds of input is occasionally acceptable, but should be used carefully (ex, to do optimizations, or situations where method overloading would be used in other languages). For example, these functions could be ok:

def contains_all(haystack, needles):
    if not isinstance(haystack, (set, dict)):
        haystack = set(haystack)
    return all(needle in haystack for needle in needles)

def ping_ip(addr):
    if isinstance(addr, numbers.Number):
        addr = numeric_ip_to_string(addr)
    # ping 'addr' which should be a string in "" form

But it's almost always better to check for capabilities instead of checking for types. For example, if you want to make sure that add throws an error on invalid input, this would be a better way:

def add(a, b):
    if not (hasattr(a, "__add__") or hasattr(b, "__radd__")):
        raise ValueError("can't add a to b"))
    return a + b

This would be equivalent to excepting an interface instead of an implementation in a statically typed language:

// This is equivilent to ``isinstance(xs, list)`` -- usually bad
public static int sum(ArrayList xs) {

// This is equivilent to ``hasattr(xs, "__iter__")`` -- almost always better
public static int sum(Collection xs) {

Or better yet, Just Do It and wrap any exceptions which pop up:

def add(a, b):
        return a + b
    except Exception, e:
        raise ValueError("cannot add %r and %r: %r" %(a, b, e)), None, sys.exc_info()[2]

In general, though, code should assume that function arguments will behave correctly, then let the caller use your documentation and Python's helpful stack traces and debugging facilities to figure out what they did wrong.

Permalink + Comments

Python 2.X's str.format is unsafe

September 22, 2011 at 07:33 PM | Python, Unicode | View Comments

I posted a tweet today when I learned that Python's %-string-formatting isn't actually a special case - the str class just implements the __mod__ method.

One side effect of this is that a few people commented that %-formatting is to be replaced with .format formatting... So I'd like to take this opportunity to explain why .format string formatting is unsafe in Python 2.X.

With %-formatting, if the format string is a str while one of the replacements is a unicode the result will be unicode:

>>> "Hello %s" %(u"world", )
u'Hello world'

However, .format will always return the same type of string (str or unicode) as the format string:

>>> "Hello {}".format(u"world")
'Hello world'

This is a problem in Python 2.X because unqualified string literals are instances of str, and the implicit encoding of unicode arguments will almost certainly explode at the least opportune moments:

>>> "Hello {}".format(u"\u263a")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u263a' in position 0: ordinal not in range(128)

Of course, one possible solution to this is remembering to prefix all string literals with u:

>>> u"Hello {}".format(u"\u263a")
u'Hello \u263a'

But I prefer to simply use %-style formatting, because then I don't need to remember anything:

>>> "Hello %s" %(u"\u263a", )
u'Hello \u263a'
>>> print _.encode('utf-8')
Hello ☺

Of course, as you've probably noticed, this means that the format string is being implicitly decoded to unicode... But since my string literals generally don't contain non-ASCII characters it's not much of an issue.

Note that this is not a problem in Py 3k because string literals are unicode.

Permalink + Comments