Python Brain-Teaser

May 29, 2008 at 11:14 AM | Python, Play | View Comments

I'm working on pulling some functionality out of one object and putting it in another, and I came across this interesting problem:

class Foo:
    me = "foo"
class Bar:
    me = "bar"
    def get_me(self):
Foo.get_me = Bar.get_me
x = Foo()
print x.get_me()

What does this print?

And, next question, why is that?

After lunch I'll post my thoughts :-)

Until now, I had assumed that the semi-magic self variable was set on method calls (ie, when x.get_me() is called)... But apparently it's set when the object is instantiated (which makes perfect sense, otherwise getattr(Bar(), 'get_me')() would not work).

So I can only presume that something equivalent to this happens when an object is instantiated:

from functools import partial
new_obj = Class()
for (key, val) in new_obj.__dict__.items():
    if not callable(val): continue
    setattr(new_obj, key, partial(val, new_obj))
Permalink + Comments

SSH Connection Sharing

May 17, 2008 at 09:35 AM | Play | View Comments

This tip was originally posted to the python-dev mailing list, so I can't take one scrap of credit for it.

Basically, since OpenSSH4, there has been an option to share connections -- that is, once you've opened one connection to a host, every subsequent connection is tunneled through the same channel, completely removing the overhead of authentication!

It's quite simple, just add this to ~/.ssh/config:

ControlMaster auto
ControlPath ~/.ssh/.%r@%h:%p

There can be problems if your machine crashes and that file is left lying around... So adding this to your crontab will fix that:

@reboot rm -f .ssh/controls/*

Next time: tab completion with scp!

Permalink + Comments

SVK + Unicode == :(

May 06, 2008 at 05:43 PM | Fixed-it, Work | View Comments

I was not impressed when I tried to check out a UTF-8 encoded file with SVK, then got the helpful message Can't encode path as ascii:

$ svk up
Syncing //drp/trunk(/drp/trunk) in /home/wolever/Trunk to 5388.
Can't encode path as ascii.

I was even less impressed when I searched Google for svk unicode and this blog was the first hit.

Fortunately my Google-foo is high today, and I was able to find a page that gives a solution: Making sure that your locale is set to something similar to en_US.UTF-8.

On Debian, here's how I do it:

$ sudo apt-get install locales # Ubuntu 6.06 didn't have it...
$ export LANG="en_US.UTF-8"
$ export LANGUAGE="$LANG"

And, of course, it may be good to put those two exports in ~/.bashrc.

Oh, but wait!

$ svk up
Syncing //drp/trunk(/drp/trunk) in /home/wolever/Trunk to 5388.
Can't encode path as ascii.

It still doesn't work!

It turns out that, for what ever reason, something was upset. Eventually I got it working by deleting the offending directory, then using svk revert to revert it:

$ rm -r hacking/
$ svk revert -R hacking
Reverted hacking/utf8_爱的


Permalink + Comments

Encoding and Decoding Text in Python (or: "I didn't ask you to use the 'ascii' codec!")

May 01, 2008 at 01:27 PM | Unicode | View Comments

When dealing with Unicode in Python, it doesn't take long to get the dreaded 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128).

You never see it coming. It doesn't make any sense. You didn't even ask for ascii!

So what's the deal?

I'm glad you asked. I will demonstrate:

>>> s = file("data").read()
>>> s

If you guessed that s is a hunk of base64 encoded data, you'd be right! Give yourself a gold star.

Now, if we want to do anything useful with this data, it needs to be decoded:

>>> s.decode('base64')
'Hello, world!'

We have just taken an encoded hunk of data and decoded it to get a useful hunk of data.

>>> s.decode('base64').replace('world', 'Marguerite')
'Hello, Marguerite!'
>>> _.encode('base64')

Now we can take that useful hunk of data (the English in 7-bit ASCII), do something useful with it (in this case, replace 'world' with 'Marguerite'), and finally encode the data.

So how does all this relate back to Unicode and ascii error messages?

I have used base64 encoded data here, but the same concept applies when dealing with Unicode data:

  1. Hunk of opaque data comes in (but we know that it contains some sort of Unicode text)
  2. Hunk of opaque data is decoded, creating a unicode object
  3. The unicode object is used for something useful
  4. The unicode object is encoded and saved (to disk, to a database, or sent to a browser)

(of course, in the Real World, you've got to figure out which encoding was used on the data (UTF-8, Latin1, etc)... But that's a topic for another post.)

Ok, back to the 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128) error. It should be fairly clear that this error is coming up because Python is trying to decode a bunch of bytes as 7-bit ASCII, but some of them are out of that range (eg, they have a value over 127).

I know what you're saying, "but I never asked Python to decode anything! I'm just trying to turn it into unicode!"

>>> unicode("Ol\xc3\xa1, mundo!")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

Two questions arise here: First, "Where is the 'ascii' coming from?" Second, "How do I make it work?"

To answer the first question, it's important to think about what's happening when the call to unicode(...) is made. The unicode function accepts an encoded string, decodes it, and creates a unicode object. In this case, though, we haven't given the function any indication of which decoder it should use, so it falls back to the computer's default encoding: ascii.

So how can you make it work? Tell unicode which encoding to use:

>>> unicode("Ol\xc3\xa1, mundo!", 'utf8')
u'Ol\xe1, mundo!'

(now, as I mentioned before, figuring out which encoding to use is another huge problem... But I'll leave that for another day)

Another problem I run into quite often is this:

>>> "Ol\xc3\xa1, mundo!".encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

And, by now, the cause of this should be painfully obvious: I've given Python an encoded string, so I should be decoding it, not encoding it again.

But why the confusing error message? Well, I'm not entirely sure, but my guess is that the UTF-8 encoder expects a unicode object, so it tries to convert the input (in this case, "Ol\xc3...") to Unicode before encoding it.

Is there any end to this insanity?!

Yes! Python 3000 will have two distinct classes: one for strings, one for hunks of data. Whenever data is read, it will come in as a "hunk of data". It will have to be explicitly decoded to a string before it can be used as such. Hopefully that will make life a little bit less painful.

See also:

Permalink + Comments