Lies, More Lies and Python Packaging Documentation on `package_data`

July 15, 2011 at 08:35 PM | Python | View Comments

My slice of Python packaging hell today was thanks to the lie that is package_data.

You see, I've been trying to create an package that includes non-Python files in the distribution... So I did what any good developer would do and hit the documentation:

Package data can be added to packages using the package_data keyword argument to the setup() function.

Distutils documentation

and

If you want finer-grained control over what files are included (for example, if you have documentation files in your package directories and want to exclude them from installation), then you can also use the package_data keyword.

Distribute documentation

Over the last hour, though, I've learned that these statements are somewhere between “dangerously misleading” and “damn lies”.

This is because the primary type of Python package is a source package, and the canonical method for creating a source package is by using setup.py sdist. However, the data specified in package_data are not included in source distributions — they are only included in binary (setup.py bdist) distributions and installs (setup.py install).

The only way to get package data included in source packages is the MANIFEST.in file... Which will also include data in binary distributions and installs.

Which renders the package_data option useful only if sdist is not used… And dangerously misleading if sdist is used.

tl;dr: package_data is a lie. Ignore it. Only use MANIFEST.in.