Ian Bicking: a blog :: 2012

Python Application Package

I’ve been thinking some more about deployment of Python web applications, and deployment in general (in part leading up to the Web Summit). And I’ve got an idea.

I wrote about this about a year ago and recently revised some notes on a proposal but I’ve been thinking about something a bit more basic: a way to simply ship server applications, bundles of code. Web applications are just one use case for this.

For now lets call this a "Python application package". It has these features:

There is an application description: this tells the environment about the application. (This is sometimes called "configuration" but that term is very confusing and overloaded; I think "description" is much clearer.)
Given the description, you can create an execution environment to run code from the application and acquire objects from the application. So there would be a specific way to setup sys.path, and a way to indicate any libraries that are required but not bundled directly with the application.
The environment can inject information into the application. (Also this sort of thing is sometimes called "configuration", but let’s not do that either.) This is where the environment could indicate, for instance, what database the application should connect to (host, username, etc).
There would be a way to run commands and get objects from the application. The environment would look in the application description to get the names of commands or objects, and use them in some specific manner depending on the purpose of the application. For instance, WSGI web applications would point the environment to an application object. A Tornado application might simply have a command to start itself (with the environment indicating what port to use through its injection).

There’s a lot of things you can build from these pieces, and in a sophisticated application you might use a bunch of them at once. You might have some WSGI, maybe a seperate non-WSGI server to handle Web Sockets, something for a Celery queue, a way to accept incoming email, etc. In pretty much all cases I think basic application lifecycle is needed: commands to run when an application is first installed, something to verify the environment is acceptable, when you want to back up its data, when you want to uninstall it.

There’s also some things that all environments should setup the same or inject into the application. E.g., $TMPDIR should point to a place where the application can keep its temporary files. Or, every application should have a directory (perhaps specified in another environmental variable) where it can write log files.

Details?

To get more concrete, here’s what I can imagine from a small application description; probably YAML would be a good format:

platform: python, wsgi
require:
os: posix
python: <3
rpm: m2crypto
deb: python-m2crypto
pip: requirements.txt
python:
paths: vendor/
wsgi:
app: myapp.wsgiapp:application

I imagine platform as kind of a series of mixins. This system doesn’t really need to be Python-specific; when creating something similar for Silver Lining I found PHP support relatively easy to add (handling languages that aren’t naturally portable, like Go, might be more of a stretch). So python is one of the features this application uses. You can imagine lots of modularization for other features, but it would be easy and unproductive to get distracted by that.

The application has certain requirements of its environment, like the version of Python and the general OS type. The application might also require libraries, ideally one libraries that are not portable (M2Crypto being an example). Modern package management works pretty nicely for this stuff, so relying on system packages as a first try I believe is best (I’d offer requirements.txt as a fallback, not as the primary way to handle dependencies).

I think it’s much more reliable if applications primarily rely on bundling their dependencies directly (i.e., using a vendor directory). The tool support for this is a bit spotty, but I believe this package format could clarify the problems and solutions. Here is an example of how you might set up a virtualenv environment for managing vendor libraries (you then do not need virtualenv to use those same libraries), and do so in a way where you can check the results into source control. It’s kind of complicated, but works (well, almost works – bin/ files need fixing up). It’s a start at least.

Support Library

On the environment side we need a good support library. pywebapp has some of the basic features, though it is quite incomplete. I imagine a library looking something like this:

from apppackage import AppPackage
app = AppPackage('/var/apps/app1.2012.02.11')
# Maybe a little Debian support directly:
subprocess.call(['apt-get', 'install'] +
app.config['require']['deb'])
# Or fall back of virtualenv/pip
app.create_virtualenv('/var/app/venvs/app1.2012.02.11')
app.install_pip_requirements()
wsgi_app = app.load_object(app.config['wsgi']['app'])

You can imagine building hosting services on this sort of thing, or setting up continuous integration servers (app.run_command(app.config['unit_test'])), and so forth.

Local Development

If designed properly, I think this format is as usable for local development as it is for deployment. It should be able to run directly from a checkout, with the "development environment" being an environment just like any other.

This rules out, or at least makes less exciting, the use of zip files or tarballs as a package format. The only justification I see for using such archives is that they are easy to move around; but we live in the FUTURE and there are many ways to move directories around and we don’t need to cater to silly old fashions. If that means a script that creates a tarball, FTPs it to another computer, and there it is unzipped, then fine – this format should not specify anything about how you actually deliver the files. But let’s not worry about copying WARs.

Git-as-sync, not source-control-as-deployment

I don’t like systems that use git push for deployment (Heroku et al). Why? I do a lot of this:

$ git push deploy
... realize I forgot a domain name ...
$ git commit -m "fix domain name" -a ; git push deploy
... realize I didn't do something right with the database setup ...
$ git commit -m "configure database right" -a ; git push deploy
... dammit, I didn't fix it quite right ...
$ git commit -m "typo" -a ; git push deploy

And then maybe I’d actually like to keep my config out of my source control, or have a build process that I run locally, or any number of things. I’d like to be able to test deployment, but every deployment is a commit, and I like to commit tested work. I think I could use git rebase but I lack the discipline to undo my work so I can do it correctly. This is why I don’t do continuous commits.

There’s a whole different level of weirdness when you use GitHub Pages as you aren’t pushing to a deployment-specific remote, you are pushing to a deployment-specific branch.

So I’ve generally thought: git deployment is wrong.

Then I was talking to some other people at Mozilla and they mentioned that ops was using git for simply moving files around even though the stuff they were deploying was itself in Mercurial. They had a particular site with a very large number of files, and it was faster to use git than rsync (git has more metadata than rsync; rsync has to look at everything everytime you sync). And that all seemed very reasonable; git is a fine way to sync things.

But I kind of forgot about it all, and just swore to myself as I did too many trivial commits and wrote too many meaningless commit messages.

Still… it isn’t so hard to separate these concerns, is it? So I wrote up a quite small command called git-sync. The basic idea: copy the working directory to a new location (minus .git/), commit that, and push the result to your deployment remote. You can send modified and untracked files, and you can run a build script before committing and push the result of the build script, all without sullying your "real" source control. And you happen to have a nice history of deployments, which is also nice.

I’ve only used this a little bit, but I’ve enjoyed when I have used it, and it makes me feel much better/clearer about my actual commits. It’s really short right now, and probably gets some things entirely wrong (e.g., moving over untracked files). But it works well enough to be improved (winkwinknudgenudge).

So check it out: https://github.com/ianb/git-sync

Ian Bicking: a blog

February 2012

Python Application Package

Details?

Support Library

Local Development

2012 02 29

Git-as-sync, not source-control-as-deployment

2012 02 14

Home

About

Archives

Categories

Recent Posts

Recent Comments