I’ve been thinking some more about deployment of Python web applications, and deployment in general (in part leading up to the Web Summit). And I’ve got an idea.
I wrote about this about a year ago and recently revised some notes on a proposal but I’ve been thinking about something a bit more basic: a way to simply ship server applications, bundles of code. Web applications are just one use case for this.
For now lets call this a "Python application package". It has these features:
- There is an application description: this tells the environment about the application. (This is sometimes called "configuration" but that term is very confusing and overloaded; I think "description" is much clearer.)
- Given the description, you can create an execution environment to run code from the application and acquire objects from the application. So there would be a specific way to setup sys.path, and a way to indicate any libraries that are required but not bundled directly with the application.
- The environment can inject information into the application. (Also this sort of thing is sometimes called "configuration", but let’s not do that either.) This is where the environment could indicate, for instance, what database the application should connect to (host, username, etc).
- There would be a way to run commands and get objects from the application. The environment would look in the application description to get the names of commands or objects, and use them in some specific manner depending on the purpose of the application. For instance, WSGI web applications would point the environment to an application object. A Tornado application might simply have a command to start itself (with the environment indicating what port to use through its injection).
There’s a lot of things you can build from these pieces, and in a sophisticated application you might use a bunch of them at once. You might have some WSGI, maybe a seperate non-WSGI server to handle Web Sockets, something for a Celery queue, a way to accept incoming email, etc. In pretty much all cases I think basic application lifecycle is needed: commands to run when an application is first installed, something to verify the environment is acceptable, when you want to back up its data, when you want to uninstall it.
There’s also some things that all environments should setup the same or inject into the application. E.g., $TMPDIR should point to a place where the application can keep its temporary files. Or, every application should have a directory (perhaps specified in another environmental variable) where it can write log files.
Details?
To get more concrete, here’s what I can imagine from a small application description; probably YAML would be a good format:
platform: python, wsgi
require:
os: posix
python: <3
rpm: m2crypto
deb: python-m2crypto
pip: requirements.txt
python:
paths: vendor/
wsgi:
app: myapp.wsgiapp:application
I imagine platform as kind of a series of mixins. This system doesn’t really need to be Python-specific; when creating something similar for Silver Lining I found PHP support relatively easy to add (handling languages that aren’t naturally portable, like Go, might be more of a stretch). So python is one of the features this application uses. You can imagine lots of modularization for other features, but it would be easy and unproductive to get distracted by that.
The application has certain requirements of its environment, like the version of Python and the general OS type. The application might also require libraries, ideally one libraries that are not portable (M2Crypto being an example). Modern package management works pretty nicely for this stuff, so relying on system packages as a first try I believe is best (I’d offer requirements.txt as a fallback, not as the primary way to handle dependencies).
I think it’s much more reliable if applications primarily rely on bundling their dependencies directly (i.e., using a vendor directory). The tool support for this is a bit spotty, but I believe this package format could clarify the problems and solutions. Here is an example of how you might set up a virtualenv environment for managing vendor libraries (you then do not need virtualenv to use those same libraries), and do so in a way where you can check the results into source control. It’s kind of complicated, but works (well, almost works – bin/ files need fixing up). It’s a start at least.
Support Library
On the environment side we need a good support library. pywebapp has some of the basic features, though it is quite incomplete. I imagine a library looking something like this:
from apppackage import AppPackage
app = AppPackage('/var/apps/app1.2012.02.11')
# Maybe a little Debian support directly:
subprocess.call(['apt-get', 'install'] +
app.config['require']['deb'])
# Or fall back of virtualenv/pip
app.create_virtualenv('/var/app/venvs/app1.2012.02.11')
app.install_pip_requirements()
wsgi_app = app.load_object(app.config['wsgi']['app'])
You can imagine building hosting services on this sort of thing, or setting up continuous integration servers (app.run_command(app.config['unit_test'])), and so forth.
Local Development
If designed properly, I think this format is as usable for local development as it is for deployment. It should be able to run directly from a checkout, with the "development environment" being an environment just like any other.
This rules out, or at least makes less exciting, the use of zip files or tarballs as a package format. The only justification I see for using such archives is that they are easy to move around; but we live in the FUTURE and there are many ways to move directories around and we don’t need to cater to silly old fashions. If that means a script that creates a tarball, FTPs it to another computer, and there it is unzipped, then fine – this format should not specify anything about how you actually deliver the files. But let’s not worry about copying WARs.
Automatically generated list of related posts:
- A Python Web Application Package and Format (we should make one) At PyCon there was an open space about deployment, and...
- The Shrinking Python Web Framework World When I was writing the summary of differences between WebOb...
- 2 Python Environment Experiments two experiments in the Python environment. The first is virtualenv,...
- Opening Python Classes So, I was reading through comments to despam my old...
- Python HTML Parser Performance In preparation for my PyCon talk on HTML I thought...
You should strongly consider some of those fields being expressions rather than text. For example how would I say Python 2.6+ and 3.3+ (ie not 3.0/3.1/3.2 or 2.5)? How about dependency ‘A’ (version >6.3) or ‘B’ (version <1)? What about if 'C' is present then 'D' must be too? How about flags like Gentoo's USE – eg how do you indicate Kerberos must be used on some systems, but can be optional on others? The ast module makes it really easy to deal with expressions, and safely evaluating and understanding them.
If I was solving the problem as you are, I'd very strongly consider making the format something that can be trivially translated into .deb, .rpm or .msi. Rather than fighting the existing platform packaging solutions, make it the smallest divergence from them possible and make it actually possible to use them. (Remember the vast majority only want one version of things and don't need multiple versions scattered all over the place. It is also possible for virtualenv and friends to understand .deb/rpm/msi and do a virtualized local install of the contents.)
Separately, I am the author of APSW which is a wrapper around SQLite. It has proven very difficult to put into package repositories. The first major issue is that it requires a C compiler which doesn't exist on Windows. That leads to this: http://code.google.com/p/apsw/downloads/list
The second is what to do about SQLite. In general the latest version of SQLite is required. My setup.py script is quite happy to download and use the latest version source for you, but you may want to use the system library instead. And if a downloaded source is used then several flags are available for extensions, omitting or including bits of SQLite functionality etc. And some won't want setup.py having internet access, while others won't want SQLite distributed with APSW source. It is hard to come up with something that satisfies the majority.
Certainly expressions can be embedded in text ;) Requiring MySQL or PostgreSQL is a pretty common example of needing an expression.
Of course this could translated into a system package (deb etc), but system packages alone aren’t expressive enough for deployments. They are good for making certain things available, but they generally leave the “doing” part to the sysadmin and configuration. This format is intended for really doing things. That said, I would hope that custom repositories (ala PPA) would be possible to setup with this format as well, or as a kind of supplement – so that when you really need a cutting edge package (which just happens sometime) you can do the work to provide that package The Right Way (as opposed to sshing in and using
./configure; make; sudo make install
). Anyway, I like the idea of pushing build stuff off to the systems instead of trying to embrace it directly in this tool.…and the race is on to release a python deployment tool named “smear”. Oh yes, I just went there.
I was going to go there, too — you beat me to it. Kudos. But, seriously, can we do better in the Python app name department? ‘Fabric’ sucks. South is okay. Django’s pretty good. Flask isn’t bad, but boo for Bottle. Celery? Lettuce? Are we making salad? Virtualenv would be great if Python had been called Indenteddynamiclang. Gunicorn is gawful. We already have pip and pep. Now PAP? Poop.
I’ve been working along very similar lines. You are definitely onto something. Here are my thoughts about it:
The “description” metadata should describe a project in a general enough way to be useful to anyone else thinking of using the project for any purpose.
This generality means the description metadata can live either inside the project or outside it (or both!). Joe’s useful library doesn’t have a description file? That’s ok, Good Samaritan has published a description of Joe’s package at goodsamaritan.org! Add that site as a source of description data.
Separating description from source tree encourages good design of the format. The description should be designed to work with the huge number of commonly used libraries out there that use things like gnu tools (./configure; make; make install), cmake, ant/maven, or even a simple “cp -r * {target}”. The point of writing a description is to tell a machine how to perform common operations on a package: get, build, test, install, start, stop, uninstall, “discover available versions”, “discover dependencies”, etc.
Separating description from source tree and using a good format design encourages adoption. If you have good tools and good descriptions to work with, you can make effective use of a package without any cooperation from the package’s author. People can write and publish descriptions for the many, many tools already in common use that haven’t been quick to adopt new tools. That’s no small part of the success of things like puppet, chef, and homebrew.
“Package Repositories” are not necessary, especially if source (rather than compiled binaries) is what you want. Does the project have a presence on Github, or Bitbucket, Gitorious, SourceForge, Google Code, or any simple git, mercurial, or (shudder) subversion repository anywhere else with a url? Just identify it by its url and distribute directly from there. Tagging a commit == cutting a new release. Grabbing a package == getting set up to make code changes and immediately contribute them back to the repository.
If you point at a 3rd party package repository and a new version shows up in the source repo, you don’t necessarily have the new version. If you point at a 3rd party description which points back at the source repo and a new version shows up in the source repo, you do have the new version, and the description likely applies just as well to the new version as it did to the last one.
I think this could be a great use of semantic web “linked data” concepts. Tools in that space are all about crawling linked data and constructing graphs to do useful things with it. This is a perfect fit for the task of crawling package descriptions, building a dependency graph, and performing operations on the graph to resolve dependencies. Also, using the web language of URIs keeps things universal, so the packages in your dependency graph can use different languages and toolchains and still reference one another in the common namespace of URIs.
What you are describing sounds to me more like buildout than what I’m thinking of. Using the system I describe, if you want to integrate a foreign tool (that doesn’t have a description), you literally take its files, put them into the directory, and describe them. You don’t point about on the net. If you want to use tools to manage that (e.g., git submodule) then sure; but when you deploy you have actual files in hand, no indirection, no pointers.
There is however a need for another higher level of coordination among these applications. It is not uncommon for application A to need to directly contact and work with application B, and vice versa. Or they might need to share a bit of information (a shared secret), or know each other’s hostnames, etc. I think that’s simply another layer on top though, and it’s possible to skip that for now.
In particular I do not agree with 3; this system should not account for things like
./configure; make; make install
– let the system packages deal with that, they have been banging their heads on that for a long time and can do it well.I don’t mean to suggest that you never want a locally cached copy of a project for things like reliable deployment; I’m just saying I would like a universal way to describe software projects supporting a set of common actions.
I have a hard time seeing a difference between “system packages” and any other packages. To me they are all projects that ultimately have a source tree somewhere and may be mixed and matched into running systems.
But maybe we just don’t want quite the same thing :)
I was unaware of buildout. I’ll have to check that out; thanks for telling me about it.
ps
I’m currently developing what I described and it works well enough that it’s my config management system now (but not so well that all the concerns are fully separated yet). It definitely makes heavy use of local repositories.
I like being able to version control a lightweight collection of descriptions, keeping the actual 3rd party projects outside the repo. The scripts simply keep them in an unversioned work area, fetching them first if necessary.
Interesting. We are currently deploying our Django apps using another Django app that ties together git, boto Amazon api, and fabric to set up and update deployments.
It would be nice if there were a more standard deployment solution. Interested in how this develops.