Ian Bicking: a blog :: A Python Web Application Package and Format (we should make one)

{ 2011 03 31 }

A Python Web Application Package and Format (we should make one)

At PyCon there was an open space about deployment, and the idea of drop-in applications (Java-WAR-style).

I generally get pessimistic about 80% solutions, and dropping in a WAR file feels like an 80% solution to me. I’ve used the Hudson/Jenkins installer (which I think is specifically a project that got WARs on people’s minds), and in a lot of ways that installer is nice, but it’s also kind of wonky, it makes configuration unclear, it’s not always clear when it installs or configures itself through the web, and when you have to do this at the system level, nor is it clear where it puts files and data, etc. So a great initial experience doesn’t feel like a great ongoing experience to me — and it doesn’t have to be that way. If those were necessary compromises, sure, but they aren’t. And because we don’t have WAR files, if we’re proposing to make something new, then we have every opportunity to make things better.

So the question then is what we’re trying to make. To me: we want applications that are easy to install, that are self-describing, self-configuring (or at least guide you through configuration), reliable with respect to their environment (not dependent on system tweaking), upgradable, and respectful of persistence (the data that outlives the application install). A lot of this can be done by the "container" (to use Java parlance; or "environment") — if you just have the app packaged in a nice way, the container (server environment, hosting service, etc) can handle all the system-specific things to make the application actually work.

At which point I am of course reminded of my Silver Lining project, which defines something very much like this. Silver Lining isn’t just an application format, and things aren’t fully extracted along these lines, but it’s pretty close and it addresses a lot of important issues in the lifecycle of an application. To be clear: Silver Lining is an application packaging format, a server configuration library, a cloud server management tool, a persistence management tool, and a tool to manage the application with respect to all these services over time. It is a bunch of things, maybe too many things, so it is not unreasonable to pick out a smaller subset to focus on. Maybe an easy place to start (and good for Silver Lining itself) would be to separate at least the application format (and tools to manage applications in that state, e.g., installing new libraries) from the tools that make use of such applications (deploy, etc).

Some opinions I have on this format, exemplified in Silver Lining:

It’s not zipped or a single file, unlike WARs. Uploading zip files is not a great API. Geez. I know there’s this desire to "just drop in a file"; but there’s no getting around the fact that "dropping a file" becomes a deployment protocol and it’s an incredibly impoverished protocol. The format is also not subtly git-based (ala Heroku) because git push is not a good deployment protocol.
But of course there isn’t really any deployment protocol inferred by a format anyway, so maybe I’m getting ahead of myself ;) I’m saying a tool that deploys should take as an argument a directory, not a single file. (If the tool then zips it up and uploads it, fine!)
Configuration "comes from the outside". That is, an application requests services, and the container tells the application where those services are. For Silver Lining I’ve used environmental variables. I think this one point is really important — the container tells the application. As a counter-example, an application that comes with a Puppet deployment recipe is essentially telling the server how to arrange itself to suit the application. This will never be reliable or simple!
The application indicates what "services" it wants; for instance, it may want to have access to a MySQL database. The container then provides this to the application. In practice this means installing the actual packages, but also creating a database and setting up permissions appropriately. The alternative is never having any dependencies, meaning you have to use SQLite databases or ad hoc structures, etc. But in fact installing databases really isn’t that hard these days.
All persistence has to use a service of some kind. If you want to be able to write to files, you need to use a file service. This means the container is fully aware of everything the application is leaving behind. All the various paths an application should use are given in different environmental variables (many of which don’t need to be invented anew, e.g., $TMPDIR).
It uses vendor libraries exclusively for Python libraries. That means the application bundles all the libraries it requires. Nothing ever gets installed at deploy-time. This is in contrast to using a requirements.txt list of packages at deployment time. If you want to use those tools for development that’s fine, just not for deployment.
There is also a way to indicate other libraries you might require; e.g., you might lxml, or even something that isn’t quite a library, like git (if you are making a github clone). You can’t do those as vendor libraries (they include non-portable binaries). Currently in Silver Lining the application description can contain a list of Ubuntu package names to install. Of course that would have to be abstracted some.
You can ask for scripts or a request to be invoked for an application after an installation or deployment. It’s lame to try to test if is-this-app-installed on every request, which is the frequent alternative. Also, it gives the application the chance to signal that the installation failed.
It has a very simple (possibly/probably too simple) sense of configuration. You don’t have to use this if you make your app self-configuring (i.e., build in a web-accessible settings screen), but in practice it felt like some simple sense of configuration would be helpful.

Things that could be improved:

There are some places where you might be encouraged to use routines from the silversupport package. There are very few! But maybe an alternative could be provided for these cases.
A little convention-over-configuration is probably suitable for the bundled libraries; silver includes tools to manage things, but it gets a little twisty. When creating a new project I find myself creating several .pth files, special customizing modules, etc. Managing vendor libraries is also not obvious.
Services are IMHO quite important and useful, but also need to be carefully specified.
There’s a bunch of runtime expectations that aren’t part of the format, but in practice would be part of how the application is written. For instance, I make sure each app has its own temporary directory, and that it is cleared on update. If you keep session files in that location, and you expect the environment to clean up old sessions — well, either all environments should do that, or none should.
The process model is not entirely clear. I tried to simply define one process model (unthreaded, multiple processes), but I’m not sure that’s suitable — most notably, multiple processes have a significant memory impact compared to threads. An application should at least be able to indicate what process models it accepts and prefers.
Static files are all convention over configuration — you put static files under static/ and then they are available. So static/style.css would be at /style.css. I think this is generally good, but putting all static files under one URL path (e.g., /media/) can be good for other reasons as well. Maybe there should be conventions for both.
Cron jobs are important. Though maybe they could just be yet another kind of service? Many extra features could be new services.
Logging is also important; Silver Lining attempts to handle that somewhat, but it could be specified much better.
Silver Lining also supports PHP, which seemed to cause a bit of stress. But just ignore that. It’s really easy to ignore.

There is a description of the configuration file for apps. The environmental variables are also notably part of the application’s expectations. The file layout is explained (together with a bunch of Silver Lining-specific concepts) in Development Patterns. Besides all that there is admittedly some other stuff that is only really specified in code; but in Silver Lining’s defense, specified in code is better than unspecified ;) App Engine provides another example of an application format, and would be worth using as a point of discussion or contrast (I did that myself when writing Silver Lining).

Discussing WSGI stuff with Ben Bangert at PyCon he noted that he didn’t really feel like the WSGI pieces needed that much more work, or at least that’s not where the interesting work was — the interesting work is in the tooling. An application format could provide a great basis for building this tooling. And I honestly think that the tooling has been held back more by divergent patterns of development than by the difficulty of writing the tools themselves; and a good, general application format could fix that.

Automatically generated list of related posts:

Python Application Package I’ve been thinking some more about deployment of Python web...
The Shrinking Python Web Framework World When I was writing the summary of differences between WebOb...
The Web Server Benchmarking We Need Another WSGI web server benchmark was published. It’s a decent...
A new way to deploy web applications Deployment is one of the things I like least about...
2 Python Environment Experiments two experiments in the Python environment. The first is virtualenv,...

Posted by Ian on Thursday, March 31st, 2011, at 10:01 am, and filed under Packaging, Programming, Python, Web.

Comments have a feed.

18 Comments

Alan Franzoni says:

March 31, 2011 at 10:33 am

I think it’s a good idea. While WAR may not be a great deployment container, many other features from Java web world are pretty good regarding packaging and APIs.

That’s probably the main target: make it possible to deploy a webapp to a standard container, independently of the framework/technology it uses. This would mean creating something like the Servlet API and/or other J2EE APIs.
- Ian says:
  
  March 31, 2011 at 10:37 am
  
  WSGI is akin to the Servlet API, only better ;) I feel like the Python world flirted with copying J2EE and other Java stuff for a while, and it all turned out badly. But… inspiration in terms of scope would be fine ;)
  
  Well… thinking about it: WSGI doesn’t cover everything, which is why this notion of multiple platforms makes sense. It’s not just that Python is a platform, but Python+WSGI would probably be the inaugural platform. Other server setups (primarily async systems that don’t use WSGI) would be new platforms — obviously sharing a large amount with Python+WSGI, but with some different details and different constraints on the containers (async being generally harder to support than sync-aka-WSGI, since one can be implemented in the other but not vice versa).
Eric Larson says:

March 31, 2011 at 10:50 am

I’m definitely in favor of getting a better application package in Python, but I do wonder if the focus is on the wrong things. Honestly, tools like dpkg and rpm do a perfectly suitable job of handling the filesystem aspects. What seems more difficult is how do you specify how to run the actual application? How do you stop it? Does this work on Windows? What and how do you get status from an application? It seems like if these questions of how do you manage a deployed python application then you’d begin to see some really interesting things happen because the deployment question is answered.

I do think the Web Site Process Bus (http://www.cherrypy.org/wiki/WSPBSpec) is one tool that leans towards the idea of a standard means of distributing and deploying Python web apps. Also, in terms of packaging with dpkg/rpm, my point is not to suggest “just use dpkg”, but rather to suggest that the goal of having a directory to deploy to maps very well to a tarball + meta data for uninstalling. If folks don’t want to use rpm or dpkg, the models they both prescribe seem correct in what they specify a package do.
- Ian says:
  
  March 31, 2011 at 11:23 am
  
  It would not be unreasonable to create a deb or rpm out of this format, and that could be part of a deployment process — but the format goes much further than deb or rpm. (Debian packaging policy does go into greater depth in particular domains, but AFAIK has not in the case of web apps.) Configuration in particular is not generally handled at the Linux package level, but is essential to this system (e.g., configuring a database to accept connections from the web application, with appropriate permissions).
  
  There is a specification of how to run the application. Stopping and starting are implicitly left to the container. There’s no notification, but it would be reasonable to have such notifications — though the idea of “the app is going down” is too expansive IMHO — only “this process is coming to a halt”.
  
  Windows is irrelevant in my opinion — it has ceased to be a serious deployment target, and supporting it only for purpose of development machines I don’t believe is worth the effort. But that’s just my opinion and the specification need not be particularly opinionated in this regard. But I think it would be counterproductive to try to create an abstraction above both Windows and Linux, it would be better to simply put the burden on Windows people to translate Linux concepts. Supporting Windows isn’t worth the drag on development.
  
  It would be nice to have a declared way to ping an application for its “status” — maybe at the most minimal, noting that it is alive and responding, though it could also do other tests. The number of tests it does effects the reasonable frequency of pinging, so there’d have to be some thought, or maybe just two kinds of ping — alive and status, with two times.
  
  Silver Lining does I think show that this covers most of the important aspects of deploying an application (with [a bunch of todo items too](http://cloudsilverlining.org/todo.html)). Development of tooling in parallel with the format would be essential, as in my experience there are many details that are best revealed through actual use.
  - Eric Larson says:
    
    March 31, 2011 at 12:41 pm
    
    I do agree that the package should have more info. I suppose my point is that just as setup.py/.cfg is a known file and /etc is for known configuration, a package can have the same specified paths. I think we are agreeing there ;)
    
    I’m also perfectly fine avoiding Windows. I only bring it up because Python as a platform (typically) supports Windows, therefore I’d imagine some support would be desired. If you have no desire to make this concept blessed within the scope of Python as a language (again fine by me), then no Windows support is definitely preferred.
    
    The reasoning behind the communicative aspects is that deployments range from pushing an app to a machine to configuring a deployment amongst hundreds (thousands?) of machines. The result being that often times you need more information other than just “are you up or not”. Likewise, if an application does fail, what do you do? You probably look to the logs, check things like disk I/O, network I/O, data stores. A load balancer might be able to switch traffic to other nodes. It would be nice to have a means of communicating with the environment to see if it can help you debug the problem where it happened. For example, running the application tests.
    
    I’m not trying to say you should support all these details, but rather suggesting that without a known way of communicating to the deployed environment, these are all ad-hoc solutions. A simple API could make a lot of this less painful. Imagine if there was an assumption that when an app crashes on a server you should easily be able to tell the env to run a suite diagnostics and tests to see what external issues could be the problem. The word “assumption” is important here. In support channels there are assumptions such as “send me a stace trace” or “can you send me a screen shot”. These kind of assumptions for web applications have typically been limited, but there is no reason they have to be if you can convene on a system of communication.
    
    Hopefully that idea make some sense and it doesn’t seem like I’m just rambling ;)
    - Ian says:
      
      March 31, 2011 at 12:53 pm
      
      I personally don’t like putting stuff in /etc because it is a singleton on the system, and singletons are bad both generally and for a number of specific reasons in this case, like fast switchovers during updates (not taking a server down, updating, and then bringing it up, but doing a complete deployment and then pointing the server to the new instance). Anyway, an application should never look for config files anyway! Then it’s just a decision for the deployment tool.
      
      APIs to introspect the container are going to be tricky, as they often expose implementation details that should be abstracted from the application. Ideally when these use cases come up the pressure will be on the container to implement something, or to add tooling around the general system, rather than pushing applications to essentially debug or monitor themselves on an individual basis. When that’s not possible (and it’s not always) then probably another API should be considered. An example might be load information — the container can tell something about load, but the application itself might also have information to share, so there should be an agreed-upon way for that to happen (maybe as simple as writing to a specified file with a particular format). Mostly someone should try to do this sort of thing and see where it hurts ;)
      - Eric Larson says:
        
        March 31, 2011 at 1:01 pm
        
        I mean /etc as in a known path in linux the same way $appdir/conf/connections.yaml might be a known path in deployment.
        
        I agree on the communication format as well. An environment can check some known file and return it and let app specific tools understand that format. The logging.statistics idea is similar ( http://www.aminus.org/blogs/index.php/2010/11/19/logging-statistics?blog=2) in that the dictionary of stats could be anything.
  - david says:
    
    March 31, 2011 at 8:11 pm
    
    By itself, there is no reason for .deb packages not to support database configuration. There is already a whole framework for configuration in debian, and there are some helpers to configure e.g. databases for an application. It is pretty horrible to use, though, because it is ancient and using shell for this kind of stuff is painful.
    
    IMO, that’s not really an issue of the packaging format (war vs .deb vs .rpm vs…), but of infrastructure around to do the hard stuff: configuring databases, users, etc… When I started dealing with those issues, I was expecting to find something like a python library to help configuring databases (mysql user/db/pasword, this kind of thing) in a relatively db/platform independent kind of way, but did not find anything.
    
    In that sense, an effort to clearly separate those issues and having one library for each sounds better to me than an all-encompassing solution like silverlining.
Bryan says:

March 31, 2011 at 11:39 am

I don’t know, seems like a potential step backwards to me. I’m more a fan of “rsync” or file copy deployments. That’s one thing PHP has right and it’s one of the reasons PHP has been so successful. I’m not ruling it out, but I’d prefer the rsync deployment problem was solved first.
Brent Tubbs says:

March 31, 2011 at 12:28 pm

I really like the idea of defining an interface between packages and containers.

I’m trying to go a similar direction with Silk (http://pypi.python.org/pypi/silk-deployment), though it has some distance to go, having started more with Fabric scripts and inspiration from Google App Engine (and some of your early Silver Lining posts).

I’ll take a look at Silver Lining’s package format and think about what it would take to deploy a Silver-Lining-configured application on Silk’s Nginx/Gunicorn/Supervisord stack. Maybe it’s not at the level you want, but Supervisord’s XML RPC interface has been nice for getting process state.
- Ian says:
  
  March 31, 2011 at 12:42 pm
  
  Cool! While I have been opinionated about things in some places in Silver Lining, I’ve tried fairly hard to keep other parts separated — so for instance, an application should not know or care if it is running under mod_wsgi or gunicorn, etc. It would be a nice test of this if Silk used the same format (which itself is simple enough that it’s not a big deal if you even use any of Silver Lining’s code; reimplementing shouldn’t be a barrier). [This app](https://bitbucket.org/ianb/silverlining/src/tip/docs/examples/simple-python/) is a very simple example, but could be a start for a test. It would be cool to have an app that tested its container (probably through a combination of self-testing once it is uploaded, and then running some tool that makes a bunch of requests and then tests the side effects all worked).
  
  It might be nice if the development and deployment tools were separated. E.g., silver serve (which serves up an app locally for development) doesn’t really have any relation to deployment.
  - Brent Tubbs says:
    
    March 31, 2011 at 4:19 pm
    
    Good point about separating deployment from devserver. If the package is standardized then the devserver is just another thing that can be outside the project’s scope (so long as it complies with the package spec). Then we can even have competing implementations of the devserver… maybe even a pretty GUI one like App Engine has on Windows and Mac. That would make the designers I work with a lot happier.
    - Ian says:
      
      March 31, 2011 at 4:25 pm
      
      Something I never got past fantasizing about, but would be really cool, would be a way of taking an application like this and building a cloud-hosted development server with development tools. Plug that into your version control post-commit so it keeps the app up to date, then point a web designer (or novice developer or what have you) to this cloud instance and they can edit things in-place with no personal setup. Or take all those same tools and build something that creates a virtual machine instance from that, and then a new developer can just start up that VM instance and develop on that.
      
      All these sorts of things are the rich set of tools I can imagine being built.
Artur says:

March 31, 2011 at 4:59 pm

I find it completely unpythonic. You want to build an abstract FRAMEWORK. And the reason why I think Python is a wonderful tool is that it conforms to the philosophy of Unix – it can be easily scripted and integrated with other OS tools, unlike Java which builds it’s own world.

I have worked for years with Java webapps and in practice WARs and EARs are inconvenient. You have to deal with large files even for simple changes. They work on a single app server unless a special attention and testing is given. Working with unpacked archives – convenient for development work – doesn’t work well and isn’t standarized.

I also don’t see a real need for such archives – 99% of web applications are developed in-house and don’t need abstract deployment protocol. And for such cases home-grown scripts for deployment are the best – you have the full control and don’t rely on some abstract mechanism, which unix admins can’t deal with.
Adrian says:

April 1, 2011 at 7:04 am

Hm, regular python packaging is a solved problem now, so we move on to web packaging? Talk about getting ahead of oneself.

Who would have thought that something as simple as manipulating a list (sys.path) could be so difficult.
Paul Winkler says:

April 1, 2011 at 9:15 am

I haven’t tried silverlining since around the time it changed its name from toppcloud, but the thing that turned me off about it was that I found the end result of deploying rather opaque. I understood my local development rig, but the deployment result on the server was rather different and I had a hard time finding things and understanding how it was tied together, so when things went wrong it was difficult to figure out.

I’m suggesting that having a smart container that eases deployment is great, but the implementation needs to be documented, and be as simple as possible to inspect and troubleshoot without special tools. And I think there’s value in having the production system be similar to the development system – or to flip that around, every difference between them is a potential source of hard-to-fix trouble.

I wish I could remember the details. I wonder how much of the stuff that confused me fell under the header of “stuff that is only really specified in code”. It felt rather like an “80% solution” to me, but it’s unfair to say that about something whose docs prominently say “don’t use this yet” :)

Meanwhile, most deployments are simple enough that I’m quite happy with the old workflow of pip install -r requirements.txt && edit-some-config-files.
jking says:

April 1, 2011 at 9:15 am

I like Chef.

Personally I think web applications involve too many layers and moving parts to be packaged like typical software packages. Over-coming this problem is going to inherit the complexities of those moving parts, their configuration, and so on. It seems like barking up the wrong tree… but maybe I’m just lazy.

One could use Fabric I guess if staying pure-Python really matters to you. Chef is better though in a lot of ways. Fabric could be better if there were more contributors. :)

I’d be interested in knowing if someone can actually bypass the complexities I’ve noted and actually create a package solution for installing web applications.
Paul Winkler says:

April 1, 2011 at 9:24 am

Sorry, didn’t mean to be a downer. I will definitely be keeping an eye on silverlining. Mostly just wanted to pass along the thoughts about troubleshooting deployment results.

Ian Bicking: a blog

A Python Web Application Package and Format (we should make one)

18 Comments

Home

About

Archives

Categories

Recent Posts

Recent Comments