Ian Bicking: a blog :: Silver Lining

{ Category Archives }

Silver Lining

Python Application Package

I’ve been thinking some more about deployment of Python web applications, and deployment in general (in part leading up to the Web Summit). And I’ve got an idea.

I wrote about this about a year ago and recently revised some notes on a proposal but I’ve been thinking about something a bit more basic: a way to simply ship server applications, bundles of code. Web applications are just one use case for this.

For now lets call this a "Python application package". It has these features:

There is an application description: this tells the environment about the application. (This is sometimes called "configuration" but that term is very confusing and overloaded; I think "description" is much clearer.)
Given the description, you can create an execution environment to run code from the application and acquire objects from the application. So there would be a specific way to setup sys.path, and a way to indicate any libraries that are required but not bundled directly with the application.
The environment can inject information into the application. (Also this sort of thing is sometimes called "configuration", but let’s not do that either.) This is where the environment could indicate, for instance, what database the application should connect to (host, username, etc).
There would be a way to run commands and get objects from the application. The environment would look in the application description to get the names of commands or objects, and use them in some specific manner depending on the purpose of the application. For instance, WSGI web applications would point the environment to an application object. A Tornado application might simply have a command to start itself (with the environment indicating what port to use through its injection).

There’s a lot of things you can build from these pieces, and in a sophisticated application you might use a bunch of them at once. You might have some WSGI, maybe a seperate non-WSGI server to handle Web Sockets, something for a Celery queue, a way to accept incoming email, etc. In pretty much all cases I think basic application lifecycle is needed: commands to run when an application is first installed, something to verify the environment is acceptable, when you want to back up its data, when you want to uninstall it.

There’s also some things that all environments should setup the same or inject into the application. E.g., $TMPDIR should point to a place where the application can keep its temporary files. Or, every application should have a directory (perhaps specified in another environmental variable) where it can write log files.

Details?

To get more concrete, here’s what I can imagine from a small application description; probably YAML would be a good format:

platform: python, wsgi
require:
os: posix
python: <3
rpm: m2crypto
deb: python-m2crypto
pip: requirements.txt
python:
paths: vendor/
wsgi:
app: myapp.wsgiapp:application

I imagine platform as kind of a series of mixins. This system doesn’t really need to be Python-specific; when creating something similar for Silver Lining I found PHP support relatively easy to add (handling languages that aren’t naturally portable, like Go, might be more of a stretch). So python is one of the features this application uses. You can imagine lots of modularization for other features, but it would be easy and unproductive to get distracted by that.

The application has certain requirements of its environment, like the version of Python and the general OS type. The application might also require libraries, ideally one libraries that are not portable (M2Crypto being an example). Modern package management works pretty nicely for this stuff, so relying on system packages as a first try I believe is best (I’d offer requirements.txt as a fallback, not as the primary way to handle dependencies).

I think it’s much more reliable if applications primarily rely on bundling their dependencies directly (i.e., using a vendor directory). The tool support for this is a bit spotty, but I believe this package format could clarify the problems and solutions. Here is an example of how you might set up a virtualenv environment for managing vendor libraries (you then do not need virtualenv to use those same libraries), and do so in a way where you can check the results into source control. It’s kind of complicated, but works (well, almost works – bin/ files need fixing up). It’s a start at least.

Support Library

On the environment side we need a good support library. pywebapp has some of the basic features, though it is quite incomplete. I imagine a library looking something like this:

from apppackage import AppPackage
app = AppPackage('/var/apps/app1.2012.02.11')
# Maybe a little Debian support directly:
subprocess.call(['apt-get', 'install'] +
app.config['require']['deb'])
# Or fall back of virtualenv/pip
app.create_virtualenv('/var/app/venvs/app1.2012.02.11')
app.install_pip_requirements()
wsgi_app = app.load_object(app.config['wsgi']['app'])

You can imagine building hosting services on this sort of thing, or setting up continuous integration servers (app.run_command(app.config['unit_test'])), and so forth.

Local Development

If designed properly, I think this format is as usable for local development as it is for deployment. It should be able to run directly from a checkout, with the "development environment" being an environment just like any other.

This rules out, or at least makes less exciting, the use of zip files or tarballs as a package format. The only justification I see for using such archives is that they are easy to move around; but we live in the FUTURE and there are many ways to move directories around and we don’t need to cater to silly old fashions. If that means a script that creates a tarball, FTPs it to another computer, and there it is unzipped, then fine – this format should not specify anything about how you actually deliver the files. But let’s not worry about copying WARs.

2012 02 29

Packaging
Python
Silver Lining
Web

Comments (9)

Permalink

Silver Lining: More People!

OK… so I said before Silver Lining is for collaborators not users. And that’s still true… it’s not a polished experience where you can confidently ignore the innards of the tool. But it does stuff, and it works, and you can use it. So… I encourage some more of you to do so.

Now would be a perfectly good time, for instance, to port an application you use to the system. Almost all Python applications should be portable. The requirements are fairly simple:

The application needs a WSGI interface.
It needs to be Python 2.6 compatible.
Any libraries that aren’t pure-Python need to be available as deb packages in some form.
Any persistence needs to be provided as a service; if the appropriate service isn’t already available you may need to write some code.

Also PHP applications should work (though you may encounter more rough edges), with these constraints:

No .htaccess files, so you have to implement any URL rewriting in PHP (e.g., for WordPress).
Source code is not writable, so self-installers that write files won’t work. (Self-installing plugins might be workable, but that hasn’t been worked out yet.)
And the same constraints for services.

So… take an application, give it a try, and tell me what you encounter.

Also I’d love to get feedback and ideas from people with more sysadmin background, or who know Ubuntu/Debian tricks. For instance, I’d like to handle some of the questions packages ask about on installation (right now they are all left as defaults, not always the right answer). I imagine there’s some non-interactive way to handle those questions but I haven’t been able to find it.

2010 04 21

Python
Silver Lining
Web

Comments (5)

Permalink

Core Competencies, Silver Lining, Packaging

I’ve been leaning heavily on Ubuntu and Debian packages for Silver Lining. Lots of "configuration management" problems are easy when you rely on the system packages… not for any magical reason, but because the package maintainers have done the configuration management for me. This includes dependencies, but also things like starting up services in the right order, and of course upgrades and security fixes. For this reason I really want everything that isn’t part of Silver Lining’s core to be in the form of a package.

But then, why isn’t everything a package? Arguably some pieces really should be, I’m just too lazy and developing those pieces too quickly to do it. But more specifically, why aren’t applications packages? It’s not the complexity really — the tool could encapsulate all the complexity of packaging if I chose. I always knew intuitively it didn’t make sense, but it took me a while to decide quite why.

There are a lot of specific reasons, but the overriding reason is that I don’t want to outsource a core function. Silver Lining isn’t a tool to install database servers. That is something it does, but that’s not its purpose, and so it can install a server with apt-get install mysql-server-5.1. In doing so it’s saving a lot of work, it’s building on the knowledge of people more interested in MySQL than I, but it’s also deferring a lot of control. When it comes to actually deploying applications I’m not willing to have Silver Lining defer to another system, because deploying applications is the entire point of the tool.

There are many specific reasons. I want multiple instances of an application deployed simultaneously. I can optimize the actual code delivery (using rsync) instead of delivering an entire bundle of code. The setup is specific to the environment and configuration I’ve set up on servers, it’s not a generic package that makes sense on a generic Ubuntu system. I don’t want any central repository of packages or a single place where releases have to go through. I want to allow for the possibility of multiple versions of an application running simultaneously. I’m coordinating services even as I deploy the application, something which Debian packages try to do a little, but don’t do consistently or well. But while there’s a list of reasons, it doesn’t matter that much — there’s no particular size of list that scared me off, and if I’m misinformed about the way packages work or if there are techniques to avoid these problems it doesn’t really matter to me… the real reason is that I don’t want to defer control over the one thing that Silver Lining must do well.

2010 04 09

Packaging
Silver Lining

Comments Off

Permalink

The Web Server Benchmarking We Need

Another WSGI web server benchmark was published. It’s a decent benchmark, despite some criticisms. But it benchmarks what everyone benchmarks: serving up a trivial app really really quickly. This is not very useful to me. Also, performance is not to me the most important differentiation of servers.

In Silver Lining we’re using mod_wsgi. Silver Lining isn’t tied to mod_wsgi (applications can’t really tell), and we may revisit that decision (mostly because of memory concerns), but it is a deliberate choice. mod_wsgi is one of the few multiprocess WSGI servers, and it manages its children (the same way Apache manages all its children). So if a child stops responding, it gets taken out of the pool and killed (brutal efficiency! Or at least brutal terminology). Child processes are also recycled, guarding against memory leaks or other peculiarities. Sometimes these kinds of things are dismissed for covering up bugs, but (a) production is a lousy time to learn about bugs, (b) it’s like a third tier of garbage collection, and (c) the bugs you are avoiding are often bugs you can’t fix anyway (for instance, if your mysql driver leaks memory, is that the application developer’s fault?)

I wish there was competition among servers not to see who can tweak their performance for entirely unrealistic situations, but to see who can implement the most fail-safe server. We’re missing good benchmarks. Unfortunately benchmarks are a pain in the butt to write and manage.

But I hope someone writes a benchmark like that. Here’s some things I’d like to see benchmarked:

A "realistic" CPU-bound application. for i in xrange(10000000): pass is a reasonable start.
An application that generates big responses, e.g., "x"*100000.
An I/O bound application. E.g., one that reads a big file.
A simply slow application (time.sleep(1)).
Applications that wedge. while 1: pass perhaps? Or lock = threading.Lock(); lock.acquire(); lock.acquire(). Wedging in C and wedging in Python are different, so a bunch of different kinds of wedging.
Applications that segfault. ctypes is specially designed for this.
Applications that leak memory like a sieve, e.g., global_var.extend(['x']*10000).
Large uploads.
Slow uploads, like a client that takes 30 seconds to upload 1Mb.
Also slow downloads.
In each case it is interesting what happens when something bad happens to just a portion of requests. E.g., if 1% of requests wedge hard. A good container will serve the other 99% of requests properly. A bad container will have its worker pool exhausted and completely stop.
Mixing and matching these could be interesting. For instance Dave Beazley found some bad GIL results mixing I/O and CPU-bound code.
Add ideas in the comments and I’ll copy them into this list.

The hardest part of writing this is not the applications (they are simple). One annoyance is wiring up the applications, but handily Nicholas covers that well in his benchmark. You also have to make sure to clean up, as many servers will not exit cleanly from some of the tests. Another nuisance is that some of these require funny clients. These aren’t too hard to write, but you can’t just use ab. Then you have to report.

Anyway: I would love it if someone did this, and packaged it as repeatable/runnable code/scripts. I’ll help some, but I can’t lead. I’d both really like to see the results, and in my ideal world people writing servers would start using these benchmarks to make their servers more robust.

2010 03 16

Programming
Python
Silver Lining
Web

Comments (23)

Permalink

Configuration management: push vs. pull

Since I’ve been thinking about deployment I’ve been thinking a lot more about what "configuration management" means, how it should work, what it should do.

I guess my quick summary of configuration management is that it is setting up a server correctly. "Correct" is an ambiguous term, but given that there are so many to configuration management the solutions are also ambiguous.

Silver Lining includes configuration management of a sort. It is very simple. Right now it is simply a bunch of files to rsync over, and one shell script (you can see the files here and the script here — at least until I move them and those links start 404ing). Also each "service" (e.g., a database) has a simple setup script. I’m sure this system will become more complicated over time, but it’s really simple right now, and I like that.

The other system I’ve been asked about the most about lately is Puppet. Puppet is a real configuration management system. The driving forces are very different: I’m just trying to get a system set up that is in all ways acceptable for web application deployment. I want one system set up for one kind of task; I am completely focused on that end, and I care about means only insofar as I don’t want to get distracted by those means. Puppet is for people who care about the means, not just the ends. People who want things to work in a particular way; I only care that they work.

That’s the big difference between Puppet and Silver Lining. The smaller difference (that I want to talk about) is "push" vs. "pull". Grig wrote up some notes on two approaches. Silver Lining uses a "push" system (though calling it a "system" is kind of overselling what it does) while Puppet is "pull". Basically Silver Lining runs these commands (from your personal computer):

$ rsync -r <silverlining>/server-files/serverroot/ root@server:/
$ ssh root@server "$(cat <silverlining>/server-files/update-server-script.sh)"

This is what happens when you run silver setup-node server: it pushes a bunch of files over to the server, and runs a shell script. If you update either of the files or the shell script you run silver setup-node again to update the server. This is "push" because everything is initiated by the "master" (in this case, the developer’s personal computer).

Puppet uses a pull model. In this model there is a daemon running on every machine, and these machines call in to the master to see if there’s any new instructions for them. If there are, the daemon applies those instructions to the machine it is running on.

Grig identifies two big advantages to this pull model:

When a new server comes up it can get instructions from the master and start doing things. You can’t push instructions to a server that isn’t there, and the server itself is most aware of when it is ready to do stuff.
If a lot of servers come up, they can all do the setup work on their own, they only have to ask the master what to do.

But… I don’t buy either justification.

First: servers don’t just do things when they start up. To get this to work you have to create custom images with Puppet installed, and configured to know where the master is, and either the image or the master needs some indication of what kind of server you intended to create. All this is to avoid polling a server to see when it comes online. Polling a server is lame (and is the best Silver Lining can do right now), but avoiding polling can be done with something a lot simpler than a complete change from push to pull.

Second: there’s nothing unscalable about push. Look at those commands: one rsync and one ssh. The first is pretty darn cheap, and the second is cheap on the master and expensive on the remote machine (since it is doing things like installing stuff). You need to do it on lots of machines? Then fork a bunch of processes to run those two commands. This is not complicated stuff.

It is possible to write a push system that is hard to scale, if the master is doing lots of work. But just don’t do that. Upload your setup code to the remote server/slave and run it there. Problem fixed!

What are the advantages of push?

Easy to bootstrap. A bare server can be setup with push, no customization needed. Any customization is another kind of configuration, and configuration should be automated, and… well, this is why it’s a bootstrap problem.
Errors are synchronous: if your setup code doesn’t work, your push system will get the error back, you don’t need some fancy monitor and you don’t need to check any logs. Weird behavior is also synchronous; can’t tell why servers are doing something? Run the commands and watch the output.
Development is sensible: if you have a change to your setup scripts, you can try it out from your machine. You don’t need to do anything exceptional, your machine doesn’t have to accept connections from the slave, you don’t need special instructions to keep the slave from setting itself up as a production machine, there’s no daemon that might need modifications… none of that. You change the code, you run it, it works.
It’s just so damn simple. If you don’t start thinking about push and pull and other design choices, it simply becomes: do the obvious and easy thing.

In conclusion: push is sensible, pull is needless complexity.

2010 03 10

Programming
Silver Lining

Comments (19)

Permalink

toppcloud renamed to Silver Lining

After some pondering at PyCon, I decided on a new name for toppcloud: Silver Lining. I’ll credit a mysterious commenter "david" with the name idea. The command line is simply silver — silver update has a nice ring to it.

There’s a new site: cloudsilverlining.org; not notably different than the old site, just a new name. The product is self-hosting now, using a simple app that runs after every commit to regenerate the docs, and with a small extension to Silver Lining itself (to make it easier to host static files). Now that it has a real name I also gave it a real mailing list.

Silver Lining also has its first test. Not an impressive test, but a test. I’m hoping with a VM-based libcloud backend that a full integration test can run in a reasonable amount of time. Some unit tests would be possible, but so far most of the bugs have been interaction bugs so I think integration tests will have to pull most of the weight. (A continuous integration rig will be very useful; I am not sure if Silver Lining can self-host that, though it’d be nice/clever if it could.)

2010 03 03

Packaging
Programming
Python
Silver Lining
Web

Comments (7)

Permalink

Why toppcloud (Silver Lining) will not be agnostic

I haven’t received a great deal of specific feedback on toppcloud (update: renamed Silver Lining), only a few people (Ben Bangert, Jorge Vargas) seem to have really dived deeply into it. But — and this is not unexpected — I have already gotten several requests about making it more agnostic with respect to… stuff. Maybe that it not (at least forever) require Ubuntu. Or maybe that it should support different process models (e.g., threaded and multiple processes). Or other versions of Python.

The more I think about it, and the more I work with the tool, the more confident I am that toppcloud should not be agnostic on these issues. This is not so much about an "opinionated" tool; toppcloud is not actually very opinionated. It’s about a well-understood system.

For instance, Ben noticed a problem recently with weird import errors. I don’t know quite why mod_wsgi has this particular problem (when other WSGI servers that I’ve used haven’t), but the fix isn’t that hard. So Ben committed a fix and the problem went away.

Personally I think this is a bug with mod_wsgi. Maybe it’s also a Python bug. But it doesn’t really matter. When a bug exists it "belongs" to everyone who encounters it.

toppcloud is not intended to be a transparent system. When it’s working correctly, you should be able to ignore most of the system and concentrate on the relatively simple abstractions given to your application. So if the configuration reveals this particular bug in Python/mod_wsgi, then the bug is essentially a toppcloud bug, and toppcloud should (and can) fix it.

A more flexible system can ignore such problems as being "somewhere else" in the system. Or, if you don’t define these problems as someone else’s problem, then a more flexible system is essentially always broken somewhere; there is always some untested combination, some new component, or some old component that might get pushed into the mix. Fixes for one person’s problem may introduce a new problem in someone else’s stack. Some fixes aren’t even clear. toppcloud has Varnish in place, so it’s quite clear where a fix related to Django and Varnish configuration goes. If these were each components developed by different people at different times (like with buildout recipes) then fixing something like this could get complicated.

So I feel very resolved: toppcloud will hardcode everything it possibly can. Python 2.6 and only 2.6! (Until 2.7, but then only 2.7!). Only Varnish/Apache/mod_wsgi. I haven’t figured out threads/processes exactly, but once I do, there will be only one way! And if I get it wrong, then everyone (everyone) will have to switch when it is corrected! Because I’d much rather have a system that is inflexible than one that doesn’t work. With a clear and solid design I think it is feasible to get this to work, and that is no small feat.

Relatedly, I think I’m going to change the name of toppcloud, so ideas are welcome!

2010 02 10

Programming
Python
Silver Lining
Web

Comments (23)

Permalink

toppcloud (Silver Lining) and Django

I wrote up instructions on using toppcloud (update: renamed Silver Lining) with Django. They are up on the site (where they will be updated in the future), but I’ll drop them here too…

Creating a Layout

First thing you have to do (after installing toppcloud of course) is create an environment for your new application. Do that like:

$ toppcloud init sampleapp

This creates a directory sampleapp/ with a basic layout. The first thing we’ll do is set up version control for our project. For the sake of documentation, imagine you go to bitbucket and create two new repositories, one called sampleapp and another called sampleapp-lib (and for the examples we’ll use the username USER).

We’ll go into our new environment and use these:

$ cd sampleapp
$ hg clone http://bitbucket.org/USER/sampleapp src/sampleapp
$ rm -r lib/python/
$ hg clone http://bitbucket.org/USER/sampleapp-lib lib/python
$ mkdir lib/python/bin/
$ echo "syntax: glob
bin/python*
bin/activate
bin/activate_this.py
bin/pip
bin/easy_install*
" > lib/python/.hgignore
$ mv bin/* lib/python/bin/
$ rmdir bin/
$ ln -s lib/python/bin bin

Now there is a basic layout setup, with all your libraries going into the sampleapp-lib repository, and your main application in the sampleapp repository.

Next we’ll install Django:

$ source bin/activate
$ pip install Django

Then we’ll set up a standard Django site:

$ cd src/sampleapp
$ django-admin.py sampleapp

Also we’d like to be able to import this file. It’d be nice if there was a setup.py file, and we could run pip -e src/sampleapp, but django-admin.py doesn’t create that itself. Instead we’ll get that on the import path more manually with a .pth file:

$ echo "../../src/sampleapp" > lib/python/sampleapp.pth

Also there’s the tricky $DJANGO_SETTINGS_MODULE that you might have had problems with before. We’ll use the file lib/python/toppcustomize.py (which is imported everytime Python is started) to make sure that is always set:

$ echo "import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'sampleapp.settings'
" > lib/python/toppcustomize.py

Also we have a file src/sampleapp/sampleapp/manage.py, and that file doesn’t work quite how we’d like. Instead we’ll put a file into bin/manage.py that does the same thing:

$ rm sampleapp/manage.py
$ cd ../..
$ echo '#!/usr/bin/env python
from django.core.management import execute_manager
from sampleapp import settings
if __name__ == "__main__":
execute_manager(settings)
' > bin/manage.py
$ chmod +x bin/manage.py

Now, if you were just using plain Django you’d do something like run python manage.py runserver. But we’ll be using toppcloud serve instead, which means we have to set up the two other files toppcloud needs: app.ini and the runner. Here’s a simple app.ini:

$ echo '[production]
app_name = sampleapp
version = 1
runner = src/sampleapp/toppcloud-runner.py
' > src/sampleapp/toppcloud-app.ini
$ rm app.ini
$ ln -s src/sampleapp/toppcloud-app.ini app.ini

The file must be in the "root" of your application, and named app.ini, but it’s good to keep it in version control, so we set it up with a symlink.

It also refers to a "runner", which is the Python file that loads up the WSGI application. This looks about the same for any Django application, and we’ll put it in src/sampleapp/toppcloud-runner.py:

$ echo 'import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()
' > src/sampleapp/toppcloud-runner.py

Now if you want to run the application, you can:

$ toppcloud serve .

This will load it up on http://localhost:8080, and serve up a boring page. To do something interesting we’ll want to use a database.

Setting Up A Database

At the moment the only good database to use is PostgreSQL with the PostGIS extensions. Add this line to app.ini:

service.postgis =

This makes the database "available" to the application. For development you still have to set it up yourself. You should create a database sampleapp on your computer.

Next, we’ll need to change settings.py to use the new database configuration. Here’s the lines that you’ll see:

DATABASE_ENGINE = '' # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
DATABASE_NAME = '' # Or path to database file if using sqlite3.
DATABASE_USER = '' # Not used with sqlite3.
DATABASE_PASSWORD = '' # Not used with sqlite3.
DATABASE_HOST = '' # Set to empty string for localhost. Not used with sqlite3.
DATABASE_PORT = '' # Set to empty string for default. Not used with sqlite3.

First add this to the top of the file:

import os

Then you’ll change those lines to:

DATABASE_ENGINE = 'postgresql_psycopg2'
DATABASE_NAME = os.environ['CONFIG_PG_DBNAME']
DATABASE_USER = os.environ['CONFIG_PG_USER']
DATABASE_PASSWORD = os.environ['CONFIG_PG_PASSWORD']
DATABASE_HOST = os.environ['CONFIG_PG_HOST']
DATABASE_PORT = ''

Now we can create all the default tables:

$ manage.py syncdb
Creating table auth_permission
Creating table auth_group
Creating table auth_user
Creating table auth_message
Creating table django_content_type
Creating table django_session
Creating table django_site
...

Now we have an empty project that doesn’t do anything. Let’s make it do a little something (this is all really based on the Django tutorial).

$ manage.py startapp polls

Django magically knows to put the code in src/sampleapp/sampleapp/polls/ — we’ll setup the model in src/sampleapp/sampleapp/polls/models.py:

from django.db import models

class Poll(models.Model):
question = models.CharField(max_length=200)
pub_date = models.DateTimeField('date published')

class Choice(models.Model):
poll = models.ForeignKey(Poll)
choice = models.CharField(max_length=200)
votes = models.IntegerField()

And activate the application by adding 'sampleapp.polls' to INSTALLED_APPS in src/sampleapp/sampleapp/settings.py. Also add 'django.contrib.admin' to get the admin app in place. Run manage.py syncdb to get the tables in place.

You can try toppcloud serve . and go to /admin/ to login and see your tables. You might notice all the CSS is broken.

toppcloud serves static files out of the static/ directory. You don’t actually put static in the URLs, these files are available at the top-level (unless you create a static/static/ directory). The best way to put files in there is generally symbolic links.

For Django admin, do this:

$ cd static
$ ln -s ../lib/python/django/contrib/admin/media admin-media

Now edit src/sampleapp/sampleapp/settings.py and change ADMIN_MEDIA_PREFIX to '/admin-media'.

(Probably some other links should be added.)

One last little thing you might want to do; replace this line in
settings:

SECRET_KEY = 'ASF#@$@#JFAS#@'

With this:

from tcsupport.secret import get_secret
SECRET_KEY = get_secret()

Then you don’t have to worry about checking a secret into version control.

You still don’t really have an application, but the rest is mere "programming" so have at it!

2010 02 05

Programming
Python
Silver Lining
Web

Comments (2)

Permalink

A new way to deploy web applications

Deployment is one of the things I like least about development, and yet without deployment the development doesn’t really matter.

I’ve tried a few things (e.g. fassembler), built a few things (virtualenv, pip), but deployment just sucked less as a result. Then I got excited about App Engine; everyone else was getting excited about "scaling", but really I was excited about an accessible deployment process. When it comes to deployment App Engine is the first thing that has really felt good to me.

But I can’t actually use App Engine. I was able to come to terms with the idea of writing an application to the platform, but there are limits… and with App Engine there were simply too many limits. Geo stuff on App Engine is at best a crippled hack, I miss lxml terribly, I never hated relational databases, and almost nothing large works without some degree of rewriting. Sometimes you can work around it, but you can never be sure you won’t hit some wall later. And frankly working around the platform is tiring and not very rewarding.

So… App Engine seemed neat, but I couldn’t use it, and deployment was still a problem.

What I like about App Engine: an application is just files. There’s no build process, no fancy copying of things in weird locations, nothing like that; you upload files, and uploading files just works. Also, you can check everything into version control. Not just your application code, but every library you use, the exact files that you installed. I really wanted a system like that.

At the same time, I started looking into "the cloud". It took me a while to get a handle on what "cloud computing" really means. What I learned: don’t overthink it. It’s not magic. It’s just virtual private servers that can be requisitioned automatically via an API, and are billed on a short time cycle. You can expand or change the definition a bit, but this definition is the one that matters to me. (I’ve also realized that I cannot get excited about complicated solutions; only once I realized how simple cloud computing is could I really get excited about the idea.)

Given the modest functionality of cloud computing, why does it matter? Because with a cloud computing system you can actually test the full deployment stack. You can create a brand-new server, identical to all servers you will create in the future; you can set this server up; you can deploy to it. You get it wrong, you throw away that virtual server and start over from the beginning, fixing things until you get it right. Billing is important here too; with hourly billing you pay cents for these tests, and you don’t need a pool of ready servers because the cloud service basically manages that pool of ready servers for you.

Without "cloud computing" we each too easily find ourselves in a situation where deployments are ad hoc, server installations develop over time, and servers and applications are inconsistent in their configuration. Cloud computing makes servers disposable, which means we can treat them in consistent ways, testing our work as we go. It makes it easy to treat operations with the same discipline as software.

Given the idea from App Engine, and the easy-to-use infrastructure of a cloud service, I started to script together something to manage the servers and start up applications. I didn’t know what exactly I wanted to do to start, and I’m not completely sure where I’m going with this. But on the whole this feels pretty right. So I present the provisionally-named: toppcloud (Update: this has been renamed Silver Cloud).

How it works: first you have a directory of files that defines your application. This probably includes a checkout of your "application" (let’s say in src/mynewapp/), and I find it also useful to use source control on the libraries (which are put in lib/python/). There’s a file in app.ini that defines some details of the application (very similar to app.yaml).

While app.ini is a (very minimal) description of the application, there is no description of the environment. You do not specify database connection details, for instance. Instead your application requests access to a database service. For instance, one of these services is a PostgreSQL/PostGIS database (which you get if you put service.postgis in your app.ini file). If you ask for that then there will be evironmental variables, CONFIG_PG_DBNAME etc., that will tell your application how to connect to the database. (For local development you can provide your own configuration, based on how you have PostgreSQL or some other service installed.)

The standard setup is also a virtualenv environment. It is setup so every time you start that virtualenv environment you’ll get those configuration-related environmental variables. This means your application configuration is always present, your services always available. It’s available in tests just like it is during a request. Django accomplishes something similar with the (much maligned) $DJANGO_SETTINGS_MODULE but toppcloud builds it into the virtualenv environment instead of the shell environment.

And how is the server setup? Much like with App Engine that is merely an implementation detail. Unlike App Engine that’s an implementation detail you can actually look at and change (by changing toppcloud), but it’s not something you are supposed to concern yourself with during regular application development.

The basic lifecycle using toppcloud looks like:

toppcloud create-node: Create a new virtual server; you can create any kind of supported server, but only Ubuntu Jaunty or Karmic are supported (and Jaunty should probably be dropped). This step is where the "cloud" part actually ends. If you want to install a bare Ubuntu onto an existing physical machine that’s fine too — after toppcloud create-node the "cloud" part of the process is pretty much done. Just don’t go using some old Ubuntu install; this tool is for clean systems that are used only for toppcloud.
toppcloud setup-node: Take that bare Ubuntu server and set it up (or update it) for use with toppcloud. This installs all the basic standard stuff (things like Apache, mod_wsgi, Varnish) and some management script that toppcloud runs. This is written to be safe to run over and over, so upgrading and setting up a machine are the same. It needs to be a bare server, but
toppcloud init path/to/app/: Setup a basic virtualenv environment with some toppcloud customizations.
toppcloud serve path/to/app: Serve up the application locally.
toppcloud update --host=test.example.com path/to/app/: This creates or updates an application at the given host. It edits /etc/hosts so that the domain is locally viewable.
toppcloud run test.example.com script.py: Run a script (from bin/) on a remote server. This allows you to run things like django-admin.py syncdb.

There’s a few other things — stuff to manage the servers and change around hostnames or the active version of applications. It’s growing to fit a variety of workflows, but I don’t think its growth is unbounded.

So… this is what toppcloud. From the outside it doen’t do a lot. From the inside it’s not actually that complicated either. I’ve included a lot of constraints in the tool but I think it offers an excellent balance. The constraints are workable for applications (insignificant for many applications), while still exposing a simple and consistent system that’s easier to reason about than a big-ball-of-server.

Some of the constraints:

Binary packages are supported via Ubuntu packages; you only upload portable files. If you need a library like lxml, you need to request that package (python-lxml) to be installed in your app.ini. If you need a version of a binary library that is not yet packaged, I think creating a new deb is reasonable.
There is no Linux distribution abstraction, but I don’t care.
There is no option for the way your application is run — there’s one way applications are run, because I believe there is a best practice. I might have gotten the best practice wrong, but that should be resolved inside toppcloud, not inside applications. Is Varnish a terrible cache? Probably not, but if it is we should all be able to agree on that and replace it. If there are genuinely different needs then maybe additional application or deployment configuration will be called for — but we shouldn’t add configuration just because someone says there is a better practice (and a better practice that is not universally better); there must be justifications.
By abstracting out services and persistence some additional code is required for each such service, and that code is centralized in toppcloud, but it means we can also start to add consistent tools usable across a wide set of applications and backends.
All file paths have to be relative, because files get moved around. I know of some particularly problematic files (e.g., .pth files), and toppcloud fixes these automatically. Mostly this isn’t so hard to do.

These particular compromises are ones I have not seen in many systems (and I’ve started to look more). App Engine I think goes too far with its constraints. Heroku is close, but closed source.

This is different than a strict everything-must-be-a-package strategy. This deployment system is light and simple and takes into account reasonable web development workflows. The pieces of an application that move around a lot are all well-greased and agile. The parts of an application that are better to Do Right And Then Leave Alone (like Apache configuration) are static.

Unlike generalized systems like buildout this system avoids "building" entirely, making deployment a simpler and lower risk action, leaning on system packages for the things they do best. Other open source tools emphasize a greater degree of flexibility than I think is necessary, allowing people to encode exploratory service integration into what appears to be an encapsulated build (I’m looking at you buildout).

Unlike requirement sets and packaging and versioning libraries, this makes all the Python libraries (typically the most volatile libraries) explicit and controlled, and can better ensure that small updates really are small. It doesn’t invalidate installers and versioning, but it makes that process even more explicit and encourages greater thoughtfulness.

Unlike many production-oriented systems (what I’ve seen in a lot of "cloud" tools) this encorporates both the development environment and production environment; but unlike some developer-oriented systems this does not try to normalize everyone’s environment and instead relies on developers to set up their systems however is appropriate. And unlike platform-neutral systems this can ensure an amount of reliability and predictability through extremely hard requirements (it is deployed on Ubuntu Jaunty/Karmic only).

But it’s not all constraints. Toppcloud is solidly web framework neutral. It’s even slightly language neutral. Though it does require support code for each persistence technique, it is fairly easy to do, and there are no requirements for "scalable systems"; I think unscalable systems are a perfectly reasonable implementation choice for many problems. I believe a more scalable system could be built on this, but as a deployment and development option, not a starting requirement.

So far I’ve done some deployments using toppcloud; not a lot, but some. And I can say that it feels really good; lots of rough edges still, but the core concept feels really right. I’ve made a lot of sideways attacks on deployment, and a few direct attacks… sometimes I write things that I think are useful, and sometimes I write things that I think are right. Toppcloud is at the moment maybe more right than useful. But I genuinely believe this is (in theory) a universally appropriate deployment tool.

Alright, so now you think maybe you should look more at toppcloud…

Well, I can offer you a fair amount of documentation. A lot of that documentation refers to design, and a bit of it to examples. There’s also a couple projects you can look at; they are all small, but :

Frank (will be interactivesomerville.org) which is another similar Django/Pinax project (Pinax was a bit tricky). This is probably the largest project. It’s a Django/Pinax volunteer-written application for collecting community feedback the Boston Greenline project, if that sounds interesting to you might want to chip in on the development (if so check out the wiki).
Neighborly, with minimal functionality (we need to run more sprints) but an installation story.
bbdocs which is a very simple bitbucket document generator, that makes the toppcloud site.
geodns which is another simple no-framework PostGIS project.

Now, the letdown. One thing I cannot offer you is support. THERE IS NO SUPPORT. I cannot now, and I might never really be able to support this tool. This tool is appropriate for collaborators, for people who like the idea and are ready to build on it. If it grows well I hope that it can grow a community, I hope people can support each other. I’d like to help that happen. But I can’t do that by bootstrapping it through unending support, because I’m not good at it and I’m not consistent and it’s unrealistic and unsustainable. This is not a open source dead drop. But it’s also not My Future; I’m not going to build a company around it, and I’m not going to use all my free time supporting it. It’s a tool I want to exist. I very much want it to exist. But even very much wanting something is not the same as being an undying champion, and I am not an undying champion. If you want to tell me what my process should be, please do!

If you want to see me get philosophical about packaging and deployment and other stuff like that, see my upcoming talk at PyCon.

2010 01 29

Packaging
Programming
Python
Silver Lining
Web

Comments (13)

Permalink

Ian Bicking: a blog

Silver Lining

Python Application Package

Details?

Support Library

Local Development

2012 02 29

Silver Lining: More People!

2010 04 21

Core Competencies, Silver Lining, Packaging

2010 04 09

The Web Server Benchmarking We Need

2010 03 16

Configuration management: push vs. pull

2010 03 10

toppcloud renamed to Silver Lining

2010 03 03

Why toppcloud (Silver Lining) will not be agnostic

2010 02 10

toppcloud (Silver Lining) and Django

Creating a Layout

Setting Up A Database

2010 02 05

A new way to deploy web applications

2010 01 29

Home

About

Archives

Categories

Recent Posts

Recent Comments