Ian Bicking: a blog :: 2010

The Web Server Benchmarking We Need

Another WSGI web server benchmark was published. It’s a decent benchmark, despite some criticisms. But it benchmarks what everyone benchmarks: serving up a trivial app really really quickly. This is not very useful to me. Also, performance is not to me the most important differentiation of servers.

In Silver Lining we’re using mod_wsgi. Silver Lining isn’t tied to mod_wsgi (applications can’t really tell), and we may revisit that decision (mostly because of memory concerns), but it is a deliberate choice. mod_wsgi is one of the few multiprocess WSGI servers, and it manages its children (the same way Apache manages all its children). So if a child stops responding, it gets taken out of the pool and killed (brutal efficiency! Or at least brutal terminology). Child processes are also recycled, guarding against memory leaks or other peculiarities. Sometimes these kinds of things are dismissed for covering up bugs, but (a) production is a lousy time to learn about bugs, (b) it’s like a third tier of garbage collection, and (c) the bugs you are avoiding are often bugs you can’t fix anyway (for instance, if your mysql driver leaks memory, is that the application developer’s fault?)

I wish there was competition among servers not to see who can tweak their performance for entirely unrealistic situations, but to see who can implement the most fail-safe server. We’re missing good benchmarks. Unfortunately benchmarks are a pain in the butt to write and manage.

But I hope someone writes a benchmark like that. Here’s some things I’d like to see benchmarked:

A "realistic" CPU-bound application. for i in xrange(10000000): pass is a reasonable start.
An application that generates big responses, e.g., "x"*100000.
An I/O bound application. E.g., one that reads a big file.
A simply slow application (time.sleep(1)).
Applications that wedge. while 1: pass perhaps? Or lock = threading.Lock(); lock.acquire(); lock.acquire(). Wedging in C and wedging in Python are different, so a bunch of different kinds of wedging.
Applications that segfault. ctypes is specially designed for this.
Applications that leak memory like a sieve, e.g., global_var.extend(['x']*10000).
Large uploads.
Slow uploads, like a client that takes 30 seconds to upload 1Mb.
Also slow downloads.
In each case it is interesting what happens when something bad happens to just a portion of requests. E.g., if 1% of requests wedge hard. A good container will serve the other 99% of requests properly. A bad container will have its worker pool exhausted and completely stop.
Mixing and matching these could be interesting. For instance Dave Beazley found some bad GIL results mixing I/O and CPU-bound code.
Add ideas in the comments and I’ll copy them into this list.

The hardest part of writing this is not the applications (they are simple). One annoyance is wiring up the applications, but handily Nicholas covers that well in his benchmark. You also have to make sure to clean up, as many servers will not exit cleanly from some of the tests. Another nuisance is that some of these require funny clients. These aren’t too hard to write, but you can’t just use ab. Then you have to report.

Anyway: I would love it if someone did this, and packaged it as repeatable/runnable code/scripts. I’ll help some, but I can’t lead. I’d both really like to see the results, and in my ideal world people writing servers would start using these benchmarks to make their servers more robust.

2010 03 16

Programming
Python
Silver Lining
Web

Comments (23)

Permalink

What Does A WebOb App Look Like?

Lately I’ve been writing code using WebOb and just a few other small libraries. It’s not entirely obvious what this looks like, so I thought I’d give a simple example.

I make each application a class. Instances of the class are "configured applications". So it looks a little like this (for an application that takes one configuration parameter, file_path):

class Application(object):
def __init__(self, file_path):
self.file_path = file_path

Then the app needs to be a WSGI app, because that’s how I roll. I use webob.dec:

from webob.dec import wsgify
from webob import exc
from webob import Response

class Application(object):
def __init__(self, file_path):
self.file_path = file_path
@wsgify
def __call__(self, req):
return Response('Hi!')

Somewhere separate from the application you actually instantiate Application. You can use Paste Deploy for that, configure it yourself, or just do something ad hoc (a lot of mod_wsgi .wsgi files are like this, basically).

I use webob.exc for things like exc.HTTPNotFound(). You can raise that as an exception, but I mostly just return the object (to the same effect).

Now you have Hello World. I then sometimes do something terrible, I start handling URLs like this:

@wsgify
def __call__(self, req):
if req.path_info == '/':
return self.index(req)
elif req.path_info.startswith('/view/'):
return self.view(req)
return exc.HTTPNotFound()

This is lazy and a very bad idea. So you want a dispatcher. There are several (e.g., selector). I’ll use Routes here… the latest release makes it a bit easier (though it could still be streamlined a bit). Here’s a pattern I think makes sense:

from routes import Mapper

class Application(object):
map = Mapper()
map.connect('index', '/', method='index')
map.connect('view', '/view/{item}', method='view')

def __init__(self, file_path):
self.file_path = file_path

@wsgify
def __call__(self, req):
results = self.map.routematch(environ=req.environ)
if not results:
return exc.HTTPNotFound()
match, route = results
link = URLGenerator(self.map, req.environ)
req.urlvars = ((), match)
kwargs = match.copy()
method = kwargs.pop('method')
req.link = link
return getattr(self, method)(req, **kwargs)

def index(self, req):
...
def view(self, req, item):
...

Another way you might do it is to skip the class, which means skipping a clear place for configuration. I don’t like that, but if you don’t care about that, then it looks like this:

def index(self, req):
...
def view(self, req, item):
...

map = Mapper()
map.connect('index', '/', view=index)
map.connect('view', '/view/{item}', view=view)

@wsgify
def application(req):
results = map.routematch(environ=req.environ)
if not results:
return exc.HTTPNotFound()
match, route = results
link = URLGenerator(map, req.environ)
req.urlvars = ((), match)
kwargs = match.copy()
view = kwargs.pop('view')
req.link = link
return view(req, **kwargs)

Then application is pretty much boilerplate. You could put configuration in the request if you wanted, or use some other technique (like Contextual).

I talked some with Ben Bangert about what he’s trying with these patterns, and he’s doing something reminiscent of Pylons controllers (but without the rest of Pylons) and it looks more like this (with my own adaptations):

class BaseController(object):
special_vars = ['controller', 'action']

def __init__(self, request, link, **config):
self.request = request
self.link = link
for name, value in config.items():
setattr(self, name, value)

def __call__(self):
action = self.request.urlvars.get('action', 'index')
if hasattr(self, '__before__'):
self.__before__()
kwargs = req.urlsvars.copy()
for attr in self.special_vars
if attr in kwargs:
del kwargs[attr]
return getattr(self, action)(**kwargs)

class Index(BaseController):
def index(self):
...
def view(self, item):
...

class Application(object):
map = Mapper()
map.connect('index', '/', controller=Index)
map.connect('view', '/view/{item}', controller=Index, action='view')

def __init__(self, **config):
self.config = config

@wsgify
def __call__(self, req):
results = self.map.routematch(environ=req.environ)
if not results:
return exc.HTTPNotFound()
match, route = results
link = URLGenerator(self.map, req.environ)
req.urlvars = ((), match)
controller = match['controller'](req, link, **self.config)
return controller()

That’s a lot of code blocks, but they all really say the same thing ;) I think writing apps with almost-no-framework like this is pretty doable, so if you have something small you should give it a go. I think it’s especially appropriate for applications that are an API (not a "web site").

2010 03 12

Programming
Python
Web

Comments (30)

Permalink

Configuration management: push vs. pull

Since I’ve been thinking about deployment I’ve been thinking a lot more about what "configuration management" means, how it should work, what it should do.

I guess my quick summary of configuration management is that it is setting up a server correctly. "Correct" is an ambiguous term, but given that there are so many to configuration management the solutions are also ambiguous.

Silver Lining includes configuration management of a sort. It is very simple. Right now it is simply a bunch of files to rsync over, and one shell script (you can see the files here and the script here — at least until I move them and those links start 404ing). Also each "service" (e.g., a database) has a simple setup script. I’m sure this system will become more complicated over time, but it’s really simple right now, and I like that.

The other system I’ve been asked about the most about lately is Puppet. Puppet is a real configuration management system. The driving forces are very different: I’m just trying to get a system set up that is in all ways acceptable for web application deployment. I want one system set up for one kind of task; I am completely focused on that end, and I care about means only insofar as I don’t want to get distracted by those means. Puppet is for people who care about the means, not just the ends. People who want things to work in a particular way; I only care that they work.

That’s the big difference between Puppet and Silver Lining. The smaller difference (that I want to talk about) is "push" vs. "pull". Grig wrote up some notes on two approaches. Silver Lining uses a "push" system (though calling it a "system" is kind of overselling what it does) while Puppet is "pull". Basically Silver Lining runs these commands (from your personal computer):

$ rsync -r <silverlining>/server-files/serverroot/ root@server:/
$ ssh root@server "$(cat <silverlining>/server-files/update-server-script.sh)"

This is what happens when you run silver setup-node server: it pushes a bunch of files over to the server, and runs a shell script. If you update either of the files or the shell script you run silver setup-node again to update the server. This is "push" because everything is initiated by the "master" (in this case, the developer’s personal computer).

Puppet uses a pull model. In this model there is a daemon running on every machine, and these machines call in to the master to see if there’s any new instructions for them. If there are, the daemon applies those instructions to the machine it is running on.

Grig identifies two big advantages to this pull model:

When a new server comes up it can get instructions from the master and start doing things. You can’t push instructions to a server that isn’t there, and the server itself is most aware of when it is ready to do stuff.
If a lot of servers come up, they can all do the setup work on their own, they only have to ask the master what to do.

But… I don’t buy either justification.

First: servers don’t just do things when they start up. To get this to work you have to create custom images with Puppet installed, and configured to know where the master is, and either the image or the master needs some indication of what kind of server you intended to create. All this is to avoid polling a server to see when it comes online. Polling a server is lame (and is the best Silver Lining can do right now), but avoiding polling can be done with something a lot simpler than a complete change from push to pull.

Second: there’s nothing unscalable about push. Look at those commands: one rsync and one ssh. The first is pretty darn cheap, and the second is cheap on the master and expensive on the remote machine (since it is doing things like installing stuff). You need to do it on lots of machines? Then fork a bunch of processes to run those two commands. This is not complicated stuff.

It is possible to write a push system that is hard to scale, if the master is doing lots of work. But just don’t do that. Upload your setup code to the remote server/slave and run it there. Problem fixed!

What are the advantages of push?

Easy to bootstrap. A bare server can be setup with push, no customization needed. Any customization is another kind of configuration, and configuration should be automated, and… well, this is why it’s a bootstrap problem.
Errors are synchronous: if your setup code doesn’t work, your push system will get the error back, you don’t need some fancy monitor and you don’t need to check any logs. Weird behavior is also synchronous; can’t tell why servers are doing something? Run the commands and watch the output.
Development is sensible: if you have a change to your setup scripts, you can try it out from your machine. You don’t need to do anything exceptional, your machine doesn’t have to accept connections from the slave, you don’t need special instructions to keep the slave from setting itself up as a production machine, there’s no daemon that might need modifications… none of that. You change the code, you run it, it works.
It’s just so damn simple. If you don’t start thinking about push and pull and other design choices, it simply becomes: do the obvious and easy thing.

In conclusion: push is sensible, pull is needless complexity.

2010 03 10

Programming
Silver Lining

Comments (19)

Permalink

Joining Mozilla

As of last week, I am now an employee of Mozilla! Thanks to everyone who helped me out during my job search.

I’ll be working both with the Mozilla Web Development (webdev) team, and Mozilla Labs.

The first thing I’ll be working on is deployment. In part because I’ve been thinking about deployment lately, in part because streamlining deployment is just generally enabling of other work (and a personal itch to be scratched), and because I think there is the possibility to fit this work into Mozilla’s general mission, specifically Empowering people to do new and unanticipated things on the web. I think the way I’m approaching deployment has real potential to combine the discipline and benefits of good development practices with an accessible process that is more democratic and less professionalized. This is some of what PHP has provided over the years (and I think it’s been a genuinely positive influence on the web as a result); I’d like to see the same kind of easy entry using other platforms. I’m hoping Silver Lining will fit both Mozilla’s application deployment needs, as well as serving a general purpose.

Once I finish deployment and can move on (oh fuck what am I getting myself into) I’ll also be working with the web development group who has adopted Python for many of their new projects (e.g., Zamboni, a rewrite of the addons.mozilla.org site), and with Mozilla Labs on Weave or some of their other projects.

In addition my own Python open source work is in line with Mozilla’s mission and I will be able to continue spending time on those projects, as well as entirely new projects.

I’m pretty excited about this — it feels like there’s a really good match with Mozilla and what I’m good at, and what I care about, and how I care about it.

2010 03 10

Mozilla
Non-technical
Programming
Python

Comments (32)

Permalink

toppcloud renamed to Silver Lining

After some pondering at PyCon, I decided on a new name for toppcloud: Silver Lining. I’ll credit a mysterious commenter "david" with the name idea. The command line is simply silver — silver update has a nice ring to it.

There’s a new site: cloudsilverlining.org; not notably different than the old site, just a new name. The product is self-hosting now, using a simple app that runs after every commit to regenerate the docs, and with a small extension to Silver Lining itself (to make it easier to host static files). Now that it has a real name I also gave it a real mailing list.

Silver Lining also has its first test. Not an impressive test, but a test. I’m hoping with a VM-based libcloud backend that a full integration test can run in a reasonable amount of time. Some unit tests would be possible, but so far most of the bugs have been interaction bugs so I think integration tests will have to pull most of the weight. (A continuous integration rig will be very useful; I am not sure if Silver Lining can self-host that, though it’d be nice/clever if it could.)

2010 03 03

Packaging
Programming
Python
Silver Lining
Web

Comments (7)

Permalink

Throw out your frameworks! (forms included)

No, I should say forms particularly.

I have lots of things to blog about, but nothing makes me want to blog like code. Ideas are hard, code is easy. So when I saw Jacob’s writeup about dynamic Django form generation I felt a desire to respond. I didn’t see the form panel at PyCon (I intended to but I hardly saw any talks at PyCon, and yet still didn’t even see a good number of the people I wanted to see), but as the author of an ungenerator and as a general form library skeptic I have a somewhat different perspective on the topic.

The example created for the panel might display that perspective. You should go read Jacob’s description; but basically it’s a simple registration form with a dynamic set of questions to ask.

I have created a complete example, because I wanted to be sure I wasn’t skipping anything, but I’ll present a trimmed-down version.

First, the basic control logic:

from webob.dec import wsgify
from webob import exc
from formencode import htmlfill

@wsgify
def questioner(req):
questions = get_questions(req) # This is provided as part of the example
if req.method == 'POST':
errors = validate(req, questions)
if not errors:
... save response ...
return exc.HTTPFound(location='/thanks')
else:
errors = {}
## Here's the "form generation":
page = page_template.substitute(
action=req.url,
questions=questions)
page = htmlfill.render(
page,
defaults=req.POST,
errors=errors)
return Response(page)

def validate(req, questions):
# All manual, but do it however you want:
errors = {}
form = req.POST
if (form.get('password')
and form['password'] != form.get('password_confirm')):
errors['password_confirm'] = 'Passwords do not match'
fields = questions + ['username', 'password']
for field in fields:
if not form.get(field):
errors[field] = 'Please enter a value'
return errors

I’ve just manually handled validation here. I don’t feel like doing it with FormEncode. Manual validation isn’t that big a deal; FormEncode would just produce the same errors dictionary anyway. In this case (as in many form validation cases) you can’t do better than hand-written validation code: it’s shorter, more self-contained, and easier to tweak.

After validation the template is rendered:

page = page_template.substitute(
action=req.url,
questions=questions)

I’m using Tempita, but it really doesn’t matter. The template looks like this:

<form action="{{action}}" method="POST">
New Username: <input type="text" name="username"><br />
Password: <input type="password" name="password"><br />
Repeat Password:
<input type="password" name="password_confirm"><br />
{{for question in questions}}
{{question}}: <input type="text" name="{{question}}"><br />
{{endfor}}
<input type="submit">
</form>

Note that the only "logic" here is to render the form to include fields for all the questions. Obviously this produces an ugly form, but it’s very obvious how you make this form pretty, and how to tweak it in any way you might want. Also if you have deeper dynamicism (e.g., get_questions start returning the type of response required, or weird validation, or whatever) it’s very obvious where that change would go: display logic goes in the form, validation logic goes in that validate function.

This just gives you the raw form. You wouldn’t need a template at all if it wasn’t for the dynamicism. Everything else is added when the form is "filled":

page = htmlfill.render(
page,
defaults=req.POST,
errors=errors)

How exactly you want to calculate defaults is up to the application; you might want query string variables to be able to pre-fill the form (use req.params), you might want the form bare to start (like here with req.POST), you can easily implement wizards by stuffing req.POST into the session to repeat a form, you might read the defaults out of a user object to make this an edit form. And errors are just handled automatically, inserted into the HTML with appropriate CSS classes.

A great aspect of this pattern if you use it (I’m not even sure it deserves the moniker library): when HTML 5 Forms finally come around and we can all stop doing this stupid server-side overthought nonsense, you won’t have overthought your forms. Your mind will be free and ready to accept that the world has actually become simpler, not more complicated, and that there is knowledge worth forgetting (forms are so freakin’ stupid!) If at all possible, dodging complexity is far better than cleverly responding to complexity.

2010 03 01

HTML
Programming
Python
Web

Comments (27)

Permalink

Ian Bicking: a blog

March 2010

The Web Server Benchmarking We Need

2010 03 16

What Does A WebOb App Look Like?

2010 03 12

Configuration management: push vs. pull

2010 03 10

Joining Mozilla

2010 03 10

toppcloud renamed to Silver Lining

2010 03 03

Throw out your frameworks! (forms included)

2010 03 01

Home

About

Archives

Categories

Recent Posts

Recent Comments