Ian Bicking: a blog

toppcloud renamed to Silver Lining

After some pondering at PyCon, I decided on a new name for toppcloud: Silver Lining. I’ll credit a mysterious commenter "david" with the name idea. The command line is simply silver — silver update has a nice ring to it.

There’s a new site: cloudsilverlining.org; not notably different than the old site, just a new name. The product is self-hosting now, using a simple app that runs after every commit to regenerate the docs, and with a small extension to Silver Lining itself (to make it easier to host static files). Now that it has a real name I also gave it a real mailing list.

Silver Lining also has its first test. Not an impressive test, but a test. I’m hoping with a VM-based libcloud backend that a full integration test can run in a reasonable amount of time. Some unit tests would be possible, but so far most of the bugs have been interaction bugs so I think integration tests will have to pull most of the weight. (A continuous integration rig will be very useful; I am not sure if Silver Lining can self-host that, though it’d be nice/clever if it could.)

2010 03 03

Packaging
Programming
Python
Silver Lining
Web

Comments (7)

Permalink

Throw out your frameworks! (forms included)

No, I should say forms particularly.

I have lots of things to blog about, but nothing makes me want to blog like code. Ideas are hard, code is easy. So when I saw Jacob’s writeup about dynamic Django form generation I felt a desire to respond. I didn’t see the form panel at PyCon (I intended to but I hardly saw any talks at PyCon, and yet still didn’t even see a good number of the people I wanted to see), but as the author of an ungenerator and as a general form library skeptic I have a somewhat different perspective on the topic.

The example created for the panel might display that perspective. You should go read Jacob’s description; but basically it’s a simple registration form with a dynamic set of questions to ask.

I have created a complete example, because I wanted to be sure I wasn’t skipping anything, but I’ll present a trimmed-down version.

First, the basic control logic:

from webob.dec import wsgify
from webob import exc
from formencode import htmlfill

@wsgify
def questioner(req):
questions = get_questions(req) # This is provided as part of the example
if req.method == 'POST':
errors = validate(req, questions)
if not errors:
... save response ...
return exc.HTTPFound(location='/thanks')
else:
errors = {}
## Here's the "form generation":
page = page_template.substitute(
action=req.url,
questions=questions)
page = htmlfill.render(
page,
defaults=req.POST,
errors=errors)
return Response(page)

def validate(req, questions):
# All manual, but do it however you want:
errors = {}
form = req.POST
if (form.get('password')
and form['password'] != form.get('password_confirm')):
errors['password_confirm'] = 'Passwords do not match'
fields = questions + ['username', 'password']
for field in fields:
if not form.get(field):
errors[field] = 'Please enter a value'
return errors

I’ve just manually handled validation here. I don’t feel like doing it with FormEncode. Manual validation isn’t that big a deal; FormEncode would just produce the same errors dictionary anyway. In this case (as in many form validation cases) you can’t do better than hand-written validation code: it’s shorter, more self-contained, and easier to tweak.

After validation the template is rendered:

page = page_template.substitute(
action=req.url,
questions=questions)

I’m using Tempita, but it really doesn’t matter. The template looks like this:

<form action="{{action}}" method="POST">
New Username: <input type="text" name="username"><br />
Password: <input type="password" name="password"><br />
Repeat Password:
<input type="password" name="password_confirm"><br />
{{for question in questions}}
{{question}}: <input type="text" name="{{question}}"><br />
{{endfor}}
<input type="submit">
</form>

Note that the only "logic" here is to render the form to include fields for all the questions. Obviously this produces an ugly form, but it’s very obvious how you make this form pretty, and how to tweak it in any way you might want. Also if you have deeper dynamicism (e.g., get_questions start returning the type of response required, or weird validation, or whatever) it’s very obvious where that change would go: display logic goes in the form, validation logic goes in that validate function.

This just gives you the raw form. You wouldn’t need a template at all if it wasn’t for the dynamicism. Everything else is added when the form is "filled":

page = htmlfill.render(
page,
defaults=req.POST,
errors=errors)

How exactly you want to calculate defaults is up to the application; you might want query string variables to be able to pre-fill the form (use req.params), you might want the form bare to start (like here with req.POST), you can easily implement wizards by stuffing req.POST into the session to repeat a form, you might read the defaults out of a user object to make this an edit form. And errors are just handled automatically, inserted into the HTML with appropriate CSS classes.

A great aspect of this pattern if you use it (I’m not even sure it deserves the moniker library): when HTML 5 Forms finally come around and we can all stop doing this stupid server-side overthought nonsense, you won’t have overthought your forms. Your mind will be free and ready to accept that the world has actually become simpler, not more complicated, and that there is knowledge worth forgetting (forms are so freakin’ stupid!) If at all possible, dodging complexity is far better than cleverly responding to complexity.

2010 03 01

HTML
Programming
Python
Web

Comments (27)

Permalink

Why toppcloud (Silver Lining) will not be agnostic

I haven’t received a great deal of specific feedback on toppcloud (update: renamed Silver Lining), only a few people (Ben Bangert, Jorge Vargas) seem to have really dived deeply into it. But — and this is not unexpected — I have already gotten several requests about making it more agnostic with respect to… stuff. Maybe that it not (at least forever) require Ubuntu. Or maybe that it should support different process models (e.g., threaded and multiple processes). Or other versions of Python.

The more I think about it, and the more I work with the tool, the more confident I am that toppcloud should not be agnostic on these issues. This is not so much about an "opinionated" tool; toppcloud is not actually very opinionated. It’s about a well-understood system.

For instance, Ben noticed a problem recently with weird import errors. I don’t know quite why mod_wsgi has this particular problem (when other WSGI servers that I’ve used haven’t), but the fix isn’t that hard. So Ben committed a fix and the problem went away.

Personally I think this is a bug with mod_wsgi. Maybe it’s also a Python bug. But it doesn’t really matter. When a bug exists it "belongs" to everyone who encounters it.

toppcloud is not intended to be a transparent system. When it’s working correctly, you should be able to ignore most of the system and concentrate on the relatively simple abstractions given to your application. So if the configuration reveals this particular bug in Python/mod_wsgi, then the bug is essentially a toppcloud bug, and toppcloud should (and can) fix it.

A more flexible system can ignore such problems as being "somewhere else" in the system. Or, if you don’t define these problems as someone else’s problem, then a more flexible system is essentially always broken somewhere; there is always some untested combination, some new component, or some old component that might get pushed into the mix. Fixes for one person’s problem may introduce a new problem in someone else’s stack. Some fixes aren’t even clear. toppcloud has Varnish in place, so it’s quite clear where a fix related to Django and Varnish configuration goes. If these were each components developed by different people at different times (like with buildout recipes) then fixing something like this could get complicated.

So I feel very resolved: toppcloud will hardcode everything it possibly can. Python 2.6 and only 2.6! (Until 2.7, but then only 2.7!). Only Varnish/Apache/mod_wsgi. I haven’t figured out threads/processes exactly, but once I do, there will be only one way! And if I get it wrong, then everyone (everyone) will have to switch when it is corrected! Because I’d much rather have a system that is inflexible than one that doesn’t work. With a clear and solid design I think it is feasible to get this to work, and that is no small feat.

Relatedly, I think I’m going to change the name of toppcloud, so ideas are welcome!

2010 02 10

Programming
Python
Silver Lining
Web

Comments (23)

Permalink

Weave: valuable client-side data

I’ve been looking at Weave some lately. The large-print summary on the page is Synchronize Your Firefox Experience Across Desktop and Mobile. Straight forward enough.

Years and years ago I stopped really using bookmarks. You lose them moving from machine to machine (which Weave could help), but mostly I stopped using them because it was too hard to (a) identify interesting content and place it into a taxonomy and (b) predict what I would later be interested in. If I wanted to refer to something I’d seen before there’s a good chance I wouldn’t have saved it, while my bookmarks would be flooded with things that time would show were of transient interest.

So… synchronizing bookmarks, eh. Saved form data and logins? Sure, that’s handy. It would make browsing on multiple machines nicer. But it feels more like a handy tweak.

All my really useful data is kept on servers, categorized and protected by a user account. Why is that? Well, of course, where else would you keep it? In cookies? Ha!

Why not in cookies? So many reasons… because cookies are opaque and can’t hold much data, can’t be exchanged, and probably worst of all they just disappear randomly.

What if cookies weren’t so impossibly lame for holding important data? Suddenly sync seems much more interesting. Instead of storing documents and data on a website, the website could put all that data right into your browser. And conveniently HTML 5 has an API for that. Everyone thinks about that API as a way of handling off-line caching, because while it handles many problems with cookies it doesn’t handle the problem of data disappearing as you move between computers and browser. That’s where Weave synchronization could change things. I don’t think this technique is something appropriate for every app (maybe not most apps), but it could allow a new class of applications.

Advantages: web development and scaling becomes easy. If you store data in the browser scaling is almost free; serving static pages is a Solved Problem. Development is easier because development and deployment of HTML and Javascript is pretty easy. Forking is easy — just copy all the resources. So long as you don’t hardcode absolute links into your Javascript, you can even just save from the browser and get a working local copy of the application.

Disadvantages: another silo. You might complain about Facebook keeping everyone’s data, but the data in Facebook is still more transparent than data held in files or locally with a browser. Let’s say you create a word processor that uses local storage for all its documents. If you stored that document online sharing and collaboration would be really easy; but with it stored locally the act of sharing is not as automatic, and collaboration is very hard. Sure, the "user" is in "control" of their data, but that would be more true on paper than in practice. Building collaboration on top of local storage is hard, and without that… maybe it’s not that interesting?

Anyway, there is an interesting (but maybe Much Too Hard) problem in there. (DVCS in the browser?)

Update: in this video Aza talks about just what I talk about here. A few months ago. The Weave APIs also allude to things like this, including collaboration. So… they are on it!

2010 02 06

HTML
Web

Comments (2)

Permalink

toppcloud (Silver Lining) and Django

I wrote up instructions on using toppcloud (update: renamed Silver Lining) with Django. They are up on the site (where they will be updated in the future), but I’ll drop them here too…

Creating a Layout

First thing you have to do (after installing toppcloud of course) is create an environment for your new application. Do that like:

$ toppcloud init sampleapp

This creates a directory sampleapp/ with a basic layout. The first thing we’ll do is set up version control for our project. For the sake of documentation, imagine you go to bitbucket and create two new repositories, one called sampleapp and another called sampleapp-lib (and for the examples we’ll use the username USER).

We’ll go into our new environment and use these:

$ cd sampleapp
$ hg clone http://bitbucket.org/USER/sampleapp src/sampleapp
$ rm -r lib/python/
$ hg clone http://bitbucket.org/USER/sampleapp-lib lib/python
$ mkdir lib/python/bin/
$ echo "syntax: glob
bin/python*
bin/activate
bin/activate_this.py
bin/pip
bin/easy_install*
" > lib/python/.hgignore
$ mv bin/* lib/python/bin/
$ rmdir bin/
$ ln -s lib/python/bin bin

Now there is a basic layout setup, with all your libraries going into the sampleapp-lib repository, and your main application in the sampleapp repository.

Next we’ll install Django:

$ source bin/activate
$ pip install Django

Then we’ll set up a standard Django site:

$ cd src/sampleapp
$ django-admin.py sampleapp

Also we’d like to be able to import this file. It’d be nice if there was a setup.py file, and we could run pip -e src/sampleapp, but django-admin.py doesn’t create that itself. Instead we’ll get that on the import path more manually with a .pth file:

$ echo "../../src/sampleapp" > lib/python/sampleapp.pth

Also there’s the tricky $DJANGO_SETTINGS_MODULE that you might have had problems with before. We’ll use the file lib/python/toppcustomize.py (which is imported everytime Python is started) to make sure that is always set:

$ echo "import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'sampleapp.settings'
" > lib/python/toppcustomize.py

Also we have a file src/sampleapp/sampleapp/manage.py, and that file doesn’t work quite how we’d like. Instead we’ll put a file into bin/manage.py that does the same thing:

$ rm sampleapp/manage.py
$ cd ../..
$ echo '#!/usr/bin/env python
from django.core.management import execute_manager
from sampleapp import settings
if __name__ == "__main__":
execute_manager(settings)
' > bin/manage.py
$ chmod +x bin/manage.py

Now, if you were just using plain Django you’d do something like run python manage.py runserver. But we’ll be using toppcloud serve instead, which means we have to set up the two other files toppcloud needs: app.ini and the runner. Here’s a simple app.ini:

$ echo '[production]
app_name = sampleapp
version = 1
runner = src/sampleapp/toppcloud-runner.py
' > src/sampleapp/toppcloud-app.ini
$ rm app.ini
$ ln -s src/sampleapp/toppcloud-app.ini app.ini

The file must be in the "root" of your application, and named app.ini, but it’s good to keep it in version control, so we set it up with a symlink.

It also refers to a "runner", which is the Python file that loads up the WSGI application. This looks about the same for any Django application, and we’ll put it in src/sampleapp/toppcloud-runner.py:

$ echo 'import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()
' > src/sampleapp/toppcloud-runner.py

Now if you want to run the application, you can:

$ toppcloud serve .

This will load it up on http://localhost:8080, and serve up a boring page. To do something interesting we’ll want to use a database.

Setting Up A Database

At the moment the only good database to use is PostgreSQL with the PostGIS extensions. Add this line to app.ini:

service.postgis =

This makes the database "available" to the application. For development you still have to set it up yourself. You should create a database sampleapp on your computer.

Next, we’ll need to change settings.py to use the new database configuration. Here’s the lines that you’ll see:

DATABASE_ENGINE = '' # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'oracle'.
DATABASE_NAME = '' # Or path to database file if using sqlite3.
DATABASE_USER = '' # Not used with sqlite3.
DATABASE_PASSWORD = '' # Not used with sqlite3.
DATABASE_HOST = '' # Set to empty string for localhost. Not used with sqlite3.
DATABASE_PORT = '' # Set to empty string for default. Not used with sqlite3.

First add this to the top of the file:

import os

Then you’ll change those lines to:

DATABASE_ENGINE = 'postgresql_psycopg2'
DATABASE_NAME = os.environ['CONFIG_PG_DBNAME']
DATABASE_USER = os.environ['CONFIG_PG_USER']
DATABASE_PASSWORD = os.environ['CONFIG_PG_PASSWORD']
DATABASE_HOST = os.environ['CONFIG_PG_HOST']
DATABASE_PORT = ''

Now we can create all the default tables:

$ manage.py syncdb
Creating table auth_permission
Creating table auth_group
Creating table auth_user
Creating table auth_message
Creating table django_content_type
Creating table django_session
Creating table django_site
...

Now we have an empty project that doesn’t do anything. Let’s make it do a little something (this is all really based on the Django tutorial).

$ manage.py startapp polls

Django magically knows to put the code in src/sampleapp/sampleapp/polls/ — we’ll setup the model in src/sampleapp/sampleapp/polls/models.py:

from django.db import models

class Poll(models.Model):
question = models.CharField(max_length=200)
pub_date = models.DateTimeField('date published')

class Choice(models.Model):
poll = models.ForeignKey(Poll)
choice = models.CharField(max_length=200)
votes = models.IntegerField()

And activate the application by adding 'sampleapp.polls' to INSTALLED_APPS in src/sampleapp/sampleapp/settings.py. Also add 'django.contrib.admin' to get the admin app in place. Run manage.py syncdb to get the tables in place.

You can try toppcloud serve . and go to /admin/ to login and see your tables. You might notice all the CSS is broken.

toppcloud serves static files out of the static/ directory. You don’t actually put static in the URLs, these files are available at the top-level (unless you create a static/static/ directory). The best way to put files in there is generally symbolic links.

For Django admin, do this:

$ cd static
$ ln -s ../lib/python/django/contrib/admin/media admin-media

Now edit src/sampleapp/sampleapp/settings.py and change ADMIN_MEDIA_PREFIX to '/admin-media'.

(Probably some other links should be added.)

One last little thing you might want to do; replace this line in
settings:

SECRET_KEY = 'ASF#@$@#JFAS#@'

With this:

from tcsupport.secret import get_secret
SECRET_KEY = get_secret()

Then you don’t have to worry about checking a secret into version control.

You still don’t really have an application, but the rest is mere "programming" so have at it!

2010 02 05

Programming
Python
Silver Lining
Web

Comments (2)

Permalink

A new way to deploy web applications

Deployment is one of the things I like least about development, and yet without deployment the development doesn’t really matter.

I’ve tried a few things (e.g. fassembler), built a few things (virtualenv, pip), but deployment just sucked less as a result. Then I got excited about App Engine; everyone else was getting excited about "scaling", but really I was excited about an accessible deployment process. When it comes to deployment App Engine is the first thing that has really felt good to me.

But I can’t actually use App Engine. I was able to come to terms with the idea of writing an application to the platform, but there are limits… and with App Engine there were simply too many limits. Geo stuff on App Engine is at best a crippled hack, I miss lxml terribly, I never hated relational databases, and almost nothing large works without some degree of rewriting. Sometimes you can work around it, but you can never be sure you won’t hit some wall later. And frankly working around the platform is tiring and not very rewarding.

So… App Engine seemed neat, but I couldn’t use it, and deployment was still a problem.

What I like about App Engine: an application is just files. There’s no build process, no fancy copying of things in weird locations, nothing like that; you upload files, and uploading files just works. Also, you can check everything into version control. Not just your application code, but every library you use, the exact files that you installed. I really wanted a system like that.

At the same time, I started looking into "the cloud". It took me a while to get a handle on what "cloud computing" really means. What I learned: don’t overthink it. It’s not magic. It’s just virtual private servers that can be requisitioned automatically via an API, and are billed on a short time cycle. You can expand or change the definition a bit, but this definition is the one that matters to me. (I’ve also realized that I cannot get excited about complicated solutions; only once I realized how simple cloud computing is could I really get excited about the idea.)

Given the modest functionality of cloud computing, why does it matter? Because with a cloud computing system you can actually test the full deployment stack. You can create a brand-new server, identical to all servers you will create in the future; you can set this server up; you can deploy to it. You get it wrong, you throw away that virtual server and start over from the beginning, fixing things until you get it right. Billing is important here too; with hourly billing you pay cents for these tests, and you don’t need a pool of ready servers because the cloud service basically manages that pool of ready servers for you.

Without "cloud computing" we each too easily find ourselves in a situation where deployments are ad hoc, server installations develop over time, and servers and applications are inconsistent in their configuration. Cloud computing makes servers disposable, which means we can treat them in consistent ways, testing our work as we go. It makes it easy to treat operations with the same discipline as software.

Given the idea from App Engine, and the easy-to-use infrastructure of a cloud service, I started to script together something to manage the servers and start up applications. I didn’t know what exactly I wanted to do to start, and I’m not completely sure where I’m going with this. But on the whole this feels pretty right. So I present the provisionally-named: toppcloud (Update: this has been renamed Silver Cloud).

How it works: first you have a directory of files that defines your application. This probably includes a checkout of your "application" (let’s say in src/mynewapp/), and I find it also useful to use source control on the libraries (which are put in lib/python/). There’s a file in app.ini that defines some details of the application (very similar to app.yaml).

While app.ini is a (very minimal) description of the application, there is no description of the environment. You do not specify database connection details, for instance. Instead your application requests access to a database service. For instance, one of these services is a PostgreSQL/PostGIS database (which you get if you put service.postgis in your app.ini file). If you ask for that then there will be evironmental variables, CONFIG_PG_DBNAME etc., that will tell your application how to connect to the database. (For local development you can provide your own configuration, based on how you have PostgreSQL or some other service installed.)

The standard setup is also a virtualenv environment. It is setup so every time you start that virtualenv environment you’ll get those configuration-related environmental variables. This means your application configuration is always present, your services always available. It’s available in tests just like it is during a request. Django accomplishes something similar with the (much maligned) $DJANGO_SETTINGS_MODULE but toppcloud builds it into the virtualenv environment instead of the shell environment.

And how is the server setup? Much like with App Engine that is merely an implementation detail. Unlike App Engine that’s an implementation detail you can actually look at and change (by changing toppcloud), but it’s not something you are supposed to concern yourself with during regular application development.

The basic lifecycle using toppcloud looks like:

toppcloud create-node: Create a new virtual server; you can create any kind of supported server, but only Ubuntu Jaunty or Karmic are supported (and Jaunty should probably be dropped). This step is where the "cloud" part actually ends. If you want to install a bare Ubuntu onto an existing physical machine that’s fine too — after toppcloud create-node the "cloud" part of the process is pretty much done. Just don’t go using some old Ubuntu install; this tool is for clean systems that are used only for toppcloud.
toppcloud setup-node: Take that bare Ubuntu server and set it up (or update it) for use with toppcloud. This installs all the basic standard stuff (things like Apache, mod_wsgi, Varnish) and some management script that toppcloud runs. This is written to be safe to run over and over, so upgrading and setting up a machine are the same. It needs to be a bare server, but
toppcloud init path/to/app/: Setup a basic virtualenv environment with some toppcloud customizations.
toppcloud serve path/to/app: Serve up the application locally.
toppcloud update --host=test.example.com path/to/app/: This creates or updates an application at the given host. It edits /etc/hosts so that the domain is locally viewable.
toppcloud run test.example.com script.py: Run a script (from bin/) on a remote server. This allows you to run things like django-admin.py syncdb.

There’s a few other things — stuff to manage the servers and change around hostnames or the active version of applications. It’s growing to fit a variety of workflows, but I don’t think its growth is unbounded.

So… this is what toppcloud. From the outside it doen’t do a lot. From the inside it’s not actually that complicated either. I’ve included a lot of constraints in the tool but I think it offers an excellent balance. The constraints are workable for applications (insignificant for many applications), while still exposing a simple and consistent system that’s easier to reason about than a big-ball-of-server.

Some of the constraints:

Binary packages are supported via Ubuntu packages; you only upload portable files. If you need a library like lxml, you need to request that package (python-lxml) to be installed in your app.ini. If you need a version of a binary library that is not yet packaged, I think creating a new deb is reasonable.
There is no Linux distribution abstraction, but I don’t care.
There is no option for the way your application is run — there’s one way applications are run, because I believe there is a best practice. I might have gotten the best practice wrong, but that should be resolved inside toppcloud, not inside applications. Is Varnish a terrible cache? Probably not, but if it is we should all be able to agree on that and replace it. If there are genuinely different needs then maybe additional application or deployment configuration will be called for — but we shouldn’t add configuration just because someone says there is a better practice (and a better practice that is not universally better); there must be justifications.
By abstracting out services and persistence some additional code is required for each such service, and that code is centralized in toppcloud, but it means we can also start to add consistent tools usable across a wide set of applications and backends.
All file paths have to be relative, because files get moved around. I know of some particularly problematic files (e.g., .pth files), and toppcloud fixes these automatically. Mostly this isn’t so hard to do.

These particular compromises are ones I have not seen in many systems (and I’ve started to look more). App Engine I think goes too far with its constraints. Heroku is close, but closed source.

This is different than a strict everything-must-be-a-package strategy. This deployment system is light and simple and takes into account reasonable web development workflows. The pieces of an application that move around a lot are all well-greased and agile. The parts of an application that are better to Do Right And Then Leave Alone (like Apache configuration) are static.

Unlike generalized systems like buildout this system avoids "building" entirely, making deployment a simpler and lower risk action, leaning on system packages for the things they do best. Other open source tools emphasize a greater degree of flexibility than I think is necessary, allowing people to encode exploratory service integration into what appears to be an encapsulated build (I’m looking at you buildout).

Unlike requirement sets and packaging and versioning libraries, this makes all the Python libraries (typically the most volatile libraries) explicit and controlled, and can better ensure that small updates really are small. It doesn’t invalidate installers and versioning, but it makes that process even more explicit and encourages greater thoughtfulness.

Unlike many production-oriented systems (what I’ve seen in a lot of "cloud" tools) this encorporates both the development environment and production environment; but unlike some developer-oriented systems this does not try to normalize everyone’s environment and instead relies on developers to set up their systems however is appropriate. And unlike platform-neutral systems this can ensure an amount of reliability and predictability through extremely hard requirements (it is deployed on Ubuntu Jaunty/Karmic only).

But it’s not all constraints. Toppcloud is solidly web framework neutral. It’s even slightly language neutral. Though it does require support code for each persistence technique, it is fairly easy to do, and there are no requirements for "scalable systems"; I think unscalable systems are a perfectly reasonable implementation choice for many problems. I believe a more scalable system could be built on this, but as a deployment and development option, not a starting requirement.

So far I’ve done some deployments using toppcloud; not a lot, but some. And I can say that it feels really good; lots of rough edges still, but the core concept feels really right. I’ve made a lot of sideways attacks on deployment, and a few direct attacks… sometimes I write things that I think are useful, and sometimes I write things that I think are right. Toppcloud is at the moment maybe more right than useful. But I genuinely believe this is (in theory) a universally appropriate deployment tool.

Alright, so now you think maybe you should look more at toppcloud…

Well, I can offer you a fair amount of documentation. A lot of that documentation refers to design, and a bit of it to examples. There’s also a couple projects you can look at; they are all small, but :

Frank (will be interactivesomerville.org) which is another similar Django/Pinax project (Pinax was a bit tricky). This is probably the largest project. It’s a Django/Pinax volunteer-written application for collecting community feedback the Boston Greenline project, if that sounds interesting to you might want to chip in on the development (if so check out the wiki).
Neighborly, with minimal functionality (we need to run more sprints) but an installation story.
bbdocs which is a very simple bitbucket document generator, that makes the toppcloud site.
geodns which is another simple no-framework PostGIS project.

Now, the letdown. One thing I cannot offer you is support. THERE IS NO SUPPORT. I cannot now, and I might never really be able to support this tool. This tool is appropriate for collaborators, for people who like the idea and are ready to build on it. If it grows well I hope that it can grow a community, I hope people can support each other. I’d like to help that happen. But I can’t do that by bootstrapping it through unending support, because I’m not good at it and I’m not consistent and it’s unrealistic and unsustainable. This is not a open source dead drop. But it’s also not My Future; I’m not going to build a company around it, and I’m not going to use all my free time supporting it. It’s a tool I want to exist. I very much want it to exist. But even very much wanting something is not the same as being an undying champion, and I am not an undying champion. If you want to tell me what my process should be, please do!

If you want to see me get philosophical about packaging and deployment and other stuff like that, see my upcoming talk at PyCon.

2010 01 29

Packaging
Programming
Python
Silver Lining
Web

Comments (13)

Permalink

WebOb decorator

Lately I’ve been writing a few applications (e.g., PickyWiki and a revisiting a request-tracking application VaingloriousEye), and I usually use no framework at all. Pylons would be a natural choice, but given that I am comfortable with all the components, I find myself inclined to assemble the pieces myself.

In the process I keep writing bits of code to make WSGI applications from simple WebOb -based request/response cycles. The simplest form looks like this:

from webob import Request, Response, exc

def wsgiwrap(func):
def wsgi_app(environ, start_response):
req = Request(environ)
try:
resp = func(req)
except exc.HTTPException, e:
resp = e
return resp(environ, start_response)
return wsgi_app

@wsgiwrap
def hello_world(req):
return Response('Hi %s!' % (req.POST.get('name', 'You')))

But each time I’d write it, I change things slightly, implementing more or less features. For instance, handling methods, or coercing other responses, or handling middleware.

Having implemented several of these (and reading other people’s implementations) I decided I wanted WebOb to include a kind of reference implementation. But I don’t like to include anything in WebOb unless I’m sure I can get it right, so I’d really like feedback. (There’s been some less than positive feedback, but I trudge on.)

My implementation is in a WebOb branch, primarily in webob.dec (along with some doctests).

The most prominent way this is different from the example I gave is that it doesn’t change the function signature, instead it adds an attribute .wsgi_app which is WSGI application associated with the function. My goal with this is that the decorator isn’t intrusive. Here’s the case where I’ve been bothered:

class MyClass(object):
@wsgiwrap
def form(self, req):
return Response(form_html...)

@wsgiwrap
def form_post(self, req):
handle submission

OK, that’s fine, then I add validation:

@wsgiwrap
def form_post(self, req):
if req not valid:
return self.form
handle submission

This still works, because the decorator allows you to return any WSGI application, not just a WebOb Response object. But that’s not helpful, because I need errors…

@wsgiwrap
def form_post(self, req):
if req not valid:
return self.form(req, errors)
handle submission

That is, I want to have an option argument to the form method that passes in errors. But I can’t do this with the traditional wsgiwrap decorator, instead I have to refactor the code to have a third method that both form and form_post use. Of course, there’s more than one way to address this issue, but this is the technique I like.

The one other notable feature is that you can also make middleware:

@wsgify.middleware
def cap_middleware(req, app):
resp = app(req)
resp.body = resp.body.upper()
return resp

capped_app = cap_middleware(some_wsgi_app)

Otherwise, for some reason I’ve found myself putting an inordinate amount of time into __repr__. Why I’ve done this I cannot say.

2009 05 22

Programming
Python
Web

Comments (11)

Permalink

Modern Web Design, I Renounce Thee!

I’m not a designer, but I spend as much time looking at web pages as the next guy. So I took interest when I came upon this post on font size by Wilson Miner, which in turn is inspired by the 100e2r (100% easy to read) standard by Oliver Reichenstein.

The basic idea is simple: we should have fonts at the "default" size, about 16px, no smaller. This is about the size of text in print, read at a reasonable distance (typically closer up than a screen):

https://ianbicking.org/wp-content/uploads/images/typesize_comparison2.jpg

Also it calls out low-contrast color schemes, which I think are mostly passe, and I will not insult you, my reader, by suggesting you don’t entirely agree. Because if you don’t agree, well, I’m afraid I’d have to use some strong words.

I think small fonts, low contrast, huge amounts of whitespace, are a side effect of the audience designers create for.

This makes me think of Modern Architecture:

https://ianbicking.org/wp-content/uploads/images/300px-seagram.jpg

This is a form of architecture popular for skyscapers and other dramatic structures, with their soaring heights and other such dramatic adjectives. These are buildings designed for someone looking at the building from five hundred feet away. They are not designed for occupants. But that’s okay, because the design isn’t sold to occupants, it is sold to people who look at the sketches and want to feel very dramatic.

Similarly, I think the design pattern of small fonts is something meant to appeal to shallow observation. By deemphasizing the text itself, the design is accentuated. Low-contrast text is even more obviously the domination of design over content. And it may very well look more professional and visually pleasing. But web design isn’t for making sites visually pleasing, it is for making the experience of the content more pleasing. Sites exist for their content, not their design.

In 100e2r he also says let your text breathe. You need whitespace. If you view my site directly, you’ll notice I don’t have big white margins around my text. When you come to my site, it’s to see my words, and that’s what I’m going to give you! When I want to let my text breathe with lots of whitespace this is what I do:

https://ianbicking.org/wp-content/uploads/images/500px-my-white-desktop.jpg

Is a huge block of text hard to read? It is. And yeah, I’ve written articles like that. But the solution?

WRITE BETTER

Similarly, it’s hard to read text if you don’t use paragraphs, but the solution isn’t to increase your line height until every line is like a paragraph of its own.

The solution to the drudgery of large swathes of text is:

Make your blocks of text smaller.
Use something other than paragraphs of text.

Throw in a list. Do some indentation. Toss in even a stupid picture. Personally I try to throw in code examples, because that’s how we roll on this blog.

That’s good writing, that’s content that is easy to read. It’s not easy to write, and I’m sure I miss the mark more often than not. But you can’t design your way to good content. If you want to write like this, if you want to let the flow of your text reflect the flow of your ideas, you need room. Huge margins don’t give you room. They are a crutch for poor writing, and not even a good crutch.

So in conclusion: modern design be damned!

2009 01 14

HTML
Non-technical
Web

Comments (21)

Permalink

Atompub as an alternative to WebDAV

I’ve been thinking about an import/export API for PickyWiki; I want something that’s sensible, and works well enough that it can be the basic for things like creating restorable snapshots, integration with version control systems, and being good at self-hosting documentation.

So far I’ve made a simple import/export system based on Atom. You can export the entire site as an Atom feed, and you can import Atom feeds. But whole-site import/export isn’t enough for the tools I’d like to write on top of the API.

WebDAV would seem like a logical choice, as it lets you get and put resources. But it’s not a great choice for a few reasons:

It’s really hard to implement on the server.
Even clients are hard to implement.
It uses GET to get resources. This is probably its most fatal flaw. There is no CMS that I know of (except maybe one) where the thing you view the browser is the thing that you’d actually edit. To work around this CMSes use User-Agent sniffing or an alternate URL space.
WebDAV is worried about "collections" (i.e., directories). The web basically doesn’t know what "collections" are, it only knows paths, and paths are strings.
(In summary) WebDAV uses HTTP, but it is not of the web.

I don’t want to invent something new though. So I started thinking of Atom some more, and Atompub.

The first thought is how to fix the GET problem in WebDAV. A web page isn’t an editable representation, but it’s pretty reasonable to put an editable representation into an Atom entry. Clients won’t necessarily understand extensions and properties you might add to those entries, but I don’t see any way around that. An entry might look like:

<entry>
<content type="html">QUOTED HTML</content>
... other normal metadata (title etc) ...
<privateprop:myproperty xmlns:privateprop="URL" name="foo" value="bar" />
</entry>

While there is special support for HTML, XHTML, and plain text in Atom, you can put any type of content in <content>, encoded in base64.

To find the editable representation, the browser page can point to it. I imagine something like this:

The actual URL (in this example this-url?format=atom) can be pretty much anything. My one worry is that this could be confused with feed detection, which looks like:

The only difference is "; type=entry", which I’m betting a lot of clients don’t pay attention to.

The Atom entries then can have an element:

This is a location where you can PUT a new entry to update the resource. You could allow the client to PUT directly over the old page, or use this-url?format=atom or whatever is convenient on the server-side. Additionally, DELETE to the same URL would delete.

This handles updates and deletes, and single-page reads. The next issue is creating pages.

Atompub makes creation fairly simple. First you have to get the Atompub service document. This is a document with the type application/atomsvc+xml and it gives the collection URL. It’s suggested you make this document discoverable like:

This document then points to the "collection" URL, which for our purposes is where you create documents. The service document would look like:

<service xmlns="http://www.w3.org/2007/app"
xmlns:atom="http://www.w3.org/2005/Atom">
<workspace>
<atom:title>SITE TITLE</atom:title>
<collection href="/atomapi">
<atom:title>SITE TITLE</atom:title>
<accept>*/*</accept>
<accept>application/atom+xml;type=entry</accept>
</collection>
</workspace>
</service>

Basically this indicates that you can POST any media to /atomapi (both Atom entries, and things like images).

To create a page, a client then does a POST like:

POST /atomapi
Content-Type: application/atom+xml; type=entry
Slug: /page/path

<entry xmlns="...">...</entry>

There’s an awkwardness here, that you can suggest (via the Slug header) what the URL for the new page is. The client can find the actual URL of the new page from the Location header in the response. But the client can’t demand that the slug be respected (getting an error back if it is not), and there’s lots of use cases where the client doesn’t just want to suggest a path (for instance, other documents that are being created might rely on that path for links).

Also, "slug" implies… well, a slug. That is, some path segment probably derived from the title. There’s nothing stopping the client from putting a complete path in there, but it’s very likely to be misinterpreted (e.g. translating /page/path to /2009/01/pagepath).

Bug I digress. Anyway, you can post every resource as an entry, base64-encoding the resource body, but Atompub also allows POSTing media directly. When you do that, the server puts the media somewhere and creates a simple Atom entry for the media. If you wanted to add properties to that entry, you’d edit the entry after creating it.

The last missing piece is how to get a list of all the pages on a site. Atompub does have an answer for this: just GET /atomapi will give you an Atom feed, and for our purposes we can demand that the feed is complete (using paging so that any one page of the feed doesn’t get too big). But this doesn’t seem like a good solution to me. GData specifies a useful set of queries to for feeds, but I’m not sure that this is very useful here; the kind of queries a client needs to do for this use case aren’t things GData was designed for.

The queries that seem most important to me are queries by page path (which allows some sense of "collections" without being formal) and by content type. Also to allow incremental updates on the client side, filtering these queries by last-modified time (i.e., all pages created since I last looked). Reporting queries (date of creation, update, author, last editor, and custom properties) of course could be useful, but don’t seem as directly applicable.

Also, often the client won’t want the complete Atom entry for the pages, but only a list of pages (maybe with minimal metadata). I’m unsure about the validity of abbreviated Atom entries, but it seems like one solution. Any Atom entry can have something like:

This indicates where the entry exists, though it doesn’t suggest very forcefully that the actual entry is abbreviated. Anyway, I could then imagine a feed like:

<feed>
<entry>

<content type="some/content-type" />
<link rel="self" href="..." />
<updated>YYYYMMDDTHH:MM:SSZ</updated>
<entry>
...
</feed>

This isn’t entirely valid, however — you can’t just have an empty <content> tag. You can use a src attribute to use indirection for the content, and then add Yet Another URL for each page that points to its raw content. But that’s just jumping through hoops. This also seems like an opportunity to suggest that the entry is incomplete.

To actually construct these feeds, you need some way of getting the feed. I suggest that another entry be added to the Atompub service document, something like:

<cmsapi:feed href="URI-TEMPLATE" />

That would be a URI Template that accepted several known variables (though frustratingly, URI Templates aren’t properly standardized yet). Things like:

content-type: the content type of the resource (allowing wildcards like image/*)
container: a path to a container, i.e., /2007 would match all pages in /2007/...
path-regex: some regular expression to match the paths
last-modified: return all pages modified at the given date or later

All parameters would be ANDed together.

So, open issues:

How to strongly suggest a path when creating a resource (better than Slug)
How to rename (move) or copy a page (it’s easy enough to punt on copy, but I’d rather move by a little more formal than just recreating a resource in a new location and deleting the original)
How to represent abbreviated Atom entries

With these resolved I think it’d be possible to create a much simpler API than WebDAV, and one that can be applied to existing applications much more easily. (If you think there’s more missing, please comment.)

2009 01 11

HTML
Programming
Web

Comments (26)

Permalink

Avoiding Silos: “link” as a first-class object

One of the constant annoyances to me in web applications is the self-proclaimed need for those applications to know about everything and do everything, and only spotty ad hoc techniques for including things from other applications.

An example might be blog navigation or search, where you can only include data from the application itself. Or "Recent Posts" which can only show locally-produce posts. What if I post something elsewhere? I have to create some shoddy placeholder post to refer to it. Bah! Underlying this the data is usually structured in a specific way, with the HTML being a sort of artifact of the database, the markup transient and a slave to the database’s structure.

An example of this might be a recent post listing like:

<ul>
for post in recent_posts:
<li>
<a href="/post/{{post.year}}/{{post.month}}/{{post.slug}}">
{{post.title}}</a>
</li>
</ul>

There’s clearly no room for exceptions in this code. I am thus proposing that any system like this should have the notion of a "link" as a first-class object. The code should look like this:

<ul>
for post in recent_posts:
<li>
{{post.link()}}
</li>
</ul>

Just like with changing IDs to links in service documents, the template doesn’t actually look any more complicated than it did before (simpler, even). But now we can use simple object-oriented techniques to create first-class links. The code might look like:

class Post(SomeORM):
def url(self):
if self.type == 'link':
return self.body
else:
base = get_request().application_url
return '%s/%s/%s/%s' % (
base, self.year, self.month, self.slug)

def link(self):
return html('<a href="%s">%s</a>') % (
self.url(), self.title)

The addition of the .url() method has the obvious effect of making these offsite links work. Using a .link() method has the added advantage of allowing things like HTML snippets to be inserted into the system (even though that is not implemented here). By allowing arbitrary HTML in certain places you make it possible for people to extend the site in little ways — possibly adding markup to a title, or allowing an item in the list that actually contains two URLs (e.g., <a href="url1">Some Item</a> (<a href="url2">via</a>)).

In the context of Python I recommend making these into methods, not properties, because it allows you to later add keyword arguments to specialize the markup (like post.link(abbreviated=True)).

One negative aspect of this is that you cannot affect all the markup through the template alone, you may have to go into the Python code to change things. Anyone have ideas for handling this problem?

2008 12 27

HTML
Programming
Python
Web

Comments (13)

Permalink

Ian Bicking: a blog

Web

toppcloud renamed to Silver Lining

2010 03 03

Throw out your frameworks! (forms included)

2010 03 01

Why toppcloud (Silver Lining) will not be agnostic

2010 02 10

Weave: valuable client-side data

2010 02 06

toppcloud (Silver Lining) and Django

Creating a Layout

Setting Up A Database

2010 02 05

A new way to deploy web applications

2010 01 29

WebOb decorator

2009 05 22

Modern Web Design, I Renounce Thee!

2009 01 14

Atompub as an alternative to WebDAV

2009 01 11

Avoiding Silos: “link” as a first-class object

2008 12 27

Home

About

Archives

Categories

Recent Posts

Recent Comments