Ian Bicking: a blog :: 2008

{ Monthly Archives }

October 2008

pyinstall is dead, long live pip!

I’ve finished renaming pyinstall to its new name: pip. The name pip is a acronym and declaration: pip installs packages.

I’ve also added a small feature intended for Google App Engine users, allowing you to zip and unzip packages in an environment. For instance:

$ pip zip --list
In ./lib/python2.5/site-packages:
No zipped packages.
Unzipped packages:
paste (98 files)
pygments (64 files)
tempita (7 files)
weberror (31 files)
webob (22 files)
webtest (9 files)
nose (43 files)
setuptools-0.6c9-py2.5.egg (43 files)
simplejson (28 files)
$ pip zip webob
Zip webob (in ./lib/python2.5/site-packages/webob)

Right now this doesn’t work well with egg directories (i.e., packages installed with easy_install), though that shouldn’t be too hard to resolve. pip install itself does not install packages into egg directories (it does install eggs, which is to say it installs all the egg metadata and works fine with pkg_resources).

I don’t really use buildout myself, but I would like to throw it out there that I think someone should create a pip recipe as an alternative to zc.recipe.egg. There’s not really a stable programmatic API in pip at this point, but with no consumers of the API it feels like premature design to settle on something now — integrate with pip and we can figure out what that stable API should be. If you integrate buildout, probably another useful feature would be an option have to pip freeze write the packages out to a setting in your buildout.cfg.

2008 10 28

Programming

Comments (19)

Permalink

Hypertext-driven URLs

Roy T. Fielding, author of the REST thesis wrote an article recently: REST APIs must be hypertext-driven. I liked this article, it fit with an intuition I’ve had. Then he wrote an article explaining that he wouldn’t really explain the other articles because, I guess, he wanted a conversation with the specialists, and it seems like a kind of invitation to reinterpret his writing. So since others are doing it I figured I’d do it too.

I’d summarize his argument thus:

Focus on media types, i.e., resource formats, i.e., document formats. The protocol will flow from these if they are well specified.
URL structures are not a media type. They are some kind of server layout. You can’t hold them, you can’t pass them around, there is no notion of CRUD. Media types have all sorts of advantages that URL structures do not.

An example of a protocol based on a URL structure would be something like:

Do GET /articles/ to get a JSON list of all the article ids, with a response like [1, 2, 3]
Do a GET /articles/{id} to get the representation of a specific article.

JSON is a reasonable structure for a media type. It is not itself a fully explained type, because it’s just a container for data, just like XML. In this example you have a document, [1, 2, 3] which isn’t self-describing and just isn’t very useful. A more appropriate protocol would be:

You start with a container, in our example /articles/. Do GET /articles/ to get a JSON document listing the URLs of all the articles. These URLs are relative to the container URL. You’ll get a response like ['./1', './2', './3'] (actually ['1', '2', '3'] would be fine too).
Do GET {article-url} to get the article representation.

It’s a small difference. Heck, the communication could look identical in practice, but by putting URLs in the JSON document instead of this abstract "id" notion you’ve created a more flexible and self-describing system. You could probably give a name to that list of URLs, and then just talk about that name.

An example in Atompub is rel="edit". An Atom entry can look like:

Instead of the client just somehow knowing where to go to edit an entry, it’s made explicit. Thus you can move the entry around, while still pointing back to the canonical location to edit that entry.

There’s nothing really that complicated about this, the rule is really quite simple: link to other things, don’t just expect the client to know or guess where those other things are.

For a more concrete example of where this linking works well, OpenID uses <link rel="openid.server" href="..."> and <link rel="openid.delegate" href="...">, which allows you to add a little information to any HTML homepage so that the login can happen at a third location. If OpenID used something like looking at {homepage}/openid for a OpenID server then you couldn’t select whatever OpenID service you liked, or change services, or apply OpenID to hosted locations where you couldn’t install an OpenID server.

I’ll add my own little opinion in here: this is why the URL structure of applications doesn’t affect their RESTfulness, nor is URL structure all that important of a concern generally. Pretty URL structures are a nice thing to do, like indenting your code in a pleasant way, but it has nothing to do with your API, and if you can’t use a crappy URL structure with that same API then probably something is wrong with that API.

2008 10 24

Programming
Web

Comments (13)

Permalink

Decorators and Descriptors

So, decorators are neat (maybe check out a new tutorial on them). Descriptors are neat but may seem hard (though they hardly take a long time to describe). Sometimes these two things intersect, and this post describes how.

Here’s an example decorator where this comes up. First, we want something that looks like a WSGI application:

def application(environ, start_response):
start_response('200 OK', [('content-type', 'text/html')])
return ['hi!']

But we want to use WebOb, like:

from webob import Request, Response
def application(environ, start_response):
req = Request(environ)
resp = Response('hi!')
return resp(environ, start_response)

(We don’t use req in this example, but of course you probably would in a real WSGI application)

Now req = Request(environ) is boilerplate, and it’d be nicer to just do return resp instead of return resp(environ, start_response). So let’s make a decorator to do that:

class wsgiapp(object):
def __init__(self, func):
self.func = func
def __call__(self, environ, start_response):
resp = self.func(Request(environ))
return resp(environ, start_response)

@wsgiapp
def application(req):
return Response('hi!')

If you don’t understand what happened there, go read up on decorators.
Now, what if you want to decorate a method? For instance:

class Application(object):
def __init__(self, text):
self.text = text
@wsgiapp
def __call__(self, req):
return Response(self.text)
application = Application('hi!')

This won’t quite work, because @wsgiapp will call Application.__call__(req) — with no self argument. This is generally a problem with any decorator that changes the signature, because the signature for methods has this extra self argument. Descriptors can handle this. First we’ll have the same wsgiapp definition as we had before, but we’ll add the magic descriptor method __get__:

class wsgiapp(object):
def __init__(self, func):
self.func = func
def __call__(self, environ, start_response):
resp = self.func(Request(environ))
return resp(environ, start_response)
def __get__(self, obj, type=None):
if obj is None:
return self
new_func = self.func.__get__(obj, type)
return self.__class__(new_func)

So, to explain:

When you get an attribute from an instance, like Application().__call__, Python will check if the object that was fetched has a __get__ method. If it does, it will call that method and use the result of that method.

This part:

if obj is None:
return self

is what happens when you do Application.__call__ — in other words, when get a class attribute. In that case obj (self) will be None, and it will just return the descriptor (it could do something else, like in this example, but usually it doesn’t).

Functions already have a __get__ method. You can try it yourself:

>>> def example(*args):
... print 'got', args
>>> example_bound = example.__get__(1)
>>> example_bound('test')
got (1, 'test')

So in the example with wsgiapp we are just changing the decorator to wrap the new bound function instead of the old unbound function. This allows wsgiapp to be compatible with both plain functions and methods. In fact, it would probably be preferable to always call func.__get__(obj, type) (even if obj is None), as then we could also wrap class methods or other kinds of descriptors.

2008 10 24

Python

Comments (17)

Permalink

The Philosophy of Deliverance

I’ll be attending PloneConf this year again, giving a talk about Deliverance. I’ve been working on Deliverance lately for work, but the hard part about it is that it’s not obviously useful. To help explain it I wrote the philosophy of Deliverance, which I will copy here, to give you an idea of what I’ve been doing:

Why is Deliverance? Why was it made, what purpose does it serve, why should you use it, how can it change the way you do web development?

On the Subject of Platforms

Right now we live in an age of platforms. Developers (or management or coincidence) decides on a platform, and that serves as the basis for all future development. Usually there’s some old things from a previous platform (or a primordial pre-platform age: I’m looking at you formmail.pl!) The goal is always to eliminate all of these old pieces, rewriting them for the new platform. That goal is seldom attained in a timely manner, and even before it is accomplished you may be moving to the next platform.

Why do you have to port everything forward to the newest platform? Well, presumably it is better engineered. The newest platform is presumably what people are most familiar with. But if those were the only reasons it would be hard to justify a rewrite of working software. Often the real push comes because your systems don’t work together. It’s hard to keep templates in sync across all the platforms. Multiple logins may be required. Navigation is inconsistent and incomplete. Functionality that cross-cuts pages — comments, login status, shopping cart status, etc — isn’t universally available.

A similar conflict arises when you consider how to add new functionality to a site. For example, you may want to add a blog. Do you:

Use the best blogging software available?
Use something native to your platform?
Write something yourself?

The answer is probably 2 or 3, because it would be too hard to integrate something foreign to your platform. This form of choice means that every platform has some kind of "blog", but the users of that blog are likely to only be a subset of the users of the parent platform. This makes it difficult for winners to emerge, or for a well-developed piece of software to really be successful. Platform-based software is limited by the adoption of the platform.

Not all software has a platform. These tend to be the most successful web applications, things like Trac, WordPress, etc.

"Aha!" you think "I’ll just use those best-of-breed applications!" But no! Those applications themselves turn into platforms. WordPress is practically a CMS. Trac too. Extensible applications, if successful, become their own platform. This is not to place blame, they aren’t necessarily any worse than any other platform, just an acknowledgment that this move to platform can happen anywhere.

Beyond Platforms, or A Better Platform

One of the major goals of Deliverance is to move beyond platforms. It is an integration tool, to allow applications from different frameworks or languages to be integrated gracefully.

There are only a few core reasons that people use platforms:

A common look-and-feel across the site.
Cohesive navigation.
Indexing of the entire site.
Shared authentication and user accounts.
Cross-cutting functionality (e.g., commenting).

Deliverance specifically addresses 1, providing a common look-and-feel across a site. It can provide some help with 2, by allowing navigation to be more centrally managed, without relying purely on per-application navigation (though per-application navigation is still essential to navigating the individual applications). 3, 4, and 5 are not addressed by Deliverance (at least not yet).

Deliverance applies a common theme across all the applications in your site. It’s basic unit of abstraction is HTML. It doesn’t use a particular templating language. It doesn’t know what an object is. HTML is something every web application produces. Deliverance’s means of communication is HTTP. It doesn’t call functions or create request objects [*]. Again, everything speaks HTTP.

Deliverance also allows you to include output from multiple locations. In all cases there’s the theme, a plain HTML page, and the content, whatever the underlying application returns. You can also include output from other parts of the site, most commonly navigation content that you can manage separately. All of these pieces can be dynamic — again, Deliverance only cares about HTML and HTTP, it doesn’t worry about what produces the response.

This is all very similar to systems built on XSLT transforms, except without the XSLT [†], and without XML. Strictly speaking you can apply XSLT to any parseable markup, even HTML, but the most common (or at least most talked about) way to apply XSLT is using "semantic" XML output that is transformed into HTML. Deliverance does not try to understand the semantics of applications, and instead expects them to provide appropriate presentation of whatever semantics the underlying application possesses. Presentation is more universal than semantics.

While Deliverance does its best to work with applications as-they-exist, without making particular demands on those applications, it is not perfect. Conflicting CSS can be a serious problem. Some applications don’t have very good structure to work with. You can’t generate any content in Deliverance, you can only manipulate existing content, and often that means finding new ways to generate content, or making sure you have a place to store your content (as in the case of navigation). This is why arguably Deliverance does not remove the need for a platform, but is just its own platform. In so far as this is true, Deliverance tries to be a better platform, where "better" is "more universal" rather than "more powerful". Most templating systems are more powerful than Deliverance transformations. It can be useful to have access to the underlying objects used to procude the markup. But Deliverance doesn’t give you these things, because it only implements things that can be applied to any source of content. Static files are entirely workable in Deliverance, just as any application written in Python, PHP, or even an application hosted on an entirely separate service is usable through Deliverance.

The Missing Parts

As mentioned before, two important benefits of a platform are missing from Deliverance. I’ll try to describe what I believe are the essential aspects. I hope at some time that Deliverance or some complementary application will be able to satisfy these needs. Also, I suggest some lines of development that might be easier than others.

Indexing The Entire Site

Typically each application has a notion of what all the interesting pages in that application are. Most applications have a set of uninteresting pages, or transient pages. A search result is transient, as an example. An application also knows when new pages appear, and when other pages disappear. A site-wide index of these pages would allow things like site maps, cross-application search, and cross-application reporting to be done.

An interesting exception to the knowledge an application has of itself: search results are generally boring. But a search result based on a category might still be interesting. The difference between a "search" and a "report" is largely in the eye of the beholder. An important feature is that the application shouldn’t be the sole entity allowed to mark interesting pages. Manually-managed lists of resources that may point to specific applications can allow people to usefully and easily tweak the site. Ideally even fully external resources could be included, such as a resource on an entirely different site.

To do indexing you need both events (to signal the creation, update, or deletion of an entity/page), and a list of entities (so the index can be completely regenerated). A simple way of giving a list of entities would be the Google Site Map XML resource. Signaling events is much more complex, so I won’t go into it in any greater depth here, but we’re working on a product called Cabochon to handle events.

One thing that indexing can provide is a way to use microformats. Right now microformats are interesting, but for most sites they are largely useless. You can mark up your content, but no one will do anything interesting with that markup. If you could easily code up an indexer that could keep up-to-date on all the content on your site, you could produce interesting results like cross-application mapping.

Shared Authentication And User Accounts

Authentication is one of the most common and annoying integration tasks when crossing platform boundaries. Systems like Open ID offer the ability to unify cross-site authentication, but they don’t actually solve the problem of a single site with multiple applications.

There is a basic protocol in HTTP for authentication, one that is workable for a system like Deliverance, and there are already several existing products (like repoze.who) that work this way. It works like this:

The logged-in username is sent in some header, e.g., X-Remote-User. Some kind of signing is necessary to really trust this header (Deliverance could filter out that header in incoming requests, but if you removed Deliverance from the stack you’d have a security hole).
If the user isn’t logged in, and the application wants them to log in, the application response with a 401 Unauthorized response. It is supposed to set the WWW-Authenticate header, probably to some value indicating that the intermediary should determine the authentication type. In some cases a kind of HTTP authentication is required (typically Basic or Digest) because cookie-based logins are too stateful (e.g., in APIs, or for WebDAV access).
The intermediary catches the 401 and initiates the login process. This might mean a redirect to a login page, and setting a cookie on successful login. The login page and setting the cookie could potentially be done by an application outside of the intermediary; the intermediary only has to do the appropriate redirects and setting of headers.
In the case when a user is logged in but isn’t permitted, the application simply sends a 403 Forbidden response. The intermediary shouldn’t actually do anything in this case (though maybe it could usefully add a logout link to that message). I only mention this because some systems use 401 for Forbidden, which causes no end of problems.

While some applications allow for this kind of authentication scheme, many do not. However, the scheme is general enough that I think it is justifiable that applications could be patched to work like this.

This handles shared authentication, but the only information handed around is a username. Information about the user — the real name, email, homepage, permission roles, etc — are not shared in this model.

You could add something like an internal location to the username. E.g.: X-Remote-User: bob; info_url=http://mysite.com/users/bob.xml. It would be the application’s responsibility to make a subrequest to fetch that information. This can be somewhat inefficient, though with appropriate caching perhaps it would be fine. But many applications want very much to have a complete record of all users. Changing this is likely to be much harder than changing the authentication scheme. A more feasible system might be something on the order of what is described in Indexing the Entire Site: provide a complete listing of the site as well as events when users are created, updated, or deleted, and allow applications to maintain their own private but synced databases of users.

A common permission system is another level of integration. One way of handling this would be if applications had a published set of actions that could be performed, and the person integrating the application could map actions to roles/groups on the system.

Cross-cutting Functionality

This item requires a bit of explanation. This is functionality that cuts across multiple parts of the site. An example might be comments, where you want a commenting system to be applicable to a variety of entities (though probably not all entities). Or you might want page-update notification, or to provide a feed of changes to the entity.

You might also want to include some request logger like Google Analytics to all pages, but this is already handled well by Deliverance theming. Deliverance’s aggregation handles universal content well, but it doesn’t handle content (or subrequests) that should only be present in a portion of pages.

One possible way to address this is transclusion, where a page can specifically request some other resource to be included in the page. A simple subrequest could accomplish this, but many applications make it relatively easy to include some extra markup (e.g., by editing their templates) but not so easy to do something like a subrequest. We’ve written a product Transcluder to use an HTML format to indicate transclusion.

It’s also possible using Deliverance that you could implement this functionality without any application modification, though it means added configuration — an application written to be inserted into a page via Deliverance, and a Deliverance rule that plugs everything together (but if written incorrectly would have to be debugged).

Other Conventions

In addition to this, other platform-like conventions would make the life of the integrator much easier.

Template Customization

While Deliverance handles the look-and-feel of a page, it leaves the inner chunk of content to the application. If you want to tweak something small you will still need to customize the template of the application.

It would be wonderful if applications could report on what files were used in the construction of a request, and used a common search path so you could easily override those files.

Backups and Other Maintenance

Process management can be handled by something like Supervisor, and maybe in the future Deliverance will even embed Supervisor.

But even then, regular backups of the system are important. Typically each application has its own way of producing a backup. Conventions for producing backups would be ideal. Additional conventions for restoring backups would be even better.

Many systems also require periodic maintenance — compacting databases, checking for any integrity problems, etc. Some unified cron-like system might be handy, though it’s also workable for applications to handle this internally in whatever ad hoc way seems appropriate.

Common Error Reporting

With a system where one of many components can fail, it’s important to keep track of these problems. If errors just end up in one of 10 log files, it’s unlikely anyone is closely tracking them.

One product we’re working on to help with this is ErrorEater, which works along with Supervisor. Applications have to be modified to emit errors in a specific format that Supervisor understands, but this is generally not too difficult.

Farming

Application farming is when one instance of an application can support many "sites". These might be sites with their own domains, or just distinct projects. Examples are Trac, which supports multiple projects in one instance, or WordPress MU which supports many WordPress instances running off a single database and code base.

It would be nice if you could add a simple header to a request, like X-Project-Name: foo and that would be used by all these products to select the site (or sub-site or project or any other organization unit). Then mapping domain names, paths, or other aspects of a request to the project could be handled once and the applications could all consistently consume it.

(Internally for openplans.org we’re using X-OpenPlans-Project and custom patches to several projects to support this, but it’s all ad hoc.)

Footnotes

[*]	This isn’t entirely true, Deliverance internally uses WSGI which is a Python-level abstraction of HTTP calls.

[†]	At different times in the past, in an experimental branch right now, and potentially integrated in the future, Deliverance has been compiled down to XSLT rules. So Deliverance could be seen even as an simple transformation language that compiles down to XSLT.

2008 10 06

HTML
Programming
Python
Web

Comments (2)

Permalink

The Poverty Of Our National Debate

We had a debate party tonight for the Biden-Palin debate. It’s nice to watch it in a group of like-minded people. Taking the Democrat/Republican debate seriously is a bullshit game and I don’t have any desire to bring this farce into my normal life.

After the debate was over, I wanted to discuss the debate. After all, it’s weird to watch something for an hour and a half and then just ignore that we spent that time watching it. The problem is that I hate the punditry. No one actually said "did Palin do what she had to do?" (I probably would have screamed) but it’s just really hard not to talk about "what will people think of this debate?" And part of that is because we all know what we think. We saw through Palin deliberately ignoring the questions and reading her already-prepared speech. We all had a basic understanding of what is fact and what is a lie or misrepresentation. It’s nice to share little stories (like stories from the article about how McCain is a jerk). But it’s so damn hard not to fall into a discussion about the horserace, about what other people will think. Why is it so hard to talk about what we think? Not what we analyze, but what we actually believe? Instead of predicting something that will come to pass regardless of our predictions, shouldn’t we be developing our own beliefs? That seems far more relevant to our lives.

There’s probably a lot of reasons for that. It’s intimidating to be entirely genuine, to speak without irony. And all the news is about the horserace, so we are all well informed, it makes it easy to talk.

I think a large part of the problem is that the spectrum of opinions is so narrow (even if also bifurcated) that it’s hard to have an interesting discussion of political issues. Lacking anything of real substance to discuss, we discuss the discussion, we make predictions instead of forming real opinions. While I’m willing to blame many things on the Republicans, this is the product of both parties, of the narrow ignorance of "conventional wisdom." For instance, the debate about the economic bailout has been rich with rhetoric but starved of any real ideas. I didn’t even realize how limited the debate was until I listened to this interview where Steve Fraser kind of says, well, we can do whatever we want. That is to say, we can actually make collective decisions about the direction of our economy, instead of the impotent position that is assumed in all current debates, where we can only poke lightly at the economy (and it’s implied anything more would destroy it).

We can’t really talk about what kind of healthcare system we’d like, because the system nearly everyone wants is not an acceptable part of conventional wisdom. Socialized healthcare is the only reasonable option, but of course there’s lots of ways it could work, there’s lots of room for genuine and important discussion. But instead we have a staggeringly horrible proposal, and a merely not quite as bad as the current situation proposal. Given this set of options you can’t have real discussion.

In the end our own happiness is mostly in our own hands. The choices we make for ourselves are more significant than the choices made by the government (the choices we make collectively). But our collective choices do matter. We certainly haven’t figured out happiness. And maybe government does best when it has the least effect on our lives, but while that’s one end of the bifurcated conventional wisdom, as an idea it remains largely uninspected. When I consider many of the pleasant conveniences in my life, government is part of a lot of them. It doesn’t do much to make me more spiritually fulfilled, but the idea that government is a hopeless place to look for our collective happiness is a truism that lacks real consideration.

Political discussion is stuck in a terrible intellectual rut. Blame falls equally on both parties. They hold on greedily to their monopoly of political thought. It’s like religious doctrine, something to which politicians must submit before being allowed to progress, a sign of submission to a larger system of power. I have this hope that Obama is going through the rites with discipline but without true belief, that he is being subversive, diving straight to the belly of the beast. But this is only speculation, perhaps a naive dream, a desire to project my hopes onto a figure of vague and general hope.

I don’t really want to spend too much time discussing all the things that are wrong. This is the depressing comfort zone of the left. I want to talk about how things could be right, about how we can make a world that isn’t just less unjust but a world that is more beautiful, more wonderful, more full of life and freedom and passion. I want to exult in the potential of the future.

2008 10 03

Non-technical
Politics

Comments (10)

Permalink

pyinstall pybundles

Update: pyinstall has been renamed to pip (per the suggestions in the comments to this post)

I added pybundles to pyinstall very shortly before I announced pyinstall. I hadn’t actually tried it out that much. Since then I’ve made three more minor releases of pyinstall, and I think the bundle support is working pretty decently.

A .pybundle file is just a bunch of source code, all the source code you need to install some package(s). For instance, for Deliverance I’ve created a bundle file so you can do:

$ easy_install pyinstall virtualenv
$ pyinstall.py -E DeliveranceTest/ \\
> http://deliverance.openplans.org/dist/Deliverance-snapshot-latest.pybundle

This creates a virtualenv environment in DeliveranceTest/, unpacks all the source from the bundle, and installs all of it. It’s not magical — it still has to compile the source and move the files around — but it does mean just a single download, and the versions of everything that is installed aren’t going to change unless that bundle file is regenerated.

I’ve been thinking about some other features for pybundles, like post installation scripts. All of this has raised a problem though: pyinstall needs to be a two-level command, with commands like:

pyinstall.py install X
pyinstall.py bundle Y
pyinstall.py freeze req.txt
and of course the not-yet-implemented:
pyinstall.py remove Z

But pyinstall.py install does not read well. It’s not too late to rename the package (yet again), or just rename the script. Ideas?

2008 10 01

Python

Comments (20)

Permalink

Ian Bicking: a blog

October 2008

pyinstall is dead, long live pip!

2008 10 28

Hypertext-driven URLs

2008 10 24

Decorators and Descriptors

2008 10 24

The Philosophy of Deliverance

On the Subject of Platforms

Beyond Platforms, or A Better Platform

The Missing Parts

Indexing The Entire Site

Shared Authentication And User Accounts

Cross-cutting Functionality

Other Conventions

Template Customization

Backups and Other Maintenance

Common Error Reporting

Farming

Footnotes

2008 10 06

The Poverty Of Our National Debate

2008 10 03

pyinstall pybundles

2008 10 01

Home

About

Archives

Categories

Recent Posts

Recent Comments