HTTP(ish) all the way down

So what would a RESTish web framework look like? I don't see much point to focusing on PUT and DELETE or URI design, simply because it doesn't achieve anything new. That stuff can lead to non-browser architectures, which means that there will have to be an intermediary, and that intermediary will need a web framework, and the premise here is that we're thinking about web frameworks.

Maybe it's better to just think about the goals of REST... loosely coupled, less code, decentralized, scalable, maybe even vague ideas like democratic, hackable, avoiding consulting-ware, less code, etc. etc. Small pieces.

So... Anders Pearson posted about his RESTful stack, a concrete set of actual small pieces. It's not one beautiful diamond, but just stuff. Hopefully useful. The architecture is interesting when thinking about them working in concert, not reflecting on the beauty of any one piece. Many of the pieces are incredibly boring, in fact. Tagging, okay, but a RESTful hash table? (Well, he's not the only one doing that -- but clearly it takes a bit of imagination to see the potential; the queue stuff Amazon is doing is also pretty abstract.)

What's interesting -- and hard -- is doing something with this mix. How will we keep the architecture manageable?

The direction I've been focused on for a while now is how to use this style internally in an application. Because yes, you may want to move some RESTful piece of your app onto another server, onto another service, into another environment. But first you have to make something useful -- scaling tools is boring, and for most of us scalability isn't even the reason for REST. Only working web applications have to scale.

So how can you handle a simple application (like, say, a blog) built from a dozen RESTful components? Lots of mod_rewrite rules, several startup scripts, some XML configuration, a couple app servers... oh just shoot me now. Setting up systems gets geometrically more annoying and error-prone as you add the number of systems working in concert. Probably half of why people like the ASP model is just because installation is so damned hard. You could just use S3 from the beginning... but I find that a non-starter as well, when the distributed hash table is just one piece among many. And then if you think about automated testing... ASP-only just isn't an option.

But you can manage these pieces. That's what the small apps post was about. And that's what the architecture of Paste is about. I say the "architecture", because Paste includes several examples of the kind of small application/component that you would use in this situation. By sticking to WSGI -- kind of an HTTP-in-Python -- any piece can be fully a peer, and Paste itself doesn't have a privileged position. A larger system of these kinds of components is in the works, like Beaker for sessions and caching. For an example of how small an application might be, in Paste there's an application for sending a single file (paste.fileapp.FileApp), or an application for sending 404 messages (httpexceptions.HTTPNotFound.wsgi_application). It sounds tedious, but those applications are as simple to invoke as they are to describe (at least once you have the WSGI invocation part down).

In addition to these endpoints, there are lots of intermediaries that rely on the transparency of the request. These get called "middleware"... but ignore the terminology. An example is code that checks IP addresses against rules (paste.auth.grantip), or validates HTML (paste.debug.wdg_validate). Intermediaries and end-point applications are even fuzzy -- paste.auth.open_id does a little of both, annotating some requests and completely intercepting others (since Open ID logins span several requests). In the context of WSGI, authentication is solved. It goes in REMOTE_USER (recognizing the standards we already have, even if those standards are sometimes undervalued due to poor implementations in the past). How's REMOTE_USER get there? Out of scope! (Like REST, WSGI is as much about what it doesn't promise as what it does promise.)

Paste Deploy is then a way to bring these pieces together and configure them, since a website will be built from a bunch of pieces. The actual invocation looks rather primitive, but the important part is that configuration is another syntax for Python function calls. So when you build a cohesive application from these pieces you put them all in a single process, build them at fixed locations. You don't configure them independently, instead you provide a cohesive single view of configuration, and programmatically configure the sub-pieces.

Another issue with a highly granular REST stack is the HTTP overhead. This may be a case of premature optimization, but latency in particular is something that might bite you later. But because WSGI maps so closely to HTTP, you can make equivalent REST calls purely with WSGI and no network connection; this just turns into some function calls. Later on you can break it up, scale it out, configure pieces to be in different locations. If you need to... and you probably won't, and you may not know which pieces should be broken off to begin with.

Anyway, that's why I think WSGI and Paste have some potential to work well with REST; not because they respect some idea of the purity of HTTP, but because they facilitate realistic architectures that are highly decoupled with HTTP (or looks-like-HTTP WSGI call) bringing those pieces together

Created 26 Apr '06

Comments:

Concur, this is the idea that finally made me buy into WSGI a while back: lots of modular components talking (like) HTTP up and down, perhaps minimally functional on their own or just web-like intermediaries.

In a (yet another) small side project that will probably never get beyond the barely functional prototype let alone see the outside world, I'm experimenting with implementing a REST-like interface to a relational database as a WSGI app, outputting simplistic XHTML representations of the data (rows/records), and then I have a minimal intermediary that queries this app to combine/restrict datasets or follow link references for parent-detail display. If I go much further in splitting things out (separate apps for each relational algebra operator?) I'll probably end up with a great example of what NOT to do with WSGI, but it's liberating in some ways.

Definitely some mental mismatch at this point in doing all subprocessing against the HTML output of upstream apps, but the less extreme cases for compositing miniapps using WSGI are completely right in my mind now.

# Luke Opperman

Clark Evans, who has been a Paste contributor, was using it for an application exactly like you describe -- SQL over HTTP. It's not open source, but I thought the was trying to get it open sourced... but I don't know if that worked out ultimately.

# Ian Bicking

Yeah, configuration/deployment and testing have been the hardest problems I've come up against with this architecture and ones that I don't feel like I have a good solution to yet.

The WSGI/Paste approach is interesting. Unfortunately, my apps still need to work in a very heterogeneous environment so if I put everything into just a WSGI stack, I lose the ability to reuse it across languages as easily. Unified configuration is tempting enough that I might go for it sometime though.

The idea of building your app as a bunch of WSGI components and later breaking them out into seperate REST apps as needed has crossed my mind as well. My thought on it was to go the other way though and have a library and registry that abstracts the difference away. So you would build all your components as WSGI components but call them through a fully HTTP looking API. Then, there would be a registry somewhere that knows which urls map to actual seperate applications and which can be silently converted to WSGI calls. So you do something like:

from magic_REST_WSGI_library import GET
tags = GET("http://tasty.example.com/item/foo/")

and a registry that magic_REST_WSGI_library reads knows that "tasty.example.com" can really be mapped to a WSGI component and called in process. That way, when that component does need to be moved out to its own machine or written in a different language or something, no code has to be changed; just an update to the registry telling it that requests to that service now have to be proper HTTP requests. It's sort of a "have your cake and eat it too" approach. You get the flexibility and loose coupling of REST components but, if you happen to write all your components in python and keep them WSGI compatible, you can keep the performance and centralized configuration of having them all in-process. Maybe this is something that Paste could help with.

The other observation I have on performance of applications built up out of small REST applications is that asynchronous requests are your friend. Latency does add up quickly and the way to avoid it is to avoid synchronous communication any time you can. Looking at how Erlang applications are structured is advisable.

# anders

The WSGI/Paste approach is interesting. Unfortunately, my apps still need to work in a very heterogeneous environment so if I put everything into just a WSGI stack, I lose the ability to reuse it across languages as easily. Unified configuration is tempting enough that I might go for it sometime though.

You'd still be making HTTPish calls between applications. WSGI (and even Paste) doesn't relate to that; it just makes it more manageable when you have a bunch of WSGI/Paste pieces. Well, WSGI does give you a backdoor to make requests on behalf of the original user, since it has a slightly larger concept of a request than what can be embedded in HTTP, including things like trusted attributes (keys that don't start with HTTP_).

Of course, you can opt not to use these things and stick to what can be represented in HTTP. But it can be very tempting.

And of course, middleware is breaking out of what HTTP can give you. Though it'd be neat if you could run PHP with FastCGI under a WSGI stack.

Anyway, there's nothing stopping you from sticking to language-neutral constructs when using WSGI, and when WSGI-specific constructs get used it's fairly explicit so you know what you are getting into.

The idea of building your app as a bunch of WSGI components and later breaking them out into seperate REST apps as needed has crossed my mind as well. My thought on it was to go the other way though and have a library and registry that abstracts the difference away. So you would build all your components as WSGI components but call them through a fully HTTP looking API. Then, there would be a registry somewhere that knows which urls map to actual seperate applications and which can be silently converted to WSGI calls.

Titus' WSGI intercept does just this transparently for anything using urllib/2.

For something that requires less configuration and is probably a bit faster of a path, paste.recursive offers a model where you phrase it as a WSGI request. By adding to that, you could translate the request into an actual HTTP request if it was outside of the WSGI stack. paste.recursive isn't exactly the right match (and paste.proxy isn't exactly right for making a WSGI request into an HTTP request), but it would be very close to that. You wouldn't need configuration, because by keeping note of the request on the way in you can automatically determine where the root of the WSGI server is. Though it could be tricky if you have fancy dispatching, like virtual host dispatching, which means that the server is capable of internally responding to more requests than you might think.

The other observation I have on performance of applications built up out of small REST applications is that asynchronous requests are your friend.

I like the idea of using callbacks more, though I haven't tried it in practice. Something like "do this, then when you are done POST to this URL with the results". Of course, many UIs need synchronous actions (people like to know that when they do something it is actually done), so you can't solve all latency problems this way.

# Ian Bicking

It's some what off-topic, but I've been trying to figure out how to do simple, RESTful, yet secure authentication for xmlrpc.

Take, for example, the Blogger API. Sends passwords via plaintext for each API hit. RESTful, but not secure. You can make it more secure with a back-and-forth digest authentication model, but then it's not really RESTful.

Thoughts on that?

# Ken Kinder

FWIW, the RESTful hashtable (pita) is pretty handy for building stateful flash apps - that is, thicker clients that want to be able to save/load, and just stash their per-user state someplace. Slap on some standard metadata, authentication policy, and data-pocket mgmt and you have yourself a really useful mini-app.

No, this would not be used for lots and lots of frequent hashtable lookups within a program.

While we are on the topic, I have also written a post. on why this design is important for people concerned with freeculture.

# Jonah

Ian Bicking: the old part of his blog

HTTP(ish) all the way down

Comments: