Ian Bicking: a blog :: 2011

A Python Web Application Package and Format (we should make one)

At PyCon there was an open space about deployment, and the idea of drop-in applications (Java-WAR-style).

I generally get pessimistic about 80% solutions, and dropping in a WAR file feels like an 80% solution to me. I’ve used the Hudson/Jenkins installer (which I think is specifically a project that got WARs on people’s minds), and in a lot of ways that installer is nice, but it’s also kind of wonky, it makes configuration unclear, it’s not always clear when it installs or configures itself through the web, and when you have to do this at the system level, nor is it clear where it puts files and data, etc. So a great initial experience doesn’t feel like a great ongoing experience to me — and it doesn’t have to be that way. If those were necessary compromises, sure, but they aren’t. And because we don’t have WAR files, if we’re proposing to make something new, then we have every opportunity to make things better.

So the question then is what we’re trying to make. To me: we want applications that are easy to install, that are self-describing, self-configuring (or at least guide you through configuration), reliable with respect to their environment (not dependent on system tweaking), upgradable, and respectful of persistence (the data that outlives the application install). A lot of this can be done by the "container" (to use Java parlance; or "environment") — if you just have the app packaged in a nice way, the container (server environment, hosting service, etc) can handle all the system-specific things to make the application actually work.

At which point I am of course reminded of my Silver Lining project, which defines something very much like this. Silver Lining isn’t just an application format, and things aren’t fully extracted along these lines, but it’s pretty close and it addresses a lot of important issues in the lifecycle of an application. To be clear: Silver Lining is an application packaging format, a server configuration library, a cloud server management tool, a persistence management tool, and a tool to manage the application with respect to all these services over time. It is a bunch of things, maybe too many things, so it is not unreasonable to pick out a smaller subset to focus on. Maybe an easy place to start (and good for Silver Lining itself) would be to separate at least the application format (and tools to manage applications in that state, e.g., installing new libraries) from the tools that make use of such applications (deploy, etc).

Some opinions I have on this format, exemplified in Silver Lining:

It’s not zipped or a single file, unlike WARs. Uploading zip files is not a great API. Geez. I know there’s this desire to "just drop in a file"; but there’s no getting around the fact that "dropping a file" becomes a deployment protocol and it’s an incredibly impoverished protocol. The format is also not subtly git-based (ala Heroku) because git push is not a good deployment protocol.
But of course there isn’t really any deployment protocol inferred by a format anyway, so maybe I’m getting ahead of myself ;) I’m saying a tool that deploys should take as an argument a directory, not a single file. (If the tool then zips it up and uploads it, fine!)
Configuration "comes from the outside". That is, an application requests services, and the container tells the application where those services are. For Silver Lining I’ve used environmental variables. I think this one point is really important — the container tells the application. As a counter-example, an application that comes with a Puppet deployment recipe is essentially telling the server how to arrange itself to suit the application. This will never be reliable or simple!
The application indicates what "services" it wants; for instance, it may want to have access to a MySQL database. The container then provides this to the application. In practice this means installing the actual packages, but also creating a database and setting up permissions appropriately. The alternative is never having any dependencies, meaning you have to use SQLite databases or ad hoc structures, etc. But in fact installing databases really isn’t that hard these days.
All persistence has to use a service of some kind. If you want to be able to write to files, you need to use a file service. This means the container is fully aware of everything the application is leaving behind. All the various paths an application should use are given in different environmental variables (many of which don’t need to be invented anew, e.g., $TMPDIR).
It uses vendor libraries exclusively for Python libraries. That means the application bundles all the libraries it requires. Nothing ever gets installed at deploy-time. This is in contrast to using a requirements.txt list of packages at deployment time. If you want to use those tools for development that’s fine, just not for deployment.
There is also a way to indicate other libraries you might require; e.g., you might lxml, or even something that isn’t quite a library, like git (if you are making a github clone). You can’t do those as vendor libraries (they include non-portable binaries). Currently in Silver Lining the application description can contain a list of Ubuntu package names to install. Of course that would have to be abstracted some.
You can ask for scripts or a request to be invoked for an application after an installation or deployment. It’s lame to try to test if is-this-app-installed on every request, which is the frequent alternative. Also, it gives the application the chance to signal that the installation failed.
It has a very simple (possibly/probably too simple) sense of configuration. You don’t have to use this if you make your app self-configuring (i.e., build in a web-accessible settings screen), but in practice it felt like some simple sense of configuration would be helpful.

Things that could be improved:

There are some places where you might be encouraged to use routines from the silversupport package. There are very few! But maybe an alternative could be provided for these cases.
A little convention-over-configuration is probably suitable for the bundled libraries; silver includes tools to manage things, but it gets a little twisty. When creating a new project I find myself creating several .pth files, special customizing modules, etc. Managing vendor libraries is also not obvious.
Services are IMHO quite important and useful, but also need to be carefully specified.
There’s a bunch of runtime expectations that aren’t part of the format, but in practice would be part of how the application is written. For instance, I make sure each app has its own temporary directory, and that it is cleared on update. If you keep session files in that location, and you expect the environment to clean up old sessions — well, either all environments should do that, or none should.
The process model is not entirely clear. I tried to simply define one process model (unthreaded, multiple processes), but I’m not sure that’s suitable — most notably, multiple processes have a significant memory impact compared to threads. An application should at least be able to indicate what process models it accepts and prefers.
Static files are all convention over configuration — you put static files under static/ and then they are available. So static/style.css would be at /style.css. I think this is generally good, but putting all static files under one URL path (e.g., /media/) can be good for other reasons as well. Maybe there should be conventions for both.
Cron jobs are important. Though maybe they could just be yet another kind of service? Many extra features could be new services.
Logging is also important; Silver Lining attempts to handle that somewhat, but it could be specified much better.
Silver Lining also supports PHP, which seemed to cause a bit of stress. But just ignore that. It’s really easy to ignore.

There is a description of the configuration file for apps. The environmental variables are also notably part of the application’s expectations. The file layout is explained (together with a bunch of Silver Lining-specific concepts) in Development Patterns. Besides all that there is admittedly some other stuff that is only really specified in code; but in Silver Lining’s defense, specified in code is better than unspecified ;) App Engine provides another example of an application format, and would be worth using as a point of discussion or contrast (I did that myself when writing Silver Lining).

Discussing WSGI stuff with Ben Bangert at PyCon he noted that he didn’t really feel like the WSGI pieces needed that much more work, or at least that’s not where the interesting work was — the interesting work is in the tooling. An application format could provide a great basis for building this tooling. And I honestly think that the tooling has been held back more by divergent patterns of development than by the difficulty of writing the tools themselves; and a good, general application format could fix that.

Javascript on the server AND the client is not a big deal

All the cool kids love Node.js. I’ve used it a little, and it’s fine; I was able to do what I wanted to do, and it wasn’t particularly painful. It’s fun to use something new, and it’s relatively straight-forward to get started so it’s an emotionally satisfying experience.

There are several reasons you might want to use Node.js, and I’ll ignore many of them, but I want to talk about one in particular:

Javascript on the client and the server!

Is this such a great feature? I think not…

You only need to know one language!

Sure. Yay ignorance! But really, this is fine but unlikely to be relevant to any current potential audience for Node.js. If you are shooting for an very-easy-to-learn client-server programming system, Node.js isn’t it. Maybe Couch or something similar has that potential? But I digress.

It’s not easy to have expertise at multiple languages. But it’s not that hard. It’s considerably harder to have expertise at multiple platforms. Node.js gives you one language across client and server, but not one platform. Node.js programming doesn’t feel like the browser environment. They do adopt many conventions when it’s reasonable, but even then it’s not always the case — in particular because many browser APIs are the awkward product of C++ programmers exposing things to Javascript, and you don’t want to reproduce those same APIs if you don’t have to (and Node.js doesn’t have to!) — an example is the event pattern in Node, which is similar to a browser but less obtuse.

You get to share libraries!

First: the same set of libraries is probably not applicable. If you can do it on the client then you probably don’t have to do it on the server, and vice versa.

But sometimes the same libraries are useful. Can you really share them? Browser libraries are often hard to use elsewhere because they rely on browser APIs. These APIs are frequently impossible to implement in Javascript.

Actually they are possible to implement in Javascript using Proxies (or maybe some other new and not-yet-standard Javascript features). But not in Node.js, which uses V8, and V8 is a pretty conservative implementation of the Javascript language. (Update: it is noted that you can implement proxies — in this case a C++ extension to Node)

Besides these unimplementable APIs, it is also just a different environment. There is the trivial: the window object in the browser has a Node.js equivalent, but it’s not named window. Performance is different — Node has long-running processes, the browser might. Node can have blocking calls, which are useful even if you can’t use them at runtime (e.g., require()); but you can’t really have any of these at any time on the browser. And then of course all the system calls, none of which you can use in the browser.

All these may simply be surmountable challenges, through modularity, mocking, abstractions, and so on… but ultimately I think the motivation is lacking: the domain of changing a live-rendered DOM isn’t the same as producing bytes to put onto a socket.

You can work fluidly across client and server!

If anything I think this is dangerous rather than useful. The client and the server are different places, with different expectations. Any vagueness about that boundary is wrong.

It’s wrong from a security perspective, as the security assumptions are nearly opposite on the two platforms. The client trusts itself, and the server trusts itself, and both should hold the other in suspicion (though the client can be more trusting because the browser doesn’t trust the client code).

But it’s also the wrong way to treat HTTP. HTTP is pretty simple until you try to make it simpler. Efforts to make it simpler mostly make it more complicated. HTTP lets you send serialized data back and forth to a server, with a bunch of metadata and other do-dads. And that’s all neat, but you should always be thinking about sending information. And never sharing information. It’s not a fluid boundary, and code that touches HTTP needs to be explicit about it and not pretend it is equivalent to any other non-network operation.

Certainly you don’t need two implementation languages to keep your mind clear. But it doesn’t hurt.

You can do validation the same way on the client and server!

One of the things people frequently bring up is that you can validate data on the client and server using the same code. And of course, what web developer hasn’t been a little frustrated that they have to implement validation twice?

Validation on the client is primarily a user experience concern, where you focus on bringing attention to problems with a form, and helping the user resolve those problems. You may be able to avoid errors entirely with an input method that avoids the problem (e.g., if a you have a slider for a numeric input, you don’t have to worry about the user inputing a non-numeric value).

Once the form is submitted, if you’ve done thorough client-side validation you can also avoid friendly server-side validation. Of course all your client-side validation could be avoided through a malicious client, but you don’t need to give a friendly error message in that case, you can simply bail out with a simple 400 Bad Request error.

At that point there’s not much in common between these two kinds of validation — the client is all user experience, and the server is all data integrity.

You can do server-side Javascript as a fallback for the client!

Writing for clients without Javascript is becoming increasingly less relevant, and if we aren’t there yet, then we’ll certainly get there soon. It’s only a matter of time, the writing is on the wall. Depending on the project you might have to put in workarounds, but we should keep those concerns out of architecture decisions. Maintaining crazy hacks is not worth it. There’s so many terrible hacks that have turned into frameworks, and frameworks that have justified themselves because of the problems they solved that no longer matter… Node.js deserves better than to be one of those.

In Conclusion Or Whatever

I’m not saying Node.js is bad. There are other arguments for it, and you don’t need to make any argument for it if you just feel like using it. It’s fun to do something new. And I’m as optimistic about Javascript as anyone. But this one argument, I do not think it is very good.

Ian Bicking: a blog

March 2011

A Python Web Application Package and Format (we should make one)

2011 03 31

Javascript on the server AND the client is not a big deal

You only need to know one language!

You can work fluidly across client and server!

You can do validation the same way on the client and server!

You can do server-side Javascript as a fallback for the client!

In Conclusion Or Whatever

2011 03 30

Home

About

Archives

Categories

Recent Posts

Recent Comments