Ian Bicking: a blog :: 2007

Tempita

I mentioned a templating language I put into Paste a while ago, but since then I extracted it into a separate package called Tempita. I think the documentation is fairly complete (it’s a small language), but I’ll describe it shortly here.

I wanted a text-substitution language, because I wanted something to be used to generate Python files, config files, etc. I also didn’t want a complex API, with search paths and components or something that interacts with import machinery, or any of that. string.Template is almost good enough, but not quite.

I started with the idea of something vaguely like Django Templates, though since I didn’t care about more advanced templating features like blocks that didn’t apply to my use cases. You do variable substitution with {{var|filter}}, and there’s no escape character, and that’s about where the similarity ends.

I realized there was no real reason to use anything but {{...}}, so it’s just {{if expr}}, {{endif}}, etc. There’s an escape for arbitrary Python, similar to how Kid does it — you can have blocks of Python code, but the Python code can only prepare variables and functions, it can’t write anything. I think this gives a nice escape for complex logic (for times when you can’t put the logic in a .py file), without the jumbled mish-mash of languages like PHP where you can trully mix functions and output.

Because it allows Python expressions everywhere, special tags don’t seem so necessary. Instead you can just provide functions to do whatever you need. I wrote a couple little ones as a start. There’s a few things that are awkward still, because there’s no way to define a block of template as a function, or pass the output of a block to a function. I haven’t actually needed these yet, but I can imagine needing this (e.g., when creating nested structures).

I wouldn’t suggest using this templating language in a web application, but I think it can be quite helpful for all the cases where you have to generate text and you aren’t writing a web application (e.g., a Framework Component). In my experience the web templating languages tend to be complex to invoke and understand in these contexts (and Buffet unfortunately doesn’t help in my mind, as it’s loading system is so vague).

2007 08 06

Programming
Python

Comments (7)

Permalink

Atompub & OpenID

One of the thinmgs I would like to do is to interact with Atompub (aka Atom Publishing Protocol) stores in Javascript through the browser. Since this effectively the browser itself interacting with the Atompub server, browser-like authentication methods would be nice. But services like Atompub don’t work nicely with the kinds of authentication methods that normal websites use. One of these is OpenID, which is particularly browser-focused.

From the perspective of a client, OpenID basically works like this:

You need to login. You tell the original server what your OpenID URL is, somehow.
The original server does some redirects, maybe some popups, etc.
Your OpenID server (attached to your OpenID URL) authenticates you in some fashion, and then tells the original server.
The original server probably sets a signed cookie so that in subsequent requests you stay logged in. You cannot do this little redirection dance for every request, since it’s actually quite intrusive.

So what happens when I have an XMLHttpRequest that needs to be authenticated? Neither the XMLHttpRequest nor Javascript generally can do the authentication. Only the browser can, with the user’s interaction.

One thought I have is a 401 Unauthorized response, with a header like:

WWW-Authenticate: Cookie location="http://original.server/login.html"

Which means I need to open up http://original.server/login.html and have the user log in, and the final result is that a cookie will be set. XMLHttpRequest sends cookies automatically I believe, so once the browser has the cookie then all the Javascript requests get the same cookie and hence authentication.

One problem, though, is that you have to wait around for a while for the login to succede, then continue on your way. A typical situation is that you have to return to the original page you were requesting, and people often do something like /login?redirect_to=original_url. In this case we might want something like /login?opener_call=reattempt_request, where when the login process is over we call window.opener.reattempt_request() in Javascript.

Maybe it would make sense for that location variable to be a URI Template, with some predefined variables, like opener, back, etc.

For general backward compatibility, would it be reasonable to send 307 Temporary Redirect plus WWW-Authenticate, and let XMLHttpRequests or other service clients sort it out, while normal browser requests do the normal login redirect?

Update: Another question/thought: is it okay to send multiple WWW-Authenticate headers, to give the client options for how it wants to do authentication? It seems vaguely okay, according to RFC 2616 14.47.

2007 08 06

Javascript
Programming
Web

Comments (8)

Permalink

Zonbu & S3

I read Edd Dumbill’s post on the Zonbu computer with interest. The Zonbu is a small and inexpensive computer, reminiscent of the Mac Mini but running Linux. The disk is fairly small (4Gb flash) and is intended to serve more as a cache for your network storage than as your primary store.

The network store is a frontend on Amazon S3. This is interesting but confusing, because Zonbu is selling the computer at a price of $99 if you agree to a two year contract for storage at $12.95 a month (about $300 over two years).

The underlying S3 storage is pretty cheap: $0.15 per Gb-month, and $0.10/$0.18 per Gb-upload/download (discounts for higher quantities, which probably Zonbu can get but an individual user couldn’t). So if you are storing, say, 10Gb of data, and retrieving about 10Gb per month (including all the syncing, cache misses, etc), that comes to about $3 per month. Zonbu costs between $0.50 and $0.20 per Gb-month, depending on the plan, and you pay for capacity, not what you actually use (S3 only charges for what you really use). I assume there are bandwidth limits but they aren’t published.

As an aside, I was looking for backup systems for my dad a few months ago, and looked at some of the backup systems that included network storage. They were often in the range of $10-20 per month, and weren’t very high capacity. I came upon S3 Backup, which is a fairly simple Windows program to upload to S3. The price of S3 is way better than any of the other commercial solutions. The billing and account setup isn’t as simple as other systems (since it’s not intended to be), but this seems like something that should be fixed. There should be a consumer version of S3. It could make it easier for software developers to make services for people without actually having to maintain infrastructure. Or maybe more accurately, it would make this possible for open source developers, since we have no interest in being the intermediary for anything as that’s all liability with no payoff. (Or maybe it’s the opposite — only by being an intermediary can you get payoff? The economics of open source get confusing.)

Zonbu, as a device and company, appeals to me. But I can’t help but feel frustrated about the network storage pricing, even though those prices are completely reasonable (and it seems without draconian cancellation fees like mobile phones). Still there’s something about the equation that I just hate — loss leaders, unnecessarily intermediated transactions, hidden costs, and a price structure that depends on people not fully utilizing what they pay for. And I really like the S3 pricing — you pay for what you use and the pricing is completely transparent. What I like about it is that at no point is Amazon expecting you to act irrationally, and for Amazon to profit from your irrational choices. They aren’t expecting you to reserve more than you need. They aren’t going to punish you if you don’t reserve enough.

Another part of why I like S3′s structure is that Amazon (well, Amazon Web Services) owns this particular space in terms of services, and it’s not because of advertising or because they cornered the market or used proprietary anything to restrict choices or made secret business deals with anyone. They simply are providing a service with enough quality and efficiency that no one else can compete (at least at the moment). When quality and efficiency drives market choices it makes me feel all fuzzy and capitalist. This happens infrequently enough that perhaps I get a little overly excitable about resellers with different price structures.

2007 08 04

Non-technical

Comments (12)

Permalink

Fast CGI that isn’t FastCGI

There’s a bunch of techniques for doing deployments of long-running processes (Zope, Python server, Rails, etc). A pretty good technique is to do HTTP proxying. There’s some details and conventions I’d like to see for HTTP, but that’s not my concern here.

HTTP proxying isn’t great for commodity hosting. Mostly you need to set up a new long-running process, and commodity hosts don’t make that easy or reliable. FastCGI offers one solution to that, essentially putting the process management into Apache or whatever web server you are using.

The problem with FastCGI is that it is finicky. There’s lots of configuration parameters, lots of parts don’t work right, and there seems to be a golden path where things actually work but it’s hard to know exactly what that is.

Another technique that has been used in the past instead of FastCGI is a very small CGI script. One example in SCGI is called cgi2scgi. This small script is fast to run (it compiles to 12kb), and all it does is take the CGI request and turn it into a SCGI request to a long-running server.

This is a nice start, and easy to deploy, except it doesn’t handle long-running processes. A great feature to add to something like this would be simple process management. I imagine something where if the socket (named or a port) that the cgi2scgi script connects to isn’t up or working, it runs a script that will start the server. If another request comes in while the server is starting up, it shouldn’t try to start the server twice. If the server is randomly killed (as is common on commodity hosters) then the next request will try to bring the server up.

Unlike FastCGI, this won’t try to handle different process models or anything fancy. It’s up to the startup script to set everything up properly, start multiple worker processes if necessary, etc. There’s probably some tricky details I haven’t thought of, and it’s slightly annoying to write all this in C (but necessary, since it’s part of the CGI script, which must be small). But I think it can be done better than existing in-the-wild FastCGI implementations.

And when we’re done, I think we could have something that would be a really good basis for commodity hosting of a whole bunch of non-PHP frameworks. You can distribute the Linux binaries, as all the Commodity Hosts That Matter can run those (even the BSD ones should be fine). Easy application installation practically falls right out of that.

2007 08 03

Programming
Web

Comments (7)

Permalink

Pronouncing “Django”

I’m not saying this to anyone in particular, but I’ve heard people pronounce Django incorrectly way too often. The "dj" in Django is a hard J, like in the word "jury" or "jolly". You don’t pronounce the D.

Update: Alex Limi tells me I’m wrong too, and it’s a soft J, like… damn, I can’t think of a word that uses a soft J in English.

I’m not sure I can use that pronunciation, I’m afraid I’ll sound all Frenchy and weird. I’ll give it a go. Zhango zhango zhango… hmm…

Another update: confirming my original pronunciation, Adrian says it is a hard J. Alex is just too European for his own good. Does the debate rage on? Hopefully not.

2007 08 02

Programming
Python

Comments (31)

Permalink

Environmental Theater

If you read Bruce Schneier, as any good geek should, you probably are familiar with the term "security theater": measures that provide the feeling of security while doing little or nothing to actually provide security.

OK, digression. We had this recycling program in Chicago where we put our recyclables in blue bags into the trash, and they pick the blue bags out of the trash. One imagines fancy computerized systems. In reality I think there’s just some people who watch trash go by on a
conveyor belt.

This all seemed fishy, but I hate waste on principle so I would dutifully recycled my trash, washed out containers, all that stuff. You’d sometimes hear an environmentalist criticize the program because there was little perceived benefit, and so people didn’t actually recycle much. The system seemed a little improbable to me too, but then I also realized that recycling is a balance and it’s easy to put more effort into recycling programs than is saved through the recycling itself. So maybe this was efficient, all things considered.

Then I learned that actually only 8% of recycling in blue bags is recovered. 92% of the time when I clean things out and put them carefully in their own container, I might as well have just thrown them away. This really pissed me off, because it made it obvious that there never was an honest attempt to reduce waste through recycling. Blue bags were just what they would give people to make them stop complaining about recycling.

The irony is that the environmentalists didn’t complain about the recovery rates (which always were estimated at a low amount). They complained about how many people were recycling. Of course with a recovery rate that low it didn’t matter how many people were recycling. The entire program was a total farce. Now that the program is going away there doesn’t seem to be much anger about how deceptive the program was, and I don’t know if anyone is paying attention to the actual environmental impact of the new program.

Even if they recover the recycling it might still just be a game. Recycling is filled with farce. Metal recycling is great. That’s why there’s trucks that roam the alleys around Chicago looking for scrap metal. There’s a market and someone is willing to pay for the results. There’s not much of a market for anything else; maybe some glass, maybe a little plastic.

People actually get angry when recycling programs restrict the plastics they will take. It doesn’t occur to them that some plastics are simply garbage. They are worthless, and moving them around in special recycling containers just wastes everyone’s time. They are angry because they want to pretend they aren’t being wasteful. They aren’t getting enough environmental theater.

A more concerning kind of environmental theater is ethanol. With an EROI (energy invested vs. energy produced) that hovers just above one, it’s not helping the environment. Biofuels on the whole seem quite questionable. Brazil has more efficient ethanol, but it’s paired with deforestation. A similar thing happens when trees for palm oil replace natural forests. And of course in all these cases, if plants weren’t grown for fuel then plants would be grown for some other purpose. So I can’t really see any advantage in terms of CO2 emissions — and when you consider the relative inefficiency compared to attaining fossil fuels, the net effect of biofuels is probably worse.

Now that environmental concern is mainstream I think we need to be on the watch for environmental theater. Many of the people who play their parts in this theater are well meaning, which can make it awkward. These are people who believe that The Important Thing Is To Raise Awareness. But awareness has been raised, so the time for that kind of bullshit is past. Lying about solutions, exaggerating specific problems, being fuzzy about facts — that’s always been bullshit, and I’ve never found it acceptable. But it’s unfortunately become the norm among advocates of all sorts in these times. The irony is that the advocacy has been done, the case has been made, enough people are convinced, but it may be hard to move beyond the theater to meaningful action. Especially as the well-meaning people are replaced with cynics out to make money.

2007 08 02

Non-technical
Politics

Comments (19)

Permalink

Atom Models

I’ve been doing a bit more with Atom lately.

First, I started writing a library to manipulate Atom feeds and entries. For the moment this is located in atom.py. It uses lxml, as does everything markup related I do these days.

I came upon a revelation of sorts when I was writing the library. I first started with a library that looked like this:

class Feed(object):
def __init__(self, title, ...):
self.title = title
..
@classmethod
def parse(cls, xml):
if isinstance(xml, basestring):
xml = etree.XML(xml)
title = xml.xpath('//title').text
...
return cls(title, ...)
def serialize(self):
el = etree.Element('{%s}feed' % atom_ns)
title = etree.Element('{%s}title' % atom_ns)
title.text = self.title
el.append(title)
...
return el

Obviously there’s ways to improve this and make it less verbose, and I went down that path for a while. But then I decided the whole path was wrong. Atom is XML. It’s not the representation of some object I’m creating. If I have something that can’t be represented in XML, it isn’t Atom, and it doesn’t belong in my Atom-related objects.

So instead I started making lxml more convenient when using Atom. I don’t keep any information except what is in the markup, I just make it more convenient to access that information.

I used lots of descriptors to do this, as the same patterns happened over and over. For instance, the Feed object is fairly simple:

class Feed(AtomElement):
entries = _findall_property('entry')
author = _element_property('author')

Which basically means that feed.entries returns all <entry> elements, and feed.author returns the single author element.

There’s also accessors for text elements (like <id>) and date containing elements (like <updated>) and just to access XML attributes as Python attributes.

There’s a number of advantages:

No hidden state.
No deferred errors, since everything is always represented in the XML infoset.
All XML extensions work, even though my classes don’t know anything in particular about them. There’s a full API for manipulating the XML that you can use, you don’t have to use my APIs.
Even more obscure kinds of extensions work fine, like a custom attribute on an element. There’s absolutely zero normalization that happens.
I only have to write the parts where the normal XML (lxml) APIs are inconvenient, so the implementation stays simple.
There’s no confusion over which object I might be talking about in my code. There’s no distinction between the XML object and the domain object.

Since then I’ve been working on a Javascript library for handling Atom. It’s not as elegant. I am trying to keep to this same principle, but of course I can’t actually extend the DOM and so I can’t add convenience methods. So instead I’m making a class that lightly wraps the DOM objects, with explicit getters and setters that simply read and modify those DOM objects.

One thing that I have found very useful in my development on the Javascript side is doctest-style testing. You can see the test, but to run it you have to check it out (it uses some svn:externals which you don’t get through the direct svn access). After using that testing some more and being pleased with the result, I decided to package the Javascript doctest runner a bit better. I removed the framework dependencies, did a bit of renaming (now it is doctestjs or doctest.js instead of jsdoctest), wrote up fairly comprehensive docs, and uploaded it to JSAN (though at the moment the trunk from svn is probably better to use). I think it’s an excellent way of doing unit testing in Javascript, much better than any of the alternatives I’ve seen. It even has some notable advantages over Python’s doctest, like if you are using Firebug (which you must if you do Javascript development) you get a console session that runs in the same namespace as your tests, so you can easily do inspection of the objects if there’s a failure.

I’m not sure about JSAN. It’s nice to have an index. But I think they copy stuff from CPAN a bit too much. Why should you have a text README file? That’s just silly; of course Javascript documentation should be HTML. They batch processing. Processing one package a day
on the fly shouldn’t be overwhelming. They want a MANIFEST file. The standard metadata file is YAML, not JSON. This should all be a little more Javascripty in my opinion. But they also accept any kind of upload, so there’s nothing stopping you from ignoring what you don’t
care about. I’ll probably improve the packaging of doctestjs a bit in the future, and still ignore the parts I think are silly.

2007 08 02

Javascript
Programming
Python

Comments (8)

Permalink

Old Archives

2007 08 01

Programming

Comments Off

Permalink

New Blog Software

I’ve switched my software over to WordPress. This was long overdue, as anyone who ever wanted to read anything at all on this site probably knows. Sometime I should really write an article reflecting on the failures of my previous blog software. Lets just say that flat files aren’t so hot either.

Now that my software doesn’t suck, I have lots of posts I have been embarrassed to write because every new post potentially introduced new people to my crappy site.

Hopefully everything is setup correctly, redirects, archives, and the new feed.

My one worry is WordPress comments, which suck a bit. They shouldn’t collect the horrible quantity of spam that the old site has, so that’s good, but I hate disconnected streams of comments. I’ve tried to modify the theme on this site to be more roomy, with less of the excessive whitespace that has become the norm. I hope this whitespace kick goes the way of Creating Killer Websites Using Table Based Layout. I.e., it’ll soon look dated and everyone will move on. So I hope you’ll have more than two inches of width to comment in. Honestly I wonder if I should just ditch WordPress comments and use something else entirely, like some kind of forum software and rig in some way of including the comments in the theme. I wanted to install threaded comments, but the installation process is rather invasive.

For editing I turned TinyMCE off (ugh), and installed a restructured text plugin. It took a while to figure out, since I have to include .. -*- mode: rst -*- in the header of each post. Oh well, a minor inconvenience. I used Text Control to enable Markdown in comments, but I had to replace the actual markdown.php it used, which was broken.

2007 08 01

Non-technical

Ian Bicking: a blog

August 2007

Tempita

2007 08 06

Atompub & OpenID

2007 08 06

Zonbu & S3

2007 08 04

Fast CGI that isn’t FastCGI

2007 08 03

Pronouncing “Django”

2007 08 02

Environmental Theater

2007 08 02

Atom Models

2007 08 02

Old Archives

2007 08 01

New Blog Software

2007 08 01

Home

About

Archives

Categories

Recent Posts

Recent Comments