Ian Bicking: a blog :: 2007

Doctest for Ruby

Finally, someone wrote a version of doctest for Ruby.

Recently I’ve been writing most of my tests using stand-alone doctest files. It’s a great way to do TDD — mostly because the cognitive load is so low. Also, I write my examples but don’t write my output, then copy the output after visually confirming it is correct. So the basic pattern is:

Figure out what I want to do
Figure out how I want to test it
Automate my conditions
Manually inspect whether the output is correct (i.e., implement and debug)
Copy the output so that in the future the manual process is automated (doctest-mode for Emacs makes this particularly easy)

The result is a really good balance of manual and automated testing, I think giving you the benefit of both processes — the ease of manual testing, and the robustness of automated testing.

Another good thing about doctest is it doesn’t let you hide any boilerplate and setup. If it’s easy to use doctest, it’s probably easy to use the library.

There’s nothing Python-specific about doctest (e.g., doctestjs), so it’s good to see it moving to other languages. Even if the language doesn’t have a REPL, IMHO it’s worth inventing it just for this.

2007 08 23

Javascript
Programming
Python
Ruby

Comments (5)

Permalink

The Shrinking Python Web Framework World

When I was writing the summary of differences between WebOb and other request objects, to remind myself of web frameworks I might have forgotten I went to the WebFrameworks page on the Python wiki.

Looking through that page I’m reminded how many framework options there have been. And I was further reminded of how few relevant options there are now. From all this, there have emerged just a few options: Django, Pylons, TurboGears, Zope. No offense to anyone left out of that list — I know there’s some other actively developed frameworks out there. But frankly they aren’t serious choices; they might be fine internal tools, or interesting experiments, but they are clearly on a different tier (and they all have questionable futures).

And now that TurboGears 2 will be based on Pylons the list looks smaller still.

For a long, long time (longer than most of those frameworks have existed) people have complained about the proliferation of web frameworks in Python. Those of us involved in developing web frameworks in Python haven’t been able to respond all that well. Complaining doesn’t magically lead to solutions, and you can’t just will there to be a single Python web framework. You can work towards that, but that’s what we’ve been doing… mostly people don’t seem to notice. It’s just not an easy thing to work towards; the problem space for a web framework isn’t well defined, its end goal is far more vague than most people immediately realize, and it involves consensus, which makes everything much harder. We said the market would decide, which is kind of a cop out (the market decides through the decisions of developers) but that’s the best answer we had.

But after all this time, it seems clear that we are getting much closer to that goal. If you squint really hard, you can almost imagine we are there. The total list of frameworks only gets longer over time — that’s how open source works — but the list of choices has become quite compact.

How we get to the next level is a little less clear. We’ve gotten this way largely through attrition, but that’s not going to get us any further. I’ll at least assure people that we are discussing this stuff — it’s slow going, but everyone is interested. And if anyone actually wants to do some leg work to move this forward, a lot of the work is actually technical, not political, so don’t be afraid to jump in.

2007 08 21

Programming
Python
Web

Comments (33)

Permalink

WebOb

I’ve have it in my head to extract/rewrite parts of Paste lately. Tempita was one example.

The request and response functions in Paste grew very organically. I wasn’t trying to create a framework, so I studiously avoided anything that might look like a request or response object. I felt that would be stepping on toes or something. Eventually, though, Ben Bangert really wanted a request object for Pylons, and it went in paste.wsgiwrappers. And at a certain point I decided that the class-based access was really just fine, and doing lots of function(environ, ...) was no better than Request(environ).function(...).

So I started WebOb. WebOb has Request, Response, and some exceptions, incorporating the functionality of Paste’s paste.request, paste.response, paste.wsgilib, paste.httpexceptions, and paste.httpheaders. And some extra stuff.

I’ve included a comparison with a few other framework request/response objects. What this doesn’t note, though, is that WebOb has a much larger Request and Response objects. I’ve taken almost all the HTTP headers and mapped them to parsed attributes. So req.if_modified_since returns a datetime object, and req.if_none_match returns a somewhat set-like object, as a few examples. I created a lot of view-like objects for this, representing the canonical form of the information in several other forms (the WSGI request environment, and the status/headers/body of the response).

It’s fairly well tested and includes almost everything I think it should include, but I reserve the right to change the API any way I want until 1.0; this means if you have any opinion on the API I have nothing to stop me from taking your opinions into account.

Oh, and it has docs, really. They may not be the best docs, but they mention most everything and are automatically tested for accuracy. If you just want a sense of the feel, maybe the file-serving example would be a good place to start (though really you’ll only read about the Response object there).

2007 08 18

Programming
Python
Web

Comments (15)

Permalink

DictMixin

Quite some time ago I gave a little presentation on DictMixin at ChiPy. If you haven’t used DictMixin before, it’s a class that implements all the derivative methods of dictionaries so you only have to implement the most minimal set: __getitem__, __setitem__, __delitem__, and keys. It’s a lot better than subclassing dict directly, as you have to implement a lot more, and dict implies a specific kind of storage. With DictMixin you can get the information from anywhere.

I thought of a couple examples, and wrote some doctests for them; I thought satisfying the doctests would itself be the presentation. I’m not sure how it worked; it was a fairly experienced crowd, but the switch from code to test can be disorienting.

One of the examples I used was a filesystem access layer. Representing a filesystem as a dictionary is nothing new, but the simplicity of the representation worked well. Here’s how it works:

An FSDict represents one directory.
The keys are the filenames in the directory.
The values are the contents of the files (strings).
When there is a subdirectory, it is another FSDict instance.
When you assign a dictionary-like object to a key, it creates a FSDict from that object.

Dictionaries have lots of methods, like items(), update(), etc. But using DictMixin you just implement the four methods. First, the setup:

class FSDict(DictMixin):
def __init__(self, path):
self.path = path

Creation of a dictionary is not part of the dictionary interface. This seems a little strange at first, but the dict class interface isn’t the same as the dictionary instance interface. So FSDict.__init__ doesn’t bear any particular relation to dict.__init__.

Now the other methods… in each case, strings and dictionaries (files and directories) are treated differently.

def __getitem__(self, item):
fn = os.path.join(self.path, item)
if not os.path.exists(fn):
raise KeyError("File %s does not exist" % fn)
if os.path.isdir(fn):
return self.__class__(fn)
f = open(fn, 'rb')
c = f.read()
f.close()
return c

Note the use of self.__class__(fn) instead of FSDict(fn). This makes the class subclassable if you retain the FSDict.__init__ signature. This way subclasses will create new instances using the subclass. Note also that KeyError is part of the dictionary interface (an important part!), so we can’t raise IOError.

Now, assignment…

def __setitem__(self, item, value):
if item in self:
del self[item]
fn = os.path.join(self.path, item)
if isinstance(value, str):
f = open(fn, 'wb')
f.write(value)
f.close()
else:
# Assume it is a dictionary
os.mkdir(fn)
f = self[item]
f.update(value)

Note that with subdirectories (represented as nested dictionaries) we let DictMixin.update do all the hard work, and just create an empty directory to be filled.

Deletion…

def __delitem__(self, item):
fn = os.path.join(self.path, item)
if not os.path.exists(fn):
raise KeyError("File %s does not exist" % fn)
if os.path.isdir(fn):
## one way...
self[item].clear()
os.rmdir(fn)
## another way...
#shutil.rmtree(fn)
else:
os.unlink(fn)

Enumeration…

def keys(self):
return os.listdir(self.path)

So, to recursively copy '/foo/bar' to '/dest/path/bar' you do:

FSDict('/dest/path')['bar'] = FSDict('/foo')['bar']

It doesn’t really matter if '/foo/bar' is a directory or file. There’s a number of other clever things that come out of this. I think it’s an example of the power of a closed set — dictionaries are expressable from these four operations, and all the other methods can be derived from there. If you find this interesting, you might want to read the source for DictMixin; it’s only about 95 lines.

My article templating via dict wrappers has some other similar dict tricks.

2007 08 17

Programming
Python

Comments (3)

Permalink

Reflection and Description Of Meaning

After writing my last post I thought I might follow up with a bit of cognitive speculation. Since the first comment was exactly about the issue I was thinking about writing on, I might as well follow up quickly.

Jeff Snell replied:

You parse semantic markup in rich text all the time. When formatting changes, you apply a reason. RFC’s don’t capitalize MUST and SHOULD because the author is thinking in upper-case versus lower-case. They’re putting a strong emphasis on those words. As a reader, you take special notice of those words being formatted that way and immediately recognize that they contain a special importance. So I think that readers do parse writing into semantic markup inside their brains.

Emphasis not added. Wait, bold isn’t emphasis, it’s strong! So sorry, STRONG not added.

I think the reasoning here is flawed, in that it supposes that reflection on how we think is an accurate way of describing how we think.

A few years ago I got interested in cognition for a while and particularly some of the new theories on consciousness. One of the parts that really stuck with me was the difference in how we think about thinking, and how thinking really works (as revealed with timing experiments). That is, our conscious thought (the thinking-about-thinking) happened after the actual thought; we make up reasons for our actions when we’re challenged, but if we aren’t challenged to explain our actions there’s no consciousness at all (of course, you can challenge yourself to explain your reasoning — but you usually won’t). And then we revise history so that our reasoning precedes our decision, but that’s not always very accurate. This gets around the infinite-loop problem, where either there’s always another level of meta-consciousness reasoning about the lower level of consciousness, or there’s a potentially infinite sequence of whys that have to be answered for every decision. And of course sometimes we really do make rational decisions and there are several levels of why answered before we commit. But this is not the most common case, and there’s always a limit to how much reflection we can do. There are always decisions made without conscious consideration — if only to free ourselves to focus on the important decisions.

And so as both a reader and a writer, I think in terms of italic and bold. As a reader and a writer there is of course translation from one form to another. There’s some idea inside of me that I want to get out in my writing, there’s some idea outside of me that I want to understand as a reader. But just because I can describe some intermediate form of semantic meaning, it doesn’t mean that that meaning is actually there. Instead I invent things like "strong" and "emphasis" when I’m asked to decide why I chose a particular text style. But the real decision is intuitive — I map directly from my ideas to words on the page, or vice versa for reading.

Obviously this is not true for all markup. But my intuition as both a reader and a writer about bold and italic is strong enough that I feel confident there’s no intermediary representation. This is not unlike the fact I don’t consider the phonetics of most words (though admittedly I did when trying to spell "phonetics"); common words are opaque tokens that I read in their entirety without consideration of their component letters. And a good reader reads text words without consideration of their vocal equivalents (though as a writer I read my own writing out loud… is that typical? I’m guessing it is). A good reader can of course vocalize if asked, but that doesn’t mean the vocalization is an accurate representation of their original reading experience.

Though it’s kind of an aside, I think the use of MUST and SHOULD in RFCs fits with this theory. By using all caps they emphasize the word over the prose, they make the reader see the words as tokens unique from "must" and "should", with special meanings that are related to but also much more strict than their usual English meaning. The caps are a way of disturbing our natural way of determining meaning because they need a more exact language.

2007 08 14

HTML
Non-technical

Comments (7)

Permalink

Of Microformats and the Semantic Web

I was talking a little with Daniel Krech (author of rdflib) about Semantic Web stuff and microformats and what they all mean. And he was saying that microformats were nice, because you could do something with them, but it would be nice to see that generalized.

By "generalized" I think he meant a general way of expressing arbitrary relationships. As an example, in hCard you can do:

home:
773-555-3821

The hCard specification (itself leaning heavily on vCard) defines tel, type, and there’s a general pattern of what value means. But if you want to describe some new kind of structure, there’s no way to do that really; there’s no marital status format, for instance (which would be useful for a singles search engine, as an example).

So I started thinking: can you really generalize it? And I started to think about Joe Gregorio’s attack of WADL:

Here is the very first example in the WADL specification.

That WADL file is a description of a search interface. But here is how you should really do it. That’s an OpenSearch document, that also describes a search interface.

Q: What’s the difference?

A: A mime-type.

Q: That doesn’t seem like much, does it make a difference?

A: Yes, it makes a big difference. When you get an OpenSearch document there is a whole data model and a set of interactions you know are possible because you read the OpenSearch specification. By reading that spec you know how to construct search queries. When I get a WADL document it might describe anything, from how to construct a search, to the APP, to JEP, to XML-RPC.

…

So when I say the difference is a ‘mime-type’, what I mean is that there is an entire spec somewhere which describes what that document means, and that meaning may include hypertext functionality, ala (X)HTML, XForms, and OpenSearch.

This made me think of shared understanding more than explicit descriptions. OpenSearch, APP, and Atom are very well described, but I think that’s only half of it: they are useful when they describe something that many people already understand.

Digressing slightly, one "semantic markup" ideal that still bugs me is  and  vs.  and . When I compose text I choose to make some words bold and some italic. I have no idea what "strong" and "emphasis" are even supposed to mean. When I’m composing text, I don’t actually know why I choose one or the other. If I sat down and thought about it I’m sure I could come up with a set of rules that describe when bold is appropriate and when italic is appropriate. But that is reflecting on my choice, it is not describing my choice. There is no intermediate semantic meaning between what I am saying and bold and italic. I think in bold and italic. Readers in turn find meaning in the text itself; they do not parse my writing into semantic markup in their brain.

I think there’s some connection between this and the shared understanding that microformats represents, and a more generalized RDF model does not represent. I know what hCard means; not just in an intellectual way, but I can imagine a dozen functional uses of it without hardly trying, and of course I am entirely clear on what contact information means. Moreover, I know what it means without actually figuring out what it means; if you asked me to articulate what contact information means I’d have to think a little, and I’m sure many people would come up with bad answers or be stumped. And yet they all actually understand what it means.

Bringing this back to Joe’s post, if I write something that produces or consumes Atom, Atompub, or OpenSearch, I understand the why of my code. With both WADL and RDF my code is divorced of the why. This isn’t about my personal understanding either; explaining it to me doesn’t serve any purpose, because with any exchange format it has to make sense to many many people to be useful. Even an education campaign won’t fix this: education by description is far inferior to education by doing, and there’s no "doing" to WADL and RDF right now.

That said, what is sufficiently obvious in the future may not be obvious now. Maybe we’ll all get smarter. Maybe someone will pioneer this stuff in a way that is really useful (Facebook?), and grow the public’s intuition about describing relationships in an abstract way. But until then I think microformats are going about this the right way, describing those things that are most easily describable.

2007 08 14

Programming
Web

Comments (7)

Permalink

Atom Publishing Protocol: Atompub

Doing stuff with the Atom Publishing Protocol, I’ve noticed that it goes by two (shortened) names: APP and Atompub. I’d become used to calling it APP, but I’ve decided to make a conscious effort to call it Atompub from now on, and I encourage you all to do the same. You cannot usefully search for "APP", and it’s pronunciation is ambiguous. Atompub is a much better name.

And as long as we’re talking about names, I’ll note that the Cheese Shop is now called PyPI again. I think we are supposed to pronounce it pih-pee, distinct from PyPy which is pie-pie. (Blast, PyPI is down; the Zope guys have been making a static stripped-down mirror for use with Setuptools, over here)

2007 08 12

Python
Web

Comments (3)

Permalink

Defaults & Inheritance

I thought I’d note a way I try to make classes reasonably customizable without creating lots of classes, but letting other people create classes if they want.

Here’s a common technique; I’m going to use a class from WSGIProxy as an example, because that’s where I was about to use this technique when I thought it might make an okay post.

In this example there’s a WSGI application that forwards requests to another HTTP server. There’s different ways to forward requests, depending on what kind of data you want to give the remote server about the original request. One example is Zope’s VirtualHostMonster, which takes requests like /VirtualHostBase/http/example.org:80/rootdir/VirtualHostBase/path — the idea is that the server can then realize that the original request was for http://example.org/path (and should ignore any Host headers), and that Zope is supposed to serve that from the internal path /rootdir/path.

There’s a problem with this particular pattern, because there’s no way to mount, say, /blog onto some Zope /sitename/blog-application path, because there’s no concept like in WSGI or CGI of SCRIPT_NAME — the base path of the request. It only handles the base host. So I didn’t just want to settle on that.

I’m kind of inclined to prefer headers, like X-Script-Name: /blog, X-Forwarded-Server: example.org, etc. But I want to support both forms.

The common way to do this is:

class WSGIProxyApp(object):

def __init__(self, host): ...

def __call__(self, environ, start_response):
# actual application interface...
# Constructs the base request:
request = self.construct_request(environ)
# Uses one of these conventions:
self.update_headers(environ, request)
... do stuff with request ...

def update_headers(self, orig_environ, request):
raise NotImplementedError

class VirtualHostMonsterApp(WSGIProxyApp):

def update_headers(self, orig_environ, request):
request.environ['SCRIPT_NAME'] = (
'/VirtualHostRoot/%(wsgi.scheme)s/%(HTTP_HOST)s/VirtualHostRoot/'
% orig_environ)

class HeaderSetterApp(WSGIProxyApp):

def update_headers(self, orig_environ, request):
request.environ['HTTP_X_SCRIPT_NAME'] = orig_environ['SCRIPT_NAME']
# and so on...

Then you use one of the subclasses depending on your needs. Personally I think this really sucks. For one thing, you may have to determine which class to use based on some configuration parameter, which can get awkward. And you might want to subclass the class to change the functionality some yourself, but you have to subclass both of them. There’s patterns to handle this, with policies and factories and other crap; but it’s not a hard problem, and those patterns are hard solutions to a problem that shouldn’t be hard.

Also, it’s harder to inform people about the options available to them, and somewhat harder to use these classes. So I tend to do something like:

class WSGIProxyApp(object):
default_forwarding_style = 'headers'

def __init__(self, host, forwarding_style=None):
...
if forwarding_style is None:
forwarding_style = self.default_forwarding_style
self.forwarding_style = forwarding_style

def __call__(self, environ, start_response):
...
method = self.forwarding_style
if isinstance(method, str):
method = getattr(self, 'forward_'+self.forwarding_style)
method(environ, request)
...

def forward_headers(self, orig_environ, request): ...
def forward_virtual_host_monster(self, orig_environ, request): ...

This way it’s just a simple parameter to change the style. You can pass in your own function, or use one of the named methods already available. The default_forwarding_style class variable lets you change the default in subclasses. If the default was in the function signature it would be much more awkard to change it, because you’d have to override the method and its signature with just that one change, then delegate back to the superclass method.

2007 08 10

Programming
Python

Comments (1)

Permalink

Opening Python Classes

So, I was reading through comments to despam my old posts before archiving them, and came upon this old reply to this old post of mine which was a reply to this much older post.

I won’t reply to that post much, because it’s mostly… well, not useful to respond to. But people often talk about the wonders of Open Classes in Ruby. For Python people who aren’t familiar with what that means, you can do:

# Somehow acquire SomeClassThatAlreadyExists
class SomeClassThatAlreadyExists
def some_method(blahblahblah)
stuff
end
end

And SomeClassThatAlreadyExists has a some_method added to it (or if that method already exists, then the method is replaced with the new implementation).

In Python when you do this, you’ve defined an entirely new class that just happens to have the name SomeClassThatAlreadyExists. It doesn’t actually effect the original class, and probably will leave you confused because of the two very different classes with the same name. In Ruby when you define a class that already exists, you are extending the class in-place.

You can change Python classes in-place, but there’s no special syntax for it, so people either think you can’t do it, or don’t realize that you are doing the same thing as in Ruby but without the syntactic help. I guess this will be easier with class decorators, but some time ago I also wrote a recipe using normal decorators that looks like this:

@magic_set(SomeClassThatAlreadyExists)
def some_method(self, blahblahblah):
stuff

The only thing that is even slightly magic about the setting is that I look at the first argument of the function to determine if you are adding an instance, class, or static method to an object, and let you add it to classes or instances. It’s really not that magic, even if it is called magicset.

I think with class decorators you could do this:

@extend(SomeClassThatAlreadyExists)
class SomeClassThatAlreadyExists:
def some_method(self, blahblahblah):
stuff

Implemented like this:

def extend(class_to_extend):
def decorator(extending_class):
class_to_extend.__dict__.update(extending_class.__dict__)
return class_to_extend
return decorator

2007 08 08

Programming
Python

Comments (21)

Permalink

XO B4

I recently received a Beta-4 XO laptop. I won’t describe the hardware on the whole, but probably a number of readers here have seen the B2 laptops so I thought I’d write up a quick description of the changes I’ve noticed. If you haven’t seen the XO in person, then the minutia of this post may be boring.

First and most substantially, the CPU, memory, and disk have all been upgraded. It now has 256MB RAM, 1GB of flash disk, and a 433MHz Geode processor. This makes a very significant impact on the speed.

It features a big colored XO on the back. Laptops will get different random combinations of X and O colors, so you can tell one laptop from another. I’m a little disappointed to have coincidentally received an X with the same color as the laptop’s green.

The screen now tilts back a bit further than it used to. It’s now comfortable to have it on a table or my lap, where before I liked to have it higher up. Putting the B2 and B4 side-by-side the change in tilt doesn’t seem significant, but using them it’s quite noticeable.

The antenna ("ears") are now rubber. This is intended to increase its durability when dropped (apparently it can sustain a 1.5 meter drop onto its antenna). Unfortunately along the way the latching mechanism became stiffer, so I don’t let people puzzle out how to open it anymore, it’s requires too much forcing to guess.

The handle is now textured. I never had any problem keeping a grip on it before, but the dots look nice. A cute detail is that around the edge the dots turn into X’s, making little XO figures.

The keyboard has had a few changes. Instead of a slider for the backlight and another slider for the volume, they have been combined into one key with four sensors. The slider that had been used for the backlight is now free to be used by applications. The chat button changed appearances a bit, and it looks like the camera/voice button has been turned into a zoom button. The mouse buttons now have an X on the left button and an O on the right button, to make it easier to refer to them in instructions. The keyboard also is generally more responsive; the spacebar doesn’t seem to have any dead spots anymore, and the keys are more reliable when tapped. It’s still a very small keyboard if you try to touch type, but it’s not impossible (at some point I seem to have lost the ability to hunt and peck, but I can get by).

There are now small white LEDs under the plastic for both the microphone and camera. Whenever these are in use, the light turns on. This is done in hardware as a security measure, so malicious software can’t surreptitiously record things. The plastic around the screen is also now a light color of gray instead of white; from what I understand to make the screen seem higher contrast, I suppose because the white of the plastic could otherwise overpower the white of the screen.

The laptop also came with an LiFePO4 battery, which is lighter and higher capacity than the NiMH batteries used before. The total difference in weight isn’t very noticeable. (Li-Ion batteries haven’t been an option in the XO because of safety concerns.)

The software has had more changes, but that’s an entirely different topic.

2007 08 07

OLPC

Comments Off

Permalink

Ian Bicking: a blog

August 2007

Doctest for Ruby

2007 08 23

The Shrinking Python Web Framework World

2007 08 21

WebOb

2007 08 18

DictMixin

2007 08 17

Reflection and Description Of Meaning

2007 08 14

Of Microformats and the Semantic Web

2007 08 14

Atom Publishing Protocol: Atompub

2007 08 12

Defaults & Inheritance

2007 08 10

Opening Python Classes

2007 08 08

XO B4

2007 08 07

Home

About

Archives

Categories

Recent Posts

Recent Comments