We can answer on question what is VPS? and what is cheap dedicated servers?

Javascript

Why doctest.js is better than Python’s doctest

I’ve been trying, not too successfully I’m afraid, to get more people to use doctest.js. There’s probably a few reasons people don’t. They are all wrong! Doctest.js is the best!

One issue in particular is that people (especially people in my Python-biased circles) are perhaps thrown off by Python’s doctest. I think Python’s doctest is pretty nice, I enjoy testing with it, but there’s no question that it has a lot of problems. I’ve even thought about trying to fix doctest, and even made a repository, but I only really got as far as creating a list of issues I’d like to fix. But, like so many before me, I never actually made those fixes. Doctest has, in its life, only really had a single period of improvement (in the time leading to Python 2.4). That’s not a recipe for success.

Of course doctest.js takes inspiration from Python’s doctest, but I wrote it as a real test environment, not for a minimal use case. In the process I fixed a bunch of issues with doctest, and in places Javascript has also provided helpful usability.

Some issues:

Doctest.js output is predictable

The classic pitfall of Python’s doctest is printing a dictionary:


>>> print {"one": 1, "two": 2}
{'two': 2, 'one': 1}
 

The print order of a dictionary is arbitrary, based on a hash algorithm that can change, or mix things up as items are added or removed. And to make it worse, the output usually stable, such that you can write tests that unexpectibly fragile. But there’s no reason why dict.__repr__ must use an arbitrary order. Personally I take it as a bit of unfortunate laziness.

If doctest had used pprint for all of its printing it would have helped some. But not enough, because this kind of code is fairly common:


def __repr__(self):
    return '<ThisClass attr=%r>' % self.attr
 

and that %r invokes a repr() that cannot be overridden.

In doctest.js I always try to make output predictable. One reason this is fairly easy is that there’s nothing like repr() in Javascript, so doctest.js has its own implementation. It’s like I started with pprint and no other notion existed.

Good matching

In addition to unpredictable output, there’s also just hard-to-match output. Output might contain blank lines, for instance, and Python’s doctest requires a very ugly <BLANKLINE> token to handle that. Whitespace might not be normalized. Maybe there’s boring output. Maybe there’s just a volatile item like a timestamp.

Doctest.js includes, by default, ellipsis: ... matches any length of text. But it also includes another wildcard, ?, which matches just one number or word. This avoids cases when the use of ... swallows up too much when you just wanted to get a single word.

Also doctest.js doesn’t use ... for other purposes. In Python’s doctest ...` is used for continuation lines, meaning you can’t just ignore output, like:


>>> print who_knows_what_this_returns()
...
 

Or even worse, you can’t ignore the beginning of an item:


>>> print some_request
...
X-Some-Header: foo
...
 

The way I prefer to use doctest.js it doesn’t have any continuation line symbol (but if there is one, it’s >).

Also doctest.js normalizes whitespace, normalizes " and ', and just generally tries to be reasonable.

Doctest.js tests are plain Javascript

Not many editors know how to syntax highlight and check doctests, with their >>> in front of each line and so forth. And the whole thing is tweaky, you need to use a continuation (...) on some lines, and start statements with >>>. It’s an awkward way to compose.

Doctest.js started out with the same notion, though with different symbols ($ and >). But recently with the rise of a number of excellent parsers (I used Esprima) I’ve moved my own tests to another pattern:


print(something())
// => expected output
 

This is already a fairly common way to write examples. Like how you may have read pre-Python pseudocode and thought: that looks like Python!: doctest.js looks like example pseudocode.

Doctest.js tests are self-describing

Python’s doctest has some options, some important options that effect the semantics of the test, that you can only turn on in the runner. The most important option is ELLIPSIS. Either your test was written to use ELLIPSIS or it wasn’t – that a test can’t self-describe its requirements means that test running is fragile.

I made the hackiest package ever to get around this in Python, but it’s hacky and lame.

Exception handling isn’t special

Python’s doctest treats exceptions differently from other output. So if you print something before the exception, it is thrown away, never to be seen. And you can’t use some of the same matching techniques.

Doctest.js just prints out exceptions, and it’s matched like anything else.

This particular case is one of several places where it feels like Python’s doctest is just being obstinate. Doing it the right way isn’t harder. Python’s doctest makes debugging exception cases really hard.

Doctest.js has a concept of "abort"

I’m actually pretty okay with Python doctest’s notion that you just run all the tests, even when one fails. Getting too many failures is a bit of a nuisance, but it’s not that bad. But there’s no way to just give up, and there needs to be. If you are relying on something to be importable, or some service to be available, there’s no point in going on with the tests.

Doctest.js lets you call Abort() and further tests are cancelled.

Distinguishing between debugging output and deliberate output

Maybe it’s my own fault for being a programming troglodite, but I use a lot of print for debugging. This becomes a real problem with Python’s doctest, as it tracks all that printing and it causes tests to fail.

Javascript has something specifically for printing debugging output: console.log(). Doctest.js doesn’t mess with that, it adds a new function print(). Only stuff that is printed (not logged) is treated as expected output. It’s like console.log() goes to stderr and print() goes to stdout.

Doctest.js also forces the developer to print everything they care about. For better or worse Javascript has many more expressions than Python (including assignments), so looking at the result of an expression isn’t a good clue for whether you care about the result of an expression. I’m not sure this is better, but it’s part of the difference.

Doctest.js also groups your printed statements according to the example you are in (an example being a block of code and an expected output). This is much more helpful than watching a giant stream of output go to the console (the browser console or terminal).

Doctest.js handles async code

This admittedly isn’t that big a deal for Python, but for Javascript it is a real problem. Not a problem for doctest.js in particular, but a problem for any Javascript test framework. You want to test return values, but lots of functions don’t "return", instead they call some callback or create some kind of promise object, and you have to test for side effects.

Doctest.js I think has a really great answer for this, which is not so much to say that Python’s doctest is so much worse, but in the context of Javascript doctest.js has something really useful and unique. If callback-driven async code had ever been very popular in Python then this sort of feature would be nice there too.

The browser is a great environment

A lot of where doctest.js is much better than Python’s doctest is simply that it has a much more powerful environment for displaying results. It can highlight failed or passing tests. When there’s a wildcard in expected output, it can show the actual output without adding any particular extra distraction. It can group console messages with the tests they go with. It can show both a simple failure message, and a detailed line-by-line comparison. All these details make it easy to identify what went wrong and fix it. The browser gives a rich and navigable interface.

I’d like to get doctest.js working well on Node.js (right now it works, but is not appealing), but I just can’t bring myself to give up the browser. I have to figure out a good hybrid.

Python’s doctest lacks a champion

This is ultimately the reason Python’s doctest has all these problems: no one cares about it, no one feels responsible for it, and no one feels empowered to make improvements to it. And to make things worse there is a cadre of people that will respond to suggestions with their own criticisms that doctest should never be used beyond its original niche, that it’s constraints are features.

Doctest is still great

I’m ragging on Python’s doctest only because I love it. I wish it was better, and I made doctest.js in a way I wish Python’s doctest was made. Doctest, and more generally example/expectation oriented code, is a great way to explain things, to make tests readable, to make test-driven development feasible, to create an environment that errs on the side of over-testing instead of under-testing, and to make failures and resolutions symmetric. It’s still vastly superior to BDD, avoiding all BDD’s aping of readability while still embracing the sense of test-as-narrative.

But, more to the point: use doctest.js, read the tutorial, or try it in the browser. I swear, it’s really nice to use.

Javascript
Mozilla
Programming
Python

Comments (19)

Permalink

Javascript on the server AND the client is not a big deal

All the cool kids love Node.js. I’ve used it a little, and it’s fine; I was able to do what I wanted to do, and it wasn’t particularly painful. It’s fun to use something new, and it’s relatively straight-forward to get started so it’s an emotionally satisfying experience.

There are several reasons you might want to use Node.js, and I’ll ignore many of them, but I want to talk about one in particular:

Javascript on the client and the server!

Is this such a great feature? I think not…

You only need to know one language!

Sure. Yay ignorance! But really, this is fine but unlikely to be relevant to any current potential audience for Node.js. If you are shooting for an very-easy-to-learn client-server programming system, Node.js isn’t it. Maybe Couch or something similar has that potential? But I digress.

It’s not easy to have expertise at multiple languages. But it’s not that hard. It’s considerably harder to have expertise at multiple platforms. Node.js gives you one language across client and server, but not one platform. Node.js programming doesn’t feel like the browser environment. They do adopt many conventions when it’s reasonable, but even then it’s not always the case — in particular because many browser APIs are the awkward product of C++ programmers exposing things to Javascript, and you don’t want to reproduce those same APIs if you don’t have to (and Node.js doesn’t have to!) — an example is the event pattern in Node, which is similar to a browser but less obtuse.

You get to share libraries!

First: the same set of libraries is probably not applicable. If you can do it on the client then you probably don’t have to do it on the server, and vice versa.

But sometimes the same libraries are useful. Can you really share them? Browser libraries are often hard to use elsewhere because they rely on browser APIs. These APIs are frequently impossible to implement in Javascript.

Actually they are possible to implement in Javascript using Proxies (or maybe some other new and not-yet-standard Javascript features). But not in Node.js, which uses V8, and V8 is a pretty conservative implementation of the Javascript language. (Update: it is noted that you can implement proxies — in this case a C++ extension to Node)

Besides these unimplementable APIs, it is also just a different environment. There is the trivial: the window object in the browser has a Node.js equivalent, but it’s not named window. Performance is different — Node has long-running processes, the browser might. Node can have blocking calls, which are useful even if you can’t use them at runtime (e.g., require()); but you can’t really have any of these at any time on the browser. And then of course all the system calls, none of which you can use in the browser.

All these may simply be surmountable challenges, through modularity, mocking, abstractions, and so on… but ultimately I think the motivation is lacking: the domain of changing a live-rendered DOM isn’t the same as producing bytes to put onto a socket.

You can work fluidly across client and server!

If anything I think this is dangerous rather than useful. The client and the server are different places, with different expectations. Any vagueness about that boundary is wrong.

It’s wrong from a security perspective, as the security assumptions are nearly opposite on the two platforms. The client trusts itself, and the server trusts itself, and both should hold the other in suspicion (though the client can be more trusting because the browser doesn’t trust the client code).

But it’s also the wrong way to treat HTTP. HTTP is pretty simple until you try to make it simpler. Efforts to make it simpler mostly make it more complicated. HTTP lets you send serialized data back and forth to a server, with a bunch of metadata and other do-dads. And that’s all neat, but you should always be thinking about sending information. And never sharing information. It’s not a fluid boundary, and code that touches HTTP needs to be explicit about it and not pretend it is equivalent to any other non-network operation.

Certainly you don’t need two implementation languages to keep your mind clear. But it doesn’t hurt.

You can do validation the same way on the client and server!

One of the things people frequently bring up is that you can validate data on the client and server using the same code. And of course, what web developer hasn’t been a little frustrated that they have to implement validation twice?

Validation on the client is primarily a user experience concern, where you focus on bringing attention to problems with a form, and helping the user resolve those problems. You may be able to avoid errors entirely with an input method that avoids the problem (e.g., if a you have a slider for a numeric input, you don’t have to worry about the user inputing a non-numeric value).

Once the form is submitted, if you’ve done thorough client-side validation you can also avoid friendly server-side validation. Of course all your client-side validation could be avoided through a malicious client, but you don’t need to give a friendly error message in that case, you can simply bail out with a simple 400 Bad Request error.

At that point there’s not much in common between these two kinds of validation — the client is all user experience, and the server is all data integrity.

You can do server-side Javascript as a fallback for the client!

Writing for clients without Javascript is becoming increasingly less relevant, and if we aren’t there yet, then we’ll certainly get there soon. It’s only a matter of time, the writing is on the wall. Depending on the project you might have to put in workarounds, but we should keep those concerns out of architecture decisions. Maintaining crazy hacks is not worth it. There’s so many terrible hacks that have turned into frameworks, and frameworks that have justified themselves because of the problems they solved that no longer matter… Node.js deserves better than to be one of those.

In Conclusion Or Whatever

I’m not saying Node.js is bad. There are other arguments for it, and you don’t need to make any argument for it if you just feel like using it. It’s fun to do something new. And I’m as optimistic about Javascript as anyone. But this one argument, I do not think it is very good.

Javascript
Programming
Web

Comments (29)

Permalink

Doctest.js & Callbacks

Many years ago I wrote a fairly straight-forward port of Python’s doctest to Javascript. I thought it was cool, but I didn’t really talk about it that much. Especially because I knew it had one fatal flaw: it was very unfriendly towards programming with callbacks, and Javascript uses a lot of callbacks.

On a recent flight I decided to look at it again, and realized fixing that one flaw wasn’t actually a big deal. So now doctest.js really works. And I think it works well: doctest.js.

I have yet to really use doctest.js on more than a couple real cases, and as I do (or you do?) I expect to tweak it more to make it flow well. But having tried a couple of examples I am particularly liking how it can be used with callbacks.

Testing with callbacks is generally a tricky thing. You want to make assertions, but they happen entirely separately from the test runner’s own loop, and your callbacks may not run at all if there’s a failure.

I came upon some tests recently that used Jasmine, a BDD-style test framework. I’m not a big fan of BDD but I’m fairly new to serious Javascript development so I’m trying to withhold judgement. The flow of the tests is a bit peculiar until you realize that it’s for async reasons. I’ll try to show something that roughly approximates a real test of an XMLHttpRequest API call:


it("should give us no results", function() {
  runs(function () {
    var callback = createSpy('callback for results');
    $.ajax({
      url: '/search',
      data: {q: "query unlikely to match anything"},
      dataType: "json",
      success: callback
    });
  });
  waits(someTimeout);
  runs(function () {
    expect(callback).toHaveBeenCalled();
    expect(callback.mostRecentCall.args[0].length).toEqual(0);
  });
});
 

So, the basic pattern is it() creates a group of tests, and each call to run() is a set of items to call sequentially. Then between these run blocks you can have signals to the runner to wait for some result, either a timeout (which is fragile), or you can setup specific conditions.

Another popular test runner is QUnit; it’s popular particularly because it’s what jQuery uses, and my own impression is that QUnit is just very simple and so least likely to piss you off.

QUnit has its own style for async:


test("should give us no results", function () {
  stop();
  expect(1);
  $.ajax({
    url: '/search',
    data: {q: "query unlikely to match anything"},
    dataType: "json",
    success: function (result) {
      ok(result.length == 0, 'No results');
      start();
    }
  });
});
 

stop() confused me for a bit until I realized what they were really referring to stopping the test runner; of course the function continues on regardless. What will happen is that the function will return, but nothing will have really been tested — the success callback will not have been run, and cannot run until all Javascript execution stops and control is given back to the browser. So the test runner will use setTimeout to let time pass before the test continues. In this case it will continue once start() is called. And expect() also makes it fail if it didn’t get at least one assertion during that interval — it would otherwise be easy to simply miss an assertion (though in this example it would be okay because if the success callback isn’t invoked then start() will never be called, and the runner will timeout and signal that as a failure).

So… now for doctest.js. Note that doctest.js isn’t "plain" Javascript, it looks like what an interactive Javascript session might look like (I’ve used shell-style prompts instead of typical console prompts, because the consoles didn’t exist when first I wrote this, and because >>>/... kind of annoy me anyway).


$ success = Spy('success', {writes: true});
$ $.ajax({
>   url: '/search',
>   data: {q: "query unlikely to match anything"},
>   dataType: "json",
>   success: success.func
> });
$ success.wait();
success([])
 

With doctest.js you still get a fairly linear feel — it’s similar to how Jasmine works, except every $ prompt is potentially a place where the loop can be released so something async can happen. Each prompt is equivalent to run() (though unless you call wait, directly or indirectly, everything will run in sequence).

There’s also an implicit assertion for each stanza, which is anything that is written must be matched ({writes: true} makes the spy/mock object write out any invocations). This makes it much harder to miss something in your tests.

Update: just for the record, doctest has changed some, and while that example still works, this would be the "right" way to do it now:


$.ajax({
  url: '/search',
  data: {q: "query unlikely to match anything"},
  dataType: "json",
  success: Spy("search.success", {wait: true, ignoreThis: true})
});
// => search.success([])
 

There is a new format that I now prefer with plain Javascript and "expected output" in comments. Spy("search.success", {wait: true, ignoreThis: true}) causes the test to wait on the Spy immediately (though the same pattern as before is also possible and sometimes preferable), and in all likelihood jQuery will set this to something we don’t care about, so ignoreThis: true keeps it from being printed. (Or maybe you are interested in it, in which case you’d leave that out)

Anyway, back to the original conclusion (update over)…

I’ve never actually found Python’s doctest to be a particularly good way to write docs, and I don’t expect any different from doctest.js, but I find it a very nice way to write and run tests… and while Python’s doctest is essentially abandoned and lacks many features to make it a more humane testing environment, maybe doctest.js can do better.

Javascript
Programming
Testing
Web

Comments (3)

Permalink

Javascript Status Message Display

In a little wiki I’ve been playing with I’ve been trying out little ideas that I’ve had but haven’t had a place to actually implement them. One is how notification messages work. I’m sure other people have done the same thing, but I thought I’d describe it anyway.

A common pattern is to accept a POST request and then redirect the user to some page, setting a status message. Typically the status message is either set in a cookie or in the session, then the standard template for the application has some code to check for a message and display it.

The problem with this is that this breaks all caching — at any time any page can have some message injected into it, basically for no reason at all. So I thought: why not do the whole thing in Javascript? The server will set a cookie, but only Javascript will read it.

The code goes like this; on the server (easily translated into any framework):


resp.set_cookie('flash_message', urllib.quote(msg))
 

I quote the message because it can contain characters unsafe for cookies, and URL quoting is a particularly easy quoting to apply.

Then I have this Javascript (using jQuery):


$(function () {
    // Anything in $(function...) is run on page load
    var flashMsg = readCookie('flash_message');
    if (flashMsg) {
        flashMsg = unescape(flashMsg);
        var el = $('<div id="flash-message">'+
          '<div id="flash-message-close">'+
          '<a title="dismiss this message" '+
          'id="flash-message-button" href="#">X</a></div>'+
          flashMsg + '</div>');
        $('a#flash-message-button', el).bind(
          'click', function () {
            $(this.parentNode.parentNode).remove();
        });
        $('#body').prepend(el);
        eraseCookie('flash_message');
    }
});
 

Note that I’ve decided to treat the flash message as HTML. I don’t see a strong risk of injection attack in this case, though I must admit I’m a little unclear about what the normal policies are for cross-domain cookie setting.

I use these cookie functions because oddly I can’t find cookie handling functions in jQuery. It’s always weird to me how primitive document.cookie is. Anyway, CSS looks like this:


#flash-message {
  margin: 0.5em;
  border: 2px solid #000;
  background-color: #9f9;
  -moz-border-radius: 4px;
  text-align: center;
}

#flash-message-close {
  float: right;
  font-size: 70%;
  margin: 2px;
}

a#flash-message-button {
  text-decoration: none;
  color: #000;
  border: 1px solid #9f9;
}

a#flash-message-button:hover {
  border: 1px solid #000;
  background-color: #009;
  color: #fff;
}
 

This doesn’t have non-Javascript fallback, but I think that’s okay. This isn’t something that a spider would ever see (since spiders shouldn’t be submitting forms that result in update messages). Accessible browsers generally implement Javascript so that’s also not particularly a problem, though there may be additional hints I could give in CSS or Javascript to help make this more readable (if there’s a message, it should probably be the first thing read on the page).

Another common component of pages that varies separate from the page itself is logged-in status, but that’s more heavily connected to your application. Get both into Javascript and you might be able to turn caching way up on a lot of your pages.

Javascript
Programming
Web

Comments (15)

Permalink

Prism

I’ve seen talk of MS Silverlight and Adobe AIR. People talk them up like the future of web applications or something. I don’t know much about them, but I almost completely certain I don’t want anything to do with them.

Here’s a general rule I have: I don’t accept anything made by people who hate the web. If you hate the web and you want to improve the web, I don’t want anything to do with you. If you think the web is some kind of implementation detail then I probably don’t care what you are doing. If you still think the web is a fad, then you are just nuts; if all you can think of is reasons why the web is stupid and awkward, and you think it’s some giant step backward (from what?), then you haven’t thought very deeply about what’s happened in the world of technology and why.

To me Silverlight and AIR reek of a distaste for the web.

So it is with great delight that I read the announcement of Mozilla Prism: a bridge between the desktop and the web, written by people who don’t hate the web.

The premise of Prism from what I can tell is this: to make a web application into a desktop application, you just give the browser a shallowly separate identity. It lives in its own window, has its own icon, it’s own launcher. Maybe runs in its own process for reliability. You take away the chrome (URL bars, back button, etc). Unlike previous ideas for Mozilla (like xulrunner) you don’t add chrome back in with XUL, you just write all the controls with HTML and Javascript.

All the things that Prism isn’t is what makes it great, because it’s so damn simple. The only thing that really seems weird to me is that it is very separate from the browser itself; hopefully this is just temporary, and by the time it’s really "done" you’ll be able to just make any page an application directly from Firefox.

This doesn’t make web applications perfect, of course. There’s the offline issue. There’s usability issues, like keybindings. There’s lots of issues, but all of those issues apply just as much to the web as to a desktop application, and should be solved both places. Those things can be worked on orthogonally (most interestingly in Web Forms 2.0 and Web Applications 1.0 at WHAT-WG). Still, HTML and Javascript right now are totally workable for most applications.

Javascript
Web

Comments (18)

Permalink

Doctest for Ruby

Finally, someone wrote a version of doctest for Ruby.

Recently I’ve been writing most of my tests using stand-alone doctest files. It’s a great way to do TDD — mostly because the cognitive load is so low. Also, I write my examples but don’t write my output, then copy the output after visually confirming it is correct. So the basic pattern is:

  • Figure out what I want to do
  • Figure out how I want to test it
  • Automate my conditions
  • Manually inspect whether the output is correct (i.e., implement and debug)
  • Copy the output so that in the future the manual process is automated (doctest-mode for Emacs makes this particularly easy)

The result is a really good balance of manual and automated testing, I think giving you the benefit of both processes — the ease of manual testing, and the robustness of automated testing.

Another good thing about doctest is it doesn’t let you hide any boilerplate and setup. If it’s easy to use doctest, it’s probably easy to use the library.

There’s nothing Python-specific about doctest (e.g., doctestjs), so it’s good to see it moving to other languages. Even if the language doesn’t have a REPL, IMHO it’s worth inventing it just for this.

Javascript
Programming
Python
Ruby

Comments (5)

Permalink

Atompub & OpenID

One of the thinmgs I would like to do is to interact with Atompub (aka Atom Publishing Protocol) stores in Javascript through the browser. Since this effectively the browser itself interacting with the Atompub server, browser-like authentication methods would be nice. But services like Atompub don’t work nicely with the kinds of authentication methods that normal websites use. One of these is OpenID, which is particularly browser-focused.

From the perspective of a client, OpenID basically works like this:

  • You need to login. You tell the original server what your OpenID URL is, somehow.
  • The original server does some redirects, maybe some popups, etc.
  • Your OpenID server (attached to your OpenID URL) authenticates you in some fashion, and then tells the original server.
  • The original server probably sets a signed cookie so that in subsequent requests you stay logged in. You cannot do this little redirection dance for every request, since it’s actually quite intrusive.

So what happens when I have an XMLHttpRequest that needs to be authenticated? Neither the XMLHttpRequest nor Javascript generally can do the authentication. Only the browser can, with the user’s interaction.

One thought I have is a 401 Unauthorized response, with a header like:


WWW-Authenticate: Cookie location="http://original.server/login.html"
 

Which means I need to open up http://original.server/login.html and have the user log in, and the final result is that a cookie will be set. XMLHttpRequest sends cookies automatically I believe, so once the browser has the cookie then all the Javascript requests get the same cookie and hence authentication.

One problem, though, is that you have to wait around for a while for the login to succede, then continue on your way. A typical situation is that you have to return to the original page you were requesting, and people often do something like /login?redirect_to=original_url. In this case we might want something like /login?opener_call=reattempt_request, where when the login process is over we call window.opener.reattempt_request() in Javascript.

Maybe it would make sense for that location variable to be a URI Template, with some predefined variables, like opener, back, etc.

For general backward compatibility, would it be reasonable to send 307 Temporary Redirect plus WWW-Authenticate, and let XMLHttpRequests or other service clients sort it out, while normal browser requests do the normal login redirect?

Update: Another question/thought: is it okay to send multiple WWW-Authenticate headers, to give the client options for how it wants to do authentication? It seems vaguely okay, according to RFC 2616 14.47.

Javascript
Programming
Web

Comments (8)

Permalink

Atom Models

I’ve been doing a bit more with Atom lately.

First, I started writing a library to manipulate Atom feeds and entries. For the moment this is located in atom.py. It uses lxml, as does everything markup related I do these days.

I came upon a revelation of sorts when I was writing the library. I first started with a library that looked like this:


class Feed(object):
    def __init__(self, title, ...):
        self.title = title
        ..
    @classmethod
    def parse(cls, xml):
        if isinstance(xml, basestring):
            xml = etree.XML(xml)
        title = xml.xpath('//title').text
        ...
        return cls(title, ...)
    def serialize(self):
        el = etree.Element('{%s}feed' % atom_ns)
        title = etree.Element('{%s}title' % atom_ns)
        title.text = self.title
        el.append(title)
        ...
        return el
 

Obviously there’s ways to improve this and make it less verbose, and I went down that path for a while. But then I decided the whole path was wrong. Atom is XML. It’s not the representation of some object I’m creating. If I have something that can’t be represented in XML, it isn’t Atom, and it doesn’t belong in my Atom-related objects.

So instead I started making lxml more convenient when using Atom. I don’t keep any information except what is in the markup, I just make it more convenient to access that information.

I used lots of descriptors to do this, as the same patterns happened over and over. For instance, the Feed object is fairly simple:


class Feed(AtomElement):
    entries = _findall_property('entry')
    author = _element_property('author')
 

Which basically means that feed.entries returns all <entry> elements, and feed.author returns the single author element.

There’s also accessors for text elements (like <id>) and date containing elements (like <updated>) and just to access XML attributes as Python attributes.

There’s a number of advantages:

  • No hidden state.
  • No deferred errors, since everything is always represented in the XML infoset.
  • All XML extensions work, even though my classes don’t know anything in particular about them. There’s a full API for manipulating the XML that you can use, you don’t have to use my APIs.
  • Even more obscure kinds of extensions work fine, like a custom attribute on an element. There’s absolutely zero normalization that happens.
  • I only have to write the parts where the normal XML (lxml) APIs are inconvenient, so the implementation stays simple.
  • There’s no confusion over which object I might be talking about in my code. There’s no distinction between the XML object and the domain object.

Since then I’ve been working on a Javascript library for handling Atom. It’s not as elegant. I am trying to keep to this same principle, but of course I can’t actually extend the DOM and so I can’t add convenience methods. So instead I’m making a class that lightly wraps the DOM objects, with explicit getters and setters that simply read and modify those DOM objects.

One thing that I have found very useful in my development on the Javascript side is doctest-style testing. You can see the test, but to run it you have to check it out (it uses some svn:externals which you don’t get through the direct svn access). After using that testing some more and being pleased with the result, I decided to package the Javascript doctest runner a bit better. I removed the framework dependencies, did a bit of renaming (now it is doctestjs or doctest.js instead of jsdoctest), wrote up fairly comprehensive docs, and uploaded it to JSAN (though at the moment the trunk from svn is probably better to use). I think it’s an excellent way of doing unit testing in Javascript, much better than any of the alternatives I’ve seen. It even has some notable advantages over Python’s doctest, like if you are using Firebug (which you must if you do Javascript development) you get a console session that runs in the same namespace as your tests, so you can easily do inspection of the objects if there’s a failure.

I’m not sure about JSAN. It’s nice to have an index. But I think they copy stuff from CPAN a bit too much. Why should you have a text README file? That’s just silly; of course Javascript documentation should be HTML. They batch processing. Processing one package a day
on the fly shouldn’t be overwhelming. They want a MANIFEST file. The standard metadata file is YAML, not JSON. This should all be a little more Javascripty in my opinion. But they also accept any kind of upload, so there’s nothing stopping you from ignoring what you don’t
care about. I’ll probably improve the packaging of doctestjs a bit in the future, and still ignore the parts I think are silly.

Javascript
Programming
Python

Comments (8)

Permalink