We can answer on question what is VPS? and what is cheap dedicated servers?

Why doctest.js is better than Python’s doctest

I’ve been trying, not too successfully I’m afraid, to get more people to use doctest.js. There’s probably a few reasons people don’t. They are all wrong! Doctest.js is the best!

One issue in particular is that people (especially people in my Python-biased circles) are perhaps thrown off by Python’s doctest. I think Python’s doctest is pretty nice, I enjoy testing with it, but there’s no question that it has a lot of problems. I’ve even thought about trying to fix doctest, and even made a repository, but I only really got as far as creating a list of issues I’d like to fix. But, like so many before me, I never actually made those fixes. Doctest has, in its life, only really had a single period of improvement (in the time leading to Python 2.4). That’s not a recipe for success.

Of course doctest.js takes inspiration from Python’s doctest, but I wrote it as a real test environment, not for a minimal use case. In the process I fixed a bunch of issues with doctest, and in places Javascript has also provided helpful usability.

Some issues:

Doctest.js output is predictable

The classic pitfall of Python’s doctest is printing a dictionary:


>>> print {"one": 1, "two": 2}
{'two': 2, 'one': 1}
 

The print order of a dictionary is arbitrary, based on a hash algorithm that can change, or mix things up as items are added or removed. And to make it worse, the output usually stable, such that you can write tests that unexpectibly fragile. But there’s no reason why dict.__repr__ must use an arbitrary order. Personally I take it as a bit of unfortunate laziness.

If doctest had used pprint for all of its printing it would have helped some. But not enough, because this kind of code is fairly common:


def __repr__(self):
    return '<ThisClass attr=%r>' % self.attr
 

and that %r invokes a repr() that cannot be overridden.

In doctest.js I always try to make output predictable. One reason this is fairly easy is that there’s nothing like repr() in Javascript, so doctest.js has its own implementation. It’s like I started with pprint and no other notion existed.

Good matching

In addition to unpredictable output, there’s also just hard-to-match output. Output might contain blank lines, for instance, and Python’s doctest requires a very ugly <BLANKLINE> token to handle that. Whitespace might not be normalized. Maybe there’s boring output. Maybe there’s just a volatile item like a timestamp.

Doctest.js includes, by default, ellipsis: ... matches any length of text. But it also includes another wildcard, ?, which matches just one number or word. This avoids cases when the use of ... swallows up too much when you just wanted to get a single word.

Also doctest.js doesn’t use ... for other purposes. In Python’s doctest ...` is used for continuation lines, meaning you can’t just ignore output, like:


>>> print who_knows_what_this_returns()
...
 

Or even worse, you can’t ignore the beginning of an item:


>>> print some_request
...
X-Some-Header: foo
...
 

The way I prefer to use doctest.js it doesn’t have any continuation line symbol (but if there is one, it’s >).

Also doctest.js normalizes whitespace, normalizes " and ', and just generally tries to be reasonable.

Doctest.js tests are plain Javascript

Not many editors know how to syntax highlight and check doctests, with their >>> in front of each line and so forth. And the whole thing is tweaky, you need to use a continuation (...) on some lines, and start statements with >>>. It’s an awkward way to compose.

Doctest.js started out with the same notion, though with different symbols ($ and >). But recently with the rise of a number of excellent parsers (I used Esprima) I’ve moved my own tests to another pattern:


print(something())
// => expected output
 

This is already a fairly common way to write examples. Like how you may have read pre-Python pseudocode and thought: that looks like Python!: doctest.js looks like example pseudocode.

Doctest.js tests are self-describing

Python’s doctest has some options, some important options that effect the semantics of the test, that you can only turn on in the runner. The most important option is ELLIPSIS. Either your test was written to use ELLIPSIS or it wasn’t – that a test can’t self-describe its requirements means that test running is fragile.

I made the hackiest package ever to get around this in Python, but it’s hacky and lame.

Exception handling isn’t special

Python’s doctest treats exceptions differently from other output. So if you print something before the exception, it is thrown away, never to be seen. And you can’t use some of the same matching techniques.

Doctest.js just prints out exceptions, and it’s matched like anything else.

This particular case is one of several places where it feels like Python’s doctest is just being obstinate. Doing it the right way isn’t harder. Python’s doctest makes debugging exception cases really hard.

Doctest.js has a concept of "abort"

I’m actually pretty okay with Python doctest’s notion that you just run all the tests, even when one fails. Getting too many failures is a bit of a nuisance, but it’s not that bad. But there’s no way to just give up, and there needs to be. If you are relying on something to be importable, or some service to be available, there’s no point in going on with the tests.

Doctest.js lets you call Abort() and further tests are cancelled.

Distinguishing between debugging output and deliberate output

Maybe it’s my own fault for being a programming troglodite, but I use a lot of print for debugging. This becomes a real problem with Python’s doctest, as it tracks all that printing and it causes tests to fail.

Javascript has something specifically for printing debugging output: console.log(). Doctest.js doesn’t mess with that, it adds a new function print(). Only stuff that is printed (not logged) is treated as expected output. It’s like console.log() goes to stderr and print() goes to stdout.

Doctest.js also forces the developer to print everything they care about. For better or worse Javascript has many more expressions than Python (including assignments), so looking at the result of an expression isn’t a good clue for whether you care about the result of an expression. I’m not sure this is better, but it’s part of the difference.

Doctest.js also groups your printed statements according to the example you are in (an example being a block of code and an expected output). This is much more helpful than watching a giant stream of output go to the console (the browser console or terminal).

Doctest.js handles async code

This admittedly isn’t that big a deal for Python, but for Javascript it is a real problem. Not a problem for doctest.js in particular, but a problem for any Javascript test framework. You want to test return values, but lots of functions don’t "return", instead they call some callback or create some kind of promise object, and you have to test for side effects.

Doctest.js I think has a really great answer for this, which is not so much to say that Python’s doctest is so much worse, but in the context of Javascript doctest.js has something really useful and unique. If callback-driven async code had ever been very popular in Python then this sort of feature would be nice there too.

The browser is a great environment

A lot of where doctest.js is much better than Python’s doctest is simply that it has a much more powerful environment for displaying results. It can highlight failed or passing tests. When there’s a wildcard in expected output, it can show the actual output without adding any particular extra distraction. It can group console messages with the tests they go with. It can show both a simple failure message, and a detailed line-by-line comparison. All these details make it easy to identify what went wrong and fix it. The browser gives a rich and navigable interface.

I’d like to get doctest.js working well on Node.js (right now it works, but is not appealing), but I just can’t bring myself to give up the browser. I have to figure out a good hybrid.

Python’s doctest lacks a champion

This is ultimately the reason Python’s doctest has all these problems: no one cares about it, no one feels responsible for it, and no one feels empowered to make improvements to it. And to make things worse there is a cadre of people that will respond to suggestions with their own criticisms that doctest should never be used beyond its original niche, that it’s constraints are features.

Doctest is still great

I’m ragging on Python’s doctest only because I love it. I wish it was better, and I made doctest.js in a way I wish Python’s doctest was made. Doctest, and more generally example/expectation oriented code, is a great way to explain things, to make tests readable, to make test-driven development feasible, to create an environment that errs on the side of over-testing instead of under-testing, and to make failures and resolutions symmetric. It’s still vastly superior to BDD, avoiding all BDD’s aping of readability while still embracing the sense of test-as-narrative.

But, more to the point: use doctest.js, read the tutorial, or try it in the browser. I swear, it’s really nice to use.

Javascript
Mozilla
Programming
Python

Comments (19)

Permalink

Python Application Package

I’ve been thinking some more about deployment of Python web applications, and deployment in general (in part leading up to the Web Summit). And I’ve got an idea.

I wrote about this about a year ago and recently revised some notes on a proposal but I’ve been thinking about something a bit more basic: a way to simply ship server applications, bundles of code. Web applications are just one use case for this.

For now lets call this a "Python application package". It has these features:

  1. There is an application description: this tells the environment about the application. (This is sometimes called "configuration" but that term is very confusing and overloaded; I think "description" is much clearer.)
  2. Given the description, you can create an execution environment to run code from the application and acquire objects from the application. So there would be a specific way to setup sys.path, and a way to indicate any libraries that are required but not bundled directly with the application.
  3. The environment can inject information into the application. (Also this sort of thing is sometimes called "configuration", but let’s not do that either.) This is where the environment could indicate, for instance, what database the application should connect to (host, username, etc).
  4. There would be a way to run commands and get objects from the application. The environment would look in the application description to get the names of commands or objects, and use them in some specific manner depending on the purpose of the application. For instance, WSGI web applications would point the environment to an application object. A Tornado application might simply have a command to start itself (with the environment indicating what port to use through its injection).

There’s a lot of things you can build from these pieces, and in a sophisticated application you might use a bunch of them at once. You might have some WSGI, maybe a seperate non-WSGI server to handle Web Sockets, something for a Celery queue, a way to accept incoming email, etc. In pretty much all cases I think basic application lifecycle is needed: commands to run when an application is first installed, something to verify the environment is acceptable, when you want to back up its data, when you want to uninstall it.

There’s also some things that all environments should setup the same or inject into the application. E.g., $TMPDIR should point to a place where the application can keep its temporary files. Or, every application should have a directory (perhaps specified in another environmental variable) where it can write log files.

Details?

To get more concrete, here’s what I can imagine from a small application description; probably YAML would be a good format:


platform: python, wsgi
require:
  os: posix
  python: <3
  rpm: m2crypto
  deb: python-m2crypto
  pip: requirements.txt
python:
  paths: vendor/
wsgi:
  app: myapp.wsgiapp:application
 

I imagine platform as kind of a series of mixins. This system doesn’t really need to be Python-specific; when creating something similar for Silver Lining I found PHP support relatively easy to add (handling languages that aren’t naturally portable, like Go, might be more of a stretch). So python is one of the features this application uses. You can imagine lots of modularization for other features, but it would be easy and unproductive to get distracted by that.

The application has certain requirements of its environment, like the version of Python and the general OS type. The application might also require libraries, ideally one libraries that are not portable (M2Crypto being an example). Modern package management works pretty nicely for this stuff, so relying on system packages as a first try I believe is best (I’d offer requirements.txt as a fallback, not as the primary way to handle dependencies).

I think it’s much more reliable if applications primarily rely on bundling their dependencies directly (i.e., using a vendor directory). The tool support for this is a bit spotty, but I believe this package format could clarify the problems and solutions. Here is an example of how you might set up a virtualenv environment for managing vendor libraries (you then do not need virtualenv to use those same libraries), and do so in a way where you can check the results into source control. It’s kind of complicated, but works (well, almost works – bin/ files need fixing up). It’s a start at least.

Support Library

On the environment side we need a good support library. pywebapp has some of the basic features, though it is quite incomplete. I imagine a library looking something like this:


from apppackage import AppPackage
app = AppPackage('/var/apps/app1.2012.02.11')
# Maybe a little Debian support directly:
subprocess.call(['apt-get', 'install'] +
                app.config['require']['deb'])
# Or fall back of virtualenv/pip
app.create_virtualenv('/var/app/venvs/app1.2012.02.11')
app.install_pip_requirements()
wsgi_app = app.load_object(app.config['wsgi']['app'])
 

You can imagine building hosting services on this sort of thing, or setting up continuous integration servers (app.run_command(app.config['unit_test'])), and so forth.

Local Development

If designed properly, I think this format is as usable for local development as it is for deployment. It should be able to run directly from a checkout, with the "development environment" being an environment just like any other.

This rules out, or at least makes less exciting, the use of zip files or tarballs as a package format. The only justification I see for using such archives is that they are easy to move around; but we live in the FUTURE and there are many ways to move directories around and we don’t need to cater to silly old fashions. If that means a script that creates a tarball, FTPs it to another computer, and there it is unzipped, then fine – this format should not specify anything about how you actually deliver the files. But let’s not worry about copying WARs.

Packaging
Python
Silver Lining
Web

Comments (9)

Permalink

Git-as-sync, not source-control-as-deployment

I don’t like systems that use git push for deployment (Heroku et al). Why? I do a lot of this:


$ git push deploy
... realize I forgot a domain name ...
$ git commit -m "fix domain name" -a ; git push deploy
... realize I didn't do something right with the database setup ...
$ git commit -m "configure database right" -a ; git push deploy
... dammit, I didn'
t fix it quite right ...
$ git commit -m "typo" -a ; git push deploy
 

And then maybe I’d actually like to keep my config out of my source control, or have a build process that I run locally, or any number of things. I’d like to be able to test deployment, but every deployment is a commit, and I like to commit tested work. I think I could use git rebase but I lack the discipline to undo my work so I can do it correctly. This is why I don’t do continuous commits.

There’s a whole different level of weirdness when you use GitHub Pages as you aren’t pushing to a deployment-specific remote, you are pushing to a deployment-specific branch.

So I’ve generally thought: git deployment is wrong.

Then I was talking to some other people at Mozilla and they mentioned that ops was using git for simply moving files around even though the stuff they were deploying was itself in Mercurial. They had a particular site with a very large number of files, and it was faster to use git than rsync (git has more metadata than rsync; rsync has to look at everything everytime you sync). And that all seemed very reasonable; git is a fine way to sync things.

But I kind of forgot about it all, and just swore to myself as I did too many trivial commits and wrote too many meaningless commit messages.

Still… it isn’t so hard to separate these concerns, is it? So I wrote up a quite small command called git-sync. The basic idea: copy the working directory to a new location (minus .git/), commit that, and push the result to your deployment remote. You can send modified and untracked files, and you can run a build script before committing and push the result of the build script, all without sullying your "real" source control. And you happen to have a nice history of deployments, which is also nice.

I’ve only used this a little bit, but I’ve enjoyed when I have used it, and it makes me feel much better/clearer about my actual commits. It’s really short right now, and probably gets some things entirely wrong (e.g., moving over untracked files). But it works well enough to be improved (winkwinknudgenudge).

So check it out: https://github.com/ianb/git-sync

Programming
Web

Comments (11)

Permalink

My Unsolicited Advice For PyPy

I think the most interesting work in programming languages right now is about the runtime, not syntax or even the languages themselves. Which places PyPy in an interesting position, as they have put a great deal of effort into abstracting out the concept of runtime from the language they are implementing (Python).

There are of course other runtime environments available to Python. The main environment has and continues to be CPython — the runtime developed in parallel with the language, and with continuous incremental feedback and improvement by the Python developer community. It is the runtime that informs and is informed by the language. It’s also the runtime that is most easy-going about integrating with C libraries, and by extension it is part of the vague but important runtime environment of "Unix". There’s also Jython and IronPython. I frankly find these completely uninteresting. They are runtimes controlled by companies, not communities, and the Python implementations are neither natural parts of their runtime environments, nor do the runtimes include many concessions to make themselves natural for Python.

PyPy is somewhere different. It still has a tremendous challenge because Python was not developed for PyPy. Even small changes to the language seem impossible — something as seemingly innocuous as making builtins static seems to be stuck in a conservative reluctance to change. But unlike Jython and IronPython they aren’t stuck between a rock and a hard place; they just have to deal with the rock, not the hard place.

So here is my unsolicited advice on what PyPy-the-runtime should consider. Simple improvements to performance and the runtime are fine, but being incrementally better than CPython only goes so far, and I personally doubt it will ever make a big impact on Python that way.

PyPy should push hard on concurrency and reliability. If it is fast enough then that’s fine; that’s done as far as I’m concerned. I say this because I’m a web programmer, and speed is uninteresting to me. Certainly opinions will differ. But to me speed (as it’s normally defined) is really really uninteresting. When or if I care about speed I’m probably more drawn to Cython. I do care about latency, memory efficiency, scalability/concurrency, resource efficiency, and most of all worst cases. I don’t think a JIT addresses any of these (and can even make things worse). I don’t know of benchmarks that measure these parameters either.

I want a runtime with new and novel features; something that isn’t just incrementally better than CPython. This itself might seem controversial, as the only point to such novel features would be for people to implement at least some code intended for only PyPy. But if the features are good enough then I’m okay with this — and if I’m not drawn to write something that will only work on PyPy, I probably won’t be drawn to use PyPy at all; natural conservatism and inertia will keep me (and most people) on CPython indefinitely.

What do I want?

  • Microprocesses. Stackless and greenlets have given us micro-threads, but it’s just not the same. Which is not entirely a criticism — it shows that unportable features are interesting when they are good features. But I want the next step, which is processes that don’t share state. (And implicitly I don’t just want standard async techniques, which use explicit concurrency and shared state.)
  • Shared objects across processes with copy-on-write; then you can efficiently share objects (like modules!) across concurrent processes without the danger of shared state, but without the overhead of copying everything you want to share. Lack of this is hurting PHP, as you can’t have a rich set of libraries and share-nothing without killing your performance.
  • I’d rather see a break in compatibility for C extensions to support this new model, than to abandon what could be PyPy’s best feature to support CPython’s C extension ecosystem. Being a web programmer I honestly don’t need many C modules, so maybe I’m biased. But if the rest of the system is good enough then the C extensions will come.
  • Make sure resource sharing that happens outside of the Python environment is really solid. C libraries are often going to be unfriendly towards microprocesses; make sure what is exposed to the Python environment is solid. That might even mean a dangerous process mode that can handle ctypes and FFI and where you carefully write Python code that has extra powers, so long as there’s a strong wall between that code and "general" code that makes use of those services.
  • Cython — it’s doing a lot of good stuff, and has a much more conservative but also more predictable path to performance (through things like type annotation). I think it’s worth leaning on. I also have something of a hunch that it could be a good way to do FFI in a safe manner, as Cython already supports multiple targets (Python 2 and 3) from the same codebase. Could PyPy be another target?
  • Runtime introspection of the runtime. We have great language introspection (probably much to the annoyance of PyPy developers who have to copy this) but currently runtime introspection is poor-to-nonexistant. What processes are running? How much memory is each using? Where? Are they holding on to resources? Are they blocking on some non-Python library? How much CPU have they been using? Then I want to be able to kill processes, send them signals, adjust priorities, etc.

And I guess it doesn’t have to be "PyPy", but a new backend for PyPy to target; it doesn’t have to be the only path PyPy pursues.

With a runtime like this PyPy could be an absolutely rocking platform for web development. Python could be as reliable as, oh… PHP? Sorry, I probably won’t win arguments that way ;) As good as Erlang! Maybe we could get the benefits of async without the pain of callbacks or Deferreds. And these are features people would use. Right now I’m perceiving a problem where there’s lots of people standing on the sidelines cheering you on but not actually using PyPy.

So: I wouldn’t tell anyone what to do, and if someone tries this out I’ll probably only be on the sidelines cheering you on… but I really think this could be awesome.

Update: there’s some interesting comments on Hacker News as well.

Programming
Python

Comments (22)

Permalink

A Python Web Application Package and Format (we should make one)

At PyCon there was an open space about deployment, and the idea of drop-in applications (Java-WAR-style).

I generally get pessimistic about 80% solutions, and dropping in a WAR file feels like an 80% solution to me. I’ve used the Hudson/Jenkins installer (which I think is specifically a project that got WARs on people’s minds), and in a lot of ways that installer is nice, but it’s also kind of wonky, it makes configuration unclear, it’s not always clear when it installs or configures itself through the web, and when you have to do this at the system level, nor is it clear where it puts files and data, etc. So a great initial experience doesn’t feel like a great ongoing experience to me — and it doesn’t have to be that way. If those were necessary compromises, sure, but they aren’t. And because we don’t have WAR files, if we’re proposing to make something new, then we have every opportunity to make things better.

So the question then is what we’re trying to make. To me: we want applications that are easy to install, that are self-describing, self-configuring (or at least guide you through configuration), reliable with respect to their environment (not dependent on system tweaking), upgradable, and respectful of persistence (the data that outlives the application install). A lot of this can be done by the "container" (to use Java parlance; or "environment") — if you just have the app packaged in a nice way, the container (server environment, hosting service, etc) can handle all the system-specific things to make the application actually work.

At which point I am of course reminded of my Silver Lining project, which defines something very much like this. Silver Lining isn’t just an application format, and things aren’t fully extracted along these lines, but it’s pretty close and it addresses a lot of important issues in the lifecycle of an application. To be clear: Silver Lining is an application packaging format, a server configuration library, a cloud server management tool, a persistence management tool, and a tool to manage the application with respect to all these services over time. It is a bunch of things, maybe too many things, so it is not unreasonable to pick out a smaller subset to focus on. Maybe an easy place to start (and good for Silver Lining itself) would be to separate at least the application format (and tools to manage applications in that state, e.g., installing new libraries) from the tools that make use of such applications (deploy, etc).

Some opinions I have on this format, exemplified in Silver Lining:

  • It’s not zipped or a single file, unlike WARs. Uploading zip files is not a great API. Geez. I know there’s this desire to "just drop in a file"; but there’s no getting around the fact that "dropping a file" becomes a deployment protocol and it’s an incredibly impoverished protocol. The format is also not subtly git-based (ala Heroku) because git push is not a good deployment protocol.
  • But of course there isn’t really any deployment protocol inferred by a format anyway, so maybe I’m getting ahead of myself ;) I’m saying a tool that deploys should take as an argument a directory, not a single file. (If the tool then zips it up and uploads it, fine!)
  • Configuration "comes from the outside". That is, an application requests services, and the container tells the application where those services are. For Silver Lining I’ve used environmental variables. I think this one point is really important — the container tells the application. As a counter-example, an application that comes with a Puppet deployment recipe is essentially telling the server how to arrange itself to suit the application. This will never be reliable or simple!
  • The application indicates what "services" it wants; for instance, it may want to have access to a MySQL database. The container then provides this to the application. In practice this means installing the actual packages, but also creating a database and setting up permissions appropriately. The alternative is never having any dependencies, meaning you have to use SQLite databases or ad hoc structures, etc. But in fact installing databases really isn’t that hard these days.
  • All persistence has to use a service of some kind. If you want to be able to write to files, you need to use a file service. This means the container is fully aware of everything the application is leaving behind. All the various paths an application should use are given in different environmental variables (many of which don’t need to be invented anew, e.g., $TMPDIR).
  • It uses vendor libraries exclusively for Python libraries. That means the application bundles all the libraries it requires. Nothing ever gets installed at deploy-time. This is in contrast to using a requirements.txt list of packages at deployment time. If you want to use those tools for development that’s fine, just not for deployment.
  • There is also a way to indicate other libraries you might require; e.g., you might lxml, or even something that isn’t quite a library, like git (if you are making a github clone). You can’t do those as vendor libraries (they include non-portable binaries). Currently in Silver Lining the application description can contain a list of Ubuntu package names to install. Of course that would have to be abstracted some.
  • You can ask for scripts or a request to be invoked for an application after an installation or deployment. It’s lame to try to test if is-this-app-installed on every request, which is the frequent alternative. Also, it gives the application the chance to signal that the installation failed.
  • It has a very simple (possibly/probably too simple) sense of configuration. You don’t have to use this if you make your app self-configuring (i.e., build in a web-accessible settings screen), but in practice it felt like some simple sense of configuration would be helpful.

Things that could be improved:

  • There are some places where you might be encouraged to use routines from the silversupport package. There are very few! But maybe an alternative could be provided for these cases.
  • A little convention-over-configuration is probably suitable for the bundled libraries; silver includes tools to manage things, but it gets a little twisty. When creating a new project I find myself creating several .pth files, special customizing modules, etc. Managing vendor libraries is also not obvious.
  • Services are IMHO quite important and useful, but also need to be carefully specified.
  • There’s a bunch of runtime expectations that aren’t part of the format, but in practice would be part of how the application is written. For instance, I make sure each app has its own temporary directory, and that it is cleared on update. If you keep session files in that location, and you expect the environment to clean up old sessions — well, either all environments should do that, or none should.
  • The process model is not entirely clear. I tried to simply define one process model (unthreaded, multiple processes), but I’m not sure that’s suitable — most notably, multiple processes have a significant memory impact compared to threads. An application should at least be able to indicate what process models it accepts and prefers.
  • Static files are all convention over configuration — you put static files under static/ and then they are available. So static/style.css would be at /style.css. I think this is generally good, but putting all static files under one URL path (e.g., /media/) can be good for other reasons as well. Maybe there should be conventions for both.
  • Cron jobs are important. Though maybe they could just be yet another kind of service? Many extra features could be new services.
  • Logging is also important; Silver Lining attempts to handle that somewhat, but it could be specified much better.
  • Silver Lining also supports PHP, which seemed to cause a bit of stress. But just ignore that. It’s really easy to ignore.

There is a description of the configuration file for apps. The environmental variables are also notably part of the application’s expectations. The file layout is explained (together with a bunch of Silver Lining-specific concepts) in Development Patterns. Besides all that there is admittedly some other stuff that is only really specified in code; but in Silver Lining’s defense, specified in code is better than unspecified ;) App Engine provides another example of an application format, and would be worth using as a point of discussion or contrast (I did that myself when writing Silver Lining).

Discussing WSGI stuff with Ben Bangert at PyCon he noted that he didn’t really feel like the WSGI pieces needed that much more work, or at least that’s not where the interesting work was — the interesting work is in the tooling. An application format could provide a great basis for building this tooling. And I honestly think that the tooling has been held back more by divergent patterns of development than by the difficulty of writing the tools themselves; and a good, general application format could fix that.

Packaging
Programming
Python
Web

Comments (18)

Permalink

Javascript on the server AND the client is not a big deal

All the cool kids love Node.js. I’ve used it a little, and it’s fine; I was able to do what I wanted to do, and it wasn’t particularly painful. It’s fun to use something new, and it’s relatively straight-forward to get started so it’s an emotionally satisfying experience.

There are several reasons you might want to use Node.js, and I’ll ignore many of them, but I want to talk about one in particular:

Javascript on the client and the server!

Is this such a great feature? I think not…

You only need to know one language!

Sure. Yay ignorance! But really, this is fine but unlikely to be relevant to any current potential audience for Node.js. If you are shooting for an very-easy-to-learn client-server programming system, Node.js isn’t it. Maybe Couch or something similar has that potential? But I digress.

It’s not easy to have expertise at multiple languages. But it’s not that hard. It’s considerably harder to have expertise at multiple platforms. Node.js gives you one language across client and server, but not one platform. Node.js programming doesn’t feel like the browser environment. They do adopt many conventions when it’s reasonable, but even then it’s not always the case — in particular because many browser APIs are the awkward product of C++ programmers exposing things to Javascript, and you don’t want to reproduce those same APIs if you don’t have to (and Node.js doesn’t have to!) — an example is the event pattern in Node, which is similar to a browser but less obtuse.

You get to share libraries!

First: the same set of libraries is probably not applicable. If you can do it on the client then you probably don’t have to do it on the server, and vice versa.

But sometimes the same libraries are useful. Can you really share them? Browser libraries are often hard to use elsewhere because they rely on browser APIs. These APIs are frequently impossible to implement in Javascript.

Actually they are possible to implement in Javascript using Proxies (or maybe some other new and not-yet-standard Javascript features). But not in Node.js, which uses V8, and V8 is a pretty conservative implementation of the Javascript language. (Update: it is noted that you can implement proxies — in this case a C++ extension to Node)

Besides these unimplementable APIs, it is also just a different environment. There is the trivial: the window object in the browser has a Node.js equivalent, but it’s not named window. Performance is different — Node has long-running processes, the browser might. Node can have blocking calls, which are useful even if you can’t use them at runtime (e.g., require()); but you can’t really have any of these at any time on the browser. And then of course all the system calls, none of which you can use in the browser.

All these may simply be surmountable challenges, through modularity, mocking, abstractions, and so on… but ultimately I think the motivation is lacking: the domain of changing a live-rendered DOM isn’t the same as producing bytes to put onto a socket.

You can work fluidly across client and server!

If anything I think this is dangerous rather than useful. The client and the server are different places, with different expectations. Any vagueness about that boundary is wrong.

It’s wrong from a security perspective, as the security assumptions are nearly opposite on the two platforms. The client trusts itself, and the server trusts itself, and both should hold the other in suspicion (though the client can be more trusting because the browser doesn’t trust the client code).

But it’s also the wrong way to treat HTTP. HTTP is pretty simple until you try to make it simpler. Efforts to make it simpler mostly make it more complicated. HTTP lets you send serialized data back and forth to a server, with a bunch of metadata and other do-dads. And that’s all neat, but you should always be thinking about sending information. And never sharing information. It’s not a fluid boundary, and code that touches HTTP needs to be explicit about it and not pretend it is equivalent to any other non-network operation.

Certainly you don’t need two implementation languages to keep your mind clear. But it doesn’t hurt.

You can do validation the same way on the client and server!

One of the things people frequently bring up is that you can validate data on the client and server using the same code. And of course, what web developer hasn’t been a little frustrated that they have to implement validation twice?

Validation on the client is primarily a user experience concern, where you focus on bringing attention to problems with a form, and helping the user resolve those problems. You may be able to avoid errors entirely with an input method that avoids the problem (e.g., if a you have a slider for a numeric input, you don’t have to worry about the user inputing a non-numeric value).

Once the form is submitted, if you’ve done thorough client-side validation you can also avoid friendly server-side validation. Of course all your client-side validation could be avoided through a malicious client, but you don’t need to give a friendly error message in that case, you can simply bail out with a simple 400 Bad Request error.

At that point there’s not much in common between these two kinds of validation — the client is all user experience, and the server is all data integrity.

You can do server-side Javascript as a fallback for the client!

Writing for clients without Javascript is becoming increasingly less relevant, and if we aren’t there yet, then we’ll certainly get there soon. It’s only a matter of time, the writing is on the wall. Depending on the project you might have to put in workarounds, but we should keep those concerns out of architecture decisions. Maintaining crazy hacks is not worth it. There’s so many terrible hacks that have turned into frameworks, and frameworks that have justified themselves because of the problems they solved that no longer matter… Node.js deserves better than to be one of those.

In Conclusion Or Whatever

I’m not saying Node.js is bad. There are other arguments for it, and you don’t need to make any argument for it if you just feel like using it. It’s fun to do something new. And I’m as optimistic about Javascript as anyone. But this one argument, I do not think it is very good.

Javascript
Programming
Web

Comments (29)

Permalink

Doctest.js & Callbacks

Many years ago I wrote a fairly straight-forward port of Python’s doctest to Javascript. I thought it was cool, but I didn’t really talk about it that much. Especially because I knew it had one fatal flaw: it was very unfriendly towards programming with callbacks, and Javascript uses a lot of callbacks.

On a recent flight I decided to look at it again, and realized fixing that one flaw wasn’t actually a big deal. So now doctest.js really works. And I think it works well: doctest.js.

I have yet to really use doctest.js on more than a couple real cases, and as I do (or you do?) I expect to tweak it more to make it flow well. But having tried a couple of examples I am particularly liking how it can be used with callbacks.

Testing with callbacks is generally a tricky thing. You want to make assertions, but they happen entirely separately from the test runner’s own loop, and your callbacks may not run at all if there’s a failure.

I came upon some tests recently that used Jasmine, a BDD-style test framework. I’m not a big fan of BDD but I’m fairly new to serious Javascript development so I’m trying to withhold judgement. The flow of the tests is a bit peculiar until you realize that it’s for async reasons. I’ll try to show something that roughly approximates a real test of an XMLHttpRequest API call:


it("should give us no results", function() {
  runs(function () {
    var callback = createSpy('callback for results');
    $.ajax({
      url: '/search',
      data: {q: "query unlikely to match anything"},
      dataType: "json",
      success: callback
    });
  });
  waits(someTimeout);
  runs(function () {
    expect(callback).toHaveBeenCalled();
    expect(callback.mostRecentCall.args[0].length).toEqual(0);
  });
});
 

So, the basic pattern is it() creates a group of tests, and each call to run() is a set of items to call sequentially. Then between these run blocks you can have signals to the runner to wait for some result, either a timeout (which is fragile), or you can setup specific conditions.

Another popular test runner is QUnit; it’s popular particularly because it’s what jQuery uses, and my own impression is that QUnit is just very simple and so least likely to piss you off.

QUnit has its own style for async:


test("should give us no results", function () {
  stop();
  expect(1);
  $.ajax({
    url: '/search',
    data: {q: "query unlikely to match anything"},
    dataType: "json",
    success: function (result) {
      ok(result.length == 0, 'No results');
      start();
    }
  });
});
 

stop() confused me for a bit until I realized what they were really referring to stopping the test runner; of course the function continues on regardless. What will happen is that the function will return, but nothing will have really been tested — the success callback will not have been run, and cannot run until all Javascript execution stops and control is given back to the browser. So the test runner will use setTimeout to let time pass before the test continues. In this case it will continue once start() is called. And expect() also makes it fail if it didn’t get at least one assertion during that interval — it would otherwise be easy to simply miss an assertion (though in this example it would be okay because if the success callback isn’t invoked then start() will never be called, and the runner will timeout and signal that as a failure).

So… now for doctest.js. Note that doctest.js isn’t "plain" Javascript, it looks like what an interactive Javascript session might look like (I’ve used shell-style prompts instead of typical console prompts, because the consoles didn’t exist when first I wrote this, and because >>>/... kind of annoy me anyway).


$ success = Spy('success', {writes: true});
$ $.ajax({
>   url: '/search',
>   data: {q: "query unlikely to match anything"},
>   dataType: "json",
>   success: success.func
> });
$ success.wait();
success([])
 

With doctest.js you still get a fairly linear feel — it’s similar to how Jasmine works, except every $ prompt is potentially a place where the loop can be released so something async can happen. Each prompt is equivalent to run() (though unless you call wait, directly or indirectly, everything will run in sequence).

There’s also an implicit assertion for each stanza, which is anything that is written must be matched ({writes: true} makes the spy/mock object write out any invocations). This makes it much harder to miss something in your tests.

Update: just for the record, doctest has changed some, and while that example still works, this would be the "right" way to do it now:


$.ajax({
  url: '/search',
  data: {q: "query unlikely to match anything"},
  dataType: "json",
  success: Spy("search.success", {wait: true, ignoreThis: true})
});
// => search.success([])
 

There is a new format that I now prefer with plain Javascript and "expected output" in comments. Spy("search.success", {wait: true, ignoreThis: true}) causes the test to wait on the Spy immediately (though the same pattern as before is also possible and sometimes preferable), and in all likelihood jQuery will set this to something we don’t care about, so ignoreThis: true keeps it from being printed. (Or maybe you are interested in it, in which case you’d leave that out)

Anyway, back to the original conclusion (update over)…

I’ve never actually found Python’s doctest to be a particularly good way to write docs, and I don’t expect any different from doctest.js, but I find it a very nice way to write and run tests… and while Python’s doctest is essentially abandoned and lacks many features to make it a more humane testing environment, maybe doctest.js can do better.

Javascript
Programming
Testing
Web

Comments (3)

Permalink

Net Neutrality: forcing companies to pay attention to their networks

When it comes to software licensing, I get annoyed at GPL critics. Mostly they argue that a permissive license is more hassle-free. But all licensing hassles come from proprietary licenses. All of them. Open source licenses are simple, well-understood, and if you are doing open source stuff you don’t need to negotiate, you don’t need lawyers. The deal is laid out and it’s more like technical machinery than a business deal. Open source has just a few deals, and we have names for them (BSD, GPL, etc); the alternative is the ever-expanding number of deals that proprietary licenses represent, always expanding, seldom clear, unnamed but still poised to mess things up.

But this is an introduction for a discussion of net neutrality! Net neutrality is one deal: simple, obvious, straight-forward. The opposite isn’t one deal, like proprietary licensing it is an ever-expanding complexity of deals, different pricing structures, opaque, and with salespeople using information-scarcity to manipulate sales at every opportunity.

There’s an absurd argument against net neutrality, that it would add regulatory complexity. This is absurd because neutrality is the default, regulation only comes into effect when someone messes with something, when some connectivity provider starts adding complexity to the system.

There are net neutrality advocates that ask: what if Fox gets preference over MSNBC? A poor argument, this kind of politically-motivated network bias seems implausible to me. The plausible result of not having network neutrality is all kinds of deals. Weird media deals. Deals with companies that have an influx of investment and want to bootstrap their audience. Providers that build their own content networks. How much will all this matter? Probably not much. Whatever the providers do will be just terrible, they seem to be inevitably bad at both idea and execution. For everyone else it will just be a competition tax, a way to turn money into a competitive advantage though with hints of the prisoner’s dilemma.

Mostly network bias just adds complexity to the system. It’s a whole new opportunity to make deals. Maybe different groups will come out ahead, but maybe not… my best guess is that implementing and justifying a biased network will be more trouble than its worth, and technology will make the issue moot before too long.

The people who will really get out ahead are the deal-mongers, the executives and lawyers and salespeople. These kinds of deals are opaque, complex, and it’s easier to manipulate analysis and perception than to actually provide a valuable agreement. But deal-making professionals come out ahead with every contract and every negotiation.

The companies providing infrastructure (Verizon, Comcast, AT&#038;T, etc) can take two approaches to maximizing profit. One approach is to pursue engineering and operational excellence, to provide a great network, and compete strongly. Or they can get "creative". Network neutrality makes creativity hard — it doesn’t block any creativity in providing their core service, but the leadership of these companies provide only deal-making leadership, to them the core service is an afterthought. I wonder if this is the worst effect of consolidation — every corporate consolidation requires all kinds of negotiation and further exults the leadership of the deal-makers over the people that are good at managing operations.

For all the complaints (and we complain about these companies a lot), these companies actually do provide good service. Reliability and speed keep improving. It could be better, but there are also lots of people doing a good job keeping a complex system working well. Those are the people I want to see empowered, they are the ones that should be the stars in their companies. I think network neutrality will help do that, it will help focus infrastructure providers on providing infrastructure. And make an exception for wireless? They are the most in need of focus.

Non-technical
Politics

Comments (5)

Permalink

Surveillance, Security, Privacy, Politics

I hang around people who talk about security and privacy and activists quite a bit. When talking security beyond the typical attackers — people committing identity theft, simple vandals, spammers, etc. — there’s the topic of government surveillance and legal attacks, and privacy as a way to defend political activists against the powers-that-be. I want to talk about this security question in particular.

(Nothing I say here relates to China or Iran or other places with overtly oppressive political systems and without basic legal rights. I don’t think worth trying to generalize that far.)

I’m not sure we are getting this stuff right. I don’t think the political attacks that are imagined are serious risks, and the attacks that are taking place are far less sophisticated than we imagine.

Background

I’m taking these lessons primarily from the experiences of my sister, who along with 7 others is currently facing felony conspiracy charges in Minnesota (felony conspiracy to riot with a dangerous weapon and to commit property damage). These charges are specifically for organizing protests in the lead up to the 2008 RNC convention in St. Paul. It’s only one data point, but in these matters there’s only a handful of cases that inform the discussion.

The city of St. Paul and other local governments received over $50 million for security for the RNC, and some of that money was quickly put into hiring informants to infiltrate organizations, anarchist organizations in particular. My sister among others were part of an organization known as the RNC Welcoming Committee. In total three informants were highly involved in the organization, each of them attending literally hundreds of hours of meetings. The Committee primarily worked on things like promoting the protests against the RNC, acquiring meeting space and internet access for people, finding housing and food for people visiting for the protests, and distributing logistical information like where protests would occur.

"Anarchism" means "without rulers": in line with their anarchist principles they didn’t try to prescribe how people would protest, they felt people should make their own choices about how to protest. The choices people made were widespread, ranging from staying in a "free speech zone" to a few permitted marches, some unpermitted marches, some civil disobedience, some blockading, and in a very small number of cases some people committed property damage. The Welcoming Committee did not advocate any particular kind of protest, they would not be their brother’s keeper, nor did they want to disparage any kind of protest as too timid. Each person should act on their own conscience.

Immediately before the RNC started the 8 were arrested and held for the duration of the convention before being charged and released on bail. Their houses and cars were searched. Nothing interesting was found, though at the time the Sheriff misrepresented things like bike inner tubes as possible slingshot material, or that having paint thinner in the basement, rags in the laundry, and empty bottles in the pantry constituted Molotov cocktail ingredients.

The Evidence

The case has progressed very slowly, but with recent hearings more of the prosecution’s case has been coming out. It’s been over a year and a half and only now are we getting any indication of what the real claims are against the defendants, though the prosecution continues to avoid presenting any real case or plausible complaint.

From the hearings we’re also learning something about the form of the investigation. The FBI was closely involved with the case and recruited the most active informant, and the primary investigator was previously with the Secret Service (which somewhat oddly has a computer-related duties), and at the time there was a great deal of national attention on the convention. So presumably they had the resources to investigate seriously if they wished to do so.

From the perspective of online security the case is very boring. The defendants have been given all the evidence collected during the investigation (including benign or even helpful evidence). It’s a huge amount of evidence, and hard for them to understand or sort through, but some kinds of investigation aren’t there. No email accounts were subpoenaed. Their computers were all confiscated, and will no doubt be kept until after the trial, but there’s nothing high-tech about that. Some of them had whole-disk encryption, and there is no indication it was broken nor were they even asked to provide passwords. There’s also no evidence of sniffing internet connections, tapping phones, breaking into email… nothing fancy was done.

From what we can tell the evidence against them will be primarily from informants’ testimony about open meetings, widely distributed literature, a video posted on YouTube, a password-protected but essentially open wiki (the wiki provider was not subpoenaed, despite things like edit history being potentially interesting).

If they had been any more security-conscious it would have worked against them — it would have been out of line with their ideals and would have made them less effective and transparent in their organizing efforts. The biggest danger now is that they’ll be demonized, that they’ll be judged based on caricatures of their actual beliefs, privacy only makes this worse.

Credit Where Credit Is Due

Perhaps one reason the surveillance was low-tech and subpoenaed evidence is not playing a large part in the case is that it’s just too hard. They used Riseup for many services, which is a set of online services for activists, who take privacy very seriously, log as little as possible, and try to host everything outside of the country so regardless of an activists locality it will be a bureaucratic challenge to get access to the servers.

Outside of the core group most people acted anonymously, so the prosecution would not be able to follow up on most of what they found anyway. Even if they got all the logs and email from everything the Welcoming Committee touched, I’m not sure they could make use of it. If they could somehow relate all that anonymous information, they’d still have to explain those techniques and convince a jury. Data mining and other data-driven techniques could be useful if they were trying to attach people who had done anything wrong. You can use surveillance to find the smoking gun, and once you’ve found it you don’t have to justify the techniques you used in the process. But only if there’s a smoking gun. It’s a peculiar situation where the prosecution doesn’t appear to actually believe they did anything demonstrably wrong; I fear they plan a case where they redefine "wrong".

Privacy

Besides the security issue there’s the privacy issue, and privacy is big on the internet these last few months. One of the oft-claimed benefits of privacy is to allow political dissent. And maybe that makes sense in China, but I don’t know how it relates to the things in the U.S. or Europe.

Political beliefs held in private don’t much matter. Complaining about politics in private situations is fine, because it just doesn’t matter. So sure, you are safe from political persecution if your privacy is maintained… but it’s because you are impotent not because privacy is some part of a political struggle.

This reminds me of a playground sense of privacy. On the playground you might say you like They Might Be Giants and the playground bully says that’s so gay, and you think I shouldn’t have said anything. But it doesn’t really matter how much you reveal in that situation, it doesn’t matter what you say you like — the bully isn’t making a pointed critique on your preferences, they are just trying to hurt you. The only way privacy will help you is if you are so quiet that the bully doesn’t notice you at all and picks on someone else instead. That’s a pathetic stance.

Ramsey County (where the RNC 8 are being charged) is a bully. They decided before the Welcoming Committee even existed that people were going to be arrested, charges were going to be made. The Welcoming Committee stuck their necks out further than anyone else. The problem isn’t that they made themselves vulnerable, the problem is that the Sheriff Fletcher is a bully and County Attorney Gaertner is some kind of automaton who doesn’t give a shit about justice.

And Lastly A Personal Plea

So… while there are general lessons, this case also specifically really sucks for my sister Monica, her significant other Eryn (another member of the Welcoming Committee) and the other six, all of whom I know and are really nice people who don’t deserve any of this shit. They have to spend their evenings reading through evidence or listening to the tapes of their meetings (which were boring enough to listen to the first time around). There’s a certain stigma to having pending felony charges, I know at least my sister has lost a job because of it. And they each have to have their own lawyer, and even though the lawyers aren’t charging them what would be the full rate it’s still a lot of money (like a quarter of a million dollars). Depressingly, in some sense this is all the government has to do; the trial is punishment enough to deter people from being activist organizers.

So I wish a donation was equivalent to Sticking It To The Man, but really it’s just adding some balance because The Man Is Already Sticking It To Them On Your Behalf.

Still, your support would be really helpful.

If it gives you any satisfaction County Attorney Susan Gaertner’s run for the Democratic gubernatorial nomination never went anywhere, I suspect in large part because she wasn’t brave enough to show her face at public events in the Twin Cities because she was consistently protested over this case. I doubt this has influenced the prosecution (at least in any positive way), but it’s satisfying.

Non-technical
Politics
Security

Comments (5)

Permalink

The Browser Desktop, developer tools

I find myself working in a Windows environment due to some temporary problems with my Linux installation. In terms of user experience Windows is not terrible. But more notable, things mostly just feel the same. My computing experience is not very dependent on the operating system… almost. Most of what I do is in a web browser — except programming itself. Probably a lot of you have the same experience: web browser, text editor, and terminal are pretty much all I need. I occasionally play with other tools, but none of them stick. Of course underlying the terminal and text editor UI is a whole host of important software — interpreters, version control tools, checkouts of all my projects, etc. So really there’s two things keeping us from a browser-only world: a few bits of UI, and a whole bunch of tools. Can we bridge this? I’m thinking (more speculatively than as an actual plan): could I stay on Windows without ever having to "use" Windows?

Browsers are clearly capable of implementing a capable UI for a terminal or editor; not a trivial endeavor, but not impossible. We need a way of handling the tools. The obvious answer in that case is a virtual machine. The virtual machine would certainly be using Linux, as there’s clear consensus that if you remove the UI and hardware considerations and just consider tools then Linux is by far the best choice — who uses Mac servers? And Windows is barely worth mentioning. I worked in a Linux VM for a while but found it really unsatisfying — but that was using the Linux UI through a VMWare interface.

So instead imagine: you start up a headless VM (remembering the tools are not about UI, so there’s no reason to have a graphical user interface on the VM), you point your browser at this VM, and you use a browser-based developer environment that mediates all the tools (the lightest kind of mediation is just simulating a terminal and using existing console-based interfaces). Look at your existing setup and just imagine a browser window in place of each not-browser-window app you are using.

I’m intrigued then by the idea of adding more to these interfaces, incrementally. Like HTML in the console, or applications lightly wrapping individual tools. IDEs never stick for me, maybe in part because I can’t commit, and also there’s collaboration issues with these tools (I’m never in a team where we would be able to agree on a single environment). But incremental decentralized improvements seem genuinely workable — improvement more in the style of the web, the browser providing the central metaphor.

I call this a Browser Desktop because it’s a fairly incremental change at this point and other terms (Web OS, Cloud OS) are always presented with unnecessarily hyperbole. What "operating system" you are using in this imagined system is a somewhat uninteresting semantic question; the OS hasn’t disappeared, it’s just boring. "The Cloud" is fine, but too easy to overthink, and there are many technical reasons to use a hybrid of local and remote pieces. "Internet Operating System" is more a framing concept than a thing-that-can-be-built. Chromium OS is essentially the same idea… I’m not really sure how they categorize themselves.

What would be painful right now? Good Javascript terminals exist. Bespin is hard at work on an editor worthy of being used by programmers. The browser needs to be an extremely solid platform. Google Chrome has done a lot in this direction, and Firefox is moving the same direction with the Electrolysis project. It’s okay to punt for now on all the "consumer" issues like music and media handling… and anyway, other people are hard at work on those things. Web sockets will help with some kinds of services that ideally will connect directly to a port; it’s not the same as a raw socket, but I feel like there’s potential for small intermediaries (e.g., imagine a Javascript app that connects to a locally-hosted server-side app that proxies to ssh). Also AddOns can be used when necessary (e.g., ChatZilla <https://addons.mozilla.org/en-US/firefox/addon/16>).

I’d like much better management of all these "apps" aka pages aka windows or tabs — things like split screens and workspaces. Generally I think using such a system heavily will create all sorts of interesting UI tensions. Which might be annoying for the user, but if it’s a constructive annoyance…

On the whole… this seems doable. It’s navel gazing in a sense — programmers thinking about programming — but one good thing about navel gazing is that programmers have traditionally been quite good at navel gazing, and while some results aren’t generally applicable (e.g., VM management) the exercise will certainly create many generally applicable side products. It would encourage interesting itch-scratching. There’s lots of other "web OS" efforts out there, but I’ve never really understood them… they copy desktop metaphors, or have weird filesystem metaphors, or create an unnecessarily cohesive experience. The web is not cohesive, and I’m pretty okay with that; I don’t expect my experiences in this context to be any more cohesive than my tasks are cohesive. In fact it’s exactly the lack of cohesiveness that interests me in this exercise — the browser mostly gives me the level of cohesiveness I want, and I’m open to experimentation on the rest. And maybe the biggest interest for me is that I am entirely convinced that traditional GUI applications are a dead end; they rise and fall (mobile apps being a current rise) but I can’t seriously imagine long term (10 year) viability for any current or upcoming GUI system. I’m certain the browser is going to be along for the long haul. Doing this would let us Live The Future ;)

Mozilla
Programming
Web

Comments (11)

Permalink