I’ve been trying, not too successfully I’m afraid, to get more people to use doctest.js. There’s probably a few reasons people don’t. They are all wrong! Doctest.js is the best!
One issue in particular is that people (especially people in my Python-biased circles) are perhaps thrown off by Python’s doctest. I think Python’s doctest is pretty nice, I enjoy testing with it, but there’s no question that it has a lot of problems. I’ve even thought about trying to fix doctest, and even made a repository, but I only really got as far as creating a list of issues I’d like to fix. But, like so many before me, I never actually made those fixes. Doctest has, in its life, only really had a single period of improvement (in the time leading to Python 2.4). That’s not a recipe for success.
Of course doctest.js takes inspiration from Python’s doctest, but I wrote it as a real test environment, not for a minimal use case. In the process I fixed a bunch of issues with doctest, and in places Javascript has also provided helpful usability.
Some issues:
Doctest.js output is predictable
The classic pitfall of Python’s doctest is printing a dictionary:
>>> print {"one": 1, "two": 2}
{'two': 2, 'one': 1}
The print order of a dictionary is arbitrary, based on a hash algorithm that can change, or mix things up as items are added or removed. And to make it worse, the output usually stable, such that you can write tests that unexpectibly fragile. But there’s no reason why dict.__repr__ must use an arbitrary order. Personally I take it as a bit of unfortunate laziness.
If doctest had used pprint for all of its printing it would have helped some. But not enough, because this kind of code is fairly common:
def __repr__(self):
return '<ThisClass attr=%r>' % self.attr
and that %r invokes a repr() that cannot be overridden.
In doctest.js I always try to make output predictable. One reason this is fairly easy is that there’s nothing like repr() in Javascript, so doctest.js has its own implementation. It’s like I started with pprint and no other notion existed.
Good matching
In addition to unpredictable output, there’s also just hard-to-match output. Output might contain blank lines, for instance, and Python’s doctest requires a very ugly <BLANKLINE> token to handle that. Whitespace might not be normalized. Maybe there’s boring output. Maybe there’s just a volatile item like a timestamp.
Doctest.js includes, by default, ellipsis: ... matches any length of text. But it also includes another wildcard, ?, which matches just one number or word. This avoids cases when the use of ... swallows up too much when you just wanted to get a single word.
Also doctest.js doesn’t use ... for other purposes. In Python’s doctest ...` is used for continuation lines, meaning you can’t just ignore output, like:
>>> print who_knows_what_this_returns()
...
Or even worse, you can’t ignore the beginning of an item:
>>> print some_request
...
X-Some-Header: foo
...
The way I prefer to use doctest.js it doesn’t have any continuation line symbol (but if there is one, it’s >).
Also doctest.js normalizes whitespace, normalizes " and ', and just generally tries to be reasonable.
Doctest.js tests are plain Javascript
Not many editors know how to syntax highlight and check doctests, with their >>> in front of each line and so forth. And the whole thing is tweaky, you need to use a continuation (...) on some lines, and start statements with >>>. It’s an awkward way to compose.
Doctest.js started out with the same notion, though with different symbols ($ and >). But recently with the rise of a number of excellent parsers (I used Esprima) I’ve moved my own tests to another pattern:
print(something())
// => expected output
This is already a fairly common way to write examples. Like how you may have read pre-Python pseudocode and thought: that looks like Python!: doctest.js looks like example pseudocode.
Doctest.js tests are self-describing
Python’s doctest has some options, some important options that effect the semantics of the test, that you can only turn on in the runner. The most important option is ELLIPSIS. Either your test was written to use ELLIPSIS or it wasn’t – that a test can’t self-describe its requirements means that test running is fragile.
I made the hackiest package ever to get around this in Python, but it’s hacky and lame.
Exception handling isn’t special
Python’s doctest treats exceptions differently from other output. So if you print something before the exception, it is thrown away, never to be seen. And you can’t use some of the same matching techniques.
Doctest.js just prints out exceptions, and it’s matched like anything else.
This particular case is one of several places where it feels like Python’s doctest is just being obstinate. Doing it the right way isn’t harder. Python’s doctest makes debugging exception cases really hard.
Doctest.js has a concept of "abort"
I’m actually pretty okay with Python doctest’s notion that you just run all the tests, even when one fails. Getting too many failures is a bit of a nuisance, but it’s not that bad. But there’s no way to just give up, and there needs to be. If you are relying on something to be importable, or some service to be available, there’s no point in going on with the tests.
Doctest.js lets you call Abort() and further tests are cancelled.
Distinguishing between debugging output and deliberate output
Maybe it’s my own fault for being a programming troglodite, but I use a lot of print for debugging. This becomes a real problem with Python’s doctest, as it tracks all that printing and it causes tests to fail.
Javascript has something specifically for printing debugging output: console.log(). Doctest.js doesn’t mess with that, it adds a new function print(). Only stuff that is printed (not logged) is treated as expected output. It’s like console.log() goes to stderr and print() goes to stdout.
Doctest.js also forces the developer to print everything they care about. For better or worse Javascript has many more expressions than Python (including assignments), so looking at the result of an expression isn’t a good clue for whether you care about the result of an expression. I’m not sure this is better, but it’s part of the difference.
Doctest.js also groups your printed statements according to the example you are in (an example being a block of code and an expected output). This is much more helpful than watching a giant stream of output go to the console (the browser console or terminal).
Doctest.js handles async code
This admittedly isn’t that big a deal for Python, but for Javascript it is a real problem. Not a problem for doctest.js in particular, but a problem for any Javascript test framework. You want to test return values, but lots of functions don’t "return", instead they call some callback or create some kind of promise object, and you have to test for side effects.
Doctest.js I think has a really great answer for this, which is not so much to say that Python’s doctest is so much worse, but in the context of Javascript doctest.js has something really useful and unique. If callback-driven async code had ever been very popular in Python then this sort of feature would be nice there too.
The browser is a great environment
A lot of where doctest.js is much better than Python’s doctest is simply that it has a much more powerful environment for displaying results. It can highlight failed or passing tests. When there’s a wildcard in expected output, it can show the actual output without adding any particular extra distraction. It can group console messages with the tests they go with. It can show both a simple failure message, and a detailed line-by-line comparison. All these details make it easy to identify what went wrong and fix it. The browser gives a rich and navigable interface.
I’d like to get doctest.js working well on Node.js (right now it works, but is not appealing), but I just can’t bring myself to give up the browser. I have to figure out a good hybrid.
Python’s doctest lacks a champion
This is ultimately the reason Python’s doctest has all these problems: no one cares about it, no one feels responsible for it, and no one feels empowered to make improvements to it. And to make things worse there is a cadre of people that will respond to suggestions with their own criticisms that doctest should never be used beyond its original niche, that it’s constraints are features.
Doctest is still great
I’m ragging on Python’s doctest only because I love it. I wish it was better, and I made doctest.js in a way I wish Python’s doctest was made. Doctest, and more generally example/expectation oriented code, is a great way to explain things, to make tests readable, to make test-driven development feasible, to create an environment that errs on the side of over-testing instead of under-testing, and to make failures and resolutions symmetric. It’s still vastly superior to BDD, avoiding all BDD’s aping of readability while still embracing the sense of test-as-narrative.
But, more to the point: use doctest.js, read the tutorial, or try it in the browser. I swear, it’s really nice to use.
Automatically generated list of related posts:
- Doctest for Ruby Finally, someone wrote a version of doctest for Ruby. Recently...
- A Doctest Wishlist Lately I’ve been doing most of my testing with doctest,...
- Doctest.js & Callbacks Many years ago I wrote a fairly straight-forward port of...
- Atom Models I’ve been doing a bit more with Atom lately. First,...
- toppcloud renamed to Silver Lining After some pondering at PyCon, I decided on a new...
I think I ought to bring Manuel to your attention, since it fixes at least some of your issues with Python’s doctest (like supporting Sphinx markup).
A few thoughts…
I’m not sure I understand the “not self-describing” complaint. What part of “# doctest: +ELLIPSIS” (etc) doesn’t allow a test to specify the required features? (although there’s an unfortunate bug in the current docs generation where these are being stripped from the examples in the section that is supposed to explain them: http://bugs.python.org/issue12947)
The dict/set ordering issue will also become significantly harder to get wrong inadvertently now that most hashes will be randomised (and collections.OrderedDict provides a nicer way to display sorted dictionaries rather than relying on lists of 2-tuples).
Doesn’t writing your debugging messages to sys.stderr have the same effect as using console.log()?
doctest has “# doctest: +NORMALIZE_WHITESPACE” to ignore whitespace details and “REPORT_ONLY_FIRST_FAILURE” to trim subsequent failures from the output.
I do like the idea of an alternate “# => output” notation for embedded doctest results in a copy-pasteable format. A “???” wild card (that can also be used for blank line matching) would also be interesting.
However, if the folks that prefer doctest to unittest for their testing needs, and find the current state of doctest to be inadequate, aren’t prepared to contribute to it, how is it supposed to be improved? unittest languished for years too, until Michael Foord took it over. Now it’s to the point where all my 2.6 tests depend on unittest2, because the 2.7+ unittest is so much better than the 2.6 provided one. A doctest2 could easily serve the same purpose (and yes, in the past, I’ve been guilty of not wanting to let doctest out of the niche I considered it suited to. I’ve learned better).
Simply that you have to do it on each test, and it’s both tedious and distracting. I use
...
on probably half my tests.print func() # doctest: +ELLIPSIS
literally doubles the length of the line, and is not unrepresentative (and personally I find it hard to even get that comment quite right).Harder to get wrong, but no easier to get right. I suppose
collections.OrderedDict(sorted(my_dict.iteritems()))
would give a consistent repr. Not exactly appealing! And if anyone uses%r
you are left with no resolution. If you allowed global replacement ofrepr()
then that would make tests easier to write.sys.displayhook()
is half of the solution, but isn’t nearly universal enough.No. I seem to remember it actually being swallowed in some cases, and I’ve had to write to
sys.__stderr__
. Also the print statements I used elsewhere don’t normally print to stderr, so I’d have to adjust my “normal” debugging to be “doctest” debugging. Lastly they aren’t grouped in any sensible way; normally stdout and stderr are mixed, and mixed in a fairly useful way – with doctest they are disassociated completely.Somehow I’d missed
NORMALIZE_WHITESPACE
.REPORT_ONLY_FIRST_FAILURE
is useful for some cases, but not when you want to quickly fail due to failed prerequisites. The browser’s more navigable interface replacesREPORT_ONLY_FIRST_FAILURE
, whileAbort()
handles prerequisite failures.It would be great if doctest was improved. Some people have considered it, who knows, maybe someone will take this as inspiration to actually do it. There are a few places where Javascript has some advantages, often more through convention than the language itself; but there are more than enough cases where doctest could be fixed just for Python itself. And with just a little help from Python core it could be way better (especially the ability to replace
repr()
).But in order for someone to be inspired to improve doctest they also have to see that it can be more than what it is. Maybe this will outline some ways it could be more. But I’m no longer the person to do that. (At this point I’m even doing functional testing of my Python code using doctest.js.)
In my opinion, doctest is fit for only one, rather narrow use case: checking that code examples sprinkled into narrative documentation don’t contain errors or typos. Or, to put it the other way around, I think doctest (the paradigm, regardless how good its mechanical execution might be) is a rather bad tool for writing either tests or documentation.
Reasons against writing tests with doctest:
Too big units / no isolation. The format tilts you towards writing stories, i.e. later things building upon stuff that has happened before. In tests, I want to specify lots of small, disjunct pieces of functionality, and the doctest format makes that difficult to impossible. (Even if I’d create a whole bunch of separate test files — which I’d consider tedious to manage — I’d still have no doctest-y way of sharing setup between them.)
Comparing representations is not as useful as comparing intentions. One, many domain objects might not have a suitable repr, then I either have to provide one specifically for the tests, or am reduced to “foo == bar // => true”, which somewhat defeats the purpose. Two, you’ll have to repeat the literals, which is both unnecessarily repetitive and loses the intention: For example, I have a timestamp “before” something occured, and one “after”. Either I have to spell out the comparison that “something.timestamp // => datetime(2012, 10, 3)”, which doesn’t tell me at a glance that this is the “before” timestamp, or I’m back to “something.timestamp == before // => true”.
Reasons against writing (most) documentation with doctest: (These are weaker than those against testing, since they boil down to, writing documentation is all about concepts and natural language, no tool in the world will help you with that.)
While doctest is good for examples, it doesn’t help you with the conceptual level, which is the most important level of documentation. You might even be tempted to not think about the conceptual level at all.
The things you want to show are in danger of getting lost in the noise of setting up the environment (or you have to extract the setup, which makes in invisible and thus unclear, which also is not ideal for documentation).
Wolfgang, you might be interested in Manuel’s isolation plugin and its ability to let you easily write your own comparisons for test results.
Regarding your documentation point 1, I agree in spirit. Documentation should be about communicating first and foremost. My hope is that people can use Manuel to implement their own tests of domain-specific concepts.
For example, if I were documenting a large network’s layout with the Sphinx Graphviz plugin (http://sphinx.pocoo.org/ext/graphviz.html) I would want tests that could read those layouts and verify that they matched reality.
Regarding your documentation point 2, Manuel’s footnote functionality (http://packages.python.org/manuel/#footnotes) and the ability to hide parts of a document in reST comments (also available in stock doctests) help quite a bit.
As Marius mentioned, Manuel is my attempt at taking doctest to the next level. It does so by making doctest one of a suite of document-oriented testing devices that you can use simultaneously.
The intent is that since doctest is essentially frozen, Manuel lets you write doctest extensions or write non-doctest document test mechanisms. For example the footnote extension lets you move incidental code (like setup code) out of the flow of the main document and it works with doctest tests or any other Manuel plugin.
The Manuel docs are tested using Manuel and provide a good example of how it can be used: (click on the “Show Source” link to see the markup).
Apparently I suck at markdown. Here is the Manuel docs link: http://packages.python.org/manuel/
Ian, you might be interested in doctest’s normalization functionality. It lets you pre-process the test output so as to replace extraneous bits (like the addresses in reprs) with placeholders. zope.testing includes a regular expression normalizer that is really nice (and should probably be packaged separately): http://svn.zope.org/checkout/zope.testing/trunk/src/zope/testing/renormalizing.txt
In the past I’ve done a lot of experiments with OutputChecker implementations, which seems to be how renormalization also works. It has potential, but it never seemed to stick for me. The inability for a doctest to install its own checker was a particular problem – it means the test isn’t very portable, and wiring up the checkers and whatnot is nontrivial, especially depending on your runner environment. It should really work like:
Same thing for doctest options. Make it so in Manuel! ;)
Like everyone else I have tried monkeypatching the OutputChecker on exactly this issue. Oddly doctest is just inconsisitent in how it passes things like OutputCheckers around and could be cleaned up. I just punted on actually using my monkeypatches in real life.
https://github.com/lifeisstillgood/mikado.oss.doctest_additions
If Manuel is open to it I would be interested in porting this
Paul, if I am understanding the code correctly, you should be able to write a Manuel plugin that would do this. Take a look at the plugins that come with Manuel for inspiration. If you have any questions, feel free to contact me. My email address is [email protected].
Your post inspired me. I love doctest, but I think it could be considerably more awesome. I went ahead and [forked your doctest2 repo](https://github.com/wapcaplet/doctest2) to begin working on it. I’m willing to manage this effort if anyone else would like to get involved.
It sounds to me like contributing to Manuel would be the best option.
Manuel does look nice, based on a brief perusal of the docs–I’ll have to give it a try. But I must say, one of the best things about doctest is that it’s in the standard library included with Python. Is there reason to believe that patches to the built-in doctest would not be accepted? (referring to Benji’s comment “since doctest is essentially frozen…”) I’ve never contributed to core Python development before, although I have been looking for an excuse to. If doctest needs a champion, then I would like to be that champion, or even just one of the champion’s faithful sidekicks. :-)
I’m sure improvements to doctest can be made and accepted into core, though some of the more ambitious things may meet resistance, certainly anything that would break current tests. I tend to just fork stuff, but that’s not always the right thing to do – you’ll certainly help a broader audience by working in the standard library.
Can I suggest a two-pronged approach
I would really like to solve the OutputChecker issue, and have some monkeypatching code as a hacky solution already, so that might make a good first stage.
I would suggest we try and provide a doctest2 that is a managed patching of doctest, with no breaking of current tests (!) – that is “pip install doctest2″ will effectively monkeypatch the doctest code with the changed code.
All the effort is in quietly and reversibly patching, rather than fixing the doctest code.
So – if the changes get approved and put in batteries, we simply slip that set of patches out of the doctest2, otherwise there is a upgrade path for those wanting doctest to be more awesome.
Paul: That sounds like a good approach. Rather than fill up Ian’s blog with further discussion, I suggest we take this over to github. I started a conversation in an issue where we can continue this topic: https://github.com/ianb/doctest2/issues/22
Hi, I think your website might be having browser compatibility issues. When I look at your blog in Ie, it looks fine but when opening in Internet Explorer, it has some overlapping. I just wanted to give you a quick heads up! Other then that, very good blog!
saves the life bore you to easily