Ian Bicking: a blog

{ 2007 11 27 }

Java BDD

I notice there’s another Behavior Driven Development framework for Java called Instinct (via). I have commented on BDD before.

Here’s an example test:

import static com.googlecode.instinct.expect.Expect.expect;
import com.googlecode.instinct.marker.annotate.BeforeSpecification;
import com.googlecode.instinct.marker.annotate.Context;
import com.googlecode.instinct.marker.annotate.Specification;

public final class AnEmptyStack {
private Stack<Object> stack;

@BeforeSpecification
void setUp() {
stack = new StackImpl<Object>();
}

@Specification
void mustBeEmpty() {
expect.that(stack.isEmpty()).equalTo(true);
}
}

Yeah, that’s… great. What would it look like in a doctest?

>>> Stack<Object> stack = new StackImpl<Object>();
>>> stack.isEmpty()
true

Of course you have to invent a REPL for Java, but I’m sure that’s not very hard.

What does all that class infrastructure, setUp, mustBeEmpty and the weird DSLish stuff give you? Beats me. Doctest started in Python but now also exists in Ruby and Javascript. Someone needs to port the concept to Java too. People ported SUnit all over the place, so there’s no reason a good idea can’t spread. I can’t help feel that BDD is a case of a bad idea spreading; the motivations for BDD are fine (a change in developer testing workflow), but the technique they use to try to reach the desired workflow is totally bizarre.

No related posts.

23 Comments

Al says:

November 27, 2007 at 6:12 pm

I just assumed that Java programmers liked to type a lot? ;p
- Nathan says:
  
  May 25, 2009 at 9:46 am
  
  The degree of typing in order to achieve a goal is hardly the point. Ruby on Rails does a lot of work for you, but it’s all magical and mystical, and when stuff breaks, it breaks HARD, and heaven help you in finding the bug with any ease. By contrast, us insane, lame-ass Java folks love being able to debug with relative ease. So, what say we just keep down the ridicule before we actually understand why some people prefer certain languages/environments, and be a little less “fanboy”-like.
  - Ian Bicking says:
    
    May 25, 2009 at 11:02 am
    
    Just because Java does something doesn’t mean it is some sensible side-effect of static typing. In this case there’s nothing more lax about my suggestion compared to the original test. I think Java people have become numb to boilerplate, and that boilerplate takes a lot more forms than just static typing.
  - Paddy3118 says:
    
    May 25, 2009 at 9:47 pm
    
    I guess it may not be clear, but the doctest code is in Python not Ruby. In Python we too tend to to take a more restrained view on what should be alterable. Compare the Django framework in Python, rather than Rails. On whether the amount to type is relevant then I think that the Javaa example given is verbose, but note that I am not saying that the least number of characters to do the job is necessarily the best either – you need a balance between readability and terseness too.
Phillip J. Eby says:

November 27, 2007 at 7:55 pm

Why create a REPL for Java, when you can just use Jython + doctest? :)
Ian Bicking says:

November 27, 2007 at 8:16 pm

Jython with a doctest did occur to me after I wrote this, but I think it’s a useful enough pattern that you should also be able to do it natively in any language. Of course Java programmers should learn Python ;) — but even if they don’t they still should have documentation-driven programming available to them.
Tom Adams says:

November 27, 2007 at 8:24 pm

As you correctly state, the example on the 2 minutes tutorial is a bit verbose. With a bit more magic you can get rid of the before method (setup) also. If you annotate the stack field as the subject (the thing you’re “testing”), the framework can auto-create it, freeing you from some more infrastructure.

There’s two things going on here, there’s the “test” method (the spec) and stuff that needs to happen before the spec method. By convention, initialisation of the subject happens in the before spec method; in fact, all configuration goes here. The second thing is that you make expectations of how you expect your code to behave, this is what’s done in the spec method.

The idea behind the expectation DSL is that it should read like how you would think or say it: “I expect that stack.isEmpty() is equalTo true”. True, it’s more typing than the equivalent assertTrue(stack.isEmpty()), but with this assertion style, you need to translate it in your head. It gets worse when you use the assertEquals(expected, actual) syntax.

Also, you don’t type much of this stuff, an IDE will auto-complete the matchers (equalTo(), etc.) based on the type of the thing you pass into the that() method. So booleans get equalTo, notEqualTo, etc., strings get matchesRegex, containsString, etc. This is really nice in practice when you are creating the expectations.

Actually, all you need to get this going is the following:
```
class AnEmptyStack {
    @Subject private Stack stack;

    void mustBeEmpty() {
        expect.that(stack.isEmpty()).equalTo(true);
    }
}
```
You then get a nice readable description of what the class should do: “An empty stack must be empty”:
```
AnEmptyStack
- mustBeEmpty
```
Adam V says:

November 27, 2007 at 9:24 pm

Even in Java/C# land, how is:
expect.that(stack.isEmpty()).equalTo(true);

better than xUnit style:
Assert.isTrue(stack.isEmpty());

The “.equalTo(true)” kills some of the readability.
Luke Stebbing says:

November 27, 2007 at 9:39 pm

This is the first time I’ve ever heard of BDD. My initial reaction was “this is laughably verbose”, but I decided to poke around and came across http://dannorth.net/introducing-bdd , which did a good job of showing me the rationale behind BDD.

BDD seems to have three goals: 1) explain the purpose of TDD in a more intuitive way, 2) organize tests into narratives (a technique pioneered by doctest), and 3) make the tests/expectations more natural to write and read.

I agree that doctests do a better job with #3. Compare
```
expect.that(stack.isEmpty()).equalTo(true);
```
with
```
>>> stack.isEmpty()
true
```
On operator readability: I think Python already strikes a good balance between English and symbols. I want a == b, not a.isEqualTo(b), and t in s, not s.containsString(t). Most languages seem to err on the side of too many symbols.
Ian Bicking says:

November 27, 2007 at 9:47 pm

Note that in a doctest every statement is an implicit assertEqual. So simply for brevity its really helpful in tests. Especially if you have some kind of simple wildcard matching on the output (this is a bit more awkward in Python’s doctest than it should be).

What the small examples don’t really point out is the balance between small and large tests. For small tests — like the example — you just want to do the test. You don’t need to enumerate every detail of what you test for a report. It should all go in “test stack-like operations”. If you do care about the details, then you should read the actual test code. Detail-oriented people should read details, and there’s no detail like code detail.

For larger tests about the behavior of a piece of code some narrative can be helpful to introduce the purpose for the test, what the test shows, and maybe boundary conditions or whatever else doesn’t fit into the test code itself. ItsHardToDoThatInMixedCase. It’s easy to do it in a paragraph of text. In a doctest test moving between code and commentary on the code is simple:
```
The StackImpl implements all the stack-like stuff:

  >>> stack.isEmpty();
  true
  >>> stack.add(1);
  >>> stack.isEmpty();
  false

So why a new stack implementation?  Because my stack works
on unhashable items:

  >>> obj = new Object();
  >>> obj.hash();
  Traceback:
      ...
  AttributeError: no hash method defined
  >>> stack.add(obj);
  >>> stack.contains(obj);
  true
```
Excuse my simplistic version of Java, which just consists of me adding semicolons.

The idea behind the expectation DSL is that it should read like how you would think or say it: “I expect that stack.isEmpty() is equalTo true”

In a kind of minimalistic way, doctest is a DSL. I find it reads (and writes) very much like thinking, at least fairly low-level thinking about tests. I think high level stuff is just too divorced from runnable code to be easy to express in automated tests. At least I haven’t seen a successful example of that.

Of course, unlike BDD a doctest is just code. == is ==, it’s not should.be or whatever particular words you choose. But runnable expectations are code. You already have one language in the mix (whatever that language might be), why add another? Because some non-programmer can read it? I find that highly unlikely. Formal programs are formal, and that makes them hard. You can fuzz it up with lots of garbage words or other “friendly” syntax (ala [HyperTalk](http://en.wikipedia.org/wiki/HyperTalk)), but history hasn’t shown that to be successful. You can’t hide the formal and disciplined nature of code.

You can make it easier to code well, and you can lower the bar to entry. This is a valid goal of BDD, which attempts to do that for TDD. But good goals are just goals, they aren’t results.
Kevin Teague says:

November 27, 2007 at 10:14 pm

From http://behaviour-driven.org/ is this descrpition behind the purpose of BDD: “It aims to help focus development on the delivery of prioritised, verifiable business value by providing a common vocabulary (also referred to as a UbiquitousLanguage) that spans the divide between Business and Technology.”

Which is to say the point of BDD is that your test suites can be read, understood and commented on by your non-technical clients. It is about focusing on requirements and functional testing first. One produces “executable requirements” the other produces “executable documentation”. Both doc testing and BDD have a focus on story-driven development, but doc testing is usually developer stories, while BDD is requirements stories.

Doc testing has that wonderful simplicity that makes it easy for developers not used to testing to easily get started writting tests. If I’m trying to write better software for a client or customer, I think BDD would be a good approach. It’s probably also useful if you are doing a thorough evaluation of software that you want to deploy as an application. If I’m wanting to consume someone else’s software as a developer, I’d much rather see an interfaces.py and a interfaces.txt to give me a high-level overview of the actual interfaces of the components in a package and a story of how those interfaces and implementations are intended to be used.

Another interesting article discussing BDD among other things:

http://www.infoq.com/news/2007/11/tdd-or-tdr
Tom Adams says:

November 27, 2007 at 11:19 pm

I must admit to being shamefully unaware of doctest, it looks interesting, but still quite low level. I’m not sure you could show the results to business type person, perhaps it’s just the examples that make it look this way though.

The syntax is also not quite as readable (IMO) as the equivalent BDD code, from the site, it looks to be more checking the return values after calling code than stating things about them. BDD examples are usually more flexible and descriptive than this. I think you can probably do the same thing in doctest as BDD, however a quick read from the site doesn’t show this.

It’s not quite true to say that BDD is just for non-technical clients. BDD is meant to provide readable specifications for whoever is the client of the code, that could be a non-technical person, or could be you. If it’s you as a developer, you’d want something that was less verbose and wordey, but still be readable.

Historically, there have been two “levels” for BDD, the story level for non-technical consumers and code level for technical consumers. These boundaries are now blurring, and frameworks such as RSpec have integrated these and allow you to use either depending on what you need at the time.
Luke Stebbing says:

November 27, 2007 at 11:57 pm

A doctest is a tool for testing that lends itself to telling stories. Those stories can be about the implementation, but they can also be about behavior/requirements.

The more I read, the more I think these should_equal / isEqualTo methods have nothing to do with BDD. Symbols are used for things that come up so frequently that they are easier to read and write as symbols. Software isn’t difficult to understand because it uses a handful of funny-looking symbols instead of English: it’s difficult because it is extraordinarily detailed. The previously mentioned [UbiquitousLanguage](http://domaindrivendesign.org/discussion/messageboardarchive/UbiquitousLanguage.html), that [TDR article](http://www.infoq.com/news/2007/11/tdd-or-tdr), and the [BDD blog post](http://dannorth.net/introducing-bdd) all advocate telling stories about high level concepts and working your way down from there. The high level can then be consumed by the business people.
Paddy3118 says:

November 28, 2007 at 12:01 am

If we were writing a Python stack then you wouldn’t write the doctest:
```
 >>> stack.isEmpty()
 true
```
It would be:
```
 >>> len(stack)
 0
```
I find it difficult to believe Kevin Teague’s point (8.); that Business types who might find a doctest hard to read, might find a BDD test easier to understand.

Tom Adams (4.) Makes the point that “you don’t type much of this stuff, an IDE will auto-complete the matchers”. That point never sits well with me. If the computer can do this stuff then we should move towards not having to see it at all – its just noise. I moved from C because I was wasting too much time chasing pointers, and along came other languages that handled that for me.

- Paddy.

Oh, another doctest introductory [link](http://en.wikipedia.org/wiki/Doctest)
Ian Bicking says:

November 28, 2007 at 12:17 pm

I think business people (or domain experts, or whatever kind of non-programmer) does best reading things when the text was written with that person as an intended reader (though not necessarily the only intended reader). These people don’t consume code, and that’s true in a doctest or a BDD test. In a BDD test they consume the report output, trusting that it accurately describes what is being tested. In a doctest they’ll skim over the code blocks and read the text.

In BDD, you embed this readable text in the names of your functions and classes, and in RSpec there are opportunities to use strings with arbitrary text (that look a little like docstrings). In no case does the computer interpret these strings, or confirm the programmer actually lived up to what they said they’d do. The only thing really consumable by non-programmers is the non-code text. So I see two things that we are attempting to accomplish: (a) provide a good space for narrative text about requirements, and (b) attach that to testing code so that the programmer can live up to those specifications, keep them in sync, and for the requirements document to be a useful and functional tool to help the programmer.

Doctests (at least the [tests that don't go in docstrings](http://python.org/doc/current/lib/doctest-simple-testfile.html)) really are editable by non-programmers — from their perspective the format is pretty trivial. I don’t think BDD systems like RSpec really are editable by non-programmers, as the text is embedded in the syntax of the language (which is pretty fragile); doctest embeds the code in the text.

[FITnesse](http://fitnesse.org/) is a more business-oriented (and less programmer-oriented) approach that is very close to doctest. I personally think its infrastructure is too complex and there’s too much indirection. The amount of test data that can usefully go in a table is fairly low in my experience. It’s high enough that I think it can still be useful, but not high enough to build an entire test framework on it. If I was writing doctests where tabular data was appropriate, I might just write a table reader that can be used by the doctest, driving the test off the data while keeping the all important code close to the functional specification as well. In Python this would unfortunately expose a flaw in doctest: the framework doesn’t expose enough to the tests. I’d like to provide hints about failures and what table data I was using, but there’s no way to do that. But doctest is old, and it really needs some more love. I would hope that new implementations would build in a few more features.
Tom Adams says:

November 28, 2007 at 4:36 pm

This has been a really good commentary so far, you guys have thought about these problems quite a bit, that’s quite refreshing to see, your take is more mature than mine.

In reply to Paddy3118, I wasn’t intending to imply by my IDE remark that the code becomes noise. One of the things Instinct (not BDD per se) aims to get rid of is noisy infrastructure. The point I was trying to make was that it is more verbose, but by using tools, the verboseness doesn’t add any extra cost in creation, and the benefits you get from the readability are worth it regardless (of whether it’s harder or easier to create).

In fact the Instinct expectation API is type-safe, making it easier to write these more verbose statements (the IDE leads you to the correct methods via the type) than their less verbose equivalents.

As always, there’s a tradeoff between verboseness and terseness. BDD aims at producing spec code that is readable, and in an argument between readability and terseness I’d take readability every time.

I don’t believe the Instinct expectation API is just noise, it’s been specifically crafted to minimise noise (in the eyes of its developers admittedly). The API aims at readability and accurately reflecting the intent of the author. So for example there are several ways to say a collection/list/array/string has a certain length, you pick the one that most accurately reflects your intent.

Picking on doctest again, the example given:
```
>>> len(stack)
0
```
Is not as readable to me as the Instinct (or RSpec) equivalent. Perhaps that’s because I haven’t used python for 7 years or so, or perhaps because I like the ability to read the expectation aloud and have it make sense. “len stack 0″ doesn’t read as well to me as “expect that stack is of size 0″ or “stack should be empty” (as it would be in RSpec) though they both assert the same thing.

In the end it comes down to the audience and how much implicit conversion you do in your head between what’s written and what it means.
Ian Bicking says:

November 28, 2007 at 5:43 pm

Perhaps some of the difference in opinion is due to the fact that, to a Python programmer, a console session feels very natural. That doctest has existed for a while also means there’s lots of examples of it, adding to the familiarity. It’s very difficult, for instance, to read len(stack) out loud (I find). Or even worse, if obj.get(key, default) == value: — how do you read that? “if obj [uck, I can't even pronounce 'obj'] get key comma default equals value” ? I don’t know how to say it, but I can read it very naturally. It’s something I (and everyone here) learned to do. And that’s what makes us programmers ;) In the same way, doctest reads very naturally to me, even though I can’t read it out loud. All these things require some learning before they really sink in.
Paddy3118 says:

November 29, 2007 at 12:20 am

Hi Tom, Ian. I am not so sure that having the needs of a non-specialist business manager as a primary goal is a good thing. Having a concise way to define tests; concise meaning clear, comprehensive, and short; from the domain specialists point of view; might be more productive.

I guess as an example I will switch to the design and verification of digital chips which is the field I am in. We have always had a written design spec that both the verification team and the design team interpret. lately we are using assertion languages such as [PSL](http://www.doulos.com/knowhow/psl/) to accurately and unambiguously define what the spec. says.

The spec may say: “Signal apply_power occurs after the breaks are released”.

Which in PSL becomes: assert always {rose(apply_power)} |-> { ! brake_applied };

Which re-interpreted in English becomes something like: “Ensure that whenever the signal apply_power rises, the signal brake_applied is not high at that same time”

You can formally check the PSL against the design. You can simulate the design and ensure it does not contradict the assertion. Domain specialists can accurately reason about the specs interpretation using PSL. ( And pedants can flag the spelling mistakes :-)

Management has to either learn PSL, get someone they trust who can explain PSL to them, monitor the process of creating and using PSL, or probably do all of the above to be a good manager. The domain specialist works together with management so the process of PSL creation and monitoring ends up as accurate reports and graphs of progress but PSL and other assertion languages are optimised for the domain . It needs to be writeable and maintainable by the domain specialists.

To get back to programming, I think we should be going down a similar route; what is maintainable, precise, and readable for the programmer creating tests will create better software, and, I think, prove easier for the manager in the long run.
Tom Adams says:

November 29, 2007 at 6:39 pm

Hi Ian, Paddy3118,

So the question is, does making the mental translation between what you see and what you interpret it as affect your ability to understand the intent of what is being expressed? I agree with you, I can do the translation in my head quickly also (for the tools I’m used to working with), however I find a natural DSL type syntax to be easier to write (good tool support) and more readable than the equivalent. So I get a better process and a better result.

Having a non-specialist as a target audience makes sense for top level “is this done” and “does it do what I want” type of tests (acceptance/functional/story tests). I don’t think anyone is trying to dumb things down to a point where anyone can read and reason about code. We should strive to make it as simple and clear as possible, but clearly there will always be complexity and a barrier to entry. What we are trying to do is use the same language as what a non-specialist would use, so that they can at least get an idea of what we’re up to (is it what I asked for, etc.) as well as helping us to not have to translate between how we think of things and how they think of things, this is a recipe for disaster. If your code and your “tests” all use the same language as that of the domain, things become a lot simpler, leaving you to worry about other things.
George Paci says:

December 23, 2007 at 10:46 pm

I hate to say it, Ian, but this post is kind of cranky.

You picked an example spec with the greatest possible overhead-to-meaningful-code ratio: 9 lines of setup (much of which, like the import statements, is forced by Java), and only 3 lines of an actual spec (one of which is just an annotation saying that it is, in fact, a spec, so maybe that’s overhead, too).

What the framework gives you, like xUnit, is the ability to do some setup in the setUp() method, then share it among some other spec methods (test methods, in xUnit). It’s pretty rare that there’s only one spec method.

And only the code is shared: the actual instance (or whatever other context you set up) is different for each spec method, which helps keep one problem from kicking up a cloud of spurious follow-on problems to hide itself. I’m not sure how you’d go about that in doctest without writing the equivalent of a setUp() method and calling it repeatedly.

The motivations for BDD are to get people to do TDD right. That means ditching anachronistic test-related names that date from back when we thought this stuff was unit testing. It’s not: it’s design, and we should use words related to design instead. (If there’s one thing XP people agree on, it’s that names actually matter.)

Finally, I think you see a conflict between doctest and BDD. I don’t: doctest is a tool, BDD is an approach, and you could clearly use doctest to help you do BDD. (Of course, you could, instead, use it as after-the-fact unit tests, functional tests, executable documentation, and possibly a dessert/floor wax.)
Margaret M. Gille says:

December 26, 2007 at 3:43 pm

Ian, Are you a decent of Frederick Bicking? I am working on my family tree and am looking for relatives from this line.

Frederick Bicking From Wikipedia, the free encyclopedia Jump to: navigation, search Frederick Bicking was born in Winterburg, a municipality in the district of Bad Kreuznach in Rhineland-Palatinate, in western Germany and came probably first to Philadelphia and then to East Brandywine Township, Chester County, Pennsylvania, before the revolution.

In Pennsylvania he owned and operated a paper mill, establishing the Bicking paper dynasty that would last well into the 19th century. The Continental Congress allocated funds to purchase Bicking’s paper for currency production. He is mentioned in several of the minutes of the Continental Congress and in the George Washington Papers.

Frederick Bicking married Mary Catherine Unverzagt of Otwiller, Germany on 26 May 1752 at St. Michael’s & Zion Lutheran Church in Germantown, Pennsylvania. Mary Catherine Unverzagt, daughter of Johannes Unverzagt, was also a German Palatine.

Of Frederick Bicking’s five sons, three were paper makers in Pennsylvania.

John Bicking had a paper mill near present day Fisherville.

Retrieved from “http://en.wikipedia.org/wiki/Frederick_Bicking”
Laurent Szyster says:

December 26, 2007 at 7:27 pm

“Someone needs to port the concept to Java too.”

Here’s a bit more than a starting point:

http://svn.berlios.de/svnroot/repos/less4j/trunk/src/org/less4j/protocols/Doctest.java

For sample output see:

http://laurentszyster.be/less4j/doc/#org.less4j.protocols.IRTD2.digested

The implementation is simple: I hijacked javadoc and used the best known REPL for Java: Rhino.

Enjoy …
Kragen Sitaker says:

January 26, 2008 at 10:32 am

BDD is just new names for TDD. doctest is a fine framework for doing BDD. Maybe Instinct is too, but it kind of suffers from the method definition overhead; it’s less painful in Smalltalk, where the design of SUnit (that it apes) comes from. But that’s Java’s fault, not Instinct’s.

Ian Bicking: a blog

Java BDD

23 Comments

Home

About

Archives

Categories

Recent Posts

Recent Comments