Ian Bicking: the old part of his blog

Python nit, chapter 1

I like Python. But there are little things that aren't quite right to me. In this extended series I will gripe about them.

First up, string.join (or str.join as we should now call it). We all know the basic gripe, this looks funny:

>>> sep = '\n'
>>> doc = sep.join(lines)
It should be:
>>> doc = join(lines, sep)
But I realized the realproblem is about typography as much as the order of the arguments. Because most of the time when I'm joining things, I'm doing it like:
>>> doc = '\n'.join(sep)
The ' doesn't line up with the .join at all. Other string methods aren't a problem, because you never call them on string literals (except maybe split, but then only to save a few keystrokes).

I can understand why Guido didn't want to add a join method to sequencees. And some people do tend to panic when they consider more builtins (mostly purists who like the idea of a small language). Anyway, it's still a gripe for me.

Created 24 Oct '03
Modified 14 Dec '04


You can always tweak Python to your own taste by adding this oneliner to your script:

def join(alist,sep='\n'): return sep.join(alist)

# Tommy Sundström

I hate one liners. One person does join(alist, sep='\n'), another does join(alist, sep=' '), another does join(alist, sep=''). No good for sharing.
# Ian Bicking

I fully agree. Python is fun and intuitive. I love convince people how far it is. Each time I use sep.join(seq) I feel really a shame and not able to convince that Python is intuitive anymore. I think we are a lot to think that and that it should be changed.

Thank to publish that thought.
# Vivian De Smedt

I agree. Some builtins are simply useful enough to add.
# Tim

I agree. str.join is jarring.
# Brandon

join is a function that has string inputs and string output. As a builtin, it would be lost in a list of more general functions. By making it a string method, it is placed where it is easy to find. Yes, its operation is somewhat unexpected, but it makes sense when you think about it.
# Lloyd

>>> str.join(' ', ('Polly','wants','a','cracker'))
'Polly wants a cracker'
>>> join = str.join
>>> join(' ', ('this','works','too'))
'this works too'

If you don't like people doing sep.join(iterator), you may start your own crusade -- oh, you've already done it :)
# Baczek

>>> from string import join
>>> doc = join(["foo", "bar"], "\n")
>>> print doc


The string module is on its way out, so that won't be valid forever.
# Brett

The string module isn't going anywhere. Don't believe everything you read on comp.lang.python.
# Fredrik Lundh

Also note that "join" wasn't made a string method to "make it easy to find"; it's a string method because we had to figure out some way make join work on multiple string types back when Unicode was added. At that time, we imagined that Python might grow even more string types (how about encoded strings to save space, or binary buffers?), and it wasn't obvious how to create a "join" primitive that would find the right implementation, without having to know about all available types. We finally decided that dispatching on the separator made more sense than, say, dispatching on the first list item.

Given this, the obvious solution was to make the "join(seq, sep)" function call "sep.__join__(seq)". Changing __join__ to join was a pretty small step; after all, there might be cases where it would make sense to write sep.join(seq) in application code, at least if you happened to have the separator in a variable with a suitable name.

The "sep.join(seq) is more pythonic" is a much later concept.

And for what it's worth, the "let's dispatch on the separator" approach didn't work in practice; in order to handle sequences with both 8-bit and unicode strings, both implementations now know about the other string type.

So instead of a single function that does the right thing (but has to be taught about each new string type), we now have two separate join methods that both knows about the other string type. If we add another string type, we'll end up with three implementations, each of which has to know about two different types. And so on.

But who cares about new string types these days; it's not like anyone's actually using strings now that we have iterators ;-)
# Fredrik Lundh