Why Small Packages Matter

With Eggs and simpler installation and dependencies there's more opportunities to distribute smaller packages and split large packages up into pieces.

I was reading this post on Zope:

I think this [splitting up of packages] is a big deal and too hope that we can "explode" Zope 3 into many eggs soon. Jim wants to do it for Zope 3.4, yay! Having smaller, more easily distributable Zope packages will also reduce the buy-in into Zope as a platform. For example, you won't have to ship with the ZMI anymore if you not only think it sucks but also find it disturbing to develop with.

Of course, I like this direction. But not because of small distributions; there's a small number of places where that matters (e.g., mobile phones), but for most of us the download time and disk space isn't a big deal. And after all, just because one app doesn't require the ZMI doesn't mean you don't have it installed -- if it is already there you aren't saving any disk space.

The advantage I see in breaking up pieces is discipline, extensibility, and creating a hierarchy to the concepts in your library or framework.

Discipline:

Given a big package, developers will sometimes say that the package is loosely coupled on the inside, and if you want to use foo.x but not foo.y that's fine, because they don't depend on each other.

How do you know they don't depend on each other? How do you know they won't depend on each other in the future? How do you know someone won't read about DRY and factor out pieces that are shared between the two modules? How do you know someone won't "fix" a bug in foo.z that both modules use, breaking something you depended on in some interface?

If you use two different distributions for these two modules, you actually have lots of ways of detecting these problems and truly keeping the modules decoupled. It's not automatic -- there's always opportunities to break things. But the discipline of distribution boundaries (and other separations like separate release schedules) will tend to keep you honest about coupling.

Extensibility:

One argument for keeping things packaged together is that it allows for optional integration, so that people who want to use all the features get a more convenient tool.

This means that, for instance, an object may have a method that binds it to another module in the package. But it's optional, because you don't have to call the method. The programmer asked to trust that this "optional" feature is truly optional may consider all the questions raised under Discipline, but imagine that these issues are addressed. So what's the problem?

In this case, the optional integration has a privileged position. The original author's libraries get special hooks, but the developer using those libraries doesn't get the same access. You could monkey patch your own extension, but you'll only have created a horrible coupled mess. You could avoid the extension entirely, of course, but if the original author thought it was sufficiently useful to create the extension it is likely that another user of the library will feel the same.

Hierarchy of Concepts:

Ideally a system will be layed out with a nice hierarchy of concepts:
Library-A      Library-B
  |    \        /
  |     \      /
  |      \    /
Library-C \  /
           \/
        Library-D
           |
           |
        Library-E
To understand Library A you have to understand all of Library C, D, and E. To understand Library D you only need understand Library E.

Given a hierarchy like this, there's actually an advantage to not using the entire framework/system. You don't need to understand nearly as much, and learning a library is probably the biggest overhead to using a library.

It can be argued that if you want to use Library A, you only need to read about Library A. If the documentation is very good, this is somewhat true. It is true if you use it perfectly and write no buggy code and the libraries themselves have no bugs and you don't need to do anything that goes outside the bounds of what Library A provides. This isn't my experience programming, and isn't typical when using F/OSS.

There is also a hierarchy of stability. If Library E is a moving target then you are just plain hosed. If someone keeps making API changes in Library D you are also hosed. If stability does not increase as you move down your stack then the stack is a big ball of mud, even if at one isolated moment it might seem like an elegant and stable system.

For all these reasons when someone claims their framework is all spiffy and decoupled, but they just don't care to package it as separate pieces... I become quite suspicious. Packaging doesn't fix everything. And it can introduce real problems if you split your packages the wrong way. But doing it right is a real sign of a framework that wants to become a library, and that's a sign of Something I'd Like To Use.

I think you've formulated the concerns quite well. Zope (2.x at least) seems similar in some ways to various Java frameworks where one is presented with a huge number of .jar files, a big configuration file with lots of sections, and documentation which glosses over the dependencies within the system. Consequently, as a developer, one has no confidence in being able to remove stuff which is either blatantly superfluous or worrying from a security perspective. Zope's "through the Web" management is nice, but I imagine that many people have wondered whether it could be detached and omitted from certain kinds of production systems.

Of course, one of the Python community's favourite megaframeworks, Twisted, went through its own process of decoupling a while back, and that may have made some of the components more popular. The prospect of having to adopt a huge blob which dwarfs one's own project is often something which invigorates the reinventive tendencies of many Python developers. Depending on or bundling some part of Twisted is better than having to monitor the effects of a complicated dependency on the whole thing.

From what I've seen, Twisted's refactoring is largely a failure. It's not compatible with eggs, and approximately zero people use the components separately -- everyone seems to still use the sumo distribution.

I think the only thing it solved is the complaint that it's not split up, I haven't seen that thread in a long time. It may have also helped a bit with the release management, but that matters to very few people. Supporting eggs would be a huge benefit to application and library developers, but they don't seem particularly interested in doing it.

I think Twisted's refactoring was kind of like Zope 3's refactoring, which happened in theory previously but no one used it. Actually getting usable isolated packages out of Zope 3 was hard, just for technical reasons, but also because the people doing the work didn't have much incentive to deal with the issues like getting the tarballs pushed somewhere and whatnot. Now they are taking a different tack with Eggs, and I think the result is potentially useful where the previous one wasn't. I think entry points are also really important here, because it's an actual novel feature which makes packaging matter as more than just a way to deflect criticism.

I also think people will really use the new Zope refactoring, if it is done well -- especially extracting small things (like, say, the transaction manager) instead of trying to extract "optional" things like the ZMI. The transaction manager is useful on its own, the ZMI is not. Similarly, the things Twisted should extract are things like the Deferred stuff, not the optional pieces like the protocol handlers. Right now I think the lack of a solid idiom for async programming (barring the use of the entire Twisted stack, or ad hoc callbacks) really keeps people from using that kind of programming in places where it might make sense.

I'm in the midst of a project using turbogears. I do distribute my application as several separate eggs, split on the major fault-lines in the concepts. i.e. a datalayer, management, shop, plugins, migration etc. It has been hugely beneficial to do this because of the aforementioned decoupling. Also because it enables decoupled updates to components with reduced risk to the whole.

Ian Bicking: the old part of his blog

Why Small Packages Matter

Comments: