James Bennett recently wrote an article on Python packaging and installation, and Setuptools. There’s a lot of issues, and writing up my thoughts could take a long time, but I thought at least I should correct some errors, specifically category errors. Figuring out where all the pieces in Setuptools (and pip and virtualenv) fit is difficult, so I don’t blame James for making some mistakes, but in the interest of clarifying the discussion…
I will start with a kind of glossary:
- Distribution:
- This is something-with-a-setup.py. A tarball, zip, a checkout, etc. Distributions have names; this is the name in setup(name="...") in the setup.py file. They have some other metadata too (description, version, etc), and Setuptools adds to that metadata some. Distutils doesn’t make it very easy to add to the metadata — it’ll whine a little about things it doesn’t know, but won’t do anything with that extra data. Fixing this problem in Distutils is an important aspect of Setuptools, and part of what Distutils itself unsuitable as a basis for good library management.
- package/module:
- This is something you import. It is not the same as a distribution, though usually a distribution will have the same name as a package. In my own libraries I try to name the distribution with mixed case (like Paste) and the package with lower case (like paste). Keeping the terminology straight here is very difficult; and usually it doesn’t matter, but sometimes it does.
- Setuptools The Distribution:
- This is what you install when you install Setuptools. It includes several pieces that Phillip Eby wrote, that work together but are not strictly a single thing.
- setuptools The Package:
- This is what you get when you do import setuptools. Setuptools largely works by monkeypatching distutils, so simply importing setuptools activates its functionality from then on. This package is entirely focused on installation and package management, it is not something you should use at runtime (unless you are installing packages as your runtime, of course).
- pkg_resources The Module:
- This is also included in Setuptools The Distribution, and is for use at runtime. This is a single module that provides the ability to query what distributions are installed, metadata about those distributions, information about the location where they are installed. It also allows distributions to be "activated". A distribution can be available but not activated. Activating a distribution means adding its location to sys.path, and probably you’ve noticed how long sys.path is when you use easy_install. Almost everything that allows different libraries to be installed, or allows different versions of libraries, does it through some management of sys.path. pkg_resources also allows for generic access to "resources" (i.e., non-code files), and let’s those resources be in zip files. pkg_resources is safe to use, it doesn’t do any of the funny stuff that people get annoyed with.
- easy_install:
- This is also in Setuptools The Distribution. The basic functionality it provides is that given a name, it can search for package with that distribution name, and also satisfying a version requirement. It then downloads the package, installs it (using setup.py install, but with the setuptools monkeypatches in place). After that, it checks the newly installed distribution to see if it requires any other libraries that aren’t yet installed, and if so it installs them.
- Eggs the Distribution Format:
- These are zip files that Setuptools creates when you run python setup.py bdist_egg. Unlike a tarball, these can be binary packages, containing compiled modules, and generally contain .pyc files (which are portable across platforms, but not Python versions). This format only includes files that will actually be installed; as a result it does not include doc files or setup.py itself. All the metadata from setup.py that is needed for installation is put in files in a directory EGG-INFO.
- Eggs the Installation Format:
- Eggs the Distribution Format are a subset of the Installation Format. That is, if you put an Egg zip file on the path, it is installed, no other process is necessary. But the Installation Format is more general. To have an egg installed, you either need something like DistroName-X.Y.egg/ on the path, and then an EGG-INFO/ directory under that with the metadata, or a path like DistroName.egg-info/ with the metadata directly in that directory. This metadata can exist anywhere, and doesn’t have to be directly alongside the actual Python code. Egg directories are required for pkg_resources to activate and deactivate distributions, but otherwise they aren’t necessary.
- pip:
- This is an alternative to easy_install. It works somewhat differently than easy_install, but not much. Mostly it is better than easy_install, in that it has some extra features and is easier to use. Unlike easy_install, it downloads all distributions up-front, and generates the metadata to read distribution and version requirements. It uses Setuptools to generate this metadata from a setup.py file, and uses pkg_resources to parse this metadata. It then installs packages with the setuptools monkeypatches applied. It just happens to use an option python setup.py --single-version-externally-managed, which gets Setuptools to install packages in a more flat manner, with Distro.egg-info/ directories alongside the package. Pip installs eggs! I’ve heard the many complaints about easy_install (and I’ve had many myself), but ultimately I think pip does well by just fixing a few small issues. Pip is not a repudiation of Setuptools or the basic mechanisms that easy_install uses.
- PoachEggs:
- This is a defunct package that had some of the features of pip (particularly requirement files) but used easy_install for installation. Don’t bother with this, it was just a bridge to get to pip.
- virtualenv:
- This is a little hack that creates isolated Python environments. It’s based on virtual-python.py, which is something I wrote based on some documentation notes PJE wrote for Setuptools. Basically virtualenv just creates a bin/python interpreter that has its own value of sys.prefix, but uses the system Python and standard library. It also installs Setuptools to make it easier to bootstrap the environment (because bootstrapping Setuptools is itself a bit tedious). I’ll add pip to it too sometime. Using virtualenv you don’t have to worry about different library versions, because for any one environment you will probably only need one version of a library. On any one machine you probably need different versions, which is why installing packages system-wide is problematic for most libraries. (I’ve been meaning to write a post on why I think using system packaging for libraries is counter-productive, but that’ll wait for another time.)
So… there’s the pieces involved, at least the ones I can remember now. And I haven’t really discussed .pth files, entry points, sys.path trickery, site.py, distutils.cfg… sadly this is a complex state of affairs, but it was also complex before Setuptools.
There are a few things that I think people really dislike about Setuptools.
First, zip files. Setuptools prefers zip files, for reasons that won’t mean much to you, and maybe are more historical than anything. When a distribution doesn’t indicate if it is zip-safe, Setuptools looks at the code and sees if it uses __file__, an if not it presumes that the code is probably zip-safe. The specific problem James cites is what appears to be a bug in Django, that Django looks for code and can’t traverse into zip files in the same way that Python itself can. Setuptools didn’t itself add anything to Python to make it import zip files, that functionality was added to Python some time before. The zipped eggs that Setuptools installs are using existing (standard!) Python functionality.
That said, I don’t think zipping libraries up is all that useful, and while it should work, it doesn’t always, and it makes code harder to inspect and understand. So since it’s not that useful, I’ve disabled it when pip installs packages. I also have had it disabled on my own system for years now, by creating a distutils.cfg file with [easy_install] zip_ok = False in it. Sadly App Engine is forcing me to use zip files again, because of its absurdly small file limits… but that’s a different topic. (There is an experimental pip zip command mostly intended for App Engine.)
Another pain point is version management with setup.py and Setuptools. Indeed it is easy to get things messed up, and it is easy to piss people off by overspecifying, and sometimes things can get in a weird state for no good reason (often because of easy_install’s rather naive leap-before-you-look installation order). Pip fixes that last point, but it also tries to suggest more constructive and less painful ways to manage other pieces.
Pip requirement files are an assertion of versions that work together. setup.py requirements (the Setuptools requirements) should contain two things: 1: all the libraries used by the distribution (without which there’s no way it’ll work) and 2: exclusions of the versions of those libraries that are known not to work. setup.py requirements should not be viewed as an assertion that by satisfying those requirements everything will work, just that it might work. Only the end developer, testing the system together, can figure out if it really works. Then pip gives you a way to record that working set (using pip freeze), separate from any single distribution or library.
There’s also a lot of conflicts between Setuptools and package maintainers. This is kind of a proxy war between developers and sysadmins, who have very different motivations. It deserves a post of its own, but the conflicts are about more than just how Setuptools is implemented.
I’d love if there was a language-neutral library installation and management tool that really worked. Linux system package managers are absolutely not that tool; frankly it is absurd to even consider them as an alternative. So for now we do our best in our respective language communities. If we’re going to move forward, we’ll have to acknowledge what’s come before, and the reasoning for it.
Automatically generated list of related posts:
- Core Competencies, Silver Lining, Packaging I’ve been leaning heavily on Ubuntu and Debian packages for...
- pyinstall is dead, long live pip! I’ve finished renaming pyinstall to its new name: pip. The...
I am new to pip, so I hope I don’t write silly things ;)
It’s a good idea to have the requirements outside the package, instead of what setuptools does. So you can share them amongst several packages.
So basically, pip’s requirements files are what Zope calls the “Known Good Set” and what Turbogears 2 does by maintaining its own PyPI server to distribute TG2 packages : a list of versions that are known to interact well in the same environment, right ?
But it’s not really different from setuptools there, except that you change the requirements in a different place if something goes wrong.
So to simplify the problem, couldn’t we have juste ONE requirement file in the whole Python ?
This could be the clue for os packagers : they would be able to tweak this file, while developers would be able to try out their package over different requirements files (“the debian etch python requirement file” “the debian unstable python requirement file”, etc). And for specific, isolated stuff, using virtualenv would allow developer to have their own custome requirement files.
Can you perhaps elaborate on that? It seems that apt, for instance is exactly that. Its language neutral. Its installs and manages libraries from various languages and according to the majority of the packages on my server, it works. The only two downsides to something like apt is A) its platform specific and B) they include new things on a timeline, not constantly.
Love to get some more info on this from you.
Another problem Linux system package managers might face is the concept of environments. I’m not sure how they deal with this other than maybe allowing an installation prefix, but even then, system-wide you can probably only install one version of the package at one prefix. I know with BSD ports you can at least have more than one version installed, but you have to activate/deactivate them individually. There needs to be a way to have completely independent environments, but I’m not experienced enough to know if this is currently possible or convenient. Anyone?
Justin, I don’t think it’s possible to have multiple versions of the same library installed via apt, correct?
Ian (and others) definitely need that ability, as it’s one of the main purposes of virtualenv. ;)
That’s a simple example with a single requirement, but imagine that each of those apps/projects/sites all had their own complicated stacks. I don’t know how you’d manage that with apt.
Another benefit is, like you said, the timeline. As Python developers it’s unlikely that most of us roll our kernel, or compile most of our own C-level-system-libraries or what have you, so whatever our distribution provides is good enough for 99.9% of the use cases. But when it comes to my Python app, I’m much more familiar and may know that the psycopg version that’s available fixes many problems over that of the one included in the latest Ubuntu, and so I’m more likely to want to manage that myself. It’s higher on the stack where small versions changes matter more to me.
As Brett points out, apt and other package managers are inappropriate because they don’t allow multiple versions of packages to be installed, local environments, and ad hoc packages.
The Linux package managers do good things, but two are very important. One is that they record the existence of files so you can check the package is correctly installed, hasn’t been altered, and can uninstall. (Do a Google search for easy_install uninstall to see how people get frustrated at being unable to uninstall). Secondly they hook into update mechanisms so updates happen as appropriate for security and functionality reasons. If those important capabilities are ignored then we’ll end up with the Windows situation with detritus all over your system and numerous update daemons running that care only about their little world.
The exact same situation arose with CPAN vs package managers. It would be worthwhile surveying that to see what worked and what didn’t. The solutions mostly seemed to be some sort of wrapper that would turn CPAN packages into native platform packages, although that doesn’t solve the update issue.
Perhaps PyPI should be doing something like Ubuntu/Launchpad PPAs where packages can be automatically built for multiple platforms and trivially integrated into Debian/Ubuntu/apt style package managers. The same approach would work for RPM and even Windows although one updater would need to be picked as the blessed updater mechanism since there isn’t a standard one.
Ian, the Django case is one where, for most of the operations involved, we use standard
__import__
andimp.find_module
, but for one specific case (having determined where custom commands are, get a list of all of them) we fall back to an old-fashionedos.listdir
. And that’s where a zipped egg trips it up, and a place where setuptools can’t recognize there’s a problem (since there’s no reference to__file__
in the relevant code). I’m not sure that (on the one hand) special-casing zip files and using thezipfile
module to introspect them or (on the other) trying to use the associated APIs coming from the setuptools frameworks would be the right solution there.And as I’ve said, I much prefer pip’s requirements files to setuptools’ attempt to graft dependencies into setup.py; pointing people at a file which lists everything they need, and which a tool can use as the basis for a repeatable install, seems to me a much more sane way to do things.
In an attempt to rid myself of easy-install, I’m trying to understand some things. First, pip isn’t intended as a setuptools replacement, right? I mean, it actually uses setuptools components, if I’m not mistaken. The way I understand it is that it’s meant to be a replacement for the easy-install functionality of setuptools. So things like install-requires in the setup command is still ok to use? If not, what is the preferred alternative? I just want to understand what I should be staying away from.
Hey Ian,
This is an excellent and unbiased summary. More or less the missing manual of the current state of packaging under Python.
I need to add that I really like pip. It doesn’t change much to easy_install, but it is day an night to me. Thanks for making it available.
Jeremy: correct, pip uses Setuptools (quite a bit). I also recommend using install_requires, as mentioned in the article: use it to indicate the absolute minimum other things you need for the library to work. This will get a new developer on a new project up and running with something workable, avoiding unnecessary installation confusion or long recipes. As you share or deploy your project, building up a requirements file will let other people reproduce your work.
Your probably meant “The specific problem James sites is what appears to be a bug in Django” to be “The specific problem James cites is what appears to be a bug in Django”
Your friendly neighbourhood grammar nazi. [Fixed!]
For those wondering, the Debian tools do allow you to install packages into numerous different places: that’s how tools like pbuilder manage to function. And with fakeroot and fakechroot you can do so without actually being root.
As for apt, yum and so on being the answer, the issue isn’t whether those tools provide a solution themselves, but whether they provide guidance on how solutions may be reached. It doesn’t surprise me that the Ruby community is experiencing the same kind of friction that the Python packaging community experienced with the Debian maintainers. After all, “opinionated” software development (ignoring the lessons of the past) was made popular by certain parts of the Ruby scene.
can we make PYTHONPATH work the way its documented to work in Python ? This would make me very happy. I.e. any paths on PYTHONPATH are used first for locating packages, regardless of the difference between path/eggs on the systemwide path. It means it would work the same as CLASSPATH in java, PERL5LIB in perl, and PYTHONPATH in….non-setuptools Python. I find setuptools’ decision in this regard pretty arbitrary (PJE of course disagrees).
The only generic solution I can think of for installing multiple libraries is vesta (http://www.vestasys.org/).
As a Plone and Zope packager for RHEL myself, one that has worked hard to understand your distribution practices and built packages that work just fine despite Zopists and Plonists actually working to make that job hard for us, I am saddened to see that your ignorance of packaging systems for distributions make you say that.
Sentences like “Linux system package managers are absolutely not that tool; frankly it is absurd to even consider them as an alternative.” not only reveal the depth of your ignorance regarding (or your unwillingness to learn about) the capabilities of deployed package managers, they actually are an insult and a spitball to the face to the efforts of many people out there, an effort that has yielded systems like dpkg and RPM that are consistently superior and offer more functionality than these half-assed “solutions” in the vein of setuptools.
In the meantime, instead of being polemic and offering your uninformed opinion as “fact”, would you please help us help you in packaging your latest releases for the operating systems out there?
There’s just no reason to package libraries for an OS. Applications should be packaged, that’s what people actually want to install. Why would any user want to install Paste, or Pylons, or WebOb, etc? I would entirely support the packaging of applications in sensible ways. That shouldn’t include packaging the libraries as separate entities. I know the reasons that are stated for packaging libraries — mostly security updates — but I find that to be a poor excuse for a lack of proper tools on the part of the packagers.
For actual development, package managers just don’t make any sense at all. I just can’t fathom how the workflow would go. For a small number of libraries that are very stable and somewhat hard to install (e.g., PIL or MySQLdb) it can be useful.
But OK, if you really think package managers make sense for developers, please show how you would reproduce this workflow:
etc/requirements.txt
specifies a set of libraries to be installed, including libraries that will be checked out from source control and should be editable in-place.This does not use root. This does not affect anything else on the system, everything is encapsulated in
mysite/
. This installs libraries that are available on PyPI, source code directly available in repositories, or private repositories. It uses conventions already in place for Python code, I don’t have to convince anyone to change their packages to make this work (that wasn’t always the case, but it is now). I can easily repeat this process in another directory to do work on a branch. Library versions do not have to match up across directories.I’ll give you a pass on one feature that my suggested workflow has: it works on rpm, deb, Gentoo, and Mac systems consistently, and it’s quite typical to have coworkers who use a diversity of those systems. Though if your workflow actually works like mine, you should be able to install rpm and repeat the workflow on any of those platforms (rpm after all installs on any of those, it just isn’t the primary packaging system).
Your argument that packaging libraries should not be done is absurd — tell that to the GTK+ or Qt guys then try to avoid the shitstorm that would ensue. There are many reasons to package libraries for an OS. One of them is that, if your users want to install an application that depends on your library (or even a particular version of it), then the users can use their package managers to install them in one shell line or a couple of clicks. Packaging libraries is extremely easy — it’s basically python setup.py bdist_rpm and it’s there, so there is no excuse besides ignorance and laziness.
Developers are a minority. You can develop all you want using your “package managers”, but development is just one step, and distribution is another. No one is arguing that packaging here is good for the developers — it is good for the users who then can install your libraries and applications with zero effort. The users in this context are simply those who want to use the applications that require the libraries. And the workflow is, again, stupidly simple: python setup.py bdist_rpm, problem solved. Debian is kind of out of luck there, but that does not mean that you should also exclude RPM distros from the loop.
If you had actually poured efforts in specifying the dependencies on the setup.cfg file, instead of building this gigantic infrastructure that basically duplicates a Yum repository, your workflow would be braindead easy to install.
Now, your workflow seems to have confused application code and data with an instance of your application. Shared code and shared data — to be run in a production setting — should be installed in the appropriate directories. Your workflow just mishmashes shared code, shared data, and instance data in a single directory. This is a crass violation of elementary system administration principles. I’m fine with you wanting to do that, but I (as an end user and a system administrator) retch when I see that, because it makes my job so much harder.
Yes, it does not use root — but that is not a concern if you are the system administrator. Yes, it does not affect anything else in the system, but then well-written libraries should not affect anything else in the system either. Yes, it encapsulates in mysite/, and this is precisely what is wrong — mishmashing shared code/data with instance data. Yes, it downloads from PyPI — wasted effort because no one has bothered to make a distribution repository for PyPI, and people just don’t want to easy_install random shit on their OS-managed /usr directories. Yes, you do not have to convince anyone to change their packages to make this work, but as you yourself recognized, people did change them in the past, and I see no problem with people filling out their setup.cfg’s now. Yes, you can easily repeat this process, but sadly this process just won’t integrate into the Install/Remove Software that every modern distribution has. Yes, library versions do not have to match up across directories, but this is not a concern for people who just want to run the latest stable versions of compatible software.
Yes, it works on all distributions consistently, but tell me: how do I audit the files using the package manager? how do I cleanly uninstall a particular package? how do I upgrade them to their latest stable versions using yum? how do I tell which files have been modified and when?
TLDR:
In short: get with the program already, what I’m asking is not a huge effort nor does it require you to abandon your preferred workflow either.
Just as an addendum to show you the error of your philosophy.
Your workflow requires me to type five different commands after creating some files. To you, of course, this is probably something you know by rote now, but to me, understanding it is likely to take me several hours of reading and research — that I don’t even have to do in principle, because I just want to use the software, not develop it further.
This is why, for users, your workflow simply cannot — and will never — compete with:
or clicking Add/Remove Software on the menu, and checking the Plone checkmark, then Apply.
Do you get it now? SAVE ME, your user, TIME. THAT is what packaging is for.
I am admittedly biased to the workflow of web developers. Though I also explicitly said that real applications work just fine as packages, and that’s cool. I actually have lots of ideas that I think would be useful about assisting in the packaging of applications, though they don’t necessarily jive with Linux package management standards.
For web developers the workflow I presented is a reasonable workflow, both for development and production, and the one you suggest is not reasonable. If you don’t understand this, I’m afraid you don’t understand web development. I’m not really sure how to bridge this divide.
There’s no “user” for web applications. There are “sysadmins”, “developers” and somewhere in between “integrators”. There is also no “application” like you suggest. There are “sites” and “development environments” and “production environments”. None of these map to what Linux packages think of as an application. They also don’t map one-to-one to machines.
Also, quoting your earlier post:
Here you imply that I just don’t know the tools. So I presented a real set of constraints in web development that apply to both development and production deployment, giving you an opportunity to educate me in how to use the tools; I figured there was a real possibility that the tools had features I wasn’t aware of. But instead of educating me on the tools, you told me my constraints were wrong. I’m afraid that was the answer I was expecting, because that’s the answer all the Linux package people seem to give. This is a big part of why there’s no progress: you won’t listen to us, instead you just reject our use cases.
Unless someone wants to take a serious shot at educating me (and other readers) about how to use packaging tools like this, I’ll have to read this as a concession that in fact they aren’t capable of handling my (very typical!) use case.
Shooting with a handicap.
I think the biggest issue at hand here is that you both fail to see each others’ argument. Both of you, I think, are exactly correct while managing to be more or less wrong. Or at least, not willing to accept the constraints offered.
The weird thing is that both the developer and packager have exactly the same set of requirements: “Reproduce this environment for me so that applications Just Plain Work (TM)”.
To address the packager, I’d think something like pip+virtualenv would be a godsend. Just make things as easy as “Correct version of Python? Check. Download app.” Brain dead simple. What’s the trade off? You waste a bit of disk space and modules using C code aren’t handled yet. Solution? Help with the making compiling stuff easier. That’d make everyone happy.
To address the dev people, we need to handle both the active development of a project along with things like deploying to elastic compute clouds. Requirements change during a project and can’t be predicted. Allowing for scripted installation of entire environments is necessary in big production systems. Perhaps they appear to be goals at odds with each other, but I’d argue that putting a package into a package manager and into cloud production have the same requirements. Basically the ability to checkpoint the app and say “I knight thee, Version Pink”. After dealing with the ensuing argument about why that version has to be Pink and that it want’s to trade with Brown, the same rules apply. Make It Fucking Work. (TM).
That’s pretty much my argument. So this is my conclusion.
I’ll throw another example out there, and a very big one. Say my application is build on TurboGears 1.0, which uses CherryPy 2.x, and is totally broken with Cherrpy 3.0 because CP3 totally broke itself (for a good reason). So how will the package manager solve that case? As Ian pointed out the “latest” version isn’t the correct solution that is why we as developers had to build workarounds against it, the best example is virtualenv, yes it’s a nightmare for a sysadmin. But it will ensure me that MY application will work, instead of getting broken because someone though that upgrading to the latest version was a good idea. Following that line, if upgrading is such a good idea why aren’t all the package managers at python 3.0 right now? after all it is “library code” and if your application is coded right it should work. Also I really dislike the argument that developers are a minority, of course we are but why we the producers of the application suffer?
As for the gnome/kde/any big system. I have to say that I have not contributed to several applications hosted on their environment specially because of the fact that I can’t run a development version and the stable at the same time. The closest example I have is Hamster which is a very nice time tracker for gtk. But since I’m a freelancer my income depends on running the stable version and not killing my database, but since it works with “linux package manager” there is simply no way for me to run both and I can’t afford to risk running from svn.
I could go on, but the problem is imminent one version is only good for end users, this doesn’t applies to development and is horrible at web development. Just take a look at how most php libraries are installed in real production server.
Rudd-O, yum install plone is easy, but usually completely useless. I haven’t seen anybody who actually runs a packaged Zope and Plone on a production system. One deal breaker is that it gets installed only once and only in one specific way. If you want to install it differently, for example with several clients in a ZEO setup, you can’t use it. If you want more than one server, you can’t use it. (Or well, you can, but you then have two different setups that are setup and configured in different ways, which is double the work).
Not to mention the many questions on Zope and Plone mailing lists we get when people have troubles with these setups. It would probably be a good idea not to package Zope or Plone in this way at all.
This should be some examples of why Linux packaging isn’t always perfect, even for applications. It works for som cases and there it works great, but it does not work for everything. The other main example is developers, which you just brush off by claiming they are irrelevant. Maybe they are to Linux packagers. But in the Python community, they aren’t.
The last three posts (#20, #21, #22) nail it for me. I use Ubuntu, and “apt-get install plone” does not solve my problem. It seems that the people that do the packaging do a fine job at putting together a consistent ball of stuff that can be installed and works – in the sense that it shows up in your web browser – but it’s mostly unusable. In the end I prefer to install from source, that’s what I did recently for a lot of stuff (Twiki, Plone, etc.).
But it’s to easy to put the blame in the packagers. I think that the problem is much deeper and lies in the way the developers work. (and yes, I am a part-time developer myself, so I take some of the blame).
Most web applications (or web frameworks, as you wish) around are very far from being a “packaged product”. They tend to miss basic administrative functions that would be necessary to make them easily customizable. A programmer, or web developer, can work around it by using the few available line tools (if any), moving files by hand, changing permissions, and editing configuration files. BTW, having an incomplete command line interface (that does not expose all functionality and requires more direct fine tuning fiddling) is sometimes worse than having none. But I digress.
That’s fine if you are a developer. But most users are not developers. That’s something that the packagers get right.
Now there’s a question, does it make sense for the developer to think about end users? First of all, decide who are your real “users”. If you write code to be used by developers, then packaging may not make much sense. Just tell people how to grab your source tree, via tgz or Subversion. But notice that the barrier to entry is very high. And there’s lot of manual work to be done everytime someone starts a new project. Developers tend to brush it off, specially for their own frameworks, because they “just know” what they need to do, and assume that it’s too easy for everyone else. But again, I digress.
At this point, when a developer takes one such application and “packages” it, he’s doing a nonservice – because the result is not usable by a end users and also not usable by a developer. Implicit in the packaging process there are dozens of decisions (where to put stuff, defaults, etc.). The decisions made by the packager get kind of “hard coded” in the package (how much depends on the way the packaging is done) and make the framework too rigid and difficult to customize for both groups.
Now, if the developer want his application to be usable by end users, then thinking about the entire installation process makes sense. And also, it makes sense to provide a complete set of administration tools. It can be command line, but it would be better if it was web based. Think of it as a “bootstrapper” for the entire installation. The developer must write it in such a way that he can easily start a new project, just by installing a new package. He has to think as a packager, and also as a end-user. If done properly, the developer can make his own life much easier (by automating some of his own tasks). I believe this idea has very long legs and could lead to a very long and productive discussion.
I think your separation between apckagers and developers are incorrect. I am a packager. I make loads of packages. Currently I am a co-manager of 19 different packages on pypi. And most of these are dead easy to use in Plone, you just add a line or two to your buildout and rerun it. This is of course how packaging should be done. The release manager makes the release by running the script that builds tha packages.
The thing is that the Linux packaging tools doesn’t provide many of the functionalities that web frameworks and developers need. Ian has listed them above. That, together with the fact that the Linux packaging tools aren’t cross-platform, is why Python has developed it’s own system. Or well, a bunch of them in fact. :)
Ian: your example in comment #16 (or the equivalent with zc.buildout) doesn’t “just work” when one (or more) of the libraries in your environment has C extension modules. Then you need to go back to your OS packaging tool and install the C compiler, and the header files for all the needed C libraries, manually.
Other than that, yes, it is very convenient for the developer to install an isolated environment with all the needed libraries is very useful, and being a developer myself I appreciate it. It’s also seductively convenient for deployment, although it places a rather large burden on the sysadmin to manually keep watch for security issues in hundreds of little libraries scattered in several isolated environments, and to backport the fix to all the different versions that are being used.
Jorge: the solution for the distribution packager is to correctly specify the dependencies so that incompatible packages are not installed. Of course, PyPI already provides for that, so there is no reason why those dependencies can’t map 1:1 to distribution dependencies, so I don’t know what your argument is.
Lennart: I use Plone packaged that way. And the argument that “no one does that, so I couldn’t possibly imagine why people would” is not an argument at all. Just because people don’t do it, does not mean it is a bad idea.
Carlos: just because apt-get install plone does not cover your scenario, does not follow that it is not a desirable one. Plone is a set of libraries and a series of management commands. There is no reason why packagers can’t package them in a reusable, effective manner, just as I have done for RHEL. Keep in mind that my Plone packages have full support for eggs, so if you want to use them in your instance, my packages certainly do not impede you.
Lennart: the Python packaging system is very convenient, but it creates system management nightmares.
Marius: CORRECT!
Your repetitions of “It works for me” and “I cannot understand why someone would want an RPM package” merely go to show you all that almost all of you are all wearing the hat of a bleeding-edge software developer (for whom easy_install and equivalents are great, ftw!), and you have not even made a feeble attempt at actually wearing the hat of a system administrator or an end-user using your applications. There would be absolutely nothing cooler than you doing that. But I’m not holding my hopes high since your actions and comments here really have demonstrated your one-track mind regarding the packaging issue, and you simply don’t want to think about anybody else who might actually have valid concerns — despite the fact that you all could have solved these concerns for distributors merely by providing a setup.cfg manifest with the correct dependencies for your eggs.
For those of you who would rather collaborate in making your packages widely available in distributions, contact me at rudd-o dot com and I will personally help you get your packages prepared for RPM distribution and even collect them on my own RPM repository.
That is all. Please remove from the comment broadcast list since I have exhausted my interest on the subject, and I don’t really care about comments from people who don’t actually have an interest in the arguments I am presenting or empathy to users and system administrators by profession — I’d rather stab myself in the eye.
Well, I guess I get the last word since Rudd-O is giving up on the argument.
For the record, I and many of the people who have responded to Rudd-O have regularly done system administration. I happily use package managers for many tasks. But I don’t use package managers for web development, and I wouldn’t advocate anyone else use them either.
Web development, or integration of web tools, should be done in an environment where there is no barrier to the deployment of multiple versions of software. When you do your first deployment a single version will work fine, but this falls apart later when you want to support working installations that don’t need to be updated, alongside new software that is built with different library versions. Because the problems with using a Linux package don’t appear immediately, I think it is good to actually discourage their use so as to avoid future frustration for users. I’m not saying that current (language-specific) tools are great, but they have a reasonable workflow for deploying web applications, while a packaging system that requires system-wide consistency is not reasonable. All this is motivated by sysadmin concerns, not even developer concerns.
Though I will also add: if you are a professional, and unless you work with absolutely zero programmers (even anyone who might occasionally touch some code), you must also have a development process that matches up with your deployment process. Developers need to be able to create local working installations, and they must be able to edit those installations and share that work with others. Installed Linux packages don’t work that way. This isn’t just a problem for web development, [GTK has an alternate build system](http://www.gnome.org/~jamesh/jhbuild.html) for similar reasons.
Rudd-O: No, but for most usage of Plone it is a bad idea. If it works for you, great. For most Plone users, it doesn’t, for reasons mentioned earlier.
I would like to know what the nightmares in question are. I haven’t noticed them.
“and you have not even made a feeble attempt at actually wearing the hat of a system administrator or an end-user using your applications”
You failed.
ian: I’m very happy to see you’ve hit the nail on the head with this:
If everyone would keep in mind that there’s more than one audience for different stages in an app’s lifetime, then I think even Rudd-O and you could at least see how your views interlock.
.. _[1]: Note that instead of seeing this as a war by proxy, I look at it as a contract negotiation with a mediator. The two parties involved are system administrators and developers. The poor packagers are the mediator that has to carry requirements and messages from one side to the other.
The Developer
There’s many many ways to make this work using packaging tools. However, the appropriate methods for the different audiences are not the same. If the audience is the developer who has just installed a new workstation and needs to start working on the next generation of the website using virtualenv is the correct thing to do. However, what it’s doing is creating a customized environment for running a particular environment. This is anathema to making a system administrators job easy.
The Deployment Engineer
After the developer has gotten the application to a certain level of functionality some engineer has to start making sure it runs the platforms the company cares to support (either internally or externally). They can create a local install in a directory using the platform’s package repositories and checkin changes to the application code to support that platform. In open source, the Developer and Packager often share the Deployment Engineer role.
The Packager
The packager may be part of mycompany or a Linux distribution but they are building the software specifically for a distribution and therefore care about the conventions that a distribution uses to make their users’ (system administrators) lives easier.
The System Administrator
The ideal for the system administrator is to run a single command that downloads and installs the application and all its dependencies. By having the application inside a distro package they also have the ability to do these things:
Notes
Not all applications have all audiences explicitly. There’s many web applications that are not meant to be distributed, for instance. However, even in a one-company-only application, it’s important to remember that the system administrator audience exists even if it is the same person as the developer. If your web site becomes popular, you may need to start load balancing it. Your current server may go belly up. The operating system you’re using may reach its end of life and you have to port to a new version. You may leave the company or your company may grow and hire a system admin.
[ed note: URLs mangled to keep WordPress from mangling them more]
jorge: Distributions handle the CherryPy2/3 issue with various means. last I checked in Debian they have an explicit Conflict so that CherryPy3 and Turbogears-1 apps cannot be installed together on the same box.
In Fedora we’ve made use of the multiple version functionality of setuptools (and run up against limitations in its design) so that you can have both versions of CherryPy installed at the same time.
Note that for a distribution POV, we’d love to have a versioning scheme that works flawlessly. Debian, where the idea of doing compatible python modules was rejected, versions each of their C library packages so you can have separate versions. Fedora, which doesn’t like to do them for maintainability reasons (what do we do when a security issue arises in CherryPy2 and the devs say, “CP2 is no longer supported, we’re only releasing a security fix for CherryPy3″?), will do compat packages like the CherryPy2 package when the value for keeping the apps outweighs the maintainance costs. ATM, only strategies like psycopg use (changing the module name from psycopg to psycopg2 on a major, API changing update) work 100%. The setuptools method has promise, though… if only it would mature a bit more.
sorry for the long overdue response.
jorge: Distributions handle the CherryPy2/3 issue with various means. last I checked in Debian they have an explicit Conflict so that CherryPy3 and Turbogears-1 apps cannot be installed together on the same box.
In Fedora we’ve made use of the multiple version functionality of setuptools (and run up against limitations in its design) so that you can have both versions of CherryPy installed at the same time.
Note that for a distribution POV, we’d love to have a versioning scheme that works flawlessly. Debian, where the idea of doing compatible python modules was rejected, versions each of their C library packages so you can have separate versions. Fedora, which doesn’t like to do them for maintainability reasons (what do we do when a security issue arises in CherryPy2 and the devs say, “CP2 is no longer supported, we’re only releasing a security fix for CherryPy3″?), will do compat packages like the CherryPy2 package when the value for keeping the apps outweighs the maintainance costs. ATM, only strategies like psycopg use (changing the module name from psycopg to psycopg2 on a major, API changing update) work 100%. The setuptools method has promise, though… if only it would mature a bit more.
I should say that I think packaging Trac (ideally with a package that includes nearly all its dependencies) seems quite reasonable. That is an “application” in a way that makes sense. Though even then it is difficult, as shown by Plone — but because Plone involves lots of plugins/products (and while Plone moves along at a reasonably slow place, the plugins do not), packaging just doesn’t keep up, and people end up reverting to methods better supported by upstream developers. It could work, but I think it doesn’t work because there’s intermediaries. Only if Plone itself had the infrastructure set up to create and distribute its own RPM or deb packages (and packages for any plugin anyone wants to make available) would it be a working system. And though Trac is more interesting in its base state than Plone, it will suffer the same problems.
There’s no reason Trac or any other application needs to depend on separately packaged Python libraries, and I do think that’s specifically problematic. But if
yum install mysite
doesn’t involve any dependencies (except for very conservative packages) then that could work pretty well. I’d still be pretty worried about migrations, though… but arguably developers should work harder at making data migration work with less interaction. Having specific support for data migration (mostly meaning clear policies for how backup and migration and reverts should work) would also be an improvement for people creating packages.Of course it’s also common for web developers to have two versions of the exact same application simultaneously deployed on a single box. I find the way Linux package managers deal with versions to simply be… dumb. As in not-technically-advanced-in-a-way-that-actively-hurts. There’s also problems about the development build process being very different from the deployment process, which leads to bugs in one or the other. These bugs cause a lot of social problems, as they aggravate the already difficult relationship between sysadmin and developer (a relationship that can be difficult even when they are the same person ;).
What we need is a great language and platform independent package manager that can work throughout the development and deployment lifecycle. I know of no such thing at the moment. (I thought [rpath](http://www.rpath.com/corp/) might be like that, but from what I can tell it is not.)
Marius said: “It’s also seductively convenient for deployment, although it places a rather large burden on the sysadmin to manually keep watch for security issues in hundreds of little libraries scattered in several isolated environments, and to backport the fix to all the different versions that are being used.”
Rudd-O said: “Marius: CORRECT!”
This as an admission that the packaging tools are NOT adequate for developer needs. Why does having isolated environments preclude simple administration? If the package management tools were in charge of managing environments, then they would know where to check for installed package versions, even if I had hundreds scattered about my home directory.
The worldviews are very different indeed. It’s a complex problem that has many roots.
1) Distributions are an attempt to freeze the world in a one entirely self consistent image. Packager’s priorities are completeness (no broken dependencies), self consistency, and stability. That’s why they prefer to package the latest stable version. But many times, the stable version of one particular program depends on the unstable version of a library (that’s only the first problem).
2) Packagers are also concerned about space, due to disk and bandwidth concerns. That means eliminating redundancy. Most packagers hate the idea of having multiple versions of a single library “just because” one app needs it.
3) Last but not least, packagers are concerned about security. Again, having multiple versions of a library tends to make their work harder due to the potential combinations and security threats.
That’s not a problem for conventional applications, because most applications evolve slowly, and even when that’s not the case, most users do not need the latest upgrades – unless there’s a security issue, which are addressed by minimal patches that only fix the problem and do not add any functionality (so dependencies keep relatively unchanged).
Development is about bringing new functionality at the fastest pace possible. It means that:
Developers are comfortable about using the latest libraries, even if they are not considered stable (and that leads to problem #1 mentioned above).
Developers need a completely different environment to work than the one recommended to run the final application.
To make things more confusing, the world is not black and white. There’s a middle ground between the two. Trac, for instance – most of the times I can work with a older version, so I don’t mind installing the packaged one. And it will work out of the box for most users. But for other packages, it’s much harder to come up with a sensible installation out of a generic package.
One of my particular gripes is with documentation. Take some package (for example, Twiki or Plone). Install it and try to customize it, following the instructions on the original website. Most of the times*, there are no instructions on how to install the “packaged version”, and some of the instructions that apply to the “tgz version” do not make sense or are not applicable to the packaged one. Locations differ, some of the parameters are already customized, and that just makes life more difficult for the users.
(*) In some cases the instructions are there but are incomplete or out of date.
This is why Gentoo portage comes with a deployment tool called webapp-config.
Let’s say you install trac using portage with
emerge =trac-0.11.2
, the files will just be placed into/usr/share/webapps/trac/0.11.2/
. You will then use webapp-config to deploy it to the desired web server(s) or vhost(s).It is far from perfect (e.g. doesn’t support virtualenv, doesn’t support nginx), but at least it partially solves the problem where more than one copy and/or more than one version of a webapp are required to be running on the same machine.
Btw, it is also “backward-compatible” with other less capable packaging tools, such that you can configure it to auto-deploy to your localhost whenever you emerge a webapp package, if you only need just one copy and one version of that webapp.
Anyway, to get anything done at work, we just use a combination of virtualenv, custom distcmds, and a local eggbasket. A great language and platform independent package manager that can work throughout the development and deployment lifecycle would certainly be very welcome :)
I think there is a problem with coding in your blog. I’m not able to see a part of the text from its left side.
I’ve got the feeling that one thing has totally been neglected. One of the origins that makes the clash between the two types of administrators (and their needs) so severe:
Python is a general purpose language. But here everybody just seems to be talking about web development!
In web development you’ve got to distinguish the users from the consumers. The consumers do not do any administration, that’s only the users who are often/mostly also the administrators of that server/host/machine. There one can much more easily get away with pulling up an alternative management system in parallel to the distribution’s own system without doing any major harm.
But with most other applications of Python it’s a bit different. Whenever you use Python for system administrative tools (Ubuntu has got quite a few already), for libraries/bindings to libraries, or for application level (excluding web applications) it’s all a different ball game. In all these cases the system’s package management becomes much more important than Python’s own attempts to cater for the developer and/or web admin needs.
Boiling down to this: I can fully understand both sides. Unfortunately there’s quite some clash between them as described. The challenges ahead would be now to find a Good Solution (TM) that is well suitable to bridge the gap and cater for the non-web needs. Maybe even some sort of a PyPI mirror that offers eggs as pre-rolled .deb/.rpm packages for different distributions.
Quick thoughts:
Debian Python packaging toolchains reuse Distutils. As a Python developper, I want to write a setup.py; Debian wants a makefile. Well, the debian/rules makefile uses the setup.py script with appropriate command-line options. (Dependencies in the Python system would be a must though, for now they are in a Debian-specific file.) See, no conflict between the two systems.
Integrating PyPI and APT isn’t just a technical issue: Debian packages are DFSG-free. (Of course, Debian means “main”-only.) I am totally in favor of decentralisation for VCS for instance, but I like the centralized control of the freeness of software allowed by using Debian archives.
APT can work with local repositories (i.e. lines starting with “deb file:///” in the sources.list file). A local repository (or an HTTP one, for that matter) is really a bunch of deb packages and a few meta-files. Automation can be done with the help of mini-dinstall, pbuilder, dnotify, *-buildpackage. (After all, pip is “just” automation of such things ;)
Well, I know my points only work if you’re already happily using a deb-based system. I don’t know whether and how Python developpers should compensate Windows flaws. And if you like setup.py develop, I don’t know how much coding would be required to automate these steps using a local APT repository, but that would be worth trying.
Cheers, Merwok
This is all very interesting. Last post was from february 2009. I would like to know if any progress was made in light of python 3. If i’m not mistaken, still only distutils is in the standard library. Another interesting idea would be to look at this in the era of virtualization. Web applications running in different versions on the same machine, is that still viable? Shouldn’t they at least be running on a different virtual host?
Standard library, yes. But much of the modules above is available for Python 3 as well.
It’s perfectly viable to run different versions of both Python and web applications on one machine. No problems at all, in fact, if you know what you are doing. I like virtual machines, but use them more to separate customers than web servers. :-)