Since I’ve been thinking about deployment I’ve been thinking a lot more about what "configuration management" means, how it should work, what it should do.
I guess my quick summary of configuration management is that it is setting up a server correctly. "Correct" is an ambiguous term, but given that there are so many to configuration management the solutions are also ambiguous.
Silver Lining includes configuration management of a sort. It is very simple. Right now it is simply a bunch of files to rsync over, and one shell script (you can see the files here and the script here — at least until I move them and those links start 404ing). Also each "service" (e.g., a database) has a simple setup script. I’m sure this system will become more complicated over time, but it’s really simple right now, and I like that.
The other system I’ve been asked about the most about lately is Puppet. Puppet is a real configuration management system. The driving forces are very different: I’m just trying to get a system set up that is in all ways acceptable for web application deployment. I want one system set up for one kind of task; I am completely focused on that end, and I care about means only insofar as I don’t want to get distracted by those means. Puppet is for people who care about the means, not just the ends. People who want things to work in a particular way; I only care that they work.
That’s the big difference between Puppet and Silver Lining. The smaller difference (that I want to talk about) is "push" vs. "pull". Grig wrote up some notes on two approaches. Silver Lining uses a "push" system (though calling it a "system" is kind of overselling what it does) while Puppet is "pull". Basically Silver Lining runs these commands (from your personal computer):
$ rsync -r <silverlining>/server-files/serverroot/ root@server:/
$ ssh root@server "$(cat <silverlining>/server-files/update-server-script.sh)"
This is what happens when you run silver setup-node server: it pushes a bunch of files over to the server, and runs a shell script. If you update either of the files or the shell script you run silver setup-node again to update the server. This is "push" because everything is initiated by the "master" (in this case, the developer’s personal computer).
Puppet uses a pull model. In this model there is a daemon running on every machine, and these machines call in to the master to see if there’s any new instructions for them. If there are, the daemon applies those instructions to the machine it is running on.
Grig identifies two big advantages to this pull model:
- When a new server comes up it can get instructions from the master and start doing things. You can’t push instructions to a server that isn’t there, and the server itself is most aware of when it is ready to do stuff.
- If a lot of servers come up, they can all do the setup work on their own, they only have to ask the master what to do.
But… I don’t buy either justification.
First: servers don’t just do things when they start up. To get this to work you have to create custom images with Puppet installed, and configured to know where the master is, and either the image or the master needs some indication of what kind of server you intended to create. All this is to avoid polling a server to see when it comes online. Polling a server is lame (and is the best Silver Lining can do right now), but avoiding polling can be done with something a lot simpler than a complete change from push to pull.
Second: there’s nothing unscalable about push. Look at those commands: one rsync and one ssh. The first is pretty darn cheap, and the second is cheap on the master and expensive on the remote machine (since it is doing things like installing stuff). You need to do it on lots of machines? Then fork a bunch of processes to run those two commands. This is not complicated stuff.
It is possible to write a push system that is hard to scale, if the master is doing lots of work. But just don’t do that. Upload your setup code to the remote server/slave and run it there. Problem fixed!
What are the advantages of push?
- Easy to bootstrap. A bare server can be setup with push, no customization needed. Any customization is another kind of configuration, and configuration should be automated, and… well, this is why it’s a bootstrap problem.
- Errors are synchronous: if your setup code doesn’t work, your push system will get the error back, you don’t need some fancy monitor and you don’t need to check any logs. Weird behavior is also synchronous; can’t tell why servers are doing something? Run the commands and watch the output.
- Development is sensible: if you have a change to your setup scripts, you can try it out from your machine. You don’t need to do anything exceptional, your machine doesn’t have to accept connections from the slave, you don’t need special instructions to keep the slave from setting itself up as a production machine, there’s no daemon that might need modifications… none of that. You change the code, you run it, it works.
- It’s just so damn simple. If you don’t start thinking about push and pull and other design choices, it simply becomes: do the obvious and easy thing.
In conclusion: push is sensible, pull is needless complexity.
Automatically generated list of related posts:
- Treating configuration values as templates A while back I described fassembler and one of the...
I used an rsync-based automated deployment tool before (slack, created at google), but it was a ‘pull’ system, where the client machines were rsync-ing from a central master. We ran into issues when we had to rsync thousands of files and there were hundreds of clients involved.
In your case, you’re pushing the rsync from the master to all your other nodes, and the push is under your control, so you won’t run into this issue necessarily, but it’s something to keep in mind.
Grig
I am, admittedly, still working on attaining just the first order of magnitude (1 server to 10), and have not even gotten close to the next order of magnitude. I have no idea how far this can go… I guess I could spin up a hundred servers and try, but… eh.
Naah, wait for your user reports to come in ;-) They’ll run into scenarios you can never even start to imagine.
Also, speaking of large scale file synchronization problems, here’s how twitter does it: http://github.com/lg/murder + Capistrano. Interesting use of BitTorrent.
This is essentially the “user-data” that you can push in to a Debian or Ubuntu image starting up on EC2 http://alestic.com/2009/06/ec2-user-data-scripts , something it gets right. If the first two characters of the user-data are “#!” it treats it as a script and executes it, and the stdout is directed to /var/log/syslog, which you can tail and make sure everything worked (use “set -e -x”!). This is enormously flexible and powerful, and cheap and easy. I much prefer having a nice, tidy library of useful Bash functions that install packages and sed/awk various config scripts, that I understand exactly what they are going to do, than going, okay, I need a centralized config server, an AMI which has the client — before I can do anything — ugh, my heart sinks …
Couldn’t agree more. This is why I used a push model for collective.hostout. I wanted something that made as little assumptions about the server, it’s firewalls and what other servers exist (e.g. svn/git/puppet) servers. You create a VPS and push the bootstrap and code up.
Push systems are called “agentless” in the world of Enterprise (i.e. expensive, bloated yet underwhelming) network and systems monitoring. Commercial vendors tend to disparage agentless systems, usually because they like the idea of selling lots of licenses for the agents. Granted, in the world of Windows where SSH is not ubiquitous, you need agents to bootstrap the monitoring infrastructure.
Congratulations on the Mozilla job, BTW!
Unfortunately Ian, you’re completely wrong. I don’t think I’ve ever read a bigger misunderstanding of Puppet. It’s for people who want things to work “a certain way”? What the hell does this even mean? I’m sure for setting up one or two servers something like Silver Lining might work but it won’t scale at all and it covers only the very, very, very basics of what configuration management. (Hello, dependencies? What do you do when one service needs another service before it can start?) Calling pull “needless complexity” is such a dumb thing to say that I had to forward this entry around to a few of my guys for a good laugh. Thanks for that at least.
I think I was pretty clear that Silver Lining is a very simple and limited approach to the problem of configuration management. It actually “handles” things like dependencies and starting services in the right order, because Ubuntu and Debian maintainers have been doing that work for ages now, and Silver Lining uses that work. If you are willing to use and trust a quality distribution like Ubuntu, then configuration management becomes a much more approachable project. If you want to be distribution- or OS-agnostic, or you disagree with the way the packagers arranged those packages, then you have to do a lot more work. Puppet does that work. I don’t consider the tradeoff worth it, but I can understand why some people would feel differently. That’s a [different argument](https://ianbicking.org/2010/02/10/why-toppcloud-not-agnostic/) for a different time.
Puppet also uses a pull model. It would be entirely possible to use a push model to the same effect.
There is one additional advantage for the push model, at least for me: security. With pull somehow it’s the production server that must connect to the developers’ citadel. I might be a bit paranoid, but this always means that you must let a way to for the deployment server to connect to the citadel, even if for just a little anonymous ftp account or an open port in the firewall. This means that if the production server is compromised it has a channel that could be abused to try and compromise the citadel.
I don’t feel at ease with this scheme. I’d rather have my completely firewalled developers’ citadel that connects to the production server. (BTW this holds for the backup as well. I never have the deployment server connect to the backup server, but the other way round.)
I think flexibility is the key point here.
Specially at the moment of wanting to build a server from scratch. Sometimes, you do not need an extremely efficient (yet overcomplicated) piece of software that can handle 1,000 servers at the same time when all you really want/need is get one up and running.
I like simplicity.
I like getting things done.
And Puppet does not seem neither simple or easy to get started with.
For a few months I have been developing something along the lines of Silver Lining, but more oriented to internal infrastructure System Administration (nothing to do with the Cloud) called Pacha (http://code.google.com/p/pacha)
Wish I could have stayed for the Silver Lining sprint after PyCon…
Sorry about being slightly off topic, but how do you view Silver lining’s overlap with fabric?
http://cloudsilverlining.org/design.html#fabric
It’s in the docs http://cloudsilverlining.org/design.html#fabric
IMO Fabric’s problem is that it expect you to run a lot of stuff on the server which makes it a messier approach.
Interesting Usenix paper from 1998. They swear by the pull mechanism too:
http://www.infrastructures.org/papers/bootstrap/bootstrap.html
Totally agree. This is the main reason I was first turned on to toppcloud/silverlining having push combined with “a setup that does one thing and does it well” is an excellent combination. Will it scale? we’ll see that when we need it. For now we are more than happy for running each app entirely on one box.
A pull-based system is useful for test machines that constantly reboot to ensure a clean test environment. Wouldn’t it be easier to have the machine come up, pull its configuration, start testing than to have it come up, and wait for another server to configure it and start the tests?
Your example makes more sense to me than most use cases I’ve heard. The way Paul Smith mentions that EC2 instances can be started with a script to be run on boot seems kind of like a hybrid. It’s not exactly push, as the commands are executed offline on restart. But it’s not pull, in that everything is initiated when the server is requisitioned, and initiated by the requisitioner. I guess it’s async push…?
There are certainly unscalable aspects of push and pull, but in fact they have different optimizations. Push is easier on the server, since it can more easily manage its resource caching, but it is highly unpredictable as you don’t know whether the clients are available or not. Moreover, a server is usually equipped to deal with the load where as a client isn’t. With push you are letting one system drive the availability of many, which makes adaptation and fault tolerance very difficult. Pull can automatically perform distributed load balancing and each machine can download stuff at its convenience. This especially applies to hosts that are down at the moment of push.
Another problem with push is that the clients have no say in what they get from the “pusher”. This is a potential security disaster waiting to happen. Clients are in a better position to decide when and what to download than a central pusher is to decide when and what they need from its single viewpoint. So adaptive behaviour makes a strong case for pull based (subscription based) services.
As for the rest, Cfengine introduced pull and is of course far superior to Puppet both in capabilities and implementation ;-)