With the recent talk on the blogosphere about deployment (and for Django, and lots of other posts too), people are thinking about PHP a bit more analytically. I think people mostly get it wrong.
There are several different process models for the web:
- CGI, where every request creates a new process, and the process handles only one request. (You could also fork a new process each request with mostly the same result.)
- Worker processes, where a process handles one request at a time, but the process is long-lived and will handle another request after finishing the first.
- Threaded, where a process handles multiple requests, each in its own thread. Threads may or may not be reused (depending on whether you use a thread pool), and that’s kind of like the difference between CGI and worker processes. But it’s not a big difference because either way the process is long lived, including all your objects.
- Asynchronous, where a process handles multiple requests without threads. Typically your code is event driven: a request is an event, but so are things like "database results returned", or "file read from disk". Code takes the form of short snippets between these events.
What people don’t realize is that PHP is effectively a CGI model of execution. People don’t appreciate this because PHP is implemented with mod_php, an Apache module. There are many other modules like mod_perl (the first of these mod_language modules), mod_python, mod_ruby, etc. None of these other modules are like mod_php. This has led many a commentator astray because they don’t get this. This is because the PHP language was written for mod_php. Perl, Python, Ruby — none of them were written to be used as an Apache module. You can’t take one of these existing languages and just retrofit it to be like PHP or like mod_php.
Why is it important that PHP has a CGI like model? Mostly because it lets two groups separate their work: system administration types (including hosting companies) set up the environment, and developer types use the environment, and they don’t have to interact much. The developers are empowered, and the administrators are not bothered.
Some details:
- PHP processes can leak memory like crazy. It doesn’t matter because they only leak memory for one request.
- PHP processes can be easily killed if they act badly (sucking up too much memory or if they get caught in an infinite loop).
- PHP code has no global state. It has lots of global variables, but since they only live for one request they are relatively harmless. (Not entirely harmless, but what’s a major sin in another language is only a minor sin in PHP.) The process cannot become corrupted.
- There’s no global set of installed libraries (besides what PHP is compiled with). If you want to include a library you include files, typically relative paths. You won’t include someone else’s library accidentally. (This has changed in PHP, but the search path remains largely unused.)
- Most of the language is implemented in C, in a shared library. In comparison Python has "batteries included", but those batteries are largely written in Python. Python code is not shareable, and can take time to load up. So while a single Python CGI script might be small, it probably imports lots of code which would have to be loaded each request. PHP scripts actually are small. (Stuff like PEAR changes this by adding substantial libraries written in PHP, but also seriously effects PHP performance.)
- What few system-dependent libraries and C extensions that are required are compiled into PHP, and hosting companies have kind of figured out a consistent set of these libraries (stuff like database drivers, an image processing library, and XML parsing). Individual developers write only PHP (and that’s all they are typically allowed to write).
These features are all contrary to the design of these other languages, and even more contrary to the conventions in other language communities. Python programmers could write all their libraries in C and start to make CGI scripts feasible, but they don’t and they won’t.
Focusing on mod_php is a goose chase. You have to focus on the features it provides.
Here’s the features I think are important:
- The failure cases are isolated. Processes never get wedged. One user doesn’t take out another user’s application. Even one bad page in an application won’t take out the entire application.
- File-based deployment. There isn’t a build process for deploying a PHP application. You put the files in the right place and they are deployed. (Configuration in PHP is fuzzy, but PHP is not perfect.)
- Minimal global dependencies. If you use libraries then you package those libraries with your application. You don’t worry about the administrator upgrading something on the system and breaking your application. (At least not much, there still things like upgrading mod_php that can break your application, or changing settings in php.ini. PHP is not perfect.)
- It’s pretty easy to do multiple deployments for development. You just drop another copy of the files somewhere else and change your database settings.
And of course the high-level features:
- Working applications stay working. (Even half-working applications at least stay half-working.)
- Administrators aren’t being hassled to make little changes and fixes all the time.
- Developers can do what they need to do without going through the administrators.
People or organizations that are hybrid developer/administrators don’t care about this so much. And that makes sense; this is all about separating those two roles. Though even developer/administrators would benefit from a better story here because it would let them concentrate more on being developers and the administration part will start taking care of itself.
Solving this is all the harder because of how it interacts with the language itself. But it’s not impossible. Process model 2 — worker processes — are feasible for most languages. You just need a really good process manager (including process setup for isolation, and process monitoring to mitigate problems). Apache alone is not this manager. mod_fastcgi could be this manager, but it’s not. Maybe mod_wsgi will become this manager for Python.
Automatically generated list of related posts:
- Git-as-sync, not source-control-as-deployment I don’t like systems that use git push for deployment...
I think that mod_wsgi’s concept of a “Daemon Mode” will end up being this really good process manager for Python web apps. It seems to have been designed specifically with shared hosting in mind and has some relatively simple configuration directives.
For my own part, I would rather see some sort of setup that makes it easy to install application servers, be they mongrels, pasters, or any other flavor, as well as providing the web server config for apache, lighttpd, ngnix, and so on. Seems a bit more extensible than mod_wsgi.
You sort of understand but you don’t understand how difficult it is to deploy mod_perl in a multi-user environment on a shared-hosting situation. It is hard and difficult, very problematic because of the features mod_perl provides which PHP can’t.
Sandboxing and deployment are 2 big reason mod_php caught on. Not this silly argument that php was written for mod_php.
> Sandboxing and deployment are 2 big reason mod_php caught on. Not this silly argument that php was written for mod_php.
Abraham – think you’re also be lead astray. First “good enough” performance (vs. CGI) also played a big part in why PHP caught on ( http://www.trachtenberg.com/PHP.pdf does a a fair job of documenting the history, if you can see past the PHP advocacy ) but Ian’s point is spot on – PHP was designed to handle memory allocation and error conditions in a manner which is tied directly to HTTP requests, something you don’t just by just embedded Perl/Ruby/Python straight into Apache – there’s description of the design here, although it’s relatively low level – http://devzone.zend.com/article/1021-Extension-Writing-Part-I-Introduction-to-PHP-and-Zend#Heading3 – Lifecycles and Memory Allocation – what mod_php does in it’s “RSHUTDOWN” phase (request shutdown) is what is unique.
Abraham: I’m basically agreeing with you; mod_perl doesn’t work in multi-user environments or on shared hosts, and it has to do with the way the language interpreter is designed. It also has to do with the hooks mod_perl gives you into the Apache process, but even if you decreased the number of hooks that alone wouldn’t be enough. Well, you could make a setup where the Perl process essentially started and stopped for each request, and then you’d have what’s effectively just a Perl CGI script. PHP is a middle ground where they get most of the advantage of CGI and still maintain reasonable performance.
Why build process management into a web sever module rather than as part of the runtime :(
What if frameworks could be error proof “microkernels” of interacting managed/restartable processes ala Erlang “instance as a process” ?
http://bayfp.org/talks/slides/yarivsadan6dec07_bayfp.pdf
Even has hot code swap!
>system administration types (including hosting companies) set up the >environment, and developer types use the environment
I wish this were true… What actually happens is that sysadmin types set up the environment, and developers yell that they need version N+1, and sysadmin types rebuild it and developer types then decide that they also need it built with modules X, Y, and Z too. And now sysadmin types have to manually track upstream updates for PHP and a build and testing environment for it.
You know, exactly the same thing that happens with Python. I can’t off hand think of a single serious python developer that is happy with the version of Python which is shipped in the enterprise distros.
It makes the baby panda cry.
Sean
The great thing about using php for a framework is that it allows nearly anyone to use their existing hosting to “install” new software by uploading “Drupal”, “WordPress”, or “Joomla”. It’s in true omnipresent PHP permissive mentality, and great for putting hundreds of sites on the same box — it is inexpensive.
While modwsgi is designed for commodity shared hosting in mind, it could be used for dedicated hosts just as well. Using modwsgi on shared hosting largely depends on if those admins choose to adapt using modwsgi in a similar way that has made modphp and php omnipresent. Furthermore, using modwsgi apps will still only be able to serve up so many simultaneous sessions on a one application one site model, thus if each site has it’s own process, it will be limited. I’ve tested my python app which has a complex templating module in both modwsgi daemon and embedded mode at around 30-40 simultaneous sessions before it noticeably starts to really grind really slow. (Perhaps I don’t have things configured correctly, by using threading?…) 100 sessions pretty much locked the app up until the sessions ended. I’m also on a VPS w/ 128mb of ram for testing. To scale well, I am imagining that there would need to be several apps used that interact with each other, and some interesting things could be possible with modwsgi daemon mode. Each app could take care of many different sites, and be able to spawn child process if it needs. One of those apps could be to write to disk previous requests, so as to be able to bypass all the other apps and serve directly a static file, this would enable huge performance gains for serving thousands of people those pages simultaneously. As for the user login side, each logged in user could have a duplicate site process dedicated to them. This could allow them to possibly make site wide changes with out ever committing them live, and see them as if they were live. When they are finished they can merge that data with the site data and make the updates, the static process could then recreate the changed pages. All in all, modwsgi, and or a webserver with an equivalent function of daemon management of python apps via wsgi, has lots of potential. One of the downsides is that Apache must be restarted frequently for changes in configuration, and updates to scripts, although I could be missing something… as I havn’t tried configuring such behavior yet and to dig into how modwsgi works. Perhaps something built with Erlang could provide a better webserver for this kind of thing, and allow for python to have an omnipresence among hosting, with lower level control of the system. Thus this would enable far greater complex functionality outside the framework’s language and framework’s design scope more efficiently. A framework thus isn’t it’s own world, but would provide more interfaces to interact with any software on the system. Python can be great for this.
I agree with your premise 100%. For web-based development since PHP was written for that exact environment, PHP is excellent and in an framework it reduces development of complex sites from months to days, outside of a framework PHP’s memory management and speed makes its scalability amazing. However, calling PHP a “CGI-based” process may be ok for the masses, but I nearly stopped reading after that sentence because its not accurate and does PHP a dis-service.
I get your point, PHP cleans up after itself and is fast and that’s super-critical in web environments. I did an eval of a Java based enterprise-class system (many machines each with 32 Gig) with scalability issues because they had many scripts that had to run per scale-level (each script did something different). Java likes to pre-compile and load all its stuff and scripts into memory and keep it there. That’s just the nature of Java. I’m not dissing Java in general – I’d use it instead of PHP in other applications – but it was the wrong tool for the job for a complex, web-based, frequently changing, many many calculation-based, enterprise-level system. The final implementation? A Java-PHP hybrid succeeding where Java by itself failed. But I digress.
My complaint is your use of the term CGI model. CGI (common gateway interface) is just an API for how to call standalone programs. CGI processes can be called from a server running in threaded or worker mode. PHP can be called in CGI mode or via mod_php. A badly written non-PHP CGI script won’t clean up after itself. In one case I saw a badly written OS (not Linux), run on a badly written web server (not Apache), calling a badly written CGI script (not PHP) and it ran out of total memory causing the server to have to reboot.
Again I agree, a badly written PHP script will not those memory issues. PHP cleans up after itself,loads amazingly quickly (even faster with optimizers), etc, etc.; but calling it CGI-based grates against the actual definition of CGI and kills me. Instead of calling it CGI-based I think you should call it something else. I don’t have a good term “requestional-based?” “session-y-ness?” “session-envelope-based?” “web-transactional?”, “web-module-based?” :) ANYTHING but CGI.
I’ll have to candidate this post for “100% right” post of the year! I have been trying to formalize all these aspects (to coworkers and managers, mostly) for a long time, but without ever reaching a reasonably clear explanation. I do not mind too much about cgi vs. worker processes models details, insofar the huge advances in saving developers time are spelled out:
automagic resource, variable and memory teardown PER REQUEST (this is teh magic, baby)
This is what gives imho php a bad rep on slashdot: you can code sloppy and get away with it! global vars are ok!
no shared state (almost, and you can put that in the db, generally).
I had a “revelation” moment once when I rewrote a C app running on text only wireless terminals: turning the quite-standard app into a simple “browser” that a – sent queries to the server and b – always displayed back the received results as ‘a page’ (plus handled a couple of error codes) shrank the codebase to one third, eliminated bugs and most importantly made all subsequent evolutions trivial (just change your code inside the db / server, no more recompiling or deploying ever!). State is your enemy!
nice process / application isolation is a huge bonus, too, even though it is still quite common and not so unlikely to bring a server to his knees (tight neverending loop, allocating huge chunks of memory or a huge number of db connecions or filling up your partition with logs, etc).
All in all my only wish is for even more sandboxing and isolation, eg. running code at different security levels, only allowing a subset of functions/operations, checking it for correctness before executing etc. Then we would need no more stinking template libraries!
Great writeup of the various flavors of process models.
PHP definitely has some big advantages in this realm.
Excellent excellent article, this is exactly why I prefer to use PHP to any other language.
If you use mod_fastcgi+perl but add a ‘die’ statement at the end of your (fastcgi) script, your CGI process prespawn but only service one request.
Doesn’t that get you pretty close to mod_php’s sandboxing/etc?
I had to do something like this to address a memory leak. I ‘died’ my script after more than one request, but the same principal applies.
From what I’ve seen WebFaction doesn’t make any changes to the software. They just run each tool’s HTTP server (most come with one) behind their main Apache server. Their control panel makes it easy to configure which domain should go to which tool. It’s very simple but from my experience it’s rock solid and still very fast.
Not directly related to your point; your article prompted me to blurb about the differences between mod_(perl|php|python) from a closer to the metal standpoint on my blog.
It is very easy to get a PHP installation up and running with adequate debugging for simple scripts, or for simple customisations of existing ones.
This also means that PHP apps (WordPress for example) end up having a huge range of extensions/plugins etc. (getting away with sloppy coding helps).
Saludos,
Gracias a artículos o comentarios como estos, seguiremos trabajando con PHP. Desde hace mucho tiempo desarrollo con php y hasta ahora no he encontrado algo mejor para trabajar mis aplicaciones web’s. Espero que siga así por mucho tiempo, buena esa Bicking..!
Ian, what you think: can Lua (build on C libraries like PHP, but more flexible) man PHP, fill it’s place in Web? Potentially it’s possible. (But Lua, I suppose, aims to accomplish the Web in other way…)
About mod_perl .. If you want to sandbox your mod_perl stuff, shouldn’t you use “PerlOptions +Parent” option like this:
[http://perl.apache.org/docs/2.0/user/config/config.html#C_Parent_](http://perl.apache.org/docs/2.0/user/config/config.html#C_Parent_)
You will then have your own (or pool of) Perl interpreters for every virtualhost .. or even for a specific directories under one vhost.
If you need the simplicity of PHP, and just want to put your stuff under the document root directory and start coding, HTML::Mason is a good choice with very gentle learning curve. When you need more (for example to separate your DB logic to a separate “model” .pm), it allows you to use any native Perl module, as all Mason templates/components are just Perl.
In a multi-site environment, just setup own INC path for every site/customer and run the interpreters in own sandboxes (as described above). For MVC, use Catalyst. You can still use Mason as a View.
Not directly related to your point; your article prompted me to blurb about the differences between mod_(perl|php|python) from a closer to the metal standpoint
Excellent excellent article, this is exactly why I prefer to use PHP to any other language
Continuing the link trail following your post on my blog.
http://frankkoehl.com/2009/01/jeff-atwood-still-wrong-about-php
I’ve never heard the argument (for or against) couched in PHP relationship with Apache, a la mod_php. Brings new perspective to what constitutes the pros and cons.
I think that mod_wsgi’s concept of a “Daemon Mode” will end up being this really good process manager for Python web apps. It seems to have been designed specifically with shared hosting in mind and has some relatively simple configuration directives.
A framework thus isn’t it’s own world, but would provide more interfaces to interact with any software on the system. Python can be great for this.