Ian Bicking: a blog

Monkeypatching and dead ends

Bill de hÓra and then Patrick Logan picked up on an old post of mine about monkeypatching.

Patrick’s reply:

I know next to nothing about the specific problems the Ruby and Python folks are encountering with "monkeypatching". However this capability is nothing new for dynamic languages. And it is a frequent desire for me when I program in C-like languages. If you become frustrated using static "utility" methods, for example in Java, that work with "closed" classes (say, String or Object), then you have at least some desire for these "monkeypatches".

See the thing is this capability is a cool feature in many Lisp and most Smalltalk systems. Sorry, dear readers who hate my Smug Lisp Weeniness. But it is true. Not only is it "cool," moreover it is pragmatic.

The truly good implementations of dynamic languages recognize the advantages of these kinds of extensions, and they’ve supported them with good tools for decades. Learn from it, don’t run from it.

Sharp tools are good. I would not want monkeypatching removed from Python. Still, it’s best not to leave sharp tools lying around. It’s best not to mix your butter knives with your steak knives. I don’t resent the safety guards on circular saws.

And sorry Lisp Weenies: your experiences are not so novel anymore. The Python community isn’t new to this dynamic typing thing. We’ve taken some hits and we’ve learned from them. And frankly the problems with runtime patching of methods can’t be specific to Python or Ruby. It only took the Ruby community a couple years to start catching on. Are you telling me Lisp and Smalltalk programmers still haven’t figured this out? Everything you value about modularity is at risk when you monkeypatch code. That risk can be worth it, of course! But do you really need me to explain the benefits of modularity? What’s next, a recap of the problems with GOTO?

One of the things that I think distinguishes Python among the popular dynamically typed languages of the day, is that it’s built — languages and libraries — on a great deal of concrete experience. Experience about developing with Python. There was a time when people tended to define Python as a delta from Java or Perl or C. We don’t need to do that anymore. Sure, closed classes in Java suck. Python isn’t a reaction to Java’s suckiness. That we can do something Java can’t doesn’t get me excited. This feature of monkeypatching has to stand up on its own, and while sometimes its use is justified those cases are few and far between. That’s what we’ve learned: monkeypatching was not dismissed out of hand, it was not dismissed because of anything in Java, it was dismissed because people used it without acknowledging it as a hack, and it sucked.

Of course the use cases are still there. Which is why people are trying new things to address these problems. One benefit of experience is that you know some paths are dead ends. We still haven’t figured out The One Right Path (and we never will), and maybe we’ve only traced out the longest path in a very long dead end in this maze of ideas we are traversing. Since I doubt the maze has any exit (nirvana?) it’s a valid debate about where we are trying to get at all. That said, I suspect we’ve out-explored Lisp. Lisp has been a worthy mentor, an intrepid explorer in his time, but he’s old and doesn’t get out much and only tells stories of where he’s been in the past. There are still things to be learned there, wisdom to be dug out of that environment, but Lisp and Python are not peers.

2008 03 21

Programming
Python
Ruby

Comments (28)

Permalink

What PHP Deployment Gets Right

With the recent talk on the blogosphere about deployment (and for Django, and lots of other posts too), people are thinking about PHP a bit more analytically. I think people mostly get it wrong.

There are several different process models for the web:

CGI, where every request creates a new process, and the process handles only one request. (You could also fork a new process each request with mostly the same result.)
Worker processes, where a process handles one request at a time, but the process is long-lived and will handle another request after finishing the first.
Threaded, where a process handles multiple requests, each in its own thread. Threads may or may not be reused (depending on whether you use a thread pool), and that’s kind of like the difference between CGI and worker processes. But it’s not a big difference because either way the process is long lived, including all your objects.
Asynchronous, where a process handles multiple requests without threads. Typically your code is event driven: a request is an event, but so are things like "database results returned", or "file read from disk". Code takes the form of short snippets between these events.

What people don’t realize is that PHP is effectively a CGI model of execution. People don’t appreciate this because PHP is implemented with mod_php, an Apache module. There are many other modules like mod_perl (the first of these mod_language modules), mod_python, mod_ruby, etc. None of these other modules are like mod_php. This has led many a commentator astray because they don’t get this. This is because the PHP language was written for mod_php. Perl, Python, Ruby — none of them were written to be used as an Apache module. You can’t take one of these existing languages and just retrofit it to be like PHP or like mod_php.

Why is it important that PHP has a CGI like model? Mostly because it lets two groups separate their work: system administration types (including hosting companies) set up the environment, and developer types use the environment, and they don’t have to interact much. The developers are empowered, and the administrators are not bothered.

Some details:

PHP processes can leak memory like crazy. It doesn’t matter because they only leak memory for one request.
PHP processes can be easily killed if they act badly (sucking up too much memory or if they get caught in an infinite loop).
PHP code has no global state. It has lots of global variables, but since they only live for one request they are relatively harmless. (Not entirely harmless, but what’s a major sin in another language is only a minor sin in PHP.) The process cannot become corrupted.
There’s no global set of installed libraries (besides what PHP is compiled with). If you want to include a library you include files, typically relative paths. You won’t include someone else’s library accidentally. (This has changed in PHP, but the search path remains largely unused.)
Most of the language is implemented in C, in a shared library. In comparison Python has "batteries included", but those batteries are largely written in Python. Python code is not shareable, and can take time to load up. So while a single Python CGI script might be small, it probably imports lots of code which would have to be loaded each request. PHP scripts actually are small. (Stuff like PEAR changes this by adding substantial libraries written in PHP, but also seriously effects PHP performance.)
What few system-dependent libraries and C extensions that are required are compiled into PHP, and hosting companies have kind of figured out a consistent set of these libraries (stuff like database drivers, an image processing library, and XML parsing). Individual developers write only PHP (and that’s all they are typically allowed to write).

These features are all contrary to the design of these other languages, and even more contrary to the conventions in other language communities. Python programmers could write all their libraries in C and start to make CGI scripts feasible, but they don’t and they won’t.

Focusing on mod_php is a goose chase. You have to focus on the features it provides.

Here’s the features I think are important:

The failure cases are isolated. Processes never get wedged. One user doesn’t take out another user’s application. Even one bad page in an application won’t take out the entire application.
File-based deployment. There isn’t a build process for deploying a PHP application. You put the files in the right place and they are deployed. (Configuration in PHP is fuzzy, but PHP is not perfect.)
Minimal global dependencies. If you use libraries then you package those libraries with your application. You don’t worry about the administrator upgrading something on the system and breaking your application. (At least not much, there still things like upgrading mod_php that can break your application, or changing settings in php.ini. PHP is not perfect.)
It’s pretty easy to do multiple deployments for development. You just drop another copy of the files somewhere else and change your database settings.

And of course the high-level features:

Working applications stay working. (Even half-working applications at least stay half-working.)
Administrators aren’t being hassled to make little changes and fixes all the time.
Developers can do what they need to do without going through the administrators.

People or organizations that are hybrid developer/administrators don’t care about this so much. And that makes sense; this is all about separating those two roles. Though even developer/administrators would benefit from a better story here because it would let them concentrate more on being developers and the administration part will start taking care of itself.

Solving this is all the harder because of how it interacts with the language itself. But it’s not impossible. Process model 2 — worker processes — are feasible for most languages. You just need a really good process manager (including process setup for isolation, and process monitoring to mitigate problems). Apache alone is not this manager. mod_fastcgi could be this manager, but it’s not. Maybe mod_wsgi will become this manager for Python.

2008 01 12

Programming
Python
Ruby
Web

Comments (24)

Permalink

Doctest for Ruby

Finally, someone wrote a version of doctest for Ruby.

Recently I’ve been writing most of my tests using stand-alone doctest files. It’s a great way to do TDD — mostly because the cognitive load is so low. Also, I write my examples but don’t write my output, then copy the output after visually confirming it is correct. So the basic pattern is:

Figure out what I want to do
Figure out how I want to test it
Automate my conditions
Manually inspect whether the output is correct (i.e., implement and debug)
Copy the output so that in the future the manual process is automated (doctest-mode for Emacs makes this particularly easy)

The result is a really good balance of manual and automated testing, I think giving you the benefit of both processes — the ease of manual testing, and the robustness of automated testing.

Another good thing about doctest is it doesn’t let you hide any boilerplate and setup. If it’s easy to use doctest, it’s probably easy to use the library.

There’s nothing Python-specific about doctest (e.g., doctestjs), so it’s good to see it moving to other languages. Even if the language doesn’t have a REPL, IMHO it’s worth inventing it just for this.

2007 08 23

Javascript
Programming
Python
Ruby

Comments (5)

Permalink

Ian Bicking: a blog

Ruby

Monkeypatching and dead ends

2008 03 21

What PHP Deployment Gets Right

2008 01 12

Doctest for Ruby

2007 08 23

Home

About

Archives

Categories

Recent Posts

Recent Comments