Ian Bicking: the old part of his blog

Another plan: SQLObject 0.7

A while ago I wrote some thoughts on what SQLObject 0.6 might look like. Well, a lot of those things didn't happen (actually nearly all of them), though the plans were pretty speculative so it's no surprise it didn't happen. But I'm going through another set of refactoring for SQLObject 0.7 (and I'm already part way through it) so I can say better what will happen for that version.

Inheritance

SQLObject normally implements concrete table inheritance, mostly because that was the absolute minimum I could do so that it support any kind of inheritance at all. It was the best I could do to avoid the issue.

But some people like other kinds of inheritance. Daniel Savard implemented a patch to add another kind of inheritance, kind of like class table inheritance -- except that each row is actually typed as part of the most-parent-table.

Oleg Broytmann took over the patch, and after refactoring it a bit it's now part of SQLObject. You have to subclass from InheritableSQLObject to actually use this kind of inheritance (I still couldn't bear bringing this into the main class -- I just don't like imposing object-oriented concepts on a relational database).

Extra Table Information

First, I'll be implementing extra table information, pretty much in the way I described. This is already well along in the repository, and looks like:

class MyTable(SQLObject):
    class sqlmeta:
        table = 'my_funny_table_name'
        style = MixedCaseStyle()
    col = StringCol()
    ...

So every subclass of SQLObject gets its own sqlmeta class. They are buddy classes, with a separate sqlmeta subclass (always with the same name) for each SQLObject subclass. And they are buddy instances too -- anInstance.sqlmeta is an instance of ATable.sqlmeta. Confusing? Hopefully not, I think it makes sense when you are using it.

This sqlmeta class/instance/attribute will hold much of the public API for SQLObject. Your extensions will add custom methods and attributes (/columns) to the SQLObject subclass, but that class will be pretty much yours and yours alone, as all the SQLObject accounting is being moved to sqlmeta.

With this we'll finally add stable, documented APIs for all sorts of introspection, like listing all the columns in a table.

Backward compatibility is being maintained, with settings to control how many warnings you want to get about using the old style of class definitions and attributes. By default there will be lots of warnings, so people new to SQLObject who are working off older documentation will be alerted to the new style early on.

Simplified Metaclass

SQLObject now uses a very simple metaclass:

class DeclarativeMeta(type):

    def __new__(meta, class_name, bases, new_attrs):
        cls = type.__new__(meta, class_name, bases, new_attrs)
        if new_attrs.has_key('__classinit__'):
            cls.__classinit__ = staticmethod(cls.__classinit__.im_func)
        cls.__classinit__(cls, new_attrs)
        return cls

This is along the lines of a conservative metaclass, and all the interesting logic is in SQLObject.__classinit__ (a method which I am also trying to whittle down). Hopefully this will make the code seem more accessible -- while it doesn't change much, it simplifies something that many people find intimidating (metaclasses) and moves the bulk of that logic into a much simpler method (__classinit__).

Active Attributes

Along the lines of what I proposed for setonce earlier, __classinit__ will check for "magic attributes". Right now it looks specifically for subclasses of Col (which are column descriptions) and a couple other special classes. After this change it will look for a special method on attributes: __addtoclass__, and calls that method with the class the attribute is being bound to, and the name it is being given. That method can replace the attribute. It gets a little complicated, because we usually replace the original object with one that is closely tied to the class it is being bound to -- this way the original object can be reused -- but it really comes down to a fairly simple soClass.addColumn(attributeWeJustCreated).

This will be something that other people can use to extend SQLObject. For instance, there's more kinds of joins, like a situation where two tables are related with some string indicating the type of relation (e.g., a person might be related to an object as an owner, author, watcher, etc). I think these active attributes will be easy to write, and allow people to express some of the things they want to express without adding too many ideas and policies into SQLObject itself.

Descriptors

As well as becoming active attributes, column objects will become descriptors. This means property will be used less, and the dynamically-created lambdas will also go away (they are now just being created with eval(), a bit of a bad smell).

The current way of overriding getters and setters (by defining _get_columnName and _set_columnName) will go away, and instead you'll be able to add methods to the column descriptors. These methods will be a bit easier to understand and use.

Descriptors will also do double-duty as sqlbuilder objects, so you don't have to use MyTable.q.column in queries, instead you can just use MyTable.column.

I also plan to add hooks for various events, like when a row is created, updated, or deleted. So maybe a you could implement something where a row was always updated to indicate the last time the row was editor. Or CASCADE could be implemented as a hook on the delete event for ForeignKey.

This last stuff might not make it into 0.7; I'm still working on the active attributes, and some form of descriptor should happen, but the events might be a stretch.

Anyway, that's the plan I have now, if you are curious about where SQLObject is going.

Created 23 Feb '05

Comments:

Sounds great! I'm glad you're going with full-blown descriptors, especially; it makes additional functionality that much easier for developers.

You mentioned:

I also plan to add hooks for various events, like when a row is created, updated, or deleted. So maybe you could implement something where a row was always updated to indicate the last time the row was edited.

That's something I've been trying to finagle into the object layer in my ORM lately. The descriptors are great for firing triggers when each value is modified, but not useful for parallel triggers. For example, if you have, oh... let's say a project-management webapp, you want to send a notification to the project manager when a due date is changed. What happens when both the due date and the owner are changed within the same submission? Ideally, there'd be a way to do the updates first and fire any triggers second (that doesn't require the UI developer to manually serialize). I haven't yet found a good idiom for that.

Of course, the "last time edited" could be handled when you flush the objects at the end of the request, but there are many more complicated interactions I've been running into lately...I'd be interested to hear your opinions.

# Robert Brewer

I am using SQLObject for some weeks now and am very pleased with it. I'm happy to see that is actively being developed. Keep up the good work!

would it be a big task (for me?) to create an Oracle connector?

# Guyon Morée

I wouldn't think it'd be hard, but seeing that everyone starts it (or at least says they have started) but doesn't complete it, maybe there's something mysterious and hard about Oracle support. Especially if you ignore things like _fromDatabase (which is fine for the first version of support) it should be very easy indeed.

# Ian Bicking

Yes, it would be a hard job. Oracle doesn't have a way to quote strings (escape special characters), so one needs to pass a list (or dictionary) of parameters along with parametrized query string. This requires changing the way SQLObject generates queries. I've started to work on this, but there is a long road in front of me. Please join if you are really interested to help.

# Oleg Broytmann

While this will cause a problem for BLOBs and some other types without a SQL literal representation, Oracle does have literal representation of most types of data. It's suboptimal to translate literals into SQL for Oracle to later parse, but it's doable and the way all the other database backends are working at the moment. Most parts of SQLObject would be easy to implement for Oracle, with some border cases that are going to be more difficult.

# Ian Bicking

To a degree its easier here, because SQLObject already centralizes a lot of these operations -- so the descriptors are actually thin delegates back to fixed SQLObject methods. The triggers and events would then also be centrally managed, and would actually be registered during __addtoclass__. There's still some complexity to multiple triggers being fired (and then what order they get fired in, especially when one invalidates the other)... but I'll have to wait to figure that out when the infrastructure is in place to add that complexity ;)

# Ian Bicking

Maybe it's a bit late, but you already don't need to use eval for the dynamically generated lambdas.

e.g. this:

getter = eval('lambda self: self._SO_loadValue(%s)' % repr(instanceName(name)))

could be done like this:

getter = lambda self: self._SO_loadValue(repr(instanceName(name)))

or even done without using nested_scopes at all:

getter = lambda self, name=repr(instanceName(name)): self._SO_loadValue(name)

# Andrew

"I also plan to add hooks for various events, like when a row is created, updated, or deleted. So maybe a you could implement something where a row was always updated to indicate the last time the row was editor. Or CASCADE could be implemented as a hook on the delete event for ForeignKey."

If that sort of thing really matters to your application, then wouldn't you be using a database that had native triggers anyway?

Drawing the line between what you put in app logic and what you put in the database is always tricky. But I tend towards the view that basic data integrity self-defence is more reliable if done in/by the database.

You might put it in the application because: - it's portable across databases (but how often in real life do people decide to switch the database behind a serious application that their business depends on?) - it works for databases that don't support those things natively (but why would you use one of those if data integrity was a serious concern?) - you can see and maintain all your logic in one place

You might put it in the database because: - it's probably probably more efficient. Assume people building databases know more about how to do this sort of stuff quickly and reliably than you (for any likely value of "you") do. - it can't be bypassed by people accessing the same database from a different client application, or even by somebody else writng a different part of the same application

I notice that even MySQL might have triggers soon: http://rootprompt.org/article.php3?article=8289

# Alan Little

In general, I agree, a company probably doesn't switch databases very often. I started my personal project using sqlite and moved over to mysql after development was under way. All it took was a change in the connection string ( and some refactoring to use datetime instead of mx.DateTime which doesn't count ). SQLObject's ability to easily switch backend stores helps me a lot - it makes it easier for me to develop and choose a provider when the time comes. But this is probably true more for small efforts than corporate IT infrastructure projects.

# Kevin Dahlhausen

If you're writing an in-house app, or one targeted at only one DB, then putting the triggers in the DB can make sense (because you don't change databases often, as you pointed out). But if you're writing an app on top of SQLObject which you then want others to use with many different databases, it's nice to write the triggers once, rather than once for each potential target DB.

# Robert Brewer

Well, like other people said, SQLObject can be used at a lower level than any single deployment -- SQLObject can be part of a project which may get installed anywhere. This just came up for a user: http://www.crummy.com/2005/02/18/0 -- and it will come up all the time for open source developers that distribute projects based on SQLObject.

But even then, database triggers are limited; they can operate on the database, but not otherwise. So, if your dependent object isn't a SQLObject instance then a database trigger won't accomplish anything. E.g., if you want a FileCol which only stores a filename in the database, and that file needs to be deleted along with the row. Or if you have two separate databases that have dependencies across the databases.

And even when you are only dealing with the database, database triggers still aren't sufficient, because they don't give SQLObject any feedback, so it doesn't know how to update its cache. SQLObject, for reasons tied to its very ORMness, has problems when the database gets updated behind its back. I'd like to fix some of those problems (and some of these features could help), but in this case SQLObject could still cache aggressively in the presence of in-process cascades.

# Ian Bicking

Good points, Ian. I think I detect a fairly fundamental difference in worldview/emphasis/(don't say the p*gm word!) between:

the application as the primary thing and the db as a dumb persistence store

versus ...

the data as the primary thing and the dbms as its primary guardian; triggers (etc) as the immune system that protect it from things malicious/miguided applications might try to get up to.

It's a design decision about at what layer in the stack you want to have your basic data integrity defence mechanisms, with different valid answers in different circumstances. Although what you seem to be saying/implying is that SQLObject won't be able to cope if you put them anywhere other than in the app's persistence layer. I would expect that (with all due respect) to be very significantly slower and less reliable than native triggers in any mature production rdbms - especially if the app isn't running on the same machine. And even if you can live with that, what if you aren't the only application - or the only instance of the same application - that's talking to the same database?

So you would appear to be limited to one instance of one application that "owns" the database entirely. Which of course is perfectly adequate in many circumstances, but severely limits what you can do if you want to load balance your app / run with existing production databases that already have integrity protections in place / etc.

# Alan Little

Martin Fowler called one style an Application Database (what SQLObject prefers) vs. an Integration Database (where the database is authoritative and the communication medium for multiple applications).

I should really write a more thorough summary of why it's hard to be reasonably efficient with an ORM while also being compatible with an Integration Database style. Not impossible -- it's just a matter of what your expectations are. Hmm... I've never properly written up the performance issues of an ORM; I should do that so I can refer to it at times like these...

# Ian Bicking