Ian Bicking: the old part of his blog

SQLObject API redesign

A tentative plan, 20 Jan 2004

(discussion on the mailing list, sqlobject-discuss@lists.sf.net)

Introduction

During vacation I thought about some changes that I might like to make to SQLObject. Several of these change the API, but not too drastically, and I think they change the API for the better. And we'd not at 1.0 yet, changes are still allowed! Here's my ideas...

Editing Context

Taken from Modeling, the "editing context" is essentially a transaction, though it also encompasses some other features. Typically it is used to distinguish between separate contexts in a multi-threaded program.

This is intended to separate several distinct concepts:

Example:

from SQLObject import EditingContext
ec = EditingContext()
# every editing context automatically picks up all the SQLObject
# classes, all magic like.
person = ec.Person.get(1) # by ID
ec2 = EditingContext() # separate transaction
person2 = ec.Person.get(1)
assert person is not person2
assert person.id == person2.id
assert person.fname == 'Guy'
person.fname = 'Gerald'
assert person2.fname == 'Guy'
ec.commit() # SQL is not sent to server
assert person2.fname == 'Guy' # Doesn't see changes
person2.fname = 'Norm'
# raises exception if locking is turned on; overwrites if locking
# is not turned on.  (Locking enabled on a per-class level)

I'm not at all sure about that example. Mostly the confusing parts relate to locking and when the database lookup occurs (and how late a conflict exception may be raised).

Somewhere in here, process-level transactions might fit in. That is, even on a backend that doesn't support transactions, we can still delay SQL statements until a commit/rollback is performed. In turn, we can create temporary "memory" objects, which is any object which hasn't been committed to the database in any way. To do this we'll need sequences -- to preallocate IDs -- which MySQL and SQLite don't really provide :(

Nested transactions...? Maybe they'd fall out of this fairly easily, especially if we define a global context, with global caches etc., then further levels of context will come for free.

We still need to think about an auto-commit mode. Maybe the global context would be auto-commit.

Caching

Really doing transactions right means making caching significantly more complex. If the cache is purely transaction-specific, then we'll really be limiting the effectiveness of the cache. With that in mind, a copy-on-write style of object is really called for -- when you fetch an object in a transaction, you can use the globally cached instance until you write to the object.

Really this isn't copy-on-write, it's more like a proxy object. Until the object is changed, it can delegate all its columns to its global object for which it is a proxy. Of course, traversal via foreign keys or joins must also return proxied objects. As the object is changed -- perhaps on a column-by-column basis, or as a whole on the first change -- the object takes on the personality of a full SQLObject instance.

When the transaction is committed, this transactional object copies itself to the global object, and becomes a full proxy. These transactional caches themselves should be pooled -- so that when another transaction comes along you have a potentially useful set of proxy objects already created for you. This is a common use case for web applications, which have lots of short transactions, which are often very repetitive.

In addition to this, there should be more cache control. This means explicit ways to control things like:

  1. Caching of instances:
    • Application/process-global definition.
    • Database-level definition.
    • Transaction/EditingContext-level definition.
    • Class-level definition.
  2. Caching of columns:
    • Class-level.
  3. Cache sweep frequency:
    • Application/process-global.
    • Database-level.
    • Class-level.
    • Doesn't need to be as complete as 1; maybe on the class level you could only indicate that a certain class should not be sweeped.
    • Sweep during a fetch (e.g., every 100 fetches), by time or fetch frequency, or sweep with an explicit call (e.g., to do sweeps in a separate thread).
  4. Cache sweep policy:
    • Maximum age.
    • Least-recently-used (actually, least-recently-fetched).
    • Random (the current policy).
    • Multi-level (randomly move objects to a lower-priority cache, raise level when the object is fetched again).
    • Target cache size (keep trimming until the cache is small enough).
    • Simple policy (if enough objects qualify, cache can be of any size).
    • Percentage culling (e.g., kill 33% of objects for each sweep; this is the current policy).
  5. Batching of updates (whether updates should immediately go to the database, or whether it would be batched until a commit or other signal).
  6. Natural expiring of objects. Even if an object must persist because there are still references, we could expire it so that future accesses re-query the database. To avoid stale data.

Expose some methods of the cache, like getting all objects currently in memory. These would probably be exposed on a class level, e.g., all the Addresses currently in memory via Address.cache.current() or something. What about when there's a cached instance in the parent context, but not in the present transaction?

Columns as Descriptors

Each column will become a descriptor. That is, Col and subclasses will return an object with __get__ and __set__ methods. The metaclass will not itself generate methods.

A metaclass will still be used so that the descriptor can be tied to its name, e.g., that with fname = StringCol(), the resultant descriptor will know that it is bound to fname.

By using descriptors, introspection should become a bit easier -- or at least more uniform with respect to other new-style classes. Various class-wide indexes of columns will still be necessary, but these should be able to remain mostly private.

To customize getters or setters (which you currently do by defining a _get_columnName or _set_columnName method), you will pass arguments to the Col object, like:

def _get_name(self, dbGetter):
    return dbGetter().strip()

name = StringCol(getter=_get_name)

This gets rid of _SO_get_columnName as well. We can transitionally add something to the metaclass to signal an error if a spurious _get_columnName method is sitting around.

Construction and Fetching

Currently you fetch an object with class instantiation, e.g., Address(1). This may or may not create a new instance, and does not create a table row. If you want to create a table row, you do something like Address.new(city='New York', ...). This is somewhat in contrast to normal Python, where class instantiation (calling a class) will create a new object, while objects are fetched otherwise (with no particular standard interface).

To make SQLObject classes more normal in this case, new will become __init__ (more or less), and classes will have a get method that gets an already-existant row. E.g., Address.get(1) vs. Address(city='New York', ...). This is perhaps the most significant change in SQLObject usage. Because of the different signatures, if you forget to make a change someplace you will get an immediate exception, so updating code should not be too hard.

Extra Table Information

People have increasingly used SQLObject to create tables, and while it can make a significant number of schemas, there are several extensions of table generation that people occasionally want. Since these occur later in development, it would be convenient if SQLObject could grow as the complexity of the programs using it grow. Some of these extensions are:

Some of these may be globally defined, or defined for an entire database. For example, typically you'll want to use a common MySQL table type for your entire database, even though its defined on a per-table basis. And while MySQL allows global permission declarations, Postgres does not and requires tedious repetitions of the permissions for each table -- so while it's applied on a per-table basis, it's likely that (at least to some degree) a per-database declaration is called for. Naming schemes are also usually database-wide.

As these accumulate -- and by partitioning this list differently, the list could be even longer -- it's messy to do these all as special class variables (_idName, etc). It also makes the class logic and its database implementation details difficult to distinguish. Some of these can be handled elegantly like id = StringCol() or id = ("fname", "lname"). But the others perhaps should be put into a single instance variable, perhaps itself a class:

class Address(SQLObject):
    class SQLMeta:
        mysqlType = 'InnoDB'
        naming = Underscore
        permission = {'bob': ['select', 'insert'],
                      'joe': ['select', 'insert', 'update'],
                      'public': ['select']}
    street = StringCol()
    ....

The metadata is found by its name (SQLMeta), and is simply a container. The class syntax is easier to write and read than a dictionary-like syntax. Or, it could be a proper class/instance and provide a partitioned way to handle introspection. E.g., Address.SQLMeta.permission.get('bob') or Address.SQLMeta.columns. In this case values that weren't overridden would be calculated from defaults (like the default naming scheme and so on).

I'm not at all certain about how this should look, or if there are other things that should go into the class-meta-data object.

Joins, Foreign Keys

First, the poorly-named MultipleJoin and RelatedJoin (which are rather ambiguous) will be renamed ManyToOneJoin and ManyToManyJoin. OneToOneJoin will also be added, while ForeignKey remains the related column type. (Many2Many? Many2many? many2many?)

ForeignKey will be driven by a special validator/converter. (But will this make ID access more difficult?)

Joins will return smart objects which can be iterated across. These smart objects will be related to SelectResults, and allow the same features like ordering. In both cases, an option to retrieve IDs instead of objects will be allowed.

These smarter objects will allow, in the case of ManyToManyJoin, Set like operations to relate (or unrelate) objects. For ManyToOneJoin the list/set operations are not really appropriate, because they would reassign the relation, not just add or remove relations.

It would be nice to make the Join protocol more explicit and public, so other kinds of joins (e.g., three-way) could be more accessible.

Created 19 Jan '04
Modified 14 Dec '04

Comments:

Hi Ian:)

This really sound strange for me. Despite i used Modeling for over 2 years now, I'm wondering why you want to have the same thing in SQLObject.

Should SLQObject have the same API than Modeling, or MiddleKit ?

I really think SQLObject is good for what it tend to be: 'fetching raw in the DB','caching results' and 'saving them when needed'. Modeling is far more complex, with its validation scheme, nested editing context and so on ..

Beside i'm not a SQLObject developper, i don't really understand why you again try to _clone_ the EOF. If you really want this kind of stuff, why you don't help Modeling guy (Big) which (i really think) will be happy to count a new developper. (the team is quite small right now)


-- Random things: ---
-In the EOF there is some qualifier for fetching objects so ec.Person.get(1) really sound strange.
- The major reason why i like SQLobject it's because it do what it should. There is nothing magic about it. Simply caching by IDs. So now, if you try to write a application that use the a DB Api which do a lot of magic, like 'nested' Editing Context. This is far more complex, because you need to think about 'in which context i'm', 'is this object is my parent Ec ?' and so on .. this is why I now use SQLObject over Modeling. I need something simple, thread safe ..
- And anyway, if you really want to do something like this, take a look closer at the Modeling API. To insert a new object in DB, simply add it too the Editing Condext and ec.save() it. no new() magic ..


Bye Bye..
# Jkx

The only change I'm suggesting like Modeling is the editing context, which seems like a more elegant way to deal with transactions than SQLObject's current implementation of pseudo-connections (which is a kind of confusing API). This doesn't change SQLObject dramatically, and wouldn't effect most of the code internally. And given that change, SQLObject would still be very different than Modeling.
# Ian Bicking

But EditingContext is the core of the modeling. Caching, fetching.. is done in the EditingContext. Could you explain what will be the difference ?
# Jkx

I'm not planning on taking the EditingContext from Modeling wholesale, it just happens to be a good metaphor. The actual API will probably be significantly different. And, in some cases (like getting and storing objects) I specifically don't plan to do it like Modeling.

The actual API I will implement? I don't know, I haven't figured that out yet ;)
# Ian Bicking

Can ID be compound object? For example for replication I can create primary key
(DB_instance_ID, row_ID). SLQObject can work with such table?
# dvb

Greetings - I just cooked up my first example objects with the current .5x stream of SQLObject and was nicely rewarded. I love the table schema generation - one of the things I've always loathed was maintaining the table/object mapping metadata in two different places.

I'm really shocked at how liberating this is! ;)

I've done a number low to middle complexity solutions using "ORM" (tuxweb.de) as well as Titus Brown's Cucumber solution (which makes elegant use of Postgresql inheritance, but is therefore not portable to other systems. I rather like the API in SQLObject; a quick comparison of current performance shows it to be somewhat more efficient at some common tasks, and easy to side step when required.

Just going to check out the new work in subversion and see how that compares.
Mike
# Mike Watkins

For the extra table information, I think it would also be useful to have a way to change the Character Set. I'm not sure if that is included in InnoDB, but I thought I'd mention it just in case.
# Brian Cooley-Gilliom