Monday, June 4, 2007

Transaction Oriented Architecture (aka The TOA of Tom)

I know, I know. It's spelled T-A-O. 

I'm willing to go out on a limb and say most new programming in the last 15 years has used object-oriented languages and relational databases. I hear complaints already. True, there are exceptions, but however big you think they are they're only rounding errors compared to the military-industrial complex which has become the object-relational mega market featuring Java, C#, PHP, Python and other OO programming languages. 

I'll go out on another limb (while I'm here) and claim most of those systems are horribly designed. I know that because a) they didn't ask for my help and b) everyone's complaining the offspring of OO/RDB shotgun weddings have contorted features. Design patterns for persistence and object hibernation have become more most-popular-kludges and less best-practice how-tos. 

What is a transaction? 

For us, a transaction is anything that might happen in our system for which we want to provide security and an audit trail. Everything that changes our system is recorded with a user name, a program name, a post date, an effective date, and a transaction type. Financial transactions include amounts and account numbers. Data changes the object and field changed, and the old and new values. Right or wrong, whatever happens in our system leaves a trail. 

That trail provides two important features. First, no matter what happens to the system we know how it got that way. If an error is made we know how it happened and how to correct it. It's important to understand we don't back out transactions. We add new transactions reversing the negative (incorrect) affects of the errors. Pretending bad things didn't happen by backing-out transactions and resetting the data to pre-mistake values doesn't make them go away. Erasing data only creates an opportunity for the database to get out-of-sync with the audit trail and weakens the system's integrity if the popular mechanism for correcting mistakes is erasing the evidence. 

In the movie Clear and Present Danger, the President of the United States is worried the press will discover one of his best friends was laundering money for a Columbian drug cartel. Presidential advisers recommend putting distance between the president and his murdered friend. Responding to the suggestions the press won't find out, the president responds saying, "They will. They always do." CIA analyst Jack Ryan (Harrison Ford) recommends the opposite approach saying, "There's no sense defusing a bomb after it's already gone off." 

Always forward, never backward. 

The second major feature is the ability to tell users (and auditors) what the system looked like at any point in time. It's easy to tell users what it looks like now, but what about last month? Or the second quarter last year? Or how about comparing this year-to-date with last-year-to-date? Transactions make that possible. 

Related to that second feature is the fact that however the system appears today it is merely the end result of all the transactions posted since our current system bootstrapped October 1, 2002. Most of the database can be recreated by reposting transaction history. For example, today we could lose the entire account table's contents and derive the correct ending balances from history. 

Database integrity is paramount and transaction history is an important ingredient. If the database is incorrect or its integrity lost, everything else is cosmetic. 

More on transactions later. 

Wagging the database 

If the database and its integrity are so important, when should its design be influenced by programming language? In a word, never. To believe otherwise suggests a database' design should change whenever the application programming language changes and that applications written in other languages can not be properly accommodated with a database designed for another. Conceiving of that kind of dependence is counter-axiomatic to tenants OO designers strive for: high cohesion within a module and low coupling between modules. Making database persistence an innate responsibility to every business object destroys each object's cohesion and tightly couples those same objects to something they shouldn't be co-dependent with-a database. The result is objects with low-cohesion and tight-coupling. To make matters worse, the contagion isn't another object in the OOPL that can be easily coded-away but a remote object that throws exceptions, is often network-remote, and is affected by many more external influences to design and performance than the objects attempting to integrate its utility with their persistence models. 

".. like another hole in the head" 

In our own system, the database is just as easily accessed from PHP and Smalltalk as it is from Sybase's isql or the open-source is (written in C). Even were the database inclined to favor object oriented languages like PHP and Smalltalk it must still treat other paradigms equitably. I can think of few other languages as far apart in paradigm as Smalltalk and C. 

Think of it as a due-process clause protecting the rights of applications no matter their language, paradigm, compilation, interpretation, or generation, or OS origins. 

It's difficult to imagine all these products, frameworks, and seminars for something we want to pretend doesn't exist-the database. We don't need consultants-we need therapy. 

The first step of whatever 12-step program launches the next beverage revolution and meets Wednesday nights is to admit our denial. It's not that we don't know the database exists-it's that we want it to disappear. We want it to go away. When we rub our eyes hard, in-between the kaleidescope colors we imagine pure object databases undetectable to the untrained pointy-haired manager or offshore programmer. But when our eyes open we're reminded how horribly awkward, non-standard, and even more difficult-to-share-than-query object-oriented databases are (and wait 'til you see the billing rates those consultants get). 

What does it look like? 

Everything InStream does, from production support all the way down to development follows a paradigm, if you will, of transaction oriented processing. It's a way of looking at the purpose of your database and how programs interact with it that nullifies any impulse programmers may have to attempt to make the database reflect OO models or OO models reflecting the DB design. 

After the database has been designed and independent of whatever language will predominate application programming, stored procedures are created for adding, changing, and querying the database. With those procedures in place scripts can easily be written to populate the database with data so that tests can prove the database is complete and properly designed. It is important to begin exercising the database early in a development process to discover any relational awkwardness and to establish performance baselines for both data changes and queries. 

Using database stored procedures also insulates application code from database design changes from the trivial (renaming a column) to the extreme (redesigned tables). Procedures form the first and lowest-level Application Programming Interface (API) of a layered system. Additionally, most relational database provide mechanisms for finding inter-dependencies within the database, like which views depend on which tables, which tables contain which columns, and which procedures depend on which database objects, as well as dependencies between the procedures themselves. 

HINT: Don't embed SQL. Once in production you want to minimize the impact of any post-production changes. You only release a system once. Everything after that is an update. The more difficult updates are to deploy the more risk is involved attempting them which may lead to reluctance tackling them and the more traumatic they become. Keep 'em simple and your system will be able to grow without stretching your customers patience.

No comments:

Post a Comment

Follow @TomGagne