Anything worth doing is worth doing poorly: June 2007

Monday, June 18, 2007

Databases as Objects: My schema is a class

In my previous article I wrote that the database is the biggest object in my system. If that is the case, I should be able to test the concept against the Gang of Four's Design Patterns to see how the idea holds up.

But before doing that I need to define, in database terms, what classes are and what their instances may look like.

In OO terms, a class is a template that defines what its instances look like. Cincom's VW Smalltalk's Date class defines two instance variables, day and year. Given those two instance variables any Date class instance can keep track of a date.

My database has a schema. That schema can be executed as a sequence of data definition language (DDL) statements to create a new instance. In addition to our production database we have multiple other instances created with the same schema our developers and quality analysts use to test the system.

Part of a class' template defines its instances methods. Which operations does it support. What behaviors can a user of any of a class' instances expect to be available? Inside a class hierarchy classes inherit the behavior of their superclasses--the classes from which they derive their base behavior. A class can add new behavior or override inherited behavior to create an object with unique capabilities not available in any of its ancestors.

Before I extend any of my database' behaviors, it too, has default behaviors. At the lowest level I can use SQL statements to introspect and interact with my database in all kinds of low-level ways. On their own, these low-level behaviors know nothing of my application or its unique abilities and requirements. Like a class, though, I can add new behavior or even override default behavior using stored procedures and views to provide unique capabilities not available or impractical if they didn't exist.

In the world of Sybase, every database inherits the attributes and behavior of a database named Model.

Model
^
|
|
Efinnet

By itself, this is beginning to look like a class tree--though a very shallow one. Something's belonging to a tree isn't more probably based on the depth of a tree (or its lack of depth). In fact, many OO designers are advocating for shallower hierarchies. In either respect, our database fits right in.

We already talked about instance variables and methods, but what are some of the other OO-ish things my database can do?

Persistence - One if its most important features is its ability to persist itself on disk and maintain its integrity. The entire state of my system is preserved and maintained inside my database object.

Introspection - My database can tell me things about itself, its variables and its methods

Composition - My database is composed of other objects called tables. Some of the tables were inherited from its superclass, others were added to extend its functionality.

Singleton - Instances of my database exist as singletons. For each instance of my system one, and exactly one, instance of my database exists to preserve and protect the state of my system.

Messages - The only way I can communicate to it is by sending messages to it. I can not (and care not) to manipulate its data directly at a low level (disk) because that would risk its integrity--not in a referential way but at a disk-level consistency way.

Extendability - I can extend my database's schema to define new variables (tables) and behaviors (procedures). Even better, I can apply the new schema its instances.

It's amazing it took me 20+ years to recognize the similarities between objects and databases. But now that I'm confident my database is an instance of my schema and in other important respects is in fact an object (singleton) of its own, I can start visiting various of the GoF's patterns to see how well they apply.

Monday, June 11, 2007

I remember my first time...

A recent ACM Queue Advice Column by Kode Vicious, called Advice to a Newbie, asked:

Do you remember the first time? The first time when, after struggling with a piece of code, you felt not only "I can do this!" but also "I love doing this!"

I still remember that rush. It was addictive. When it happened I decided what I wanted to be when a grew up: a computer programmer.

In 1983 I was senior at Troy High School, in Michigan. I was taking a computer programming elective at the same time I was taking trigonometry. We were learning BASIC on Apple IIe computers. Our final assignment was to write a graphic animation of something. Anything. Mine was influenced by both being a high-school student and America's 1981 return to space with NASA's shuttle program.

Using BASIC and the IIe's low-resolution graphics (pixels that seemed the size of a Tic-Tac) I simulated the launch of the Space Shuttle Columbian (did I mention I was in high school?). My rendering of the shuttle was as good as it could have been, considering the resolution, and included a 10-second count-down, smoke, flames, and a lift-off off the top of the screen. After that the shuttle was flying right-to-left, with the appearance of motion provided by stars in the background moving left-to-right. The loops were gigantic. Inside the loops the program made sure the stars disappeared behind the shuttle and reappeared at the appropriate time.

Then the pièce de résistance, a PacMan moved across the screen and gobbled the shuttle into nothingness.

I got an A.

But better than that, I triumphed over the task using BASIC and geometry. The loops moving the stars non-destructively behind the shuttle were nothing compared to the routines to open and close the PacMan's mouth as it moved across the screen. I remember how impressed my parents pretended to be when I showed them the print-out of the code.

I also remember how slow the program ran. It seemed everything was happening under water. I could almost make out each line of the PacMan's mouth closing drawing yellow then black again to open it as it devoured the space ship.

But then something amazing happened.

Our teacher, Mr. Ralph Nutter, who was my older brother's math teacher and swim coach a few years earlier, demonstrated all our projects in front of the entire class--but now they were compiled into machine language. The lift-off was smooth and the screen almost looked as though it were on fire. Most importantly, my PacMan moved across the screen so smoothly and cleanly the jagged resolution was invisible, and it seemed to race over the shuttle so gloriously I could hear the game's music playing inside my head.

And I was hooked.

That was 24 years ago and to this day, it is one of the single biggest life-changing events of my life. Almost everything that's happened to me since turned on what happened that last May in 1983, 4th hour, in the closing days of my last year in school.

Wednesday, June 6, 2007

The database is the biggest object in my system

After posting a link to my The TOA of Tom a couple interesting discussions occurred inside comp.object. While responding to Bryce Jacobs, a better way of describing what we're doing came to me. It's buried in:

In fact, after inspecting multiple C APIs for LDAP, iCAL, and other libraries it appears it's not even foreign to C programmers. Often a structure is used to hold state information for a connection to a resources, but the format of that information isn't exposed to API-users except through an API. Even when a method may only be returning a value from the structure, API programmers know that level of indirection affords them flexibility inside the structure they may use without negatively impacting all the API's users.

So a common pattern is used by both C and OO programmers. What my paper is promoting (and I'll try to do a better job explaining) is that the same pattern be applied to how OO programs access the DB.

Essentially, my paper on transaction processing encourages thinking of the database as one big object with all the rules for data hiding and interfaces OO programmers are already acquainted with.

Why shouldn't applications have embedded SQL? Because it's the same as accessing the private data members of an object. It shouldn't be done. OO programmers know the correct way to interface with an object is to use its method interface--not attempt direct manipulation of the object's data. OO programmer's attempts to violate that rule is what causes so much frustration mapping the application's data graph into a relational database's tables, rows, and columns. Those things belong to the DB--not to the application.

Now, OO programmers and system designers can return to their favorite Patterns books and reevaluate the lessons from a new perspective. Should make some interesting reading.

Monday, June 4, 2007

Transaction Oriented Architecture (aka The TOA of Tom)

I know, I know. It's spelled T-A-O.

I'm willing to go out on a limb and say most new programming in the last 15 years has used object-oriented languages and relational databases. I hear complaints already. True, there are exceptions, but however big you think they are they're only rounding errors compared to the military-industrial complex which has become the object-relational mega market featuring Java, C#, PHP, Python and other OO programming languages.

I'll go out on another limb (while I'm here) and claim most of those systems are horribly designed. I know that because a) they didn't ask for my help and b) everyone's complaining the offspring of OO/RDB shotgun weddings have contorted features. Design patterns for persistence and object hibernation have become more most-popular-kludges and less best-practice how-tos.

What is a transaction?

For us, a transaction is anything that might happen in our system for which we want to provide security and an audit trail. Everything that changes our system is recorded with a user name, a program name, a post date, an effective date, and a transaction type. Financial transactions include amounts and account numbers. Data changes the object and field changed, and the old and new values. Right or wrong, whatever happens in our system leaves a trail.

That trail provides two important features. First, no matter what happens to the system we know how it got that way. If an error is made we know how it happened and how to correct it. It's important to understand we don't back out transactions. We add new transactions reversing the negative (incorrect) affects of the errors. Pretending bad things didn't happen by backing-out transactions and resetting the data to pre-mistake values doesn't make them go away. Erasing data only creates an opportunity for the database to get out-of-sync with the audit trail and weakens the system's integrity if the popular mechanism for correcting mistakes is erasing the evidence.

In the movie Clear and Present Danger, the President of the United States is worried the press will discover one of his best friends was laundering money for a Columbian drug cartel. Presidential advisers recommend putting distance between the president and his murdered friend. Responding to the suggestions the press won't find out, the president responds saying, "They will. They always do." CIA analyst Jack Ryan (Harrison Ford) recommends the opposite approach saying, "There's no sense defusing a bomb after it's already gone off."

Always forward, never backward.

The second major feature is the ability to tell users (and auditors) what the system looked like at any point in time. It's easy to tell users what it looks like now, but what about last month? Or the second quarter last year? Or how about comparing this year-to-date with last-year-to-date? Transactions make that possible.

Related to that second feature is the fact that however the system appears today it is merely the end result of all the transactions posted since our current system bootstrapped October 1, 2002. Most of the database can be recreated by reposting transaction history. For example, today we could lose the entire account table's contents and derive the correct ending balances from history.

Database integrity is paramount and transaction history is an important ingredient. If the database is incorrect or its integrity lost, everything else is cosmetic.

More on transactions later.

Wagging the database

If the database and its integrity are so important, when should its design be influenced by programming language? In a word, never. To believe otherwise suggests a database' design should change whenever the application programming language changes and that applications written in other languages can not be properly accommodated with a database designed for another. Conceiving of that kind of dependence is counter-axiomatic to tenants OO designers strive for: high cohesion within a module and low coupling between modules. Making database persistence an innate responsibility to every business object destroys each object's cohesion and tightly couples those same objects to something they shouldn't be co-dependent with-a database. The result is objects with low-cohesion and tight-coupling. To make matters worse, the contagion isn't another object in the OOPL that can be easily coded-away but a remote object that throws exceptions, is often network-remote, and is affected by many more external influences to design and performance than the objects attempting to integrate its utility with their persistence models.

".. like another hole in the head"

In our own system, the database is just as easily accessed from PHP and Smalltalk as it is from Sybase's isql or the open-source is (written in C). Even were the database inclined to favor object oriented languages like PHP and Smalltalk it must still treat other paradigms equitably. I can think of few other languages as far apart in paradigm as Smalltalk and C.

Think of it as a due-process clause protecting the rights of applications no matter their language, paradigm, compilation, interpretation, or generation, or OS origins.

It's difficult to imagine all these products, frameworks, and seminars for something we want to pretend doesn't exist-the database. We don't need consultants-we need therapy.

The first step of whatever 12-step program launches the next beverage revolution and meets Wednesday nights is to admit our denial. It's not that we don't know the database exists-it's that we want it to disappear. We want it to go away. When we rub our eyes hard, in-between the kaleidescope colors we imagine pure object databases undetectable to the untrained pointy-haired manager or offshore programmer. But when our eyes open we're reminded how horribly awkward, non-standard, and even more difficult-to-share-than-query object-oriented databases are (and wait 'til you see the billing rates those consultants get).

What does it look like?

Everything InStream does, from production support all the way down to development follows a paradigm, if you will, of transaction oriented processing. It's a way of looking at the purpose of your database and how programs interact with it that nullifies any impulse programmers may have to attempt to make the database reflect OO models or OO models reflecting the DB design.

After the database has been designed and independent of whatever language will predominate application programming, stored procedures are created for adding, changing, and querying the database. With those procedures in place scripts can easily be written to populate the database with data so that tests can prove the database is complete and properly designed. It is important to begin exercising the database early in a development process to discover any relational awkwardness and to establish performance baselines for both data changes and queries.

Using database stored procedures also insulates application code from database design changes from the trivial (renaming a column) to the extreme (redesigned tables). Procedures form the first and lowest-level Application Programming Interface (API) of a layered system. Additionally, most relational database provide mechanisms for finding inter-dependencies within the database, like which views depend on which tables, which tables contain which columns, and which procedures depend on which database objects, as well as dependencies between the procedures themselves.

HINT: Don't embed SQL. Once in production you want to minimize the impact of any post-production changes. You only release a system once. Everything after that is an update. The more difficult updates are to deploy the more risk is involved attempting them which may lead to reluctance tackling them and the more traumatic they become. Keep 'em simple and your system will be able to grow without stretching your customers patience.

Friday, June 1, 2007

Programming Rules

In the seven years our company has been around the technological landscape has changed dramatically. Additionally, the way our system is assembled and the components it's assembled from have also changed. Can a system go through that much change and still be consistent with an architecture described seven years earlier?

Yes.

Rules, as I've said in earlier postings, are more durable than technology. That is why rules are important. Design rules persist across technological smokestacks. Good rules are good rules regardless which database you use, regardless which language you develop in, regardless whether your application exists on the web, is client-server, or exists stand-alone in the back office.

I won't give-up all the secret ingredients, but here are some of our favorites at InStream Services:

It's better to be explicit than implicit

The code systems are built with, regardless of programming language, aren't susceptible to the same fragile memories humans suffer. Code never forgets, but humans do. Has that ever happened to you? It happened to me. Considering how complicated software systems are and how many distractions humans are challenged with on a day-to-day basis, having code that distributes its implementation across classes, procedures, and triggers doesn't help programmers understand how something works and actually erects barriers to its enhancement. This is one of the reasons our system has only a single database trigger. Want to know what a procedure does? Everything it does is right inside the procedure's code.

If something is broken I don't want to be hunting around forever tracking it down.

No Polling

Besides wasting CPU it's a cop-out to a work-flow problem. There's no excuse for not knowing when the next thing needs to be done.

No Parsing

We don't write compilers or invent languages. We're surrounded by technology that already knows how to parse grammars. Find the right tool and use it. There are better ways to know what's coming next.

Once and only once

I hate solving problems twice. Keeping both sections of code in-sync with each other and the business is like making sure twins both get the same amount of ice cream. By the time I've measured it perfectly it's already melting. If it seems like it needs to be solved in two different places chances are it should be moved somewhere else and solved once.

No harm running twice

Humans are fallible. Everyone knows it. Why write programs, scripts, fixes, or patches that depend on being run under perfect conditions? Fix scripts should make sure things are still broken before they try to fix something that isn't. Programs should know not to do anything if there isn't anything to do. Nearly everything we've written can be run as many times as people feel like running it without negative consequences. There are enough things that can go wrong that aren't under our control--let's not add to them with things we do control.

Do one thing and do it well

Structured programming, refactoring, and the Unix shells have demonstrated how powerful a concept it is to do one thing and do it well. Narrowing the utility of a program reduces side effects and increases its utility. When all a system's programs do only one thing there is less chance of overlap and duplication. It also means that when something goes wrong there are fewer programs that require fixing--and fewer places to look in the first place.

Desirable undefined behavior

What happens when an unidentified transaction arrives? What happens when something occurs that wasn't planned for and stopping the system is an unacceptable option? You make sure there's default behavior that does something harmless, can be audited, and alert the authorities.

One of the constraints of our system is to be able to import supply chain data from multiple external systems we don't control (see Database Prenutials). Because 3rd-party data is sometimes incomplete our systems need to do something reasonable with imperfect data, especially since the alternative risks the database's integrity.

In production it's important to know that when bugs appear the likelihood of something bad happening is minimized because there's appropriate default behaviors. We know where our system is most at-risk and write our software with harmless fail-safes.

These are some of our rules. At times we've been tempted to break them, but found ways to stick to them. As I wrote in the first Rules to Live By article, when multiple programmers and designers are working with little supervision it's best if they all know what the rules are so no one, especially me, is surprised with the implementation--except in a good way.