Monday, June 18, 2007

Databases as Objects: My schema is a class

In my previous article I wrote that the database is the biggest object in my system. If that is the case, I should be able to test the concept against the Gang of Four's Design Patterns to see how the idea holds up. 

But before doing that I need to define, in database terms, what classes are and what their instances may look like. 

In OO terms, a class is a template that defines what its instances look like. Cincom's VW Smalltalk's Date class defines two instance variables, day and year. Given those two instance variables any Date class instance can keep track of a date. 

My database has a schema. That schema can be executed as a sequence of data definition language (DDL) statements to create a new instance. In addition to our production database we have multiple other instances created with the same schema our developers and quality analysts use to test the system. 

Part of a class' template defines its instances methods. Which operations does it support. What behaviors can a user of any of a class' instances expect to be available? Inside a class hierarchy classes inherit the behavior of their superclasses--the classes from which they derive their base behavior. A class can add new behavior or override inherited behavior to create an object with unique capabilities not available in any of its ancestors. 

Before I extend any of my database' behaviors, it too, has default behaviors. At the lowest level I can use SQL statements to introspect and interact with my database in all kinds of low-level ways. On their own, these low-level behaviors know nothing of my application or its unique abilities and requirements. Like a class, though, I can add new behavior or even override default behavior using stored procedures and views to provide unique capabilities not available or impractical if they didn't exist. 

In the world of Sybase, every database inherits the attributes and behavior of a database named Model. 

Model 



Efinnet 

By itself, this is beginning to look like a class tree--though a very shallow one. Something's belonging to a tree isn't more probably based on the depth of a tree (or its lack of depth). In fact, many OO designers are advocating for shallower hierarchies. In either respect, our database fits right in. 

We already talked about instance variables and methods, but what are some of the other OO-ish things my database can do? 

Persistence - One if its most important features is its ability to persist itself on disk and maintain its integrity. The entire state of my system is preserved and maintained inside my database object. 

Introspection - My database can tell me things about itself, its variables and its methods 

Composition - My database is composed of other objects called tables. Some of the tables were inherited from its superclass, others were added to extend its functionality.

Singleton - Instances of my database exist as singletons. For each instance of my system one, and exactly one, instance of my database exists to preserve and protect the state of my system. 

Messages - The only way I can communicate to it is by sending messages to it. I can not (and care not) to manipulate its data directly at a low level (disk) because that would risk its integrity--not in a referential way but at a disk-level consistency way. 

Extendability - I can extend my database's schema to define new variables (tables) and behaviors (procedures). Even better, I can apply the new schema its instances. 

It's amazing it took me 20+ years to recognize the similarities between objects and databases. But now that I'm confident my database is an instance of my schema and in other important respects is in fact an object (singleton) of its own, I can start visiting various of the GoF's patterns to see how well they apply. 

Monday, June 11, 2007

I remember my first time...

A recent ACM Queue Advice Column by Kode Vicious, called Advice to a Newbie, asked:
Do you remember the first time? The first time when, after struggling with a piece of code, you felt not only "I can do this!" but also "I love doing this!"
I still remember that rush. It was addictive. When it happened I decided what I wanted to be when a grew up: a computer programmer.

In 1983 I was senior at Troy High School, in Michigan. I was taking a computer programming elective at the same time I was taking trigonometry. We were learning BASIC on Apple IIe computers. Our final assignment was to write a graphic animation of something. Anything. Mine was influenced by both being a high-school student and America's 1981 return to space with NASA's shuttle program.

Using BASIC and the IIe's low-resolution graphics (pixels that seemed the size of a Tic-Tac) I simulated the launch of the Space Shuttle Columbian (did I mention I was in high school?). My rendering of the shuttle was as good as it could have been, considering the resolution, and included a 10-second count-down, smoke, flames, and a lift-off off the top of the screen. After that the shuttle was flying right-to-left, with the appearance of motion provided by stars in the background moving left-to-right. The loops were gigantic. Inside the loops the program made sure the stars disappeared behind the shuttle and reappeared at the appropriate time.

Then the pièce de résistance, a PacMan moved across the screen and gobbled the shuttle into nothingness.

I got an A.

But better than that, I triumphed over the task using BASIC and geometry. The loops moving the stars non-destructively behind the shuttle were nothing compared to the routines to open and close the PacMan's mouth as it moved across the screen. I remember how impressed my parents pretended to be when I showed them the print-out of the code.

I also remember how slow the program ran. It seemed everything was happening under water. I could almost make out each line of the PacMan's mouth closing drawing yellow then black again to open it as it devoured the space ship.

But then something amazing happened.

Our teacher, Mr. Ralph Nutter, who was my older brother's math teacher and swim coach a few years earlier, demonstrated all our projects in front of the entire class--but now they were compiled into machine language. The lift-off was smooth and the screen almost looked as though it were on fire. Most importantly, my PacMan moved across the screen so smoothly and cleanly the jagged resolution was invisible, and it seemed to race over the shuttle so gloriously I could hear the game's music playing inside my head.

And I was hooked.

That was 24 years ago and to this day, it is one of the single biggest life-changing events of my life. Almost everything that's happened to me since turned on what happened that last May in 1983, 4th hour, in the closing days of my last year in school.

Wednesday, June 6, 2007

The database is the biggest object in my system

After posting a link to my The TOA of Tom a couple interesting discussions occurred inside comp.object. While responding to Bryce Jacobs, a better way of describing what we're doing came to me. It's buried in: 

In fact, after inspecting multiple C APIs for LDAP, iCAL, and other libraries it appears it's not even foreign to C programmers. Often a structure is used to hold state information for a connection to a resources, but the format of that information isn't exposed to API-users except through an API. Even when a method may only be returning a value from the structure, API programmers know that level of indirection affords them flexibility inside the structure they may use without negatively impacting all the API's users.

So a common pattern is used by both C and OO programmers. What my paper is promoting (and I'll try to do a better job explaining) is that the same pattern be applied to how OO programs access the DB.


Essentially, my paper on transaction processing encourages thinking of the database as one big object with all the rules for data hiding and interfaces OO programmers are already acquainted with. 

Why shouldn't applications have embedded SQL? Because it's the same as accessing the private data members of an object. It shouldn't be done. OO programmers know the correct way to interface with an object is to use its method interface--not attempt direct manipulation of the object's data. OO programmer's attempts to violate that rule is what causes so much frustration mapping the application's data graph into a relational database's tables, rows, and columns. Those things belong to the DB--not to the application. 

Now, OO programmers and system designers can return to their favorite Patterns books and reevaluate the lessons from a new perspective. Should make some interesting reading. 

Monday, June 4, 2007

Transaction Oriented Architecture (aka The TOA of Tom)

I know, I know. It's spelled T-A-O. 

I'm willing to go out on a limb and say most new programming in the last 15 years has used object-oriented languages and relational databases. I hear complaints already. True, there are exceptions, but however big you think they are they're only rounding errors compared to the military-industrial complex which has become the object-relational mega market featuring Java, C#, PHP, Python and other OO programming languages. 

I'll go out on another limb (while I'm here) and claim most of those systems are horribly designed. I know that because a) they didn't ask for my help and b) everyone's complaining the offspring of OO/RDB shotgun weddings have contorted features. Design patterns for persistence and object hibernation have become more most-popular-kludges and less best-practice how-tos. 

What is a transaction? 

For us, a transaction is anything that might happen in our system for which we want to provide security and an audit trail. Everything that changes our system is recorded with a user name, a program name, a post date, an effective date, and a transaction type. Financial transactions include amounts and account numbers. Data changes the object and field changed, and the old and new values. Right or wrong, whatever happens in our system leaves a trail. 

That trail provides two important features. First, no matter what happens to the system we know how it got that way. If an error is made we know how it happened and how to correct it. It's important to understand we don't back out transactions. We add new transactions reversing the negative (incorrect) affects of the errors. Pretending bad things didn't happen by backing-out transactions and resetting the data to pre-mistake values doesn't make them go away. Erasing data only creates an opportunity for the database to get out-of-sync with the audit trail and weakens the system's integrity if the popular mechanism for correcting mistakes is erasing the evidence. 

In the movie Clear and Present Danger, the President of the United States is worried the press will discover one of his best friends was laundering money for a Columbian drug cartel. Presidential advisers recommend putting distance between the president and his murdered friend. Responding to the suggestions the press won't find out, the president responds saying, "They will. They always do." CIA analyst Jack Ryan (Harrison Ford) recommends the opposite approach saying, "There's no sense defusing a bomb after it's already gone off." 

Always forward, never backward. 

The second major feature is the ability to tell users (and auditors) what the system looked like at any point in time. It's easy to tell users what it looks like now, but what about last month? Or the second quarter last year? Or how about comparing this year-to-date with last-year-to-date? Transactions make that possible. 

Related to that second feature is the fact that however the system appears today it is merely the end result of all the transactions posted since our current system bootstrapped October 1, 2002. Most of the database can be recreated by reposting transaction history. For example, today we could lose the entire account table's contents and derive the correct ending balances from history. 

Database integrity is paramount and transaction history is an important ingredient. If the database is incorrect or its integrity lost, everything else is cosmetic. 

More on transactions later. 

Wagging the database 

If the database and its integrity are so important, when should its design be influenced by programming language? In a word, never. To believe otherwise suggests a database' design should change whenever the application programming language changes and that applications written in other languages can not be properly accommodated with a database designed for another. Conceiving of that kind of dependence is counter-axiomatic to tenants OO designers strive for: high cohesion within a module and low coupling between modules. Making database persistence an innate responsibility to every business object destroys each object's cohesion and tightly couples those same objects to something they shouldn't be co-dependent with-a database. The result is objects with low-cohesion and tight-coupling. To make matters worse, the contagion isn't another object in the OOPL that can be easily coded-away but a remote object that throws exceptions, is often network-remote, and is affected by many more external influences to design and performance than the objects attempting to integrate its utility with their persistence models. 

".. like another hole in the head" 

In our own system, the database is just as easily accessed from PHP and Smalltalk as it is from Sybase's isql or the open-source is (written in C). Even were the database inclined to favor object oriented languages like PHP and Smalltalk it must still treat other paradigms equitably. I can think of few other languages as far apart in paradigm as Smalltalk and C. 

Think of it as a due-process clause protecting the rights of applications no matter their language, paradigm, compilation, interpretation, or generation, or OS origins. 

It's difficult to imagine all these products, frameworks, and seminars for something we want to pretend doesn't exist-the database. We don't need consultants-we need therapy. 

The first step of whatever 12-step program launches the next beverage revolution and meets Wednesday nights is to admit our denial. It's not that we don't know the database exists-it's that we want it to disappear. We want it to go away. When we rub our eyes hard, in-between the kaleidescope colors we imagine pure object databases undetectable to the untrained pointy-haired manager or offshore programmer. But when our eyes open we're reminded how horribly awkward, non-standard, and even more difficult-to-share-than-query object-oriented databases are (and wait 'til you see the billing rates those consultants get). 

What does it look like? 

Everything InStream does, from production support all the way down to development follows a paradigm, if you will, of transaction oriented processing. It's a way of looking at the purpose of your database and how programs interact with it that nullifies any impulse programmers may have to attempt to make the database reflect OO models or OO models reflecting the DB design. 

After the database has been designed and independent of whatever language will predominate application programming, stored procedures are created for adding, changing, and querying the database. With those procedures in place scripts can easily be written to populate the database with data so that tests can prove the database is complete and properly designed. It is important to begin exercising the database early in a development process to discover any relational awkwardness and to establish performance baselines for both data changes and queries. 

Using database stored procedures also insulates application code from database design changes from the trivial (renaming a column) to the extreme (redesigned tables). Procedures form the first and lowest-level Application Programming Interface (API) of a layered system. Additionally, most relational database provide mechanisms for finding inter-dependencies within the database, like which views depend on which tables, which tables contain which columns, and which procedures depend on which database objects, as well as dependencies between the procedures themselves. 

HINT: Don't embed SQL. Once in production you want to minimize the impact of any post-production changes. You only release a system once. Everything after that is an update. The more difficult updates are to deploy the more risk is involved attempting them which may lead to reluctance tackling them and the more traumatic they become. Keep 'em simple and your system will be able to grow without stretching your customers patience.

Friday, June 1, 2007

Programming Rules

In the seven years our company has been around the technological landscape has changed dramatically. Additionally, the way our system is assembled and the components it's assembled from have also changed. Can a system go through that much change and still be consistent with an architecture described seven years earlier?

Yes.

Rules, as I've said in earlier postings, are more durable than technology. That is why rules are important. Design rules persist across technological smokestacks. Good rules are good rules regardless which database you use, regardless which language you develop in, regardless whether your application exists on the web, is client-server, or exists stand-alone in the back office.

I won't give-up all the secret ingredients, but here are some of our favorites at InStream Services:

It's better to be explicit than implicit

The code systems are built with, regardless of programming language, aren't susceptible to the same fragile memories humans suffer. Code never forgets, but humans do. Has that ever happened to you? It happened to me. Considering how complicated software systems are and how many distractions humans are challenged with on a day-to-day basis, having code that distributes its implementation across classes, procedures, and triggers doesn't help programmers understand how something works and actually erects barriers to its enhancement. This is one of the reasons our system has only a single database trigger. Want to know what a procedure does? Everything it does is right inside the procedure's code.

If something is broken I don't want to be hunting around forever tracking it down.

No Polling

Besides wasting CPU it's a cop-out to a work-flow problem. There's no excuse for not knowing when the next thing needs to be done.

No Parsing

We don't write compilers or invent languages. We're surrounded by technology that already knows how to parse grammars. Find the right tool and use it. There are better ways to know what's coming next.

Once and only once

I hate solving problems twice. Keeping both sections of code in-sync with each other and the business is like making sure twins both get the same amount of ice cream. By the time I've measured it perfectly it's already melting. If it seems like it needs to be solved in two different places chances are it should be moved somewhere else and solved once.

No harm running twice

Humans are fallible. Everyone knows it. Why write programs, scripts, fixes, or patches that depend on being run under perfect conditions? Fix scripts should make sure things are still broken before they try to fix something that isn't. Programs should know not to do anything if there isn't anything to do. Nearly everything we've written can be run as many times as people feel like running it without negative consequences. There are enough things that can go wrong that aren't under our control--let's not add to them with things we do control.

Do one thing and do it well

Structured programming, refactoring, and the Unix shells have demonstrated how powerful a concept it is to do one thing and do it well. Narrowing the utility of a program reduces side effects and increases its utility. When all a system's programs do only one thing there is less chance of overlap and duplication. It also means that when something goes wrong there are fewer programs that require fixing--and fewer places to look in the first place.

Desirable undefined behavior

What happens when an unidentified transaction arrives? What happens when something occurs that wasn't planned for and stopping the system is an unacceptable option? You make sure there's default behavior that does something harmless, can be audited, and alert the authorities.

One of the constraints of our system is to be able to import supply chain data from multiple external systems we don't control (see Database Prenutials). Because 3rd-party data is sometimes incomplete our systems need to do something reasonable with imperfect data, especially since the alternative risks the database's integrity.

In production it's important to know that when bugs appear the likelihood of something bad happening is minimized because there's appropriate default behaviors. We know where our system is most at-risk and write our software with harmless fail-safes.

These are some of our rules. At times we've been tempted to break them, but found ways to stick to them. As I wrote in the first Rules to Live By article, when multiple programmers and designers are working with little supervision it's best if they all know what the rules are so no one, especially me, is surprised with the implementation--except in a good way.

Thursday, May 31, 2007

Database Prenuptials

This is the third installment of my "Rules to Live By" series. The first entry was Everything you needed to know about software architecture you learned in kindergarten. It discusses the importance of sharing a vision with coworkers about where the system is going, the metaphor you're using to model after, and the rules it must follow.

The second installment, Production rules easy as 1, 2, 3--but without 2 and 3, talks about our production priorities, and how those priorities are reflected throughout development, testing, and assurance. It also talks about how knowing our priorities and what's important allows us to upgrade our system more aggressively and frequently than is possible for other software teams. The result is to deliver more features and correct problems as quickly as possible for both our users and ourselves.

This article is about our database design. It's intended for software designers and not DBAs. There's nothing here about relational theory, 3rd normal form, or performance tuning. The only thing a DBA may find interesting about our system is that the logical and physical designs are the same. No compromises are made translating our logical design's entities or relationships into real tables. Nothing was denormalized. It was myth in the 90s and still is today, even on more powerful equipment, that normalized data doesn't perform.

The basics

It should go without saying, but won't, that the first step to good database design is understanding your business. Sometimes that understanding comes from interviewing "experts." Our development staff has made it their goal to become experts in our niche of commercial finance so our interviews have a more consultative feel than an interrogation.

Fundamentally, we assume solid database design skills. Consistent table naming makes it easy to predict where data may be stored. Strict attribute naming requirements remove confusion about what a field's meaning is: there are never two meanings--fields are never reused.

Some of our rules may be specific to our industry, but I'm wary of provincialism. Believing we're unique would discourage us from both looking outside ourselves for ideas and from publishing what we've discovered improves our processes.

What's yours is yours

Our system tracks documents exchanged between trading partners. To maintain the system's integrity we never add to, change, or remove these documents if they exist "in the real world" and have been recorded in ours.

Furthermore we never create documents we don't know exist, even if it would make sense that they did. Suppliers don't normally ship without a purchase order and it's easy-enough to create a PO if one is obviously missing. But if we created the purchase order without evidence it exists then we've compromised the system's integrity. Better to be missing a document than to fabricate one.

Real world integrity is more important to our business than relational integrity. Besides, there are other ways to synthesize relational integrity to support these situations.

What's ours is ours

Rather than pollute real-world documents with our data we keep them separate. Our system maintains proxies for customer's documents to which we add our own fields. This separation-of-stuff provides our system with a kind of flexibility otherwise unavailable. We're able to mold and bend our system any number of ways behind the scenes without impacting our customers' view of the world.

Other duties as assigned

Never reduce resolution. One we've created or captured detail information we won't delete or summarize it away. We violated this rule once and haven't forgiven ourselves since (and yes, we knew it was a rule then).

Our code may be object oriented (Smalltalk and PHP) but our database is not. Forcing one to look like the other is a disaster waiting to happen--at least it is in financial systems. Resist the urge to treat your relational database as if it were an object database or to impose your code's object model onto your database. We use an approach we call Transaction Oriented Processing to marry the two. RDBs are from Mars. OO is from Venus. Transaction Oriented Processing is the medium between the two. There is no object-relational impedance mismatch if we let both systems do what they're best at. Using Transaction Oriented Processing to negotiate between OO and RDB technologies is straightforward but requires a separate article to explain. I promise to post one here.

It is better to be explicit than implicit. Nothing in our system is subtle. There are no hints, innuendo, or clues. Intuition is not necessary to find out what's going on. Everything is exactly what it says it is.

For instance, there is only a single trigger in the entire system. If a programmer wants to know what happens when a row is inserted or changed they need look no further than the stored procedure that inserts or changes the row.

You're going to be married to your database for a long time. It's best you learn how to get along with it.

Wednesday, May 30, 2007

Production rules as easy as 1, 2, 3--but without 2 and 3

There used to be a sign in our office listing our company's top priorities. It's been replaced with a much larger sign with vision and mission statements and a list of guiding principles. I miss the old sign because it listed only three easy-to-remember items, ordered by importance. The first item was "protect the cash."

I forget the other two.

I'm sure they were important but not as important to a finance company as protecting the cash. That's Rule #1. If you don't protect the cash it doesn't matter what the other two items are--they're cosmetic. They were not second and third because they were alphabetized. They were second and third because they were less important than Rule #1: Protect the Cash.

But Rule #1 doesn't hold the same imperative for programmers and system administrators as it does accountants, controllers, and credit analysts, so the technology group issued its own.

Rule #1: Protect the database

Like the original list, it doesn't much matter what rules two and three would have been because without a database everything else is cosmetic. But protecting the database goes way beyond security and backups. The database's integrity must be protected. Everything must be entered correctly, everything must balance, and audit trails must be complete. If the data is corrupted what good is having a backup of corrupted data?

Having rule #1 helps prioritize all other activities and allows us to add features and fix bugs more quickly. There are a select few programs and stored procedures that modify the database. Collectively, these are our posting programs. Any changes to them, for any purpose, brings with it commensurate scrutinizing and testing as though we were attempting gene therapy on our own children. It's that important. It's rule #1, and everything else is cosmetic.

The best part of knowing your top priority is how much time it frees up for other purposes; like fixing bugs, adding features, automating the system, and increasing metrics. Basically, more time is available to respond quicker to customers because we know what is and is not critical.

Some managers believe everything is critical and deserves the same attention to detail as posting programs. That platitude is either propaganda for other departments' consumption or an inability or unwillingness to prioritize.

If the purpose of everything-is-critical management is to make other departments feel their issues are given equal weight with all other issues then the practice is either dishonest or a ruse to avoid conflict. Telling someone their issue is a top priority when it isn't demonstrates both parties lack respect for each other.

Any other reason for a lack of prioritization doesn't fool programmers and quality analysts. They know priorities need to exist and that a failure to prioritize indicates a failure to lead. Everything can't be an emergency. Every issue isn't a crisis. They can't be--otherwise we wouldn't be able to recognize something even more important than the very important thing we're already working on.

And that's important.
Follow @TomGagne