Thursday, May 31, 2007

Database Prenuptials

This is the third installment of my "Rules to Live By" series. The first entry was Everything you needed to know about software architecture you learned in kindergarten. It discusses the importance of sharing a vision with coworkers about where the system is going, the metaphor you're using to model after, and the rules it must follow.

The second installment, Production rules easy as 1, 2, 3--but without 2 and 3, talks about our production priorities, and how those priorities are reflected throughout development, testing, and assurance. It also talks about how knowing our priorities and what's important allows us to upgrade our system more aggressively and frequently than is possible for other software teams. The result is to deliver more features and correct problems as quickly as possible for both our users and ourselves.

This article is about our database design. It's intended for software designers and not DBAs. There's nothing here about relational theory, 3rd normal form, or performance tuning. The only thing a DBA may find interesting about our system is that the logical and physical designs are the same. No compromises are made translating our logical design's entities or relationships into real tables. Nothing was denormalized. It was myth in the 90s and still is today, even on more powerful equipment, that normalized data doesn't perform.

The basics

It should go without saying, but won't, that the first step to good database design is understanding your business. Sometimes that understanding comes from interviewing "experts." Our development staff has made it their goal to become experts in our niche of commercial finance so our interviews have a more consultative feel than an interrogation.

Fundamentally, we assume solid database design skills. Consistent table naming makes it easy to predict where data may be stored. Strict attribute naming requirements remove confusion about what a field's meaning is: there are never two meanings--fields are never reused.

Some of our rules may be specific to our industry, but I'm wary of provincialism. Believing we're unique would discourage us from both looking outside ourselves for ideas and from publishing what we've discovered improves our processes.

What's yours is yours

Our system tracks documents exchanged between trading partners. To maintain the system's integrity we never add to, change, or remove these documents if they exist "in the real world" and have been recorded in ours.

Furthermore we never create documents we don't know exist, even if it would make sense that they did. Suppliers don't normally ship without a purchase order and it's easy-enough to create a PO if one is obviously missing. But if we created the purchase order without evidence it exists then we've compromised the system's integrity. Better to be missing a document than to fabricate one.

Real world integrity is more important to our business than relational integrity. Besides, there are other ways to synthesize relational integrity to support these situations.

What's ours is ours

Rather than pollute real-world documents with our data we keep them separate. Our system maintains proxies for customer's documents to which we add our own fields. This separation-of-stuff provides our system with a kind of flexibility otherwise unavailable. We're able to mold and bend our system any number of ways behind the scenes without impacting our customers' view of the world.

Other duties as assigned

Never reduce resolution. One we've created or captured detail information we won't delete or summarize it away. We violated this rule once and haven't forgiven ourselves since (and yes, we knew it was a rule then).

Our code may be object oriented (Smalltalk and PHP) but our database is not. Forcing one to look like the other is a disaster waiting to happen--at least it is in financial systems. Resist the urge to treat your relational database as if it were an object database or to impose your code's object model onto your database. We use an approach we call Transaction Oriented Processing to marry the two. RDBs are from Mars. OO is from Venus. Transaction Oriented Processing is the medium between the two. There is no object-relational impedance mismatch if we let both systems do what they're best at. Using Transaction Oriented Processing to negotiate between OO and RDB technologies is straightforward but requires a separate article to explain. I promise to post one here.

It is better to be explicit than implicit. Nothing in our system is subtle. There are no hints, innuendo, or clues. Intuition is not necessary to find out what's going on. Everything is exactly what it says it is.

For instance, there is only a single trigger in the entire system. If a programmer wants to know what happens when a row is inserted or changed they need look no further than the stored procedure that inserts or changes the row.

You're going to be married to your database for a long time. It's best you learn how to get along with it.

No comments:

Post a Comment

Follow @TomGagne