Anything worth doing is worth doing poorly: May 2007

Thursday, May 31, 2007

Database Prenuptials

This is the third installment of my "Rules to Live By" series. The first entry was Everything you needed to know about software architecture you learned in kindergarten. It discusses the importance of sharing a vision with coworkers about where the system is going, the metaphor you're using to model after, and the rules it must follow.

The second installment, Production rules easy as 1, 2, 3--but without 2 and 3, talks about our production priorities, and how those priorities are reflected throughout development, testing, and assurance. It also talks about how knowing our priorities and what's important allows us to upgrade our system more aggressively and frequently than is possible for other software teams. The result is to deliver more features and correct problems as quickly as possible for both our users and ourselves.

This article is about our database design. It's intended for software designers and not DBAs. There's nothing here about relational theory, 3rd normal form, or performance tuning. The only thing a DBA may find interesting about our system is that the logical and physical designs are the same. No compromises are made translating our logical design's entities or relationships into real tables. Nothing was denormalized. It was myth in the 90s and still is today, even on more powerful equipment, that normalized data doesn't perform.

The basics

It should go without saying, but won't, that the first step to good database design is understanding your business. Sometimes that understanding comes from interviewing "experts." Our development staff has made it their goal to become experts in our niche of commercial finance so our interviews have a more consultative feel than an interrogation.

Fundamentally, we assume solid database design skills. Consistent table naming makes it easy to predict where data may be stored. Strict attribute naming requirements remove confusion about what a field's meaning is: there are never two meanings--fields are never reused.

Some of our rules may be specific to our industry, but I'm wary of provincialism. Believing we're unique would discourage us from both looking outside ourselves for ideas and from publishing what we've discovered improves our processes.

What's yours is yours

Our system tracks documents exchanged between trading partners. To maintain the system's integrity we never add to, change, or remove these documents if they exist "in the real world" and have been recorded in ours.

Furthermore we never create documents we don't know exist, even if it would make sense that they did. Suppliers don't normally ship without a purchase order and it's easy-enough to create a PO if one is obviously missing. But if we created the purchase order without evidence it exists then we've compromised the system's integrity. Better to be missing a document than to fabricate one.

Real world integrity is more important to our business than relational integrity. Besides, there are other ways to synthesize relational integrity to support these situations.

What's ours is ours

Rather than pollute real-world documents with our data we keep them separate. Our system maintains proxies for customer's documents to which we add our own fields. This separation-of-stuff provides our system with a kind of flexibility otherwise unavailable. We're able to mold and bend our system any number of ways behind the scenes without impacting our customers' view of the world.

Other duties as assigned

Never reduce resolution. One we've created or captured detail information we won't delete or summarize it away. We violated this rule once and haven't forgiven ourselves since (and yes, we knew it was a rule then).

Our code may be object oriented (Smalltalk and PHP) but our database is not. Forcing one to look like the other is a disaster waiting to happen--at least it is in financial systems. Resist the urge to treat your relational database as if it were an object database or to impose your code's object model onto your database. We use an approach we call Transaction Oriented Processing to marry the two. RDBs are from Mars. OO is from Venus. Transaction Oriented Processing is the medium between the two. There is no object-relational impedance mismatch if we let both systems do what they're best at. Using Transaction Oriented Processing to negotiate between OO and RDB technologies is straightforward but requires a separate article to explain. I promise to post one here.

It is better to be explicit than implicit. Nothing in our system is subtle. There are no hints, innuendo, or clues. Intuition is not necessary to find out what's going on. Everything is exactly what it says it is.

For instance, there is only a single trigger in the entire system. If a programmer wants to know what happens when a row is inserted or changed they need look no further than the stored procedure that inserts or changes the row.

You're going to be married to your database for a long time. It's best you learn how to get along with it.

Wednesday, May 30, 2007

Production rules as easy as 1, 2, 3--but without 2 and 3

There used to be a sign in our office listing our company's top priorities. It's been replaced with a much larger sign with vision and mission statements and a list of guiding principles. I miss the old sign because it listed only three easy-to-remember items, ordered by importance. The first item was "protect the cash."

I forget the other two.

I'm sure they were important but not as important to a finance company as protecting the cash. That's Rule #1. If you don't protect the cash it doesn't matter what the other two items are--they're cosmetic. They were not second and third because they were alphabetized. They were second and third because they were less important than Rule #1: Protect the Cash.

But Rule #1 doesn't hold the same imperative for programmers and system administrators as it does accountants, controllers, and credit analysts, so the technology group issued its own.

Rule #1: Protect the database

Like the original list, it doesn't much matter what rules two and three would have been because without a database everything else is cosmetic. But protecting the database goes way beyond security and backups. The database's integrity must be protected. Everything must be entered correctly, everything must balance, and audit trails must be complete. If the data is corrupted what good is having a backup of corrupted data?

Having rule #1 helps prioritize all other activities and allows us to add features and fix bugs more quickly. There are a select few programs and stored procedures that modify the database. Collectively, these are our posting programs. Any changes to them, for any purpose, brings with it commensurate scrutinizing and testing as though we were attempting gene therapy on our own children. It's that important. It's rule #1, and everything else is cosmetic.

The best part of knowing your top priority is how much time it frees up for other purposes; like fixing bugs, adding features, automating the system, and increasing metrics. Basically, more time is available to respond quicker to customers because we know what is and is not critical.

Some managers believe everything is critical and deserves the same attention to detail as posting programs. That platitude is either propaganda for other departments' consumption or an inability or unwillingness to prioritize.

If the purpose of everything-is-critical management is to make other departments feel their issues are given equal weight with all other issues then the practice is either dishonest or a ruse to avoid conflict. Telling someone their issue is a top priority when it isn't demonstrates both parties lack respect for each other.

Any other reason for a lack of prioritization doesn't fool programmers and quality analysts. They know priorities need to exist and that a failure to prioritize indicates a failure to lead. Everything can't be an emergency. Every issue isn't a crisis. They can't be--otherwise we wouldn't be able to recognize something even more important than the very important thing we're already working on.

And that's important.

Thursday, May 24, 2007

Everything you need to know about software architecture you learned in kindergarten

Designing and building software can be easier than you think. What makes it easy is having a set of rules any decision can be tested against so the rules may design the system for you.

The idea isn't so crazy. In 1986 Craig Reynolds simulated the behavior of a flock of birds using only three rules. What was previously thought too complex to program without great effort was suddenly not just easier for programmers to construct, but easier for non-programmers to understand.

In InStream's case rules don't process transactions, but they provide the system design's conceptual integrity. Without enumerated rules the decisions a software designer makes can seem capricious, arbitrary, or simply black magic. As hard as it is to articulate gut feelings it's even harder to debate them or distribute decision making throughout a staff if a gut feeling is their only design guide.

When all a system's code is the result of a small set of rules its behavior is more easily predicted and its complexity more easily comprehended. Knowing what a system does and what it will do, and knowing the rules that created it, make extending and adapting it in unanticipated ways much easier, even by new programmers.

To really pick up development momentum the staff also shares a common metaphor. Few systems are truly ex nihilo and without precedent either man-made or natural. Inspiration for working software models can come from physics, biology, dams, post offices or dry cleaners. When everyone understands the metaphor then everyone knows how to marshal their efforts toward that goal without consulting a single oracle they would otherwise depend on.

So the first rule we learned in kindergarten: sharing. Share your rules and share your inspiration.

Next we'll look at InStream's rules for production, database design, and programming.