This is the third installment of my "Rules to Live By" series. The first entry was Everything you needed to know about software architecture you learned in kindergarten. It discusses the importance of sharing a vision with coworkers about where the system is going, the metaphor you're using to model after, and the rules it must follow.
The second installment, Production rules easy as 1, 2, 3--but without 2 and 3, talks about our production priorities, and how those priorities are reflected throughout development, testing, and assurance. It also talks about how knowing our priorities and what's important allows us to upgrade our system more aggressively and frequently than is possible for other software teams. The result is to deliver more features and correct problems as quickly as possible for both our users and ourselves.
This article is about our database design. It's intended for software designers and not DBAs. There's nothing here about relational theory, 3rd normal form, or performance tuning. The only thing a DBA may find interesting about our system is that the logical and physical designs are the same. No compromises are made translating our logical design's entities or relationships into real tables. Nothing was denormalized. It was myth in the 90s and still is today, even on more powerful equipment, that normalized data doesn't perform.
The basics
It should go without saying, but won't, that the first step to good database design is understanding your business. Sometimes that understanding comes from interviewing "experts." Our development staff has made it their goal to become experts in our niche of commercial finance so our interviews have a more consultative feel than an interrogation.
Fundamentally, we assume solid database design skills. Consistent table naming makes it easy to predict where data may be stored. Strict attribute naming requirements remove confusion about what a field's meaning is: there are never two meanings--fields are never reused.
Some of our rules may be specific to our industry, but I'm wary of provincialism. Believing we're unique would discourage us from both looking outside ourselves for ideas and from publishing what we've discovered improves our processes.
What's yours is yours
Our system tracks documents exchanged between trading partners. To maintain the system's integrity we never add to, change, or remove these documents if they exist "in the real world" and have been recorded in ours.
Furthermore we never create documents we don't know exist, even if it would make sense that they did. Suppliers don't normally ship without a purchase order and it's easy-enough to create a PO if one is obviously missing. But if we created the purchase order without evidence it exists then we've compromised the system's integrity. Better to be missing a document than to fabricate one.
Real world integrity is more important to our business than relational integrity. Besides, there are other ways to synthesize relational integrity to support these situations.
What's ours is ours
Rather than pollute real-world documents with our data we keep them separate. Our system maintains proxies for customer's documents to which we add our own fields. This separation-of-stuff provides our system with a kind of flexibility otherwise unavailable. We're able to mold and bend our system any number of ways behind the scenes without impacting our customers' view of the world.
Other duties as assigned
Never reduce resolution. One we've created or captured detail information we won't delete or summarize it away. We violated this rule once and haven't forgiven ourselves since (and yes, we knew it was a rule then).
Our code may be object oriented (Smalltalk and PHP) but our database is not. Forcing one to look like the other is a disaster waiting to happen--at least it is in financial systems. Resist the urge to treat your relational database as if it were an object database or to impose your code's object model onto your database. We use an approach we call Transaction Oriented Processing to marry the two. RDBs are from Mars. OO is from Venus. Transaction Oriented Processing is the medium between the two. There is no object-relational impedance mismatch if we let both systems do what they're best at. Using Transaction Oriented Processing to negotiate between OO and RDB technologies is straightforward but requires a separate article to explain. I promise to post one here.
It is better to be explicit than implicit. Nothing in our system is subtle. There are no hints, innuendo, or clues. Intuition is not necessary to find out what's going on. Everything is exactly what it says it is.
For instance, there is only a single trigger in the entire system. If a programmer wants to know what happens when a row is inserted or changed they need look no further than the stored procedure that inserts or changes the row.
You're going to be married to your database for a long time. It's best you learn how to get along with it.
Designing and building software can be easier than you think. What makes it easy is having a set of rules any decision can be tested against so the rules may design the system for you.
The idea isn't so crazy. In 1986 Craig Reynolds simulated the behavior of a flock of birds using only three rules. What was previously thought too complex to program without great effort was suddenly not just easier for programmers to construct, but easier for non-programmers to understand.
In InStream's case rules don't process transactions, but they provide the system design's conceptual integrity. Without enumerated rules the decisions a software designer makes can seem capricious, arbitrary, or simply black magic. As hard as it is to articulate gut feelings it's even harder to debate them or distribute decision making throughout a staff if a gut feeling is their only design guide.
When all a system's code is the result of a small set of rules its behavior is more easily predicted and its complexity more easily comprehended. Knowing what a system does and what it will do, and knowing the rules that created it, make extending and adapting it in unanticipated ways much easier, even by new programmers.
To really pick up development momentum the staff also shares a common metaphor. Few systems are truly ex nihilo and without precedent either man-made or natural. Inspiration for working software models can come from physics, biology, dams, post offices or dry cleaners. When everyone understands the metaphor then everyone knows how to marshal their efforts toward that goal without consulting a single oracle they would otherwise depend on.
So the first rule we learned in kindergarten: sharing. Share your rules and share your inspiration.
Next we'll look at InStream's rules for production, database design, and programming.