It’s almost one year since I read the “blue book”. It changed my view on how I write software and how I build software architecture. During my time as a software developer, I have tried many different approaches on building software. A lot of the approaches includes an anemic domain model. It’s nothing wrong with building an anemic domain model, but for applications with a bit more complex business logic, it might not be the best choice. You can end up with a lot of “spaghetti code” with high coupling between the different parts of the code. Anemic domain models have their business logic spread “all over” the code and you often have to do updates multiple places in your code if a business rule is changed. And you want to avoid that. Keep this in mind when you code:
“Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live” – M. Golding
A classis rich domain model holds all the business logic inside the model and most of the objects are connected to eachother. It strives to be a model that solves the domain logic for the business in one perfect model. And this is often where this way of writing software fails. You get a big model that finally gets very unwieldy. You should rather think bounded context, but that’s not the topic for this post.
Split the model
Instead of trying to solve everything in one model; split it into smaller parts and work with each part on its own. Create aggregates that naturally fits together. An example can be a car and a customer. In this example, both are aggregate roots. You should consider to not model both objects and their aggregates into one model. Think of them as small separated models inside the model. This will make the model much easier to handle when persisting and loading into memory.
Greg Young had a great example when he was explaining aggregate roots and how they should work. If a teacher is asked by the headmaster for a overview of his students, he will not bring all the students to the the headmasters office, but he will bring a list with the student information the headmaster wants. For the most part, there are no good arguments for putting everything in one big connected model.
The problem with relational databases
Nowadays many use ORM frameworks to persist data from the domain model to the database, and they use a relational database as the database (are there others? ;-)). A problem with the relation databases is that you have to do a compromise on speed regarding reading and writing data.
If you normalize the database, it will be better at inserting, updating and deleting data, but not reading data. The problem are indexes. These buggers will in most cases slow down your select operations but help insert, delete and update operations.
Update: I have received some comments on this statement. What I
was trying to say is: If you add indexes to improve update/delete operations, it
might affect the selects you do against the database. You have to do a
compromise; either you read faster or write faster.
If you decide to make the database better at reading data (selecting), one option is to de-normalize the database. A de-normalized database means tables with repetitive data and possibly tables with a number of columns from hell.
As developers we often don’t give much thought to this question; how to scale the solution? I think we should keep this thought in the front of your head when you create a solution that may have to serve a lot of users. I’m not saying you should try to predict the future, but with some simple measures from the start, you do not have to think much of it later.
Command and Query Responsibility Segregation
Most applications reads data more frequently than they write data. Based on that statement, it would be a good idea to make a solution where easily you can add more databases to read from, right? So what if we set up a database dedicated just for reading? Even better; what if we design the database in a way so it’s faster to read from it? If you design your applications based on the patterns described in CQRS architecture, you will have a solution that is scalable and fast at reading data. Command Query Separation (CQS) is a pattern that was first thought of by Bertrand Meyer. He described it on the object level. Later this pattern has been lifted from the lower levels and has been used in the higher architectural level. I think it was Udi Dahan who first started talking about this architectural principle.
In this post I describe an architecture that is quite close to the one Udi is talking about. There are several implementations of this pattern, and one of them is described by Greg Young. Greg Young's flavor of the pattern describes CQRS and Event Sourcing. I won’t go into this flavor in this post. If you want to learn more about this flavor, Google for it.
A quick walkthrough of the CQRS architecture
A graphical overview of the architecture is shown in the figure below. Lets do a quick walkthrough of the figure. A user opens an application and the first screen is loaded. The data is fetched from the Query side. In this example it goes through a WCF service and uses, lets say NHibernate, to read data from the database and into a DTO that returns the data to the GUI. The DTO is tailored to fit the screen the user is viewing. The query database is often de-normalized to improve reading. The user may brows through different screens, and the process is the same. A DTO tailored for the users screen is returned from the database. Eventually, the user wants to change the data in one of the screens. What then happens is that a command message is created based on the data that have changed in the view and sent down the left side of the figure; the command side. The command message is sent into the domain model to get validated against the business rules. A error message is sent back to the client if some of the business rules fails (how this is done can differ). If the message goes through the domain model without errors, it’s persisted to the write database and synchronized with the read database(s). The command side should never return other data than the error message. If you follow this rule, the command side will only be behavioral. This makes logging of what happens in the domain model very easy, and it’s quite easy to track what the user wanted to do, not only what the result of his actions were. The approach I’m describing in this post is a bit different than Greg Young describes. His solution promotes no database on the command side; a database is only needed for reporting. This is a great approach, but it makes things a bit more complex and it requires a green field project(?). The approach I’m describing can be used with a brown field application without much change to the existing architecture. In most cases you only have to add a query-side to the “old” architecture.
Why add query side DTOs?
In a traditional domain model, you will create the objects for solving domain logic, not for viewing. They have behavior rather than shape. To make domain objects more viewable, many developers use mapping between the domain model and a DTOs tailored for being displayed. This results in a lot of mapping between objects for the developer and domain models who exposes getters and setters (To learn why this is bad, do a Google search on; getters and setters are evil).
If you are using a ORM like NHibernate, you have to add getters to the domain model (and make the properties and methods virtual if you want to use lazy loading), and I think this is ok. The model is still protected against unwanted changes that can make it invalid. Every change have to happen through its command methods.
What you get from this architecture is; scalability on the read side of the application (where you normally get most load), the possibility to log all happenings in the domain model (by tracking the commands and events in the domain model), and a domain model that only have to worry about writing data, not be a transporter of data from the database to the GUI. I guarantee that this alone will make things a lot easier for you. And one final thing; creating objects (DTOs) based on the view helps us avoid a lot of mapping and a lot of requests against the database to fill the view with data. I will go through the parts of the architecture in more details in future posts.
Update: I have done a update to this figure in a new post.