Monday, July 14, 2008

DDD is for CRUD Apps

I wrote this piece for the Entity Framework Wiki where rage a number of folks with contrasting degrees of affection for and animosity toward Entity Framework. Maybe some of you will see it here first. :-)

When confronted with a brown-field application development scenario I often find myself in the camp that finds value in leveraging an existing database schema during the development of my domain model. By "leveraging" I mean that I construct some part of my domain model with the aid of an ORM tool that takes the database schema as one of its inputs and produces a conceptual model as one of its outputs. I am pleased to use that conceptual model to generate a portion of my domain model.

For some this approach is anathema. One expression of revulsion is to dismiss the tool and approach as suitable only for CRUD applications. Apparently a CRUD app is pretty low on the sophistication scale. I am often told that if this is "all" I'm going to do (read: all that I am capable of), I should stick to one of the less intellectually challenging design patterns, maybe ACTIVE RECORD, and leave OBJECT MAPPER and DOMAIN MODEL to the big thinkers.

It follows also, they would suggest, that this so-called "Data First" approach betrays an almost constitutional ignorance of modern design patterns and practices and is utterly incompatible with Domain Driven Design (DDD). I think this leap to judgment, while understandable, is unwarranted and prematurely terminates what could be productive discussions.

In this page I will hold that

  • DDD is almost invariably demonstrated with a CRUD application
  • Data-firsters build behavior rich Domain Models too
  • Ease of code generation undermines thought
  • The real question is "do you engage your design faculties or not?"
  • Data-first can assist DDD
  • DDD and "Data First" are compatible when the "Data Firster" uses his or her head

DDD for CRUD

Jimmy Nilsson shows us DDD in action in his highly regarded book, Applying Domain-Driven Design and Patterns [ADDP] by stepping through the design and development process of an application with the following requirements:

  1. List customers by applying a flexible and complex filter
  2. List the orders when looking at a specific customer
  3. An order can have many different lines
  4. Concurrency conflict detection is important
  5. A customer may not owe us more than a certain amount of money
  6. An order may not have a total value greater than a predetermined system-wide order limit
  7. Each order and customer should have a unique and user-friendly number
  8. A new customer is acceptable only after passing a credit check by an independent institution
  9. An order must have a customer; an order line must have an order
  10. Saving an order and its lines should be atomic
  11. Orders have an acceptance status that is changed by the user

This is a classic CRUD application. It is, in words typically uttered with total derision, "just a CRUD app."

It is also the richest investigation of DDD development that I have found. Jimmy devotes nearly half of the 500 pages of his book to this example. I have yet to see an example of DDD in practice that is as thorough as this one.

This is the only example in Jimmy's. He never even hints that this CRUD app is inadequate to the task of demonstrating DDD. It is all he requires to show DDD's superiority relative to the way he used to build applications. Yes, the app is a toy and half-baked - as it must be for purposes of exposition. But it does what he wants it to do pedagogically.

I'm not trying to make Jimmy a saint or sinner. This post is not about Jimmy.

I do intend to make it hard for someone to say, "well, that's just a CRUD example and DDD is for more sophisticated applications." I figure if Jimmy wrote the app with DDD, it's a DDD app. If Jimmy thought we should use ACTIVE RECORD instead of Domain Model, he would say so.

I am also trying to discover if there is something distinctly different about applications that people are building with DDD. On the strength of this example, we all seem to be building the same kind of apps.

Are so-called "object first" applications somehow beyond the reach of applications developed when you use "data first" techniques? Is there something Jimmy is doing here that we aren't doing routinely ourselves? Not that I can tell.

We Build Behavior Rich Domain Models Too

A subset of Jimmy's application's requirements are manifestly "behavioral"

  • Concurrency conflict detection is important
  • A customer may not owe us more than a certain amount of money
  • An order may not have a total value greater than a predetermined system-wide order limit
  • Each order and customer should have a unique and user-friendly number
  • A new customer is acceptable only after passing a credit check by an independent institution
  • Saving an order and its lines should be atomic
  • Orders have an acceptance status that is changed by the user

Guess what? You will find implementations for such requirement in my "data first" applications. These kinds of requirements are routine for us as well.

Look closer at my own applications and you will see Aggregates, Value Objects, Services, Repositories, Unit-of-Work, inheritance, etc.. You'll see Dependency Injection and MVC/MVP too. I can't say that I have always been as clear and decisive with the purely DDD structures; I'm new to the DDD formalisms although one of the reasons it resonates so strongly with me is that (as with the GoF patterns) there is a shock of recognition when you first seem them - the realization that DDD captures what you should have been doing - and were sort of doing - all along.

It certainly seems to me that I approach these needs with the same attitude and catalog of "solutions" as any "object-first" architect. I just happened to get there with my "data-first" tools. And I didn't have to stand on my head or otherwise fight my own tools or predilections to do so.

Let's take "behavior" for example. Someone is always trying to tell me that "data firsters" don't understand the difference between data objects and objects with behavior.

There is some strange misconception, widely repeated, that "data firster" business objects lack behavior; that they are just stupid property bags straight from the ORM code generator. Where does this notion come from? Every generated domain object class file is paired with a custom class file. That's where we put our behaviors. We are going to enrich our domain model in order to satisfy the expectations coming from the business and we're going to do it in that custom class. Go ahead and decry the noise and alleged confusion that I must be experiencing because I have two class files to do the work of your single file (psst - I hardly notice). But why insist that I'm not writing behavior at all?

When I look at Jimmy's classes - the ones he actually wrote - I don't see any important differences between what his code does and what my code does. You will find no less behavior in one of my business object classes than in one of Jimmy's classes.

Yes there are differences - we won't write the same code. But once you set aside the code gen and Persistence Awareness artifacts, what is left of genuine substance to fight about?

I must be quick to acknowledge that there are characteristic misbehaviors with the "data-first", code-generation approach that always show up in the code. Every technique is prone to its "signature" mistakes - the kinds of mistakes that are so easy to make that they always leave evidence behind. We'll talk about some of these shortly. But they are minor sins, easily expiated.

My point, in this section, is that, from the perspective of a consumer of the Domain Model, there is no fundamental reason for Data-firsters to produce a domain model that is appreciably different from the one produced by an Object-firster.

There are some differences in how we got to a given place. There are some differences in how we pursue development in subsequent iterations. But if we both consistently produce a domain model that delivers the same capabilities, with similar APIs, in a similar amount of time, with comparable quality, ... iteration after iteration ... then I cannot see why one camp must lord it over the other.

The burden of proof falls on he who would claim that we cannot achieve comparable outcomes.

What About Those Getters and Setters?

There is a faction within the DDD family that is at war with getters and setters in Domain Model classes. I think they make important points. I also think they over-state the benefits and understate the challenges of doing without getters and setters. Challenges begin in earnest when you have to present domain objects in the UI. If we are to move state between a domain object and widgets on the screens, with today's client technologies one is driven to writing intermediaries (e.g., DTOs) that actually do have properties . For a glimpse of the tedious care and feeding such intermediaries require, see Mats Helander's article in Jimmy's book [p.431]

Aside: this has nothing to do with separated presentation per se. If domain model objects are property-less, the "Model" -  in MVC or MVP or embedded in a Presentation Model - can not consist of domain model objects; there must be intermediaries. Perhaps this is a virtue but it is won at hard cost.

If you believe getters and setters are bad, you will really hate the "data-first" ORM approach which emits properties in great abundance. Property generation is the strength of the "data-first" style. It is its strength ... and its weakness.

Before I pursue that thought, I must observe that the "no properties" faction appears to be a minority within DDD. Maybe Jimmy's example application is "old-school" DDD - it's so "2006" - but his domain model classes have plenty of properties and his associates, who build the UI on his domain models, are not shy about working directly with those domain objects and their properties. So I don't think DDD'ers are united in hating properties.

But the property resisters have a great point. DDD stresses the importance of designing and implementing in the language - the ubiquitous language (UL) - of the business domain. If "Get LastName" and "Change LastName" are sensible operations in the UL, the properties belong. But if "ShoeSize" is not a meaningful fact about a person in the UL, we should not have a ShoeSize property.

Data-firsters have the bad habit of acting as if every column in every table is directly expressible in the UL as a "get" and "set" operation. In other words, we have a tendency to expose every column as a property. It's just so easy to do.

The same is true for bi-directional relationships. Order.Customer? Customer.Orders? The ORM can generate them both; let 'er rip.

Again, it's so easy ... that we let the ORM generate these properties ... and now our domain model has unwanted behavior that clouds our vision, adds a point of failure, and consumes testing resources.

There is something insidious in this too. Our ubiquitous language may support setting the Last Name. But not necessarily at will. Not necessarily in isolation from other domain model state or rules. By blithely spitting out a LastName property, we gloss over the careful analysis that should have gone into the decision to expose a mutator of this value.

Do You Design Or Not

Let me stipulate: blind "data-first" thinking combined with rapid code generation is a formula for poor design.

I know the perverse delight in spewing a "model" of 100 table-backed-classes in fifteen minutes. I suppose this is like firing off a few thousand rounds from an assault rifle. Kind of cool on the range; not very cool if I do the same thing at ... I don't know, let's pick on the poor post office again.

Is it the tool's fault? Or is it my fault?

If I use the ORM this way, shame on me. I didn't have to.

I may have to cope with the fact that the legacy ShoeSize column is in my Person table ... and I am not allowed to get rid of it. But I don't have to expose ShoeSize publicly as a property. I don't have to expose both sides of a relation. And, if there is special business logic governing how to change LastName, I can bury the property and write a "message" method to mutate it properly.

In short, if I just fire up the ORM and pull the trigger, does my tool make me an idiot? Or am I an idiot to begin with.

When Data-First Improves Design

A significant portion of Evans seminal DDD book concerns how you determine what the domain model should be; how you align it with the business. The message, repeated in many forms, is "this is very hard."

If you've been a consultant, you know that learning your customer's requirements is wickedly difficult both because she isn't sure what she wants and because you don't understand her business well enough to understand her even if she explained it well.

You have to play anthropologist. An apologist listens to stories, yes, but he also looks at actual behavior and, in particular, the artifacts of the culture he studies.

I suggest that an existing database is one of your most important sources of insight into the culture. That database didn't happen by accident. ShoeSize is in there because someone went to the trouble of putting it there. Just because your client didn't mention ShoeSize once during your interviews doesn't mean you can ignore it. Even if she says "we never use that", the experienced consultant retains the nagging suspicion that something important is missing ... and won't rest until the mystery of ShoeSize is resolved.

We used to say, "show me your data and I'll tell you what your application does." Flippant perhaps, but not altogether wrong.

I almost forgot this maxim because it seemed so obvious. Yet I can't remember it being mentioned in a single DDD book or article. I think you're missing an important design opportunity when you neglect to start from the existing data schema.

Conclusion: DDD is for Data-Firsters too

DDD is first and foremost about studiously matching the domain model to its business purpose. That's hard work. Data-Firsters cannot escape that work - even if running the ORM on auto-pilot seems at first to deliver good results. The schema is only one of the inputs to the ORM; our judgement - what tables and columns to model, how to expose columns or relationships as properties, what data should appear as Value Objects, etc. - is the more important input.

Of course domain model objects have behavior. Data-firsters are not satisfied with property-bag classes. They add behavior as they go ... just as Object-firsters do.

DDD describes structures and design patterns that favor an evolutionary domain model that serves the business. We sometime data-firsters build those same structures and follow those same patterns.

Unit testing is critical to the iterative process promoted by DDD. Our persistence infrastructures must facilitate unit testing. In particular, we must be able to test the model without connecting to a database. Some persistence infrastructures just missed this boat. Big mistake. Unfortunately, many of these infrastructures - I'm thinking of Entity Framework in particular - are associated with tools favored by data-firsters.

I'm going to claim that this is a spurious correlation. There is nothing about the data-first approach that requires an infrastructure that makes testing hard. A persistence aware infrastructure does not have to make unit testing hard. That many do is a correctable error. 

DDD emphasizes the importance of reducing friction in facilitating continual redesign and re-implementation. Friction discourages us from seeing and making the changes that improve the model. Our code-generating ORM tools undoubtedly introduce some friction into the process. The need to regenerate the domain model simply to change a persisted property's name or accessibility is among the more glaring examples of friction.

But I think it's also time for the object-firsters to come clean about the friction they introduce. The friction is not always in the domain model; it pops up elsewhere in the system because of what is not in the domain model classes. Jimmy's book is pretty fair in its recital of the ugliness in the "infrastructure ignorant" discipline. The "no properties" school introduces another, huge source of friction to the process - whatever the compensating benefits.

But I digress. The point I want to make is that, if data-firsters tame their unbridled exhuberance for their tools and use them wisely, they too can practice DDD.

Then we can all build CRUD apps ... of any sophistication.

4 comments:

Shawnolius said...

Can you please help me understand what that problem is with Properties? Why is there a "no properties" camp? What's bad about properties?

Ward Bell said...

@ shawnolius. I will try to summarize here but you may want to follow some of the back and forth here.

DDD strongly advises that every public member of your domain model classes should reveal intent and conform to the domain's "ubiquitous language" (UL) understood by both non-tech domain experts and developers.

A property such as "LastName" is not absolutely clear in intent and may not be part of the UL.

We read it as "get LastName" and "set LastName". Is that really how domain experts talk about a Person? Or do they say "get the name" and "change the name". If so, then maybe we should not expose the inner elements (First and Last name) of a name.

Maybe we should also distinguish between initializing the name - and when that is permitted - as opposed to changing the name - and when that is permitted.

If this is a Passport management app you might well imagine that those operations are very different and subject to different validation rules. The "LastName" property conveys none of these semantics and, in fact, invites abuse.

It's a little harder to make the case against the getter but the reasoning follows the same lines.

See also Martin Fowler's marvelously nuanced discussion in his GetterEradicator post.

Apologies to Gregory Young if I butchered the summary.

Jimmy Bogard said...

One of the tougher things about a data-first approach with an existing database is not that you have to play anthropologist, but more of a Mike Rowe / Dirty Jobs "sewage technician" role.

Existing databases provide some insight, but like any one source, it can be misleading. As databases are more difficult to change and evolve over time, you might see quite a bit of tacit business processes and swivel-chair integration. The skills involved in refactoring databases is not quite honed as code refactoring, so databases don't always keep up with what the business truly represents and needs.

I see some definite truth in "show me the data...", but existing schema, like code comments, can lie about their true business intent. Of course, if you have something like CUST_DTL_FLAG_1 columns to deal with, it's all moot anyway.

Anonymous said...

Database design for large databases is an art, and I think that it should be managed from that perspective. Sure, in toy applications you can do it the other way around, but not in the reality of enterprise development. At least, not effectively. Performance is always an issue in large, high volume/traffic apps - in both the database and application. ORM tools are not and have never claimed to be the best way to ALWAYS access/update data. They save tons of money and keep the code clean with strongly typed members, commonality, and of course the much undervalued intellisense. It is important to also be able to work outside of the ORM box for performance tuning and other parts of the app have custom requirements. Ignorance is the real culprit. People that won't stray from thier set way of doing things. Like trying to use a datareader instead of a dataset, or writing directly to the http response stream instead of using viewstate controls to optimize performance and reduce page size, or writing a custom proc for mass updates - whatever it is. There is no perfect model for a highly tuned application. ORM saves time and money - and it is as flexible as anything else - you can always go outside that box whenever you need to.