Tuesday, August 9, 2011

And the (ORM) survey said …

In mid-May 2011 I ran a short survey about ORM usage among .NET developers. I wrote a draft of this blog post back then … and forgot to publish it (doh!). I’m finally doing so now in August in hopes that it remains relevant. I don’t know if attitudes have changed (do you?) but what the heck … I might as well …

The Survey

The survey was prepared on and hosted by SurveyMonkey, a free/low-cost survey site. It ran from Monday, 9 May, to Wednesday, 18 May, 2011. It consisted of two questions. Here’s question #1:

ORM_Survey_q1

If … and only if … you answered “Entity Framework” did you get to see question #2:

ORM_Survey_q2

The survey permitted only one survey answer per person but you could come back later and revise your answer if you wished.

The Results

You can see the survey results on the SurveyMonkey website.

I downloaded all 896 individual responses as of May 18 into an Excel spreadsheet.The spreadsheet is useful if you want to correlate comments with question answers. The comments can be entertaining.

Here’s a picture of the statistics as displayed in the spreadsheet:

ORM_Survey_stats

Survey Goals

The survey seeks insight into how developers of .NET data-driven applications prefer to manage their data. More specifically:
  1. Would they use an ORM … or not.
  2. If they would use an ORM, would they use Microsoft’s Entity Framework; the popular, open source NHibernate; or some other ORM tool.
  3. If they would use Entity Framework (EF), which style would they use to develop their entity models. As of version 4.1, EF offers three styles called “Database First”, “Model First”, and “Code First”.

    For you readers who don’t know what those styles are, I recommend Julie Lerman’s summary of these styles (she calls them “workflows”) and her May 2011 MSDN article on the subject.

I wanted developers to respond as if they were:

  • building an application with a medium to large model
  • free to choose the data modeling/access technology
  • required to work with an existing relational database of many tables, filled with irreplaceable production data.

Survey Design

In this section I talk about how the questions were formed, how I found the respondents, and survey bias.

The questions

The survey only had two questions – and you couldn’t reach the second question unless you answered “Entity Framework” to the first. I was adamant about keeping it short. I hate long surveys and assume others do too.

I really wanted to ask demographic questions. I wanted to know about respondent backgrounds, what kinds of apps they built in which client technologies for what kinds of customers. Were they architects, trainers, or practitioners? Were they actually committed to building with the selected technologies or only hoping to do so or merely advising others to do so?

I backed off mostly because it would have bulked up the survey but also because I had no hypotheses at that time that required demographic information. As a general rule flailing away with pointless questions invites spurious correlations. If the survey results suggest new hypotheses, someone can pursue them independently.

Some folks were miffed that they couldn’t answer the EF development style question unless they intended to use EF. Sorry … sort of. Question #2 is about how EF people would use EF. How an NHibernate person would develop an NHibernate model is interesting but out-of-scope. Moreover, I didn’t want to muddy the interpretation with advice to EF developers from people who wouldn’t use EF. The second question is not a vote on best practices or preferred technologies; it’s trying to surface the practices that EF people intend to use.

A number of respondents want to pick more than one answer. They wanted checkboxes instead of radio buttons. Sorry but you’ll have to come up with your own survey. I deliberately forced you to choose … as you would have to choose when faced with one application to build. You were not granted the consultant’s luxury of saying “it depends”. I left you room to vacillate in the comments.

I have one regret about question #2. I failed to make clear that the “100+ table, RDB application” context that I described in question #1 should apply to question #2 as well. That was my intention. I didn’t say so in the phrasing of the question; it is clear from some of the comments that many respondents weren’t sure about this. I’ll never know the degree to which that uncertainty skewed the results.

I’m going to do the only responsible thing and blame my survey reviewers for failing to raise the alarm before I launched the survey; shame on you guys!

The Sample

The sample is far from “scientific”. I did not ask an independent, qualified research group to identify and survey an unbiased sample. I simply announced the survey on Twitter on May 9th. I asked everyone I could think of to re-tweet it; many of them did. I learned that IdeaBlade announced it on the DevForce forum but no one, to my knowledge, sent email to any of our lists. I figure if you answered the survey, you learned about it on Twitter. Which means you probably follow me or follow someone who follows me.

It’s reasonable to suppose that we have interests, opinions, and experiences in common. We are in none of these respects the norm for the general .NET developer population. Accordingly, I’m reluctant to apply my interpretation to that broader population. Well … kind of reluctant.

Frankly, I am too cheap to do it properly. While I think the survey questions are defensible, I’ll concede that the sampling technique is not … and introduces a significant bias of some sort. Of what sort? I take that up next.

Survey Bias

Many of you took me to task on my approach to sampling … and we’re free with your conclusions about how that would bias the results. My secret pleasure in this is that you had so many different and conflicting notions about what that bias would be.

Some were sure that it biased the survey toward EF. Some said it biased the survey against CQRS and NoSQL. Some said it biased the survey against native ADO approaches. Some said it biased the survey in favor of egghead pontificators who have no practical experience.

Some people see a bias in the survey’s goals. I suppose there is an existential “bias” in one’s research interests. I’m please to ponder the significance of wanting to know one thing versus another. Don’t stop there; ask me why I’m running a survey about ORMs instead of world peace.

I’m more concerned about survey bias. What factors in the questions, the sampled population, the conduct of the survey, and the interpretation of results that undermines the survey’s avowed research goals?  I’m sure there are plenty of such factors. I’m not sure where they point.

I was twice accused of confirmation bias, which in brief means “a tendency for people to favor information that confirms their preconceptions or hypotheses regardless of whether the information is true.”  I’d be more susceptible to this charge if I had preconceptions to confirm. Take question #2 for example. I don’t have a theory about which EF development style people will prefer. I guessed that “Code First” would win about 10% of the votes, not 42%. But I was no more invested in the outcome than I am when guessing the number of jelly beans in a jar.

Question #1 is a more delicate matter. My company placed a bet several years ago that EF would eventually dominate the .NET ORM space. Around 66% of survey respondents favored EF; NHibernate was a distant second at 18%. This outcome is certainly consistent with my expectations. Is that evidence of “confirmation bias”? You can’t call a survey “biased” simply because the results seem to favor the author’s prediction. You have to explain how the question or the conduct of the survey favored EF. I don’t understand how that could be in this case. The question seems innocuous to me. And while my twitter-verse is indeed ORM friendly, it is not pre-disposed for or against EF; I’ve got a ton of EF-hating friends.

So you still think my audience tilts towards EF? Maybe it does. But is it significant? The gap between the EF preference (62%) and the preference for all other ORMs (18%) is 44%. How do you explain that?

Bias toward ORM uses

As I see it, the gravest potential for misinterpretation would be the misguided assumption that most developers know about, care about, or use ORMs.

People reading this blog as well as people who took the survey are acquainted with ORMs. We either use one or have used one. We like them or we hate them. Remember, only 7% of respondents said they wouldn’t use an ORM

The sample must be severely skewed if only 7% say they won’t use an ORM. I know that the majority of developers in the general population don’t use an ORM. How do I know ? OK … I don’t have hard facts. But I have strong anecdotal evidence. Most of our customers and prospects have never used an ORM before. I’ve been to a lot of user group meetings and, based on a show of hands … on several occasions …, a tiny few of the attendees know squat about ORM.

We have to be careful in our interpretation of this survey. We can’t say much about developers in general. We might have gleaned something useful about developers who use ORMs.

Interpreting the Results

I wasn’t surprised by EF’s dominance … as I noted above. It seems EF v.4 is finally over the bad press that greeted v.1 and is begrudgingly accepted as “viable” even by staunch NH supporters.

I was surprised at the affection for Code First. Few could yet have tried it. Evidently it taps into some disaffection with the “database first” workflow and its visual designer. People I deeply respect have tried it a few times and they like it a lot. They haven’t built anything significant with it yet … but what they like about it resonates strongly with me. I’ll have more to say in a future post.

On the other hand, “database first” has ardent fans (read the comments). It’s the majority choice and will remain so as long as most developers who conceive of their applications primarily in terms of the data they store. Database management tooling is capable and the practice of simultaneously evolving schema and code is entrenched.

I’ve tried both “database first” and “code first” development. There isn’t an obvious productivity winner. One could pick a side on other grounds, but only a clear productivity edge will shift the balance among active developers.

So it looks like “Code First” will have a strong following much sooner than I expected. But it won’t hurt “Database First” which will remain the popular choice.

My Commercial Agenda

I’m a curious guy … but I don’t run surveys for the fun of it. The survey was honest but it wasn’t innocent.

Consider the survey’s framing context: an application built to accommodate a 100-table existing database. My company, IdeaBlade, makes and sells a product called DevForce. DevForce helps you build data-driven, Rich Internet Applications. Our sweet spot is the customer who intends to migrate an existing Line-of-Business (LOB) application to Silverlight or WPF. The existing data store is almost always a relational database.

This customer is torn about how to get there. He’s acknowledged that the application will have to be rewritten back-to-front … which opens the door to a new data access architecture. But he’s not completely free to start from scratch. He has an existing business to run with real customers and real data. He’ll have to maintain the existing application while building its eventual replacement. He is naturally reluctant to rip out the database or even restructure it. I wanted the survey to elicit responses from professional developers facing a comparable situation.

EF First

The attention to Entity Framework is no coincidence either. DevForce is not an ORM and can work with any source of data; our value really kicks in after you’ve defined your entities and figured out how you will persist them.

But DevForce is cozy with the Entity Framework. Close alignment with EF was a business decision, not a technology judgment. It has worked out well for us and our customers. The survey results indicate that we made a sound choice … and shouldn’t bother integrating with NHibernate or another ORM; the commercial demand is simply not there.

You may feel differently. You’re not selling an infrastructure product. You may have good technical reasons to prefer a different ORM. I suggest, however, that you factor industry trends into your decision.

Code First

We really wanted to assess the appetite for the new EF “Code First” style. DevForce currently favors the “Database First” and “Model First” styles. We extend the EDM with our own attributes, the EDM Designer with our own properties, and we can generate our own classes from the conceptual model’s XML.

Should we stick with those workflows or add support for “Code First”? We suspected that “Code First” would attract an audience. We weren’t sure how big or when.

The survey tells us it’s going to be big soon; nearly 42% of EF respondents chose “Code First” as their preferred style, just 5 points below “Database First”. So we’re hustling on our Code First support. We just released the first preview which gets us more than half way home. You can read about it here and I'll have more to say in future blog posts.

Got an opinion? I want to hear it.

1 comment:

John Jones said...

Ward,

Interesting survey. You could have announced it here on your blog. I don't use twitter because it's too much noise. I'll share my experience with EF.

We are in the middle of re-implementating our UI with EF Code First. We have a 150+ table existing database with many schema issues. We picked EF after following the development of 4.0 and 4.1, using all the CTPs against our database since late 2009. We compared EF against Linq2SQL or no ORM. I can't say I'm really familiar with other ORMs like NHibernate, although matching an existing, very-imperfect database is not the sweet spot for ORMs in general.

We also sweated when EF 4.1 would release. Linq2SQL wasn't happy with our database. Generating a model from the database didn't have relations that aren't in the database as foreign keys (we have lots). Working in the model designer was hard with 30 tables. 150 seemed unworkable. There were other issues, but "Code First" works far better for us. We also prefer the concise, code-style configuration (it's better now than in some CTPs), but that's only gravy.

We often had to consider that no-ORM might be the only way to get things done. We also found that generated SQL was often immensely bloated and had subpar performance (albeit not as bad as it looks). We addressed that by using stored procedures where performance was an issue (this wasn't many spots, but isn't completely integrated in Code First yet). Rumored performance improvements in EF v.next sound great, but are too late for our first release.

I think many using ORM want a model that exactly mirrors the objects they want to work with. I have friends who do this in Java and love it. That's not where we are -- it wouldn't work with our database. We're looking for compile-time checking (very important), intellisense, and friendly objects we can use in our code. We also have large tables (1,000,000 rows is not uncommon in some tables), so performance can be a significant problem. We made this work in an application with plenty of user-friendly features, so I'm strongly with EF Code First.

JJ