Thursday, March 22, 2012

Squash Entity Framework startup time with pre-compiled views

In brief

Your application can stall for several minutes while Entity Framework gathers the information it needs to perform queries and saves, a lengthy process it performs twice: before the first query and before the first save. Those minutes pile up, wasting developer time and angering your customers. You can drastically reduce these delays by pre-compiling the Entity Framework’s “views” of your model … as I explain in this post and demonstrate in its 14 minute accompanying video.



Costly EF startup

If you’ve used Entity Framework for a line-of-business application model, you’ve suffered a lengthy delay before the first query completes and a similar delay before the first save completes. Subsequent queries and saves finish are much quicker, completing in an amount of time commensurate with the request.

The delay is a non-linear function of the number of entities in the model. It often feels exponential. You probably won’t notice it in a toy model (every demo you’ll ever see) because the delay is lost in the wash of everything else that you’re thinking and learning about. But when the model grows to normal size – 100 or more entities – the delay mushrooms to a minute, two minutes or more. And you suffer this delay every time run the application … which you do all day, every day during development. Multiply that by the number of developers on the project and you’re wasting a lot of time … and money.

The cost is far worse than the time lost. Make a developer wait two or three minutes per iteration and she’s bound to forget why she ran the app in the first place. Two minutes is a long time. The mind wanders. The mind turns to email, Twitter, and Facebook. Productivity is shot.

Now I don’t think you should be going near a database during normal development iterations. I recommend that you toggle the app to run against an in-memory representation of your data layer such as the DevForce “Fake Backing Store”. But maybe you’ll disregard my suggestion. And everyone has to hit the database occasionally just to confirm that the app works end-to-end.

So the development cost is terrible no matter what you do … unless your developers’ time is free; perhaps you price them at zero dollars and you’re response to every productivity decline is to hire more developers. Your second instinct is to outsource.

What about your customers and internal end users? If the app runs 2-tier, they suffer the delay every time they launch the app. Does their time matter to you? I’ll bet someone will make sure it matters to you.

You won’t field customer complaints if your application runs n-tier (e.g., in a Silverlight application) because the Entity Framework runs on the server. The startup penalty is paid only by the first user to query and save. If you run n-tier and you don’t care about developer productivity, turn the page and move along.

Pre-compiled Views to the rescue

I’ve been wondering what to do about this for a long time. I’d heard that “Entity Framework Pre-compiled Views” might help. I also had heard that it was troublesome and might not work. It seemed like one more thing to get around to someday.

Then one of our professional services customers called and complained. His project had started fine but hit the wall at around 200 entity types. The first query and first save each took about 50 seconds on most machines. Team productivity had sunk, morale was sinking, and he was catching serious political flak internally. Our own staff confirmed that the problem was real. Since we (IdeaBlade) had recommended EF Code First, we had to do something.

My colleague, Steven Schmitt, did the leg work that proved EF pre-compiled views (a) work for Code First models, (b) were easy to create, and (c) improved performance dramatically: the 50 second first query dropped to seven seconds; the 50 second first save dropped to less than one second.

He deserves the credit … I’m taking the glory by blogging about it.

The accompanying 14 minute video shows EF’s slow launch times for a 200+ entity model, demonstrates how to create pre-compiled Views, and explains a bit about how they work.

I produced the video to spare you a parade of screen shots. I think it also conveys the seriousness of the problem and the practical benefit of pre-compiled Views more effectively than I can in spare prose.

The EF view generation tool does not work with EF 4.3 yet. Microsoft sources report that an update is in the works.

For those of you who want just the facts, here they are:
  1. Ensure that SQL Server Express is installed. You can get around it with a DefaultConnectionFactory but it’s such a pain. Save your energy for better things and just install the thing.

  2. In Visual Studio 2010, open the Extension Manager (Tools | Extension Manager).

  3. Search for “Entity Framework Power Tools”. The version as I write is “Entity Framework Power Tools CTP1 0.5.0.0”.

  4. [optional] Review the online information about it. These tools do more than pre-compile EF views.

  5. Locate your custom DbContext class in Solution Explorer [note: we’re describing how to pre-compile views for an EF Code First model. You follow a similar approach for an EDMX-based model although I haven’t tried it personally.]

  6. Make sure that your DbContext class has a public parameterless constructor … or the tool will fail in a mysterious way.

  7. Select your DbContext class, right-click, and select “Entity Framework”

  8. Select the “Optimize Entity Data Model” sub-item

  9. Wait … the tool takes a while to compile the "views”.

  10. When it’s done, your DbContext has a companion DbContext.Views class file.

  11. Build and run.
You should notice an immediate improvement in start time. There is still a delay before the first query completes. But it should be a fraction of the former delay … around 1/7th of the time. The delay for the first save should be gone; it takes no longer than the second save.

DevForce Developer Notes

Your DevForce application benefits from EF Pre-compiled views when you follow these steps. A DevForce Code First model doesn’t have to have a custom DbContext class … but you will have to create one to use this tool.

DevForce developers typically don’t define a parameterless constructor because DevForce wants a constructor that takes a connection string. Add the parameterless constructor anyway. Don’t worry, we will pickup the appropriate constructor at runtime.

When the model changes

Entity Framework detects if your entity model classes have changed since you compiled the EF views class. When you attempt your first query, you’ll get a clear runtime exception telling you to re-compile the views class.

Only database-related changes to persisted data and navigation properties matter. You can add UI hint attributes (e.g., [Display…]) and non-persisted custom properties (e.g., FullName) without triggering an exception. Any change that would affect the mapping between your entity classes and the database will trigger the exception.

How does EF know that the model has changed? I’m not certain but I have a pretty good guess. Ignore the views class filename and look at the name of the views class itself. It will be something like “ViewsForBaseEntitySets72E6108A34B7DB042DBA3C465F35B967B4E3C76051DFBAB958B69CB0D23EA8B7”.

The hex suffix at the end looks like a hash. I’m guessing it is a hash of your entity model classes and that Entity Framework spends the initial seconds before the first query reflecting over and hashing your entity model classes before comparing that hash to this views class suffix. Inside the class itself are a couple more hash values. Maybe it's using those too or instead. Someday I'll find out. It's evident that its doing some kind of comparison between the entity model classes and this views class to ascertain if there is a disconnect.

Anyway, at runtime, if EF detects a difference, it throws an exception which should terminate your app. You'll encounter the exception quickly and unmistakeably when your app first requests data. I presume that will be before you push to production :). Just re-run the tool and you should be back in business.

At IdeaBlade we’re looking into a way to detect the views/model incompatibility at build time and regenerate the pre-compiled views automatically.

Meanwhile, it’s good to know that EF fails fast when the pre-compiled views and your model are out of sync … and the remedy is as simple as re-running the tool.

Hope this helps real-world EF developers everywhere.

Update - March 23

My buddy Steve Schmitt reminds me of a few more points.
  • Rowan Miller and the EF team deserve credit for developing the EF Power Tools; we just downloaded it.
  • There’s a bit more info about the tool online.
  • If you don’t want to regenerate the views for whatever reason, you can just delete the views file and you’re back to “normal”.

Update - April 6

The EF team published this month an important white paper on performance in EF 4 and 5 that bears on pre-compiled views and other tactics that could make a significant difference for your project.

Thursday, March 8, 2012

Synchronous tasks with Task<T>

I extracted this thought from an email by Microsoft’s Brad Wilson and circulated within my company. Why not share it with you?

Brad starts with an important piece of advice: don’t make a synchronous activity async!

Ok, but how do you construct a Task<T> that you’ll consume within the context of a bundle of tasks? Brad shows how. Hey … thanks Brad!

-----
The following is an anti-pattern with tasks on a server:
return Task.Factory.StartNew(
    () => model.Deserialize(stream, null, type));
This will run your code on a new thread, forcing a context switch, which is unnecessary because your code is fundamentally synchronous. If you’re going to run synchronously, you should just run synchronously, and return a TaskCompletionSource that’s populated with your result. For example:
object result = model.Deserialize(stream, null, type);
var tcs = new TaskCompletionSource<object>();
tcs.SetResult(result);
return tcs.Task;
If Deserialize might throw, then a version with try/catch would be a better implementation of the Task contract:
 var tcs = new TaskCompletionSource<object>();

 try
 {
     object result = model.Deserialize(stream, null, type);
     tcs.SetResult(result);
 }
 catch(Exception ex)
 {
     tcs.SetException(ex);
 }

 return tcs.Task;

What do you mean by "fundamentally synchronous"?


I can assure you that the Deserialize method in question is synchronous.
model.Deserialize(stream, null, type));
That expression blocks until it returns the deserialized object. You see the stream parameter and think "this should be an asynchronous method". Maybe it should be, smarty pants; come back when you have written a deserializer that can reliably produce an object graph without reading the entire stream first.

While we're waiting for your DeserializeAsync implementation, let's push on the proposition that we should not move the execution of Deserialize to another thread.

Clearly, if this method is reading a stream, it could take "a long time" to complete. If we are running on the client, we'll freeze the UI until the deserialization completes. That can't be good. And it isn't. If we're running on the client, you should consider moving the execution of this method to another thread.

In this case, we're running on the server. There is no user waiting for the method to return so we don't care about speed on any particular thread. I'm sure the client cares about a fast response but the response isn't coming until the work is done ... on one thread or another.

We do care about total server throughput. We gain nothing by moving execution to another thread; in fact, we lose because of the thread context switching cost. This is why Brad says that spawning a new task with "Task.Factory.StartNew" is "an anti-pattern ... on a server". It's cool on the client; not cool on the server.

I'm ready with DeserializeAsync; now what?

Should we invoke the async method within a delegate passed to "Task.Factory.StartNew"? No, we should not!

This surprised me too ... until someone walked me through it ... until someone asked me "what do you think will happen on the thread you spawn?" I realized that all I would do on that new thread is dream up some way to wait for DeserializeAsync to finish. Of course DeserializeAsync spawns its own thread so I've got an original thread waiting for my task thread which is waiting for the DeserializeAsync thread. That's a complete waste of time ... and a pointless, resource-wasting context switch.

What's the point of TaskCompletionSource?

We're in this situation because for some (good) reason we want to expose a method - synchronous or asynchronous - as a Task. We don't want or need to spawn a new thread to run that method. We just want to consume it as a Task. The TaskCompletionSource is the wrapper we need for this purpose. It lets us return a Task object with the Task API that we, like puppeteers, can manipulate while staying on the current thread.

For another, perhaps better, and certainly excellent explanation of this, I recommend the following Phil Pennington video on TaskCompletionSource. Happy coding!