Proactively Lazy: 2013

Saturday, December 28, 2013

Creating an MVC framework part 2 - First Decisions and Code

In writing the first bit of code a quickly came across a few key decisions. The very first one was what sort of routing I wanted to include. This lead to a deeper question of what sort of applications I wanted the framework to be suitable for.

My career has largely been spent creating line of business applications and that is the priority focus of the framework. Eventually I would like to expand on that but with limited time you have to simplify or get no where. I also didn't intend for it to be particularly opinionated but I'm naturally getting pushed that way. Again I hope to make it more flexible in future but for now it's simply not a priority.

The first casualty of this is routing, otherwise known as pretty URL's. This feature is simply not that important in a line of business app and it simplifies things to not have to worry about it at this point. For now, the only route is /{controller}/{action}/{querystring}.

The other big decision I made is project structure. I'm sick of having various parts of an application strewn about the project, or several projects, based on type. The grouping of this framework is going to be done by function. A standard project layout will be along the lines of:

MvcProject
-\_Common
---_Layout.cshtml
-\Product
---_Controller.cs
---List.js
---List.cshtml

Let there be Light

So I created a new web application and in the global.asax.cs I put:

protected void Application_Start(object sender, EventArgs e) {
RouteTable.Routes.Add(new Route("{*url}", new RequestHandler()));
}

This tells ASP that we want to handle all incoming URL's with our request handler, which is all I want for now. RequestHandler looks like this:

public class RequestHandler : IRouteHandler, IHttpHandler {
IHttpHandler IRouteHandler.GetHttpHandler(RequestContext requestContext) { return this; }
bool IHttpHandler.IsReusable { get { return true; } }
void IHttpHandler.ProcessRequest(HttpContext context) {
context.Response.Write("Hello World!");
}
}

IRouteHandler and IHttpHandler are the interfaces we need to implement to recieve requests from ASP. I'm not sure if it's a good idea to combine them or not but it works for now. The real meat and potatoes starts with ProcessRequest though, it might not look like much but this is the entry point to our framework. This is where the adventure begins.

Wednesday, December 18, 2013

Creating an MVC framework part 1 - Why???

So I was working on an application framework, something that would take some of the drudgery out of creating a new application. I like things to be nicely structured, but setting up this architecture takes time. Usually by the time I've set it up that urge to solve the original problem had waned and my projects folder was filled with yet another orphan.

I just wanted to solve the most common things. Validation, logging, permissions are the obvious ones, I find the implementation of these in microsofts MVC framework to be pretty terrible, it's almost always one of the first things I replace. I like the command/query model, so I wanted that in place. I want to be able to offer desktop/mobile integration further down the line. I want easy integration with knockoutjs. Most importantly, I don't want to have to set all this up every single time.

So as I'm sporadically putting various parts together I noticed one thing. MS MVC is getting in my way and stopping me from creating this framework, It must go! So what was once going to be an application framework is now going to be MVC framework as well.

A touch of arrogance

I think if your going to replace something then you need to replace it with something better, which requires at least a modest level of arrogance. You have to know how what you want to do better and hopefully have a rough idea on how to get there.

As I got further through my exploratory and brainstorming phase I started to get more confident that yes, I can make a better MVC.

Controllers

The first thing I started looking at is what I wanted my controllers to look like. I came up with this:

public abstract class Controller {
[Route("/product/view/*/{Id}")]
public abstract object View(ProductDetailsQuery query);
public abstract object List(ProductListQuery query);
[Roles("ProductManager")]
[HttpPost]
public object Create(ProductCreateCommand command) {
System.Execute(command);
User.Message(MessageType.Success, "Product has been created");
return User.Redirect("/Product/List");
}
}

There are a number of interesting things here, First of all, the controller is responsible for defining a public API, so routes and permissions are defined by attributes. MS MVC has recently added attribute based routing as well.

A permissions system will be provided by the framework that will allow role based and finer grained permissions.

The controller is abstract. For many common actions like view and list in the above sample, all the controller does is delegate responsibility. Abstract actions will have the ability to be routed by default. Saving those couple of lines of code was one of the main drivers for my framework, yes I'm that anal.

The HttpPost may seem fairly standard, but by convention this will enlist a transaction.

All action return objects. There are no view results, json results or anything like that. It's one object in, one object out. The format of the result is determined by the caller.

We don't have to check if ModelState.IsValid on *every single function*. That is done for us.

It's just called Controller. I will mention the project layout at a later date.

There are a couple of big interfaces in there, User and System. These are basically wrappers for real objects. I really like this abstraction because it makes it explicit who where interacting with. Where not interacting with NotificationSubSystem, where interacting with the user. We aren't interacting with the command executor, we are interacting with the system.

Fingers Crossed

This is the current plan any way. Some ideas may turn out to be bad ones, some might be brilliant and there could be many more additions. Finding out which is which should be a fun journey.

Sunday, December 15, 2013

ORM vs SQL Part 3: Filtering

The final of the 3 basic scenarios that nearly every application will need is filtering, the where clause in an sql statement. An application will have to display a list of products in a category, filter a list of users by email address, get an order by it's id or something along those lines. It's practically impossible to write an application with out it.

The most interesting thing about filtering is that it's rather simple to write a query that can handle all your scenarios. The problem is that it renders all your carefully planned indexes irrelevant or worse.

Sql server can be quite fickle about using indexes, depending on the cardinality, execution plan and phase of the moon. I'll show just how stupid it can sometimes be.

Filtering with an ORM

Once again we can build our dynamic query with an ORM quite simply:

var query = Session.QueryOver<User>();
if (string.IsNullOrEmpty(filterId))
query = query.Where(x => x.Id == filterId);
if (string.IsNullOrEmpty(filterLastName))
query = query.Where(Restrictions.On<User>(x => x.LastName).IsLike(filterLastName));
if (string.IsNullOrEmpty(filterEmail))
query = query.Where(Restrictions.On<User>(x => x.Email).IsLike(filterEmail));
var result = query.List();

Once again, our ORM builds a prepared statement for every unique situation which can make use of the appropriate indexes if we have any. It's not neccessarily fast, that will depend on your performance tuning, but it's as fast as it can be.

Filtering with Sql

Dynamically filtering with sql is also quite simple. To get the same functionality as above:

select TOP(20) id, lastName, firstName, email
from users
where --lastName = '21250A22-6B3F-452C-9FE3-7EBBF216B13C'
(@id is null OR @id = @id)
AND
(@lastName is null or lastName Like @lastName + '%')
AND
(@email is null or email LIKE @email + '%')

This is much more elegeant than our solutions in the past two posts. The first problem is, what if you want some other logic, if you want to search by last name or email then this won't work. You'll either need something far more elaborate or you'll need several identical queries. The other problem is that it may not use an index, or even worse, will use the wrong index.

What index will this use?

Well the answer is it depends on a lot of things. The first query, collected statistics and moon phases being the primary influences. I don't know about you, but I like my systems to perform consistently. Assuming the above query is in a stored procedure:

execute the procedure with @id=null, @lastName='abc', @email=null
Query finishes in ~280ms. The index on lastName was used
execute the procedure with @id=null, @lastName=null, @email='abc'
Query finishes in ~11000ms, still using the index on lastName, not the index on email
execute the procedure with @id='C15BAFAA-B772-4863-8849-00413B531C29', @lastName=null, @email=null
Query finishes in ~11000ms, still using the index on lastName, not the primary key index
execute the procedure with @id=null, @lastName=null, @email=null
Query finishes in ~12000ms, still using the index on lastName

Deleting our indexes and repeating the queries in the same order takes ~50ms, ~60ms, ~20ms, 20ms.

So a query like this will not only render many of your indexes useless but can cause the wrong index to be used, which is much worse. Again the tailor made query that an ORM will generate will have much better performance than a generic one that you can create.

Tuesday, December 10, 2013

ORM vs SQL Part 2: Ordering

Most applications will need to to control the ordering of the data being presented to users, the exact requirements will differ depending on the application and situation, but virtually all will require it. Sometimes the same data will presented in several places, each requiring it's own order for natural reasons. Other times dynamic tables will be presented and users will control what order the view information in. Some even allow full control and let the users specify multiple orders.

In the later two cases ordering usually goes hand in hand with filtering, but I'm going to leave the filtering until part three. In part one I looked at paging.

Ordering with an ORM

Simple ordering is straight forward with an ORM (and with sql for that matter):

var users = session.QueryOver<User>().OrderBy(x => x.LastName).Asc.List();

In fact, at this level, sql is even simpler. An ORM starts to shine when you have some other logic controlling the ordering because you get a composable query object to work with, for example:

var query = session.QueryOver<User>();
if (orderByLastName == true)
query = query.OrderBy(x=>x.LastName).Asc;
if (orderByLastLogin == true)
query = query.OrderBy(x=>x.LastLogin).Asc;

By composing queries like this, any ordering that can be logically described can be performed, your only limit is your own code and how much power and flexibility you want to give the end users.

The ORM will generate the sql needed for that individual scenario and make full use of our indexes.

Ordering with Sql

Simple ordering in sql is extremely straight forward:

select TOP 10 * from users order by lastName

Most importantly, this will use the index on lastName. On my computer this will execute in ~5ms with 1,000,000 rows. This is essentially what our ORM will generate.

Now, let's say our application is showing a list of users in two places, the first we are trying to find users by last name and the second we want to see who were the latest to login. We can write two stored procedures, which will be identical apart from the order by clause. Or we can add some dynamic ordering to the query:

SELECT TOP 10 id, lastName, firstName
FROM users
order by CASE
WHEN @OrderBy = 'LastNameAsc' THEN lastName
WHEN @OrderBy = 'FirstName' THEN firstName
END

The drawbacks for this are that the data types have to be the same in the case statement, so we can't use the same query to order by lastLogin. Likewise, we can't mix ascending and descending. And even if we are ordering on lastName, which has an index in this scenario, the index will not get used. It also performs quite badly, 753ms which is more than two orders of magnitude slower for the same functionality.

In part one I said the new 2012 offset/next syntax performed quite slowly with dynamic ordering, this is why.

Ordering with Sql - Again

So being unable to mix datatypes is a deal breaker in our scenario. we want to order by either lastName or lastLogin. The easiest way to do this is to invite our old friend from part one, rownumber:

SELECT TOP 10 id, lastName, firstName, CASE
WHEN @OrderBy = 'LastNameAsc' THEN ROW_NUMBER() OVER(ORDER BY lastName)
WHEN @OrderBy = 'LastLoginDesc' THEN ROW_NUMBER() OVER(ORDER BY lastLogin desc)
END as rownumber
FROM users
order by rownumber

Finally, we can order by multiple data types and by ascending or descending order. But it comes at a hefty price, it's yet another order of magnitude slower, ~5800ms, way beyond what I would consider acceptable.

Ordering and Paging

So now let's look at the combination of paging and ordering:

SELECT TOP(@take) id, lastName, firstName, email, lastLogin
FROM (
SELECT id, lastName, firstName, email, lastLogin, CASE
WHEN @OrderBy = 'LastNameAsc' THEN ROW_NUMBER() OVER(ORDER BY lastName)
WHEN @OrderBy = 'LastLoginDesc' THEN ROW_NUMBER() OVER(ORDER BY lastLogin desc)
END as rownumber
FROM users
) as query
WHERE query.rownumber > @skip
ORDER BY rownumber

Our simply little select isn't so simple any more and the performance has suffered terribly (~8500ms), to the point where it is immediately noticeable to end users. All because you thought ORM's were too slow.

We are quickly approaching the limits of what we can do with sql. The ORM can easily handle ordering over multiple columns dynamically, to do the same with sql would require a fundamentally different, and probably much more complex, strategy.

Why are ORM's faster?

In this scenario our ORM is 3 orders of magnitude faster than a stored procedure with the same functionality. This is simply because it generates queries dynamically, it doesn't have to account for every possibility at programming time because it can adjust at run time instead.

Sql, despite frequently being championed as faster, can not even make use of indexes appropriately as soon as you throw a real world situation at it.

The final score, with paging and ordering is ORM: ~10ms, Sql: ~8500ms.

Next up, I'll take a look at dynamic filtering.

ORM vs SQL Part 1: Paging

In this series of posts I want to talk about the real world ramifications of not using an ORM for application development and why falling back to SQL and stored procedures is a poor choice.

Quite often the argument seems to be that an SQL select and a few joins is simple and effective (which it is), and that ORM's are a useless abstraction. The problem is that simple selects don't get you very far, even when creating the CRUDiest of applications.

In real applications, we have to do paging, sorting (dynamically), joining (dynamically) and filtering (dynamically). That's just on the read side, not even considering what an ORM can do for you when your writing data. I'm going to go through the abstractions an ORM provide and show how they make life easier for us.

Some caveats before I go on:

I'm talking about general application development, web, desktop or mobile doesn't really matter here.
I'm talking about general line of business applications. If your working on something more exotic you may have legitimate needs that an ORM doesn't cater for.
The ORM I'm using will be nHibernate, but the same applies to all ORM's.
I'll mostly be talking about Sql Server.
Stored procedures have there place, if your ORM creating a legitimate bottleneck you should feel free to work around it entirely.

Paging with an ORM

Here is an example of paging a list with nHibernate, it's simple and it works, I don't even have to bother explaining it:

Session.QueryOver<Product>().Skip(20).Take(10)

Paging with SQL

I'm going to assume Sql Server here but it varies from database to database. Before we go down that path I will also say that database independence is not a good reason to use an ORM.

A quick Google search for sql paging will give you quite a number of different possibilities. MySQL, for all it's faults, at least includes the ability to say "LIMIT 20, 10", which means skip 20, take 10. Those of use working with Sql Server aren't so lucky, it blows my mind that such a common scenario wan't at the forefront of developers minds when Sql Server was first created and they've had to add various hacks over the last decade. Now that we can (hopefully) pretend that Sql Server 2000 no longer exists the best option seems to be:

SELECT TOP (@take) id, name
FROM (
SELECT
id, name, ROW_NUMBER() OVER(ORDER BY name) as rownumber
FROM products) as query
WHERE query.rownumber > @skip
ORDER BY query.name

This has good performance, it's relatively simple and it works well with dynamic ordering, which I'll get to in part two. This is exactly the sql that nHibernate will generate for you. Another possibility, as of Sql Server 2012 is:

SELECT *
FROM products
ORDER BY name OFFSET @skip ROWS FETCH NEXT @take ROWS ONLY;

While still lacking the elegance of MySQL, it works and is faster than the previous example. Throw in dynamic ordering though (part 2) and the performance goes out the window. There are other possibilities I haven't covered, the pros and cons of each is worth it's own blog series, though I won't be the one writing it (if you know of one, let me know and I will link it here).

The dumbest one I've seen, unfortunately the most common also, is to page on the client side (client being the application, not end user). The idea seems to be that you cache entire result sets and filter/order it in application code. People actually think this performs better, but I'll save this for a longer rant later.

That doesn't seem too bad

Well it isn't too bad but it's not such a simple query anymore. This extra complexity goes into nearly every query in the application, so it's a non-trivial amount of extra code to maintain. In later posts we'll see how this complexity is multiplied by other, equally necessary features.

Finally, compare it with the simplicity of doing the same in an ORM, which will execute the exact same query and perform the same, all that extra code starts to look pretty meaningless.

In part two I'll take a look at dynamic ordering.

Thursday, September 19, 2013

Your checkin process is ruining your checkin process

Like most people I like to take the easy road, the path of least resistence. Fortunately most of the tools we use on a daily basis cater for that and makes life as easy as possible. One of those tools is source control, every SCM, TFS aside, makes it a straightforward process to checkout, checkin, branch and merge your code.

The requriements of an SCM (not the implimentation) are fairly modest. It has to keep track of our files, when, who and why they have changed. It has to keep a historical record of these changes for that inevitable day when someone ask "WTF was this done for?". An SCM will take care of the who and when for the most part, we just have to supply the why.

Unless your trying to find blame the why is the most important part. It's a little message to future you or future college what the reasoning behind that particular change was. This is one of the most invaluable features of source control.

Then the complicators get in, these guys are just like there architecture astronaut counterparts. The see a simple process and try to make it enterprisy. All of a sudden source control, instead of being the bedrock of the development process, becomes a management tool and prevents good practices instead of creating them.

Here are some common ways too much process can wreck your development bedrock.

"Every checkin shall hencforth require a ticket" - The Complicator

This is probably the most common for of SCM abuse. Every checkin requires a reference to a task or a bug so that every change is recorded and justified. It sounds like a good idea at the start but it has some pretty serious downsides that aren't immediately apparent.

Quite frequently we come across bugs and being the good developers we are, we like to fix things, we fix it while we're there. But then comes the paperwork, the mental overhead that we would rather not deal with. Should we create a bug for this single line fix? Should we role it into another checkin? Should we ignore it?

Creating bug ticket for a single line fix we just happened to come across is just more trouble than it's worth, we are developers after all, not bureaucrats. The 10 minutes or more to create a ticket for a 5 second is woefully inefficient.

Squeezing it in to another checkin is probably the most common result, but it undermines the value of source control and, ironically enough, the process that was put around it. The process that was meant to document the reason for the change suddenly has a random change with no apparent reason, which makes it pretty useless.

This also leads to another problem, monolithic checkins. A monolithic checkin is either more than a days work in a single checkin or several unrelated changes in a single checkin. The bigger the checkin the less likely it is that all the changes have been documented correctly. And the bigger the checkin the harder merging will be. So in this case too much checkin has undermined the value of source control.

The Inevitable Result

The final option left to the developer and probably the most common one is to ignore the problem in the first place. As I said, I follow the path of least resistance, and if your process requires more work than fixing a bug then we might just decide it's not worth while. Every body loses in this situation, the business has a worse product and the developers don't get to fix things. We like fixing things.

Sunday, March 31, 2013

NServicebus Configuration is a Nightmare

So I'm pulling my hair out at the moment trying to configure nServiceBus. I've spent a couple of days trying to get something working to no avail. The core issues seem to be the configuration and lack of documentation. My current trial and error approach is getting me nowhere.

I don't think it's too much to ask that a trivial pub/sub queue should be able to be setup in minutes, even if it's all in memory. I just want to be able to get on developing the application itself and to come back update the configuration later to something more robust.

The following rant may be completely wrong, but the documentation didn't exactly steer me in the right direction.

Why is the configuration so bad?

The main reason is because there is no single way to configure nServiceBus. The configuration is handled four different ways, the web/app config file, the fluent configuration, the declarative configuration and the "convention".

The application config method seems to be the outdated way to configure nServiceBus. Except for all those times and exception is thrown because of a missing config section. This leads you to adding config sections without knowing what they actually are, just to shut up the exception. I know this is my fault for adding stuff without realizing what it is, but I'm sure I'm not the only person to have done it.

I love the idea of fluent configuration, but the current implementation is lacking. Adding .MsmqTransport at application start up seems pretty simple right? But then you get random exceptions because of missing elements in the web.config. I think if your going to have a fluent API then it should be completely self contained, not half and half.

The declarative configuration seems to only be relevant if you are running the nServiceBus process host, but this wasn't entirely clear from the documentation. It basically involves creating a class with a marker interface and for this class to contain the configuration information. For my current project nServiceBus needs to be hosted in my own executable (to satisfy app harbor), but I'm not entirely sure if I need it or not.

Conventions are also used throughout nServiceBus, but it's not always clear what those conventions are.

And the Documentation?

The documentation isn't bad and there is quite a bit of it but some more step by step tutorials would be good. There is absolutely no documentation on how to configure nServiceBus to use ravendb, which is quite frustrating, because that's what I'm trying to use.

None of the documentation seems to tell you which version it was created with, so I never have any idea if what I'm reading is relevant or not.

The latest version of the tutorials is also completely useless as it uses the WYSIWYG editor, which for some reason won't install in my version of visual studio. Not that I want to use it anyway, I prefer the code first approach, for which there is no up to date documentation.

What I'd like?

I would love an entirely self contained fluent configuration that made it clear what was happening, something like: Configure.AsPublisher(x=> { x.Queue = myRavenQueue }) and Configure.AsSubscriber(x=> { x.Queue = myRavenQueue }).

Making it clear what is happening is important, because the current configuration options absolute do not.

For now I'm not sure what to do. Continue persevering Hard to do when I have no idea if I'm getting closer or not. Look for alternatives? Maybe Rhino service bus is nicer but I'm not sure of the future of the project. Avoid it all together? I don't particularly need a service bus right now, but less code, future scalability and built in fault tolerance are all pushing me towards one.

Sunday, January 27, 2013

App Harbor, Security and Configuration

By now we've all read the story about keys and passwords being kept in source control. It seems pretty obvious it's a bad idea but it tends to be our tools that force us down this path. At the moment this is being forced on me by app harbor and it's deployment model.

App Harbor deployment works by taking a web.config and applying the transformations in web.release.config. The problem with this is that all you configuration must be under source control, and everyone that works on the source has access to private security information.

Web.Config

As a general rule, I don't like to keep the Web.config file in source control at all and generally write an ignore rule for it. Having it in source control virtual guarantees that at some point someone will put private information in there accidentally. Nothing is more frustrating than pulling the latest version of the source and having someone overwrite all your settings.

With a traditional CI server setup I would have a config template and would generate a full Web.config based on settings in the CI server but App Harbor is very limited support for this style. This is a nice compromise, it's easy to setup a configuration for a new user or server and it keeps everyone from overwriting each others configuration files.

App Harbor however, forces you to have a Web.config file in source control. Forcing you to be just on blonde moment away from commiting private information.

Configuration Variables

App Harbor provides half a solution, it allows you to set configuration variables in the application manager interface. While this is okay for some simple key value settings, sometimes the configuration is more complex. Here is an example of a configuration section using the OAuth2 library:

<oauth2>
    <services>
      <add clientType="GoogleClient" enabled="true" clientId="00000000.apps.googleusercontent.com" clientSecret="aaaaaaaaaaaaaaaaaaa" scope="https://www.googleapis.com/auth/userinfo.profile https://www.googleapis.com/auth/userinfo.email" redirectUri="~/User/Auth" />
    </services>
</oauth2>

The only way to get such a configuration into App Harbor is to have it in a config file under source control, giving everyone access to private information.

What I'd Like

What I'd really like is rather simple. I want to store a config file in the application manager interface that will be copied into the project as part of the build. This way no configuration has to be under source control, no private information shared and simplicity remains.

Friday, January 25, 2013

How to turn away paying customers

I have a couple of recent examples of companies being unwilling to take my money.

The first is Amazon. I was working on an app and thought amazon S3 would be perfect for blob storage. After going through the most annoying, buggy sign up experience possible I was informed that Amazon, the worlds largest online retailer, is unable to process a card with a CCV.

The solution: Rackspace Cloudfiles. Which was actually closer to what I was looking for anyway.

The next one was nDepend. I've wanted this for a while and used the trial at a number of places. I finally decided to cough up and buy a legit copy. The only thing standing in my way is there reluctance to accept gmail accounts. Sorry guys, web mail is ubiquitous and you better get used to it, until then I'll just go without.

Update: As you might have seen in the comments, nDepend has some very pro active support and have enabled purchases with gmail addresses. So, I'm now a proud owner. Fortunately the current project only has one major issue but there are some others where I know nDepend will help out a lot.