Adrian Hofman's Blog

Archived brain dumps!

Debuggerless Development

As a teenager I taught myself C and C++ programming from a handful of online tutorials, starting out with nothing but notepad.exe and an ancient compiler (djgpp, if memory serves). In those heady days an IDE or a debugger was a totally alien concept to me. The only way to verify my code worked was to compile it, link it, run it and look at the output. If it didn’t do what I expected, I added extra printf statements that explicitly told me what certain code paths were doing and what the values of certain variables were. Little did I know at the time that I was engaging in a debugging technique I now call “printf debugging”.

Fast forward two decades. I now code in C# every day. I’ve used the Visual Studio debugger every day for years, and I know it inside out. Yet I now find myself forming a habit of abandoning the debugger and relying solely on log messages for debugging – just like the printf style debugging I did as a kid. What’s going on here?

For context: I’m developing a multi-tenanted SaaS product on Azure Service Fabric. We have a team of Site Reliability Engineers (SREs) whose job it is to deploy the product and keep it running smoothly. When there are issues in production, there is no way that our SREs are going to attach a debugger to a running service to try to diagnose it – the impacts that could have on the service are too great to justify. With an attached debugger you can do pretty much anything to the running process and its memory, with no safeguards in place to prevent you from doing potentially destructive things. Because this is SaaS, if you did do a potentially destructive thing, you could be affecting a significant portion of your user base – hence, the safest thing to do is just never use a debugger in production. Instead, our SREs rely on log messages emitted from the application for diagnosis.

Of course, Service Fabric has a 1 node mode intended for local development, and it is indeed very easy to use the trusty debugger in this scenario. However, to do so would not be fair. Why? Because our SREs and Support staff can not diagnose the application in this way. They are working with the multi-node cluster in production that they can’t attach a debugger to. So, if they can’t use a debugger, I won’t use one either. By abandoning the debugger altogether, and relying solely on log messages for diagnosis, I am ensuring that the systems I’m building can be supported and maintained by our SREs and support staff. If I cheated and used the debugger, I might be forgetting to put enough information in my log mesages, and the SREs and support staff would have little recourse other than to escalate bugs and issues back to… me! Which I would prefer to avoid where possible ūüôā

Naturally there are still going to be the odd occasion where I’ll need to use the debugger again – memory corruption, for example – but these problems are few and far between. For bread and butter debugging of code paths, it’s log messages only for the forseeable future.

Piggybacking off the “Serverless Architecture” trend, I’m going to call this “Debuggerless Development” – let’s see if it catches on ūüôā

May 15, 2017 Posted by | Programming, Web | Leave a Comment

It’s OK To Retry Failed Tests

A firmly held belief in the automated testing community is that automatically retrying a failing test is just plain bad. In this post I will explore the idea of retrying a failing test, and identify a few cases where retries are actually a very sensible thing to do.

Let’s consider three different categories of tests – unit tests, API tests and UI tests.

Unit Tests

For unit tests, I completely agree. A unit test is totally isolated and not distributed in any way. If it needs to talk to an external component then it no longer falls under the definition of a unit test. Unit tests are entirely in-process and are not subject to the fallacies of distributed computing. Therefore they should be reliable and deterministic. There’s no good¬†reason for a unit test to fail occasionally – yes, even unit tests that exercise multi-threaded or asynchronous code. Following this line of reasoning, it doesn’t make any sense to add a retry mechanism to unit tests, as this would hide an underlying issue that could cause other problems later on. Rather, the underlying cause of the occasional failures should be identified and fixed.

API Tests

API tests are a very different story altogether. These kinds of tests stand in contrast to unit tests in that the test always talking to an external component Рthe web service API! The test is communicating to the API via a network, and is therefore subject to the fallacies of distributed computing Рthe most pertinent point being the network is not reliable. The test could fail for reasons that are beyond the control of the system under test, or the test itself. Network congestion could lead to packet loss which would ultimately lead to timeouts. In these cases the only sensible thing to do is wait some small period of time  Рa few seconds maybe Рand just try the test again.

Note that it’s important to only retry when a specific class of errors are encountered. Retrying a test if it fails for any reason is not appropriate – if an API returns HTTP 500 when it should be HTTP 200, that’s a bug in the API and the test should immediately fail. The retries should only kick in for timeouts, or any other transient failure.

Granted, in most test environments, network timeouts are unlikely. The tests often run in the same LAN as the system under test – sometimes they’re even on the same physical machine. However, network issues can never be ruled out completely, and any robust test suite should account for them.

UI Tests

UI tests are similar to API tests in that¬†the test is always communicating to an external component, but rather than communicating to an API via some protocol like HTTP over TCP, the test is communicating to a UI via some other protocol, like WebDriver, or perhaps a proprietary communication mechanism. The test is pretty much always talking to a UI on the same physical machine, but it is definitely not reliable. UI tests are notoriously flaky, and a big contributor to this flakiness is that UI testing tools have a hard time coping with today’s sophisticated UI’s. Testing tools frequently experience timeouts that cause tests to fail, even when there is nothing wrong with the test code, or the system under test.

WebDriver timeouts aside – testing tools also suffer from bugs of their own! I once encountered a bug in a UI test framework whereby about 10% of the time the mouse would click on the wrong element. It was pretty random, very flaky, and totally outside of my control – my test code, and my system under test, were both fine and working as expected – the UI test framework was letting me down.

Another interesting aspect of UI testing is that many web applications accept the fact that the web is a flaky place to be. How many times have you seen an error message on a web page along the lines of¬†“Sorry, something went wrong. Try reloading the page”? I’ve seen error messages like this on YouTube, GitHub, Azure, Visual Studio Online, etc. Web applications like this have to display error messages like these sometimes as their backend services are highly distributed and they can’t get away from the occasional transient failure. If your UI is telling users to just reload the page, shouldn’t your UI tests be doing the same thing – just retry?

 

 

September 10, 2016 Posted by | Automated Testing, Programming, Web | , , | Leave a Comment

SharePoint Online Recycle Bin Caveats

Information retention – it’s a key concern in any information architecture. Organizations that embrace SharePoint Online as a CMS must have a strategy in place to ensure data is kept for the duration of its retention schedule.

SharePoint Online supports a Recycle Bin feature that provides users with a level of¬†safety when deleting their content. There are a plethora of excellent articles that cover the basics of how the Recycle Bin works, so I won’t cover those topics here. Instead, in this post¬†I will explore one of the main pitfalls of the Recycle Bin that any information architect¬†should be aware of.

At first glance the Recycle Bin may appear to cover¬†all deleted content. When an end-user tries to delete content via the SharePoint user interface, they can only send the item to the Recycle Bin – they¬†are¬†not given the option to delete it outright. Indeed,¬†the SharePoint Recycle Bin is sometimes touted as a “catch-all” safety net for deleted content. You would be forgiven for believing that if the Recycle Bin is enabled on a Site Collection it will catch any deleted content, regardless of who deleted it and how.

While this is a comforting position it is unfortunately not true: there are several common scenarios where content can be permanently deleted without going to the Recycle Bin at all. This caveat has some important implications for the information architecture of any organisation, particularly those whose requirements emphasize data retention.

How?

To explain, let’s take a closer look under the hood by examining the ListItem class in the Client Side Object Model. We can see in the reference documentation that it supports a Recycle() method and a Delete() method. The Recycle() method will send the list item to the first stage of the Recycle Bin. In contrast, the Delete() method deletes the list item permanently – it does not go to the Recycle Bin, not even the second stage Recycle Bin. It’s gone and lost forever.

This means that any add-in installed on your Site Collection that has delete permissions could potentially be deleting things permanently, without ever touching the Recycle Bin, by calling the Delete() method instead of the Recycle() method.

The same methods are supported in Powershell – so, a power user or administrator can delete content via the Delete method, again bypassing the Recycle Bin.

Finally, content may also be deleted by a retention policy. A policy can be put in place on a Site Collection that takes some disposal action on content after its retention period has expired. The policy can be configured to send content to the Recycle Bin, or it can just delete it permanently – if the latter option is used, the content never touches the Recycle Bin.

Can Permissions Help?

SharePoint permissions do not distinguish between the Delete and Recycle operations – the “Delete” permission encompasses both. ¬†Any user with a delete permission granted on a list item can either Recycle or Delete that list item. There’s no way to only allow a user to Recycle and not Delete, or vice versa.

What is the Recycle Bin Really For?

The primary use case for the Recycle Bin is for end-users managing their own content via the user interface.¬†The user interface¬†is the only avenue for deleting content where you aren’t given a choice – you can’t delete things permanently, you can only send them to the Recycle Bin. It’s there so that if a user deletes a document by accident they have an easy way to get it back – there’s no need to call an administrator and ask to restore from a backup.

Administrators and power users are given more control over how to delete content. Administrators have legitimate reasons to need this control Рthey may need to manage the disk usage for a site, for example.  Naturally, with great power comes great responsibility, and special care must be taken.

August 28, 2016 Posted by | SharePoint | | Leave a Comment

SharePoint Online, Azure AD and Remote Event Recievers

A few years ago, Microsoft added support for Azure AD Application Permissions for SharePoint Online. This allows a headless app – i.e. a background task – to access resources in a SharePoint Online tenant, using Azure AD as an identity provider for the app.

This stands as an alternative to the incumbent approach: using an add-in principal in SharePoint Online, with app-only permissions.

Using Azure AD for app authentication to SharePoint Online is a great option for a few reasons:
– The app can be onboarded using the awesome consent flow provided by Azure AD;
– The app must present a certificate to Azure AD to acquire an access token, providing an extra level of security over the old approach, which just relies on a client secret.

Now, permit me a brief digression to explain Remote Event Recievers.

Remote Event Recievers (RERs) are a feature of the SharePoint Online Add-in model that allow a remote add-in component to receive SOAP messages when certain events happen in SharePoint Online. You would be forgiven for thinking that RERs will be entirely supplanted by webhooks – but this is not the case. RERs support one key feature that webhooks do not.

Webhooks are notifications of events that have already happened; the add-in that is receiving the events can’t¬†affect the behavior of the event in any way. RERs on the other hand allow the add-in to prevent events from completing – for example, they can cancel a deleting event to prevent users from deleting content in SharePoint (this happens to be a key compliance requirement at RecordPoint!)

The sting in the tail is that RERs don’t seem to work with Azure AD app authentication. While there doesn’t seem to be any official word on whether this scenario is supported, I haven’t been able to get it to work in my tests, and I’m not the only one.

This intuitively makes sense when we consider how RERs authenticate to their receiver.

When a RER is registered using the add-in model, RERs authenticate using the add-in principal. They can use the client id and the client secret defined in the add-in principal, because the client id and client secret were entered directly in to SharePoint Online via appregnew.aspx. They can use the client id and client secret to obtain an app-only token, which is passed to the event receiver. The event receiver can then use the token to authenticate the call.

The Azure AD case is different though. The client id and client certificate are entered in to Azure AD, not SharePoint Online. SharePoint Online captures the client id during the consent flow, but there is no way that it can get the certificate – Azure AD knows the certificate is a secret, and it isn’t sharing it with anyone. SharePoint Online will therefore not have enough information to authenticate using the Azure AD app identity, and therefore can’t call an event receiver.

This is my theory, based on hours of research and testing. I’d love for a Microsoft insider to give a definitive answer on whether RERs are supported when using Azure AD app authentication!

August 5, 2016 Posted by | SharePoint | Leave a Comment

High School Bully

I knew him in a very different way to how most people knew him. As a matter of fact, I barely knew him at all. All I knew of him was his name, that he went to the same high school as me, and that he disliked me intensely.

I don’t really remember exactly how it started – all I recall is one day there was this aggressive guy that I was trying to get away from. He kept accusing me of “cheeking” him – but I had no idea what he was talking about, I’d never spoken to him before. It happened once, then twice, then again and again, day after day. He’d follow me as I walked home from school. He hurled abuse, verbal and physical. He’d push me around and rough me up – always in the body, where the bruises wouldn’t show. He’d wave a cricket bat around my head and threaten to put me in hospital. He was ruthless – he didn’t stop even when I was crying, lying face down on the ground, pleading for him to leave me alone.

I never pushed back in any way, I always just tried to get away as quickly as possible. Even though I was older, I was small, scrawny, awkward and scared. I was no match.

I started to avoid him. I learned and memorised his school timetable so that I knew where he was – where was not safe for me to be. I snuck out of 6th period early so I could get a head start on walking home, so he couldn’t catch me. Other times I walked home in the total opposite direction, adding 40 minutes to the trip, just to avoid him. I was on high alert all the time. I couldn’t allow myself to get engrossed in work or conversation – instead I kept watch for him, so I could make a quick getaway if necessary. I was constantly in flight mode. Naturally other aspects of my life deteriorated – schoolwork, relationships and the like. My life revolved around him – knowing where he was and making sure I was somewhere else.

This went on for two years, and ended only when I finished high school. I lived in constant fear for two years. I was ashamed – I believed at the time that it was my fault and that I deserved it, even though I never really understood why it was happening. I didn’t tell a soul, out of fear and shame – not my parents, nor my brother, nor my friends, nor my teachers.

Reflecting on this one day, I decided I wanted closure. I looked him up on Facebook. Maybe we could meet up and talk about it – maybe I could understand why. Who knows, maybe we could have a beer together?

It turns out he passed away in a road accident in 2010. He was 25 years old.

There is a tribute page on Facebook filled with messages of love and mourning from his many friends and family. Photographs show a large number of people at his funeral. He was loved my many and missed by all.

Many people have obviously lost something with his tragic passing.
I have lost something too – I’ll never be able to ask him why. Now my only reasonable attempt at catharsis is to write this…

December 8, 2015 Posted by | Uncategorized | , | Leave a Comment

SharePoint: Concurrency and CheckOuts

If you keep coding for long enough, eventually you will learn the ways of writing thread-safe code. It’s a journey every programmer takes at some point: you have some single-threaded code that isn’t meeting performance expectations, so you try to run it on multiple threads. You discover that there is some data that is being updated by many threads at the same time, and you’re getting incorrect results as a consequence. To deal with it, you’ll probably end up putting some kind of locking mechanism in place to ensure that your shared data is being updated in a consistent fashion.

If you’re a .NET developer you’ll probably end up using the lock keyword. Using the canonical example of withdrawing from a bank account,¬†your code might look a little bit like this:

void WithdrawMoneyFromBankAccount(double withdrawalAmount)
{
   lock(_syncObject)
   {
      if ( _bankBalance - withdrawalAmount < 0 )
         throw new OverdrawnException();
      _bankBalance -= withdrawalAmount;
   }
}

Given that locking is a well understood idiom for maintaining consistent data, you’d be forgiven for applying the same concept to your SharePoint list data. However, upon closer inspection, it doesn’t make sense. I’ll explain why.

SharePoint provides a checkout feature for documents in a document library: a user may check out a document, which prevents any other user from editing that document. The document remains checked out to the user until it is checked in or the checkout is discarded. Because the checkout prevents other users from modifying the document, it may seem that checkouts are a good mechanism for synchronising access to document data from your code.

Consider some code (using the Server Object Model) that looks like this:

void WithdrawMoneyFromBankAccount(double withdrawalAmount)
{
   SPFile fileWithAccountBalance = GetFile();
   fileWithAccountBalance.CheckOut();
   try 
   {
      // ... get the file contents, do the withdrawal operation on them,
      // ... serialize the data back into a byte array called myNewFileContent
      
      fileWithAccountBalance.SaveBinary( myNewFileContent );
      fileWithAccountBalance.CheckIn();
   }
   catch 
   {
      fileWithAccountBalance.UndoCheckOut();
      throw;
   }

}

What’s wrong with this, you ask? The CheckOut, CheckIn¬† and UndoCheckOut calls are totally unnecessary. They’re just making a few extra round-trips to the database and slowing you down.

What should we do instead? How do we ensure that the data is updated in a consistent fashion? Simple: SharePoint implements optimistic concurrency under the hood. To explain (very briefly) how optimistic concurrency works: instead of locking to prevent other threads from reading data during a transaction, you check for other updates to the data before writing your version of it. If someone else has updated the data you were working on, you stop what you were doing, get the latest version of the data and start again.

In this example, the call to SaveBinary will throw if the SPFile object is “stale”. When this happens, simply retry the transaction. Here’s an example that does this using Polly for the retry logic.

void WithdrawMoneyFromBankAccount(double withdrawalAmount)
{  
   var policy = Policy
              .Handle<SPException>()
              .Retry(10);

   policy.Execute(() => 
      {
         // Get the most up-to-date version of the file from the database
         SPFile fileWithAccountBalance = GetFile();
   
         // ... get the file contents, do the withdrawal operation on them,
         // ... serialize the data back into a byte array called myNewFileContent

         // Try to save it. If another thread has modified the file
         // since we called GetFile() above, SaveBinary will throw an 
         // SPException, and the Polly retry policy above will retry this
         // transaction for us.
         fileWithAccountBalance.SaveBinary( myNewFileContent );
      });
}

Of course, if you needed CheckOuts for a different reason – e.g., maybe you really need the check in comment for something. If that’s the case,¬†then by all means use them. However, if you’re just using CheckOuts for implementing pessimistic locking, I’d strongly recommend not doing that and embracing the optimistic concurrency model built in to SharePoint.

 

August 11, 2015 Posted by | Programming, SharePoint | | Leave a Comment

Schema, Type and NServiceBus

If you’ve ever worked with a Service Oriented Architecture, you will almost certainly have come across the following maxim:

“Services share Schema and Contract, not Type and Class”.

“Schema and Contract” refers to the structure of data. A schema is totally ignorant of platform or runtime. A schema is typically described with some kind of markup, like XML or JSON.

“Type and Class” refers to a physical data type. A type implies a certain language or runtime, e.g, a .NET class. A set of types can be an implementation of a schema for a given platform.

“Schema and Contract” is interface. “Type and Class” is implementation.

This brings us back to a basic old-school principle of software maintainability: program to an interface, not an implementation. Consume other components only through a well defined interface . Don’t depend on implementation details, because they might need to change. If its not in the interface, it’s none of your business.

Now, I love NServiceBus. I really really do. Its abstractions are drawn just at the right level – it takes care of all the stuff I don’t want to think too hard about (guaranteed once only delivery, for example), and doesn’t get in the way of me trying to just get the job done. It has a lot of awesomeness that help me deliver more business value in a shorter time, with fewer defects – doesn’t get much better than that. Of course, there’s a non-trivial learning curve associated with it, and it takes a bit of time to get accustomed to “the NServiceBus way”. I must admit this was a long journey for me, and there were a few things that took me a while to come to terms with.

One such thing is that NServiceBus, at first blush, appears to disobey the golden rule of SOA given above, and tells developers that endpoints should share types. The suggested setup is to have an assembly with all message types in it. This assembly is then deployed at all endpoints that need to send, publish or handle these messages. That’s not the only way to do it: there is a way to generate XSD’s that describe the schema of an endpoint, but this option appears to be less documented and less popular.

If you read enough of the NServiceBus documentation, and hang around the message boards long enough, it will eventually become clear that the message types are to be treated as a contract. Fair enough, I can deal with that. You’re still tied to an implementation though (.NET), and you’ll still run aground of some problems related to that implementation. Trying to deploy different versions of NServiceBus at different endpoints? You need to use unobtrusive mode to get that to work. Trying to interoperate with something that isn’t .NET? You need to do stuff to get that to work. To the credit of the Particular developers, they have well documented solutions to these problems.

The advantage that I see to the type-sharing setup is that it makes NServiceBus really easy to pick up for developers of any level of experience. You only need to know some basic C# and you’re up and running. I guess it’s easier to use a C# class than it is to use an XSD or something similar. As you go along you’ll eventually hit a problem related to sharing types (like message versioning, interoperation with another platform, or using different NSB versions at different endpoints), all of which have well documented workarounds. That’s the journey that the Particular software people have selected for you, and it makes a lot of sense for them: if it’s easy to adopt, people will adopt it.

However – what if we took the other road by default? What if we provide an XSD from our endpoints that describes the message schema? Consumers can use a T4 or something similar to generate their own copy of the message classes. The T4 can generate whatever is neccessary for the concerns of that endpoint – UI concerns, validation, whatever implementation details are required for that endpoint. The XSD becomes the “Schema and Contract” that is shared between endpoints, and the generated message classes become the “Class and Type” that are not shared, and are instead true implementation details of the endpoint.

This approach gives us the following:

  • “Schema and Contract” is an XSD; “Type and Class” is a .NET type. The concepts map together very cleanly. If I were picking up an SOA framework that used this approach for the first time, I think I would “get” this much more so than if I were trying to understand that a .NET class is actually a service contract. That’s just me, though.
  • Versioning of messages still works. Make non-breaking changes to your XSD (i.e., only add new optional elements) and you’re just fine.
  • Every endpoint is interoperable with any XML-aware platform by default.
  • Using different versions of NServiceBus at different endpoints “just works”. There’s no need for unobtrusive mode.

Food for thought.

April 4, 2014 Posted by | Programming | , , , | Leave a Comment

Agile Anti-Pattern: Aiming For the Moon

W. Clement Stone gives the inspiring advice to

“Always aim for the moon. Even if you miss, you’ll land among the stars”.

Set your sights really high, and have lofty, almost achievable goals. There’s an off chance that you might meet your goals and be enjoying success that was previously unimaginable. But even if you don’t achieve what you set out to do, you’ll still have achieved something pretty awesome, or so the theory goes.

This seems to make a lot of sense in the business world. It definitely seemed to work out for Mr. Stone.

Unfortunately, I haven’t seen this kind of thinking work out terribly well in the trenches of agile software development, particularly at the beginning of brand new projects.

I’ve seen projects that start out with ambitious goals – more features than the old system, faster than the old system, better UX than the old system, and zero defects (yes, zero!). The team is really excited – it’s the beginning of a new project and there’s a feeling that anything is possible! The buzz is palpable as each developer commits to doing the best work of their career. There are the odd voices of concern that ask “can we really achieve this?”, but these concerns are quickly assuaged with a rousing speech built around Stone’s quote above. “It doesn’t really matter if we don’t hit our goal of perfection, because we’ll still have something great”. Sounds awesome. However, as the project continued, the progress was disappointing – in fact, the progress over several months was close to zero.¬†Projects like this definitely don’t hit the moon, but they don’t land among the stars either. Unfortunately, they barely get a few centimeters off the ground.¬†What’s happening here?

An agile process can suffer a lot of damage if the mentality of setting lofty goals is allowed to seep into the formation of a user story. When goals are set very loftily, the acceptance criteria and all-important definition of done starts to move into unmanageable territory. The amount of effort required to complete a user story to this high standard increases, and the scope of the story increases accordingly. The stories become very large, with a significant amount of uncertainty attached to them. The bigger the stories grow, the lower the possibility that they can be completed within a single iteration – and you end up with one or more iterations passing with zero user stories complete. Sure, there is work being done, but it’s not tested, and there’s no visibility on how the project is tracking, and no good way to tell how far away you are from being able to deliver.

What should we do instead? For that first iteration, aim really low. Aim for software that only barely functions, but is testable. You can always improve it in future iterations. Instead of aiming for the moon with the intention of landing somewhere in the stars, aim to get your little Wright brothers flyer off the ground for a few seconds at a time with the intention of not crashing and killing yourself. Then and only then should you attempt to go higher.

Remember the INVEST mnemonic, especially the S: Small! If your story is huge, you’re doing it wrong. Break it down to a few days work at most. When your developers complain that a big task can’t be broken down, try running through some pseudo-code with them – the boundaries can sometimes present themselves pretty clearly on the whiteboard.

Strictly define the scope of each story with acceptance criteria. It’s just as important to define what’s out of scope, as well as in scope. Remember, we developers just love to tinker and are sometimes prone to wandering off into the weeds, so¬†keep an eye on them at each stand up to make sure they’re not working on stuff that is out of scope.

In the trenches of agile software development, it’s best to keep your goals small, sensible and achievable. Leave your aspirational goals to a higher level.

 

March 18, 2014 Posted by | Uncategorized | Leave a Comment

Just (Spend Ages Getting It To Work And Then) Mock

We recently moved away from the awesome RhinoMocks¬†library and started using Telerik’s JustMock instead. Why? Because JustMock, according to its sales pitch, allows you to mock absolutely anything – non virtual methods, sealed classes, etc. While it does sound cool, I can’t shake that gut feeling that says “if you’re trying to mock something that’s not an interface, you’re doing it wrong”.

I’m not going to mention all the special stuff you have to do to get JustMock to work on your workstation, and then to integrate it with your automated build – lots of other people seem to have covered those bases already. What I will mention here is one little gotcha: the JustMock profiler makes things slow – really slow.

So, if you’re experiencing these symptoms:

  • Just about any operation in Visual Studio 2012 is taking a really really long time, and
  • Task Manager shows the CPU usage of NuGet.exe taking up an entire core

Try disabling the JustMock profiler (from the top menu bar of VS2012, “JustMock” -> Disable Profiler). Oh look at that, they made a shortcut key for it.

 

August 6, 2013 Posted by | Programming | , , | Leave a Comment

Anti-Pattern: The Race Car With Leather Interior

Consider an Audi A7, or a Mercedes Benz E-Class. These are what some people call “executive cars”, presumably because they carry a certain degree of prestige. They’re nice cars to drive: very comfortable, very smooth. In fact, one of their primary design concerns is comfort. Nobody is going to buy a car that hurts them while they drive it.

These cars are designed for use by the masses. They are designed for use by people who drive regularly, but who have had relatively little in the way of driver training. While the average driver of these cars can accelerate, brake and change gears, they probably do not possess advanced driving skills that would allow them to recover from a spin. You could say that these cars are designed for amateur drivers.

On the other hand, consider a Formula Ford. I once had the great pleasure of driving one of these cars on a closed racing circuit. It was an exhilarating experience, one that I will remember for a very long time. The car was fast, nimble and felt like it was glued to the race track. It was also very stiff, bumpy and somewhat uncomfortable. It required quite a bit of concentration and energy to drive, considerably more than what is required to drive one of the comfy road cars I mentioned earlier. The cockpit was very snug, it was difficult to get in and out of the car. I found it curious that there was absolutely no instrumentation whatsoever – no speedometer, no tachometer, not even a shift light – nothing but a steering wheel. I was also surprised by the gear stick – it was a little stub, the size and shape of a pen, suspended by two rods running through the cockpit. There was no H pattern, nor was there any kind of labeling to show you where to find the gears. You just had to trust what the instructor was telling you – “put it straight forward for first, second is down from there, third is back up and to the right, fourth is down and to the right. Don’t put it to the left, that’s reverse.” Once you were in gear, there was no way to tell what gear you were in, other than making a guess based on your speed and revs – which you also had to guess, because there were no dials to give you this information!

Of course, comfort and style is not a concern in the design of a race car. The only goal of a race car is to go as fast as possible over a complete race distance. If the driver is exerted, it’s a secondary concern. The driver is skilled and experienced enough to be able to drive the car quickly despite the cramped cockpit and total lack of instruments. He’s trained his physical fitness to the point that he is able to deliver the level of exertion necessary to race over a full distance. He’s memorized the positions of the gears and he can shift gears based on feel alone; he doesn’t need an H pattern or gear labels. The Formula Ford is designed for professional drivers – people who have spent countless hours honing advanced driving skills.

As software developers, our comfort is of secondary concern. It’s our job to deliver working software, and if we’re a little bit uncomfortable or exerted in the process, it doesn’t really matter all that much. As we work to deliver software, it’s actually not that important if we don’t have all the right information conveniently delivered to us in exactly the way we would like. It’s not that big a deal if our processes aren’t exhaustively documented. Certainly these things are nice to have, and if you do have them, great. But it is a mistake to allow these things to distract you from the real task at hand: delivering working software to users.

For example: Harry* is an award-winning employee and one of the most respected software developers in the company. Harry is a very thorough and detail oriented programmer, and because of this his work rarely contains bugs. While Harry is an excellent developer, the company’s analysts often have a hard time with him, because Harry always complains that he doesn’t have any defined business rules for the features he’s working on.

Of course, the analysts disagree. “There’s a use case here, a few stories there, a UI mockup over here… all the business rules are in these documents!” they cry.

“No, that’s not what I want”, says Harry. “I want all the business rules in one catalog.”

Harry’s original assertion that he doesn’t have any business rules was not quite correct – he did¬†have business rules, but they just weren’t presented to him in the format that he wanted. Harry has many good reasons why having a single business rules catalog makes sense. However, Harry is making the mistake of letting the poor presentation of information distract him from his primary goal. Instead of focusing primarily on delivering working software, Harry is focusing primarily on changing the way others give him information. Instead of focusing on winning races, Harry is focusing on getting a nice sporty dashboard.

Another example: something I frequently hear professional developers moan about is merging checkins between branches. When I ask why a developer doesn’t like merging, the response is always along the lines of one of these two:

  • “My merge tool gets confused and lines up the diff all wrong!” Merge tools aren’t perfect, that’s why merging is a part of your job. Certainly the merge tool makes things easier, but perhaps it’s a little bit unrealistic to expect it to do the entire job of merging for you. Just like the professional race car driver knows what gear he is in without any labels or instruments, the professional developer should know how to line up a diff by himself, in case the merge tool gets it wrong.
  • “Sometimes there are merge conflicts and they’re hard to resolve!” This is like the professional race car driver complaining that he had a difficult race because there was another car in front of him. Overtaking a competing car is a difficult part of racing, but you simply cannot be a racing driver without the skill to overtake another car. Similarly, resolving a merge conflict is not trivial (but certainly not all that difficult either), but it is a sign of a healthy, collaborative development team. Being able to resolve these conflicts is one of the most crucial parts of maintaining that collaborative atmosphere. It is an essential skill for any professional software developer.

I’ve frequently heard developers suggest that they should not be the ones to have to do merges; or that we should take locks on files in source control, so that two people can’t modify the same thing at the same time and then merge conflicts wouldn’t happen; or that we should get a better merge tool that will somehow fix everything for us. This is the anti-pattern: what these developers are asking for is akin to the professional race car driver asking for his car to have a leather interior, a sporty dash with lots of dials and flashing lights, and a wood finished H pattern for his gear lever. It is an anti-pattern because none of these things are critically important in achieving your goal. They’re nice-to-have, but not what you should be focusing on.

Remember, we are not amateur software developers. We are not just driving the Audi A7 with the family on a Sunday afternoon. We are professional developers, whose job it is to deliver working software. We are driving the Formula Ford; we’re sweating it out in a hot, stuffy, cramped cockpit. But none of that matters, because we’re 100% focused on winning the race.

 

July 24, 2013 Posted by | Programming | 1 Comment