Inheritance is Evil

This is something one of my previous mentors used to say all the time. I love the way he puts such a complex principle in those simple three words. Inheritance. Is. Evil. It is funny because this is one of the first things most schools will teach you when they talk about Object Oriented Programming. What is inheritance? What can you do with it? Why should you use it? These are all the topics that your course will cover, but “why is it evil” is definitely not one of them.

Kinds of Inheritance

Most cases of inheritance in OOP can be separated into two groups. You have data (or structural) inheritance and you have behavior inheritance.

You use data inheritance when you create a set of entities, plain old C/Java/Whatever objects, that inherits from each others. You probably remember the classic library example where you have a base Media class and a Book, Movie and Song class as its decedents. All of those items shares the part of their data. They all have a unique id within your library, a title, an author and publication date. All of this data is inherited from the base Media class.

The other kind of inheritance is generally introduced at the same time as polymorphism. Behavior inheritance happens when you share common behavior through a base class. For instance, you could have a Plugin class that provides a set of basic behaviors for a plugin to your application like getting the version number of the application or injecting their own UI elements into your application. Other developers can inherit from this class to hook their own code and thus adding features to the host software. In this case, all plugins will share the same basic behavior and add their own to it through hooks provided by virtual methods.


What’s so bad then? This sounds pretty useful but there is a catch. Those are perfect, book-worthy example. In real live, things are rarely black and white. Your application’s structure will change and you will miss things along the way, Because of this, things can get pretty funky when you strongly depend on inheritance. This is why some APIs that seems very well thought, even in modern systems like the WPF framework in C#, can some times be a pain to work with.

When things are not what they seem to be

Let’s take a quick look at data inheritance first. Lets say you are dealing with a simple drawing application. You might have a set of tools to draw basic shapes. You might want to support squares, rectangles, circles and lines. You can already see the hierarchy appear in your head. There is a Shape base class with all shapes inheriting from it.

Inheritance is Evil - Basic Shapes

Then, you remember from an old geometry class that squares are just rectangles with both sides sharing the same length.You go on ahead an change the structure so that the Square class inherit from the Rectangle class because, after all, a Square IS A Rectangle. You release your application and a large amount of people loves it.

Then, some users starts on complaining about files being large and the app being slow. In your investigation, you notice that the Square class waste a lot of space in memory because it stores the same data twice. After all, since you are reusing the Rectangle data structure, you have to fill both the Width and Height property all the time with the same value which effectively doubles the amount of memory required to store a Square.

Inheritance is Evil - Square is a Rectangle

You think about it a little and you decide to reverse this hierarchy. By having the Rectangle being a Square, you can have the square only define the Width property and the Rectangle define the Height.

Inheritance is Evil - Rectangle is a Square

This is definitely a solution that could work, but you would end up with two new and more subtle problems on your hands. First, you will probably find it a little awkward to use the Width name in squares. This is specially true if you are dealing with perimeter calculations or when you will need to pass it as a y coordinate. If someone read your code, they will probably think you made a copy/paste mistake. You will probably want to use a different name like Size but that would cause a similar issue when dealing with Rectangles. The bigger problem, though, comes from broken semantics. Every one who learned that a square is a rectangle will undergo the same thought  process as you originally did. Unless they had the same problem before, they will all expect the Rectangle to inherit from the Square class.

Dead end.

This is why data inheritance is evil. It is really tempting to match virtual objects to real world object, but you shouldn’t always choose that path. It is also very tempting to optimize your data structure by going on more than two levels, but this often introduce semantic issues and adds to the complexity of your code. The best solution in this case is to keep Square and Rectangle as separate entities. This ensures that Rectangle and its related code will never change because Square do which is one of the SOLID principles you should admire and love. It also better matches the semantics of your application as you will probably always display both objects side by side, grouped as shapes.

It is important to note here that the point is not to avoid data inheritance at all cost. Data inheritance is a very useful tool that can, and will help you accomplish your tasks. The risk in using it is that trying to map virtual objects to real world objects is very tempting and often the wrong path to take. Do not over-think your inheritance hierarchy. Keep your objects as close as possible to your problem’s domain and everything will be fine.

Still, this isn’t exactly evil…

Leave it to evolution

Now that this is out of the way, let’s look at what can go wrong when dealing with behavior inheritance. Again, we are dealing with a very powerful tool that is often misused and misunderstood. We do not need to look very far to find issues with it. Imagine that you have a set of classes that can be serialized. All objects will have a Serialize method that should give out a text-based representation of the object. For these classes, you might have decided to output a JSON string. Your objects are simple so you also decided to do it by hand.

class Root
    private int _foo;
    private string _bar;

    public virtual string Serialize()
        return "{ \"foo\" : " + _foo + ", \"bar\" : " + _bar + " }";

That looks good. It could be made simpler but that is not the point. If you wanted to create a class that inherited from this one, what would you do? You will probably want to add your own properties to it. You will have to override the Serialize method in that case. Because it returns a specifically formatted string, you cannot just append the new properties at the end. You have to deal with a closing bracket there. You can’t really rebuild the entire string because some of the base class properties are private. Your only solution is to manually find where you need to inject your properties. This can be really tedious, specially if your deserializer impose an order on the property list*.

Even if you managed to make it work, how would you test it? You could easily write a mock class to test the base class behavior, but how would you test a specialized class without bloating your tests with the base class behavior all the time? It is simply impossible. For the same reason, it is impossible to make the derived class a true SOLID entity. Depending on how you will build your serialization code, you will either end up terribly inefficient or you will assume a lot of things about the base class’ value which will break your code if it ever changes,

Dead end… Again.

The biggest problem with behavior inheritance is that it nearly never works. Even if your architecture stays stable for a long time and everything seems alright, something is going to happen that will throw it off. A classic is the new async framework in .NET 4.5. How are you supposed to await a method from a virtual method that is not async? You could block until it returns, which really is a bad idea, or you could fire and forget it, which is an even worst idea! in there end, you simply cannot turn that method into an async one because of your base class’ limitations. If we go back to our plugin example from earlier, this means that the host application needs to provide both a synchronous and an asynchronous version of the Plugin class. Even then, if async is only needed for a single method, turning all methods into async ones will be inefficient.

Is there an alternative?

What should you do if you encounter one of those problems? The solution is simple. Do not use inheritance. Or at least, do not use behavior inheritance and keep data inheritance to a minimum. Try to use other techniques like composition and delegation instead. Make sure that your classes can talk to each other and work together instead of overriding each other’s behaviors. Write cooperative code instead of dictatorial code. Use interfaces and dependency injection. Deal with micro services. These simple tricks will help you solve all the problems we had when using behavior inheritance.

You could easily handle the last example using plain old c objects and services. No need for inheritance. Yes, you will end up with a couple more classes, but who cares. Your code will be testable, both ways, it will be extensible, it will be simple and each classes will have a single responsibility. Basically, it will be SOLID.

That’s all I have for you today folks! Read ya later!

*  Yes, a very popular one that I will not name will only look for properties in alphabetical order. Trying to deserialize A, C, B, D will simply cause B to be skipped.


An Alternate Take On: ‘Interface’ Considered Harmful

I usually agree with Uncle Bob’s opinions quite easily. I even built this whole blog inspired by his own and his book: CleanCode. –You should definitely read them, there are real mind openers. Today, though, he posted an article that looked unusual; even strange from a clean code advocate point of view. Today’s article was about interfaces and why the ‘interface‘ keyword should be considered harmful.

I really liked reading it. It is an interesting way to view this language feature, but I think it is also a little twisted. The article, written as a discussion between two coders, starts with a simple question, “What do you think of interfaces?”, and proceed to explore what they are, what is inheritance and multiple inheritance and then finally demonstrates why the explicit interface language feature is actually a bad thing.

I agree with him. If interfaces where created to “solve” the diamond inheritance ambiguity problem, because it just cannot happen anymore, then yes, it is obviously a bad solution. Multiple inheritance is a wonderful tool when used correctly and excluding it from a language just because there can be ambiguities is definitely not a reasonable solution.  That would be like saying that we should not use cars because fatal accidents can happen when driving them. It would set us back years in the past to not have cars and I believe that the lack of multiple inheritance in many languages can give that same feeling of living in the past.

Something’s not right here…

On a different note, I also think that he has overlooked something very important in this discussion: semantics. In his examples, he mentions that abstract classes like the one below can also act as an interface. In fact, this is how they are done in C++.

public abstract class MyInterface {
    public abstract void f();

Here is where I think his reasoning went wrong. Abstract classes are interfaces, but interfaces are not abstract classes. That is, in the same sense as a square is a rectangle, but a rectangle is not necessarily a square. So what is the difference between abstract classes and interfaces then?

The base class(es) describes what it is.
The interface(es) describes what it can do.

A good way to wrap your head around those differences is when thinking about someone in a recruiting process. This candidate is, well, a person, but it can also be hired or rejected.

public class Person
    public string Name { get; set; }
    public int Age { get; set; }
    public string Address { get; set; }

public interface Hireable   { bool IsHired { get; }    void Hire();   }
public interface Rejectable { bool IsRejected { get; } void Reject(); }
public class Candidate : Person, Hireable, Rejectable {}

A candidate is definitely not a Rejectable. Rejectable is a behavior that we want to attach to a Candidate. It does not define what a Candidate is. Person on the other hand defines very well what a Candidate is. Persons have a name, an age, they live somewhere and can be contacted via various methods. Except for the last one, these all represent data points of a Person. Their ability to be contacted is not a data point. Their phone number is. This is an other behavior.

public interface Contactable { string PhoneNumber { get; } void Contact(); }
public class Person : Contactable {}

Now we have a problem. What if we do not want our system to be able to contact a Person if it is not a candidate because we do not know what we would tell them. We still want Person to hold the contact information because a Candidate is about job offers and not personal information like a phone number. We could split the property from the interface, but then how would the Contact method knows where to call. It would have to be passed as a parameter every time by the caller of Person and that sure is ugly because now Person is not encapsulating the contact information anymore.

The solution to this problem is actually quite simple. In our system, a Person is not a completely functional entity. It cannot do its work all by itself. It needs to be at least a Candidate (or maybe a Recruiter) to function properly. Yet, Person cannot be turned into an interface because it is about data and not behavior. The solution is to make Person an abstract class.

public interface Contactable { string PhoneNumber { get; } void Contact(); }
public abstract class Person : Contactable {
    public PhoneNumber { get; set; }
    public abstract void Contact();

Let’s go back to the semantics of what we just built. A Candidate is a Person that can be contacted, hired and rejected. A Person is the information about a real world person that can be contacted but not without some context. There is a clear semantic difference between the two uses of inheritance in this situation. You are something and you can do things. Basically, in OOP, you are not defined by your actions, you are defined by your data.* Actions are just behaviors that we attach to this data.

Is the interface keyword really evil?

Since interfaces and abstract classes are not equivalent in the semantic world, we cannot simply replace all interfaces by abstract classes, even if they could accomplish the same task when used in a language that have multiple inheritance capabilities. They mean different things and are not interchangeable.

The interface keyword is not really harmful. It is just that the language designers chose to prioritize proper semantics over language features. You could still have multiple inheritance and explicit interfaces in the same language. It is not because one is there and the other isn’t that the interface or abstract keyword is necessarily harmful.

Hope you enjoyed my take on this one! Read you next time!


*Yes, there are properties in my interfaces .They are only getters to some data and not some space to hold this data. In other words, querying for a state like IsRejected is not the state itself.

MapReduce and LINQ

A surprisingly useful yet extremely misunderstood tool when dealing with large amount of data is the map/reduce pattern. There is a large amount of frameworks out there like MapReduce and Hadoop which makes it look like this is insanely complicated, and it is. Those implementations are very complex because they also deal with the problem of distributed data. At heart though, the pattern itself behind those implementation is very simple. Lets explore it and see how we could build a crude implementation of a MapReduce framework for C#.

Understanding MapReduce

The idea behind MapReduce is that most operations behind large amount of data can be done in two steps: one step that needs to be as close as possible to the data and one step that needs as much data as possible. For instance, you might want to count all instance of the word “banana” in a massive database. If this database is distributed across the globe, it would be very expensive if not prohibitive to transfer all the data to a machine for processing. Instead, we will first count the instance of this word locally, near the data, for each machine in the cluster. This will produce a bunch of very small values, the local count, that can be easily regrouped together on a single computer and then processed.

The Map operation is the one working close to the data. The Reduce operation is the one dealing with the large amount of results.

int Map(IEnumerable<string> allLocalData) {
    return allLocalData.Count(s => s.Equals("banana"));

int Reduce(IEnumerable<int> allMappedData) {
    return allMappedData.Sum();

This example is too simple to understand the use of MapReduce in a non distributed system. A more appropriate example would be compressing a large amount of images. There is one operation that needs to be done close to the data, the compression, and one that needs as much information as possible, the progress report. In this example, we can clearly see that MapReduce is about nothing more than executing a task on a large amount of data and collecting results. It does not matter if the compression is run sequentially or in parallel on multiple thread.

To make this clear, the pattern is only about data locality and not about asynchronous or parallel programming. Though, the goal of this pattern is to simplify large processing problems to enable some degree of parallel programming on them.

This pattern is generic enough to be used in a lot of cases. You might already be familiar with the Task.WhenAll() or IEnumerable.Select() methods and you might already have used them before. These are at the core of a basic, asynchronous and parallel implementation of the map reduce pattern in C#.

var inputs = new [] { "0.png", "1.png", "2.png" };

var map = inputs.Select(input => Task.Factory.StartNew(() =>

var results = await Task.WhenAll(map);

double reduce = results.Select(r => r.CompressionRatio).Average();

We start from of list of items to process. Here we are dealing with a list of files, but this could be anything: a large database, pixels in a bitmap, words in a documents, etc.  Then, we project each of those items to a Task which will execute some process over those items. In this case, we compress each file to the jpeg format and returns metadata once the process is finished. This is a CPU intensive process and compressing many files simultaneously is a great way to save time. Finally, we execute all of those tasks and wait for all resulting metadata to arrive. Once we have everything, we calculate the average compression ratio of all the files that we processed. This is an easy tasks as the expensive work has already been done.

From this prototype, we can extract the specifics of our problem to keep only the MapReduce code hidden behind.

public Task<TResult> MapReduce<TInput, TIntermediate, TResult>(
    IEnumerable<TInput> inputs,
    Func<TInput, TIntermediate> mapDelegate,
    Func<IEnumerable<TIntermediate>, TResult> reduceDelegate
    var results = new ConcurrentBag<TResult>();
        .ForAll(r => results.Add(r));
    return reduceDelegate(results);

This snippet of code contains all that is specific to the map reduce process that we used in the previous example. If you played with PLINQ, this should look very simple to you. We create an object to store the temporary data between the map and reduce operation. Then, we run a map delegate over all inputs, in parallel and put the result in the bag as they arrive from the map operation. Once the map operation is done, we start reducing the values using the reduce delegate and then return the result. This extension method enables the user to focus on the business logic and keep the map reduce implementation hidden.

var average = somePngFiles.MapReduce(
    rs => rs.Select(r => r.CompressionRatio).Average()

While this implementation supports parallelism, it lack support for asynchronous programming. You would first need to improve it to support Tasks like in the original case and to call the reduce delegate, not once all the data has been processed, but as soon as the data arrives. The solution here is to use Rx instead of PLINQ to solve the problem. I will leave the exercise to the reader to figure out how this would work.

Keep in mind that this is only implementing the MapReduce pattern and not a full, distributed map reduce framework. Still, you could easily use the map and reduce delegate to call in a remote API to provide the data and do some processing.


By trying to implement the MapReduce pattern in C#, it quickly becomes evident that it is already deeply part of the .NET language. We can already get pretty close with just a Select and Aggregate operation and it does not take much more work to support parallel and asynchronous programming.

That’s it! Hopefully this will help you understand what is going on deep inside those more complex MapReduce frameworks available out there.

Have fun and read you later!

The idea behind micro services

Micro services are so much more than just a way to build a large application. They are not about having a whole bunch of independent tiny web servers talking to each other. It is a design pattern you can use to structure your code, any code. At heart, micro services are about small independent units working together toward a larger goal. Micro services are probably the absolute best hammer for just about every single job. Here’s why.

Single responsibility

The first thing micro services bring to the table is a strong concept of integrity. That is strong cohesion and low coupling. The very concept of having a small object taking care of some very specific needs goes hand in hand with the problem of cohesion and coupling. To understand this, we must first look at what is the service pattern. In OOP, a service is a piece of code, typically a class, that handles processes on a related model. If you look at it from an MVVM point of view, services are what your view models should call to acquire their data, apply your business processes and push the data back to some permanent storage.

Here is a quick example of an application using the service pattern to display a list of employees whom contact information have not been filed:

class Employee {}
interface IUserAccessToken {}
interface IRepository {}

interface IEmployeeService
 IEnumerable<Employee> GetEmployeesWithInvalidContactInfo();

class EmployeeListViewModel
 private IEmployeeService _employeeService;
 public EmployeeListViewModel(IEmployeeService employeeService)
 _employeeService = employeeService;
 public Employee[] InvalidEmployees { get; private set; }
 public ICommand RefreshEmployees()
 return SimpleCommand.OnExecute(() =>
 InvalidEmployees = GetEmployeesWithInvalidContactInfo()
 private Employee[] GetEmployeesWithInvalidContactInfo()
 return _employeeService
 .OrderBy(e => e.Name)

class EmployeeService
 private IUserAccessToken _token;
 private IRepository _employeeSource;
 public EmployeeService(IUserAccessToken token, IRepository employeeSource)
 _token = token;
 _employeeSource = employeeSource;
 public IEnumerable<Employee> GetEmployeesWithInvalidContactInfo()
 return _employeeSource
 .Where(e => e.ContactInfo.Any() && e.EmergencyContactInfo.Any());

Here, we are using an IEmployeeService to fill a list of invalid employees in the view model when a refresh is triggered. The goal of this service is to hide the complexities of the underlying query, leaving only display logic to be taken care of in the view model. The service takes care of forwarding the security token to the remote repository and gives a meaning to “Invalid Contact Info” by saying that an employee must have a contact and an emergency contact to be valid.

In those services, the more you focus on a specific task, the more you have to remove distractions around. You could put everything into a single class and call it a service. This kind of thinking works very well for extremely small projects because those applications usually end up with 1 to 3 simple concerns. The moment you want to deal with real applications though, you have to start breaking things apart. In a JavaScript web application, you might end up with an AJAX service that takes care of sending requests to your server. You might also have a persistence service that lets you save data to various destinations like the browser’s local storage or the AJAX service itself. Let’s say you are working on an IDE, you might want to provide some code auto-complete feature to your users, this can be crammed into a service too. In the case of our employee management example, there might be a lot more services, each taking care of a single responsibility like the one from IEmployeeService.

The more you think in term of services, the more you end up with reusable block of independent features. This is because you can clearly see what belongs into a service and what doesn’t. The code that doesn’t ends up distracting when trying to accomplish the main service’s task just feels out of place. It might be a method that deals with significantly different data than the others or a block of code that doesn’t look like it has been written in the same style that the others. It might sound counter-intuitive, but the latter is usually easier to spot than the former. Code that is imperative stands out in a sea of declarative or fluent calls, highly nested loops or conditions are clearly, visually not aligned with more linear algorithms and finally injecting a dependency for a single line of code is the epiphany of those examples. On the other side, it is hard to tell if the Employee class have anything to do with the Person class or if you should handle a Person emails or hand that over to a different service. In our case, it is safe to say that an CreateEmployee method would make quite a lot of sense in our IEmployeeService, but a CreateManager method would probably deserve its own service as it deals with a different entity and repository.

In other words, well designed services tends toward having a single responsibility and micro services are exactly that: a single service for a single task.


Another reason why you might be interested in using micro services is that their nature makes them highly extensible. They tends to have a fair amount of injectable dependencies that you can swap to create new behaviors. They also expose almost everything they can do through an interface making it easy to override part of the behaviors. Finally, in most languages, it is trivial to implement the most complicated behavior the application might need in a service and expose this behavior through overloads that offers simpler variations as needed.

Let’s say we update our example to support paging and filtering. The interface will probably be updated to look like this:

interface IEmployeeService
 IEnumerable<Employee> GetEmployeesWithInvalidContactInfo(Func<Employee, bool> filter, int count, int page);

But you might also want to be able to take some shortcuts when some values can be inferred like in our existing view model:


This can easily be accomplished using a simple extension method.

public IEnumerable<Employee> GetEmployeesWithInvalidContactInfo(this IEmployeeService service)
    return service.GetEmployeesWithInvalidContactInfo(e => true, int.MaxValue, 0);

While those are just simple overloads, you could go a lot further using simple extension methods. The important thing to remember here is that the service itself does not have to be modified to support those overloads. Everything can be done from outside in separate classes. Smaller services makes for easier code organization when dealing with those extensions.


The second your start thinking in terms of services, you have to deal with dependencies. Since services are highly cohesive, they strongly depends their users to provide them with their dependencies. This means that services ends up being shared between each other in your code quite often. Taking a look at our previous example, many view models will probably use an IEmployeeService and many services will use an IUserAccessToken. Their code ends up being reused dozens of times throughout your application.

They are often highly decoupled from your business domain which makes them great candidate for external code reuse in other projects. As a matter of fact, AngularJs, a popular JavaScript framework, already provides a tons of services to you to take care of DOM manipulation and AJAX queries. This shows how reusable a service can become and it greatly help reducing the amount of repetitions in your codebase.


Sometimes, some services might have a common codebase and you might be tempted to leave everything in a single class to simplify the data flow. This is questionable, but in such a case, nothing forces your to expose a single interface for both behaviors. You can easily implement two interfaces in a single service. This enables you to publish your service to other developers while keeping the ability to split it in two later on when the need arise.

For instance, we could have an IReadEmployeeService, an ICreateEmployeeService  and an IUpdateEmployeeService. We can always add an IDeleteEmployeeService later if we need to. In the end, all of that code can end up in the same EmployeeService class we already have. The important thing is, we can always change our mind later on.


As I already mentioned, a side effect of dealing with highly decoupled services is that you end up having to deal with dependencies a lot. Because of this, micro services are a clear use case for constructor dependency injection and inversion of control. Each services should always expose an interface. Other services can then depend on them through constructor injection without imposing a specific implementation. You can then truly let your dependency injection framework shine and take care of resolving those dependencies for you.

In fact, you can even inject new services at run-time through a plugin architecture which lets you extend your application as it is running. Having all of those reusable blocks of code helps in exposing points of interactions with the plugin’s developers and simplify their job. Everyone wins!

Conclusion… or is it?

As you can see, there are a lot of advantages to micro services, the main one being that they not only respect but actually impose SOLID principles onto your code. It is harder to write micro services without ending up respecting those principles than trying to force your way against them. This makes the micro services pattern a really good design candidate for just about any code base. But, there are even more advantages…

Aside from the SOLID principles, micro services also comes with other additional goodies that you might like.


When dealing with very small, reusable piece of code, it gets harder to track and store states. There are so many places where you could store data that you are better-off without states at all. This encourage the use of stateless patterns like pure methods which not only improve code readability but also reduce the chances of bugs. Good micro services design usually abuse of the principle of immutability to reduce side effects as they would get amplified very quickly as a service gets reused.

For instance, instead of adding a SetCurrentPage or SetMaxResultCount method on the service, we prefer to share those states through the method parameters. Having to deal with such a state would be really troublesome if we started to have multiple instances of the EmployeeService class. In this case, immutability is just a simpler path.

Code Ownership

Usually, in a serious development environment, when you write a piece of code, you end up owning it. People can easily track who wrote a block of code through the use of source control systems or even through knowledge sharing meetings. Your coworkers will ends up depending on your knowledge of the code when using your services or figuring out issues. As the original writer, you will be the first resource everyone will go to when they have questions.

This can be really painful when you author a large piece of code because by the time you come back to them a few months later to answer questions, you will have a much more complex puzzle to wrap your mind around before answering anyone’s questions. With micro services, there are also less chances someone else will change your code in a way that makes it unfamiliar to you. If a large refactor happens, they will have to take ownership of the new services and your name will usually end up erased from history as new classes gets created. Normally though, you will rarely see someone else change a service you own. Instead, most people will be creating new files all the time which makes it easier when you come back to your code a few months later.

The truth…

You probably noticed that I never attached micro services to any specific communication technologies . This is why they are in fact a pattern. They can be implemented inside an application or through an entire cloud application by using multiple segregated applications that communicate to each others using message queues, rest or protocol buffer endpoints or even simple piping. In fact, *nix is strongly based on the micro services pattern as it is built as a collection of small tools like grep, test or chmod. Thing is, it wasn’t a thing at the time so they didn’t advertised it as such. In my head though, linux is a perfect example of a very successful micro service architecture.

Next time you design an application or refactor an existing one, consider this: micro services are a pattern that provides a SOLID foundation and helps reduce bugs through immutability and fix them through better code ownership. They truly are a recipe for success.

Read ya later!

Teaching Modern Development – The series – Part 6

Welcome back again to this series about teaching modern development. This session will be very different from the more technical ones and might remind you of the firsts when we covered some of Visual Studio features. Today, we are taking a look at something that seems trivial but truly isn’t: storing code files.

Storing code, the bad way

Imagine that you are working on a small scale, academic project with yourself as the only developer. You will probably use many files to store all the information related to the project. These files could be code, configuration files, program assets, analysis documents and many other things. You will probably store everything in folders and keep it all organized to help you deal with clutter. This is a good start. You should always stay organized, even on small scale projects. It does not cost anything to create a folder and only adds in readability afterward. If you do not do this already, then you should start thinking about it.

As your project progress, a day will come where you will need to undo some very clever changes you made at 3 am after a big party with your friends. While it looked perfect at that time, everything strangely seems to break on the following day. Your first reflex will be to remove the new code you have added until you remember that to do some of those changes, you initially had to delete and modify other things around the project. It is easy to delete lines of code but figuring what changed and what was there before hand, even if you wrote it yourself, can be a challenge. If you do not keep any timely backups, you will be stuck to rewrite the whole thing from scratch. This might also introduce new bugs and will require the application to be retested entirely. In the end, you are bound to forget something down the line unless you can restore a previous copy of your work.

A few months later, as the assignment becomes more and more complex, your project gain speed and you end up working as a small team. Most of the time, everyone contribute from their home but you still meet a few times a week to catch up on other’s people changes. Fortunately, you are a good architect and you know how to plan features ahead so that each member of the team can work on their own little piece of code without interfering with each others. Each one can work in their own bubble and just have to sync up with everyone else once in a while. You exchange code via file sharing services like Dropbox and some time using a flash drive.

A few weeks in, one of your team member did not completed their tasks in time and you have to push back the meeting. When everyone finally gets together again, you notice that someone decided to improve a few parts of the application to pass the time. They did not told you about it before now, but their changes seems very useful. The only problem is, they changed a file you were working on and ended up using an older, incompatible version as a base for their changes. You can either keep their changes and redo your bug fix or keep the bug fix and redo their changes.

Near the end of the project, you notice that your teacher asked you for some release notes covering the changes between your final version and a prototype you released 3 months ago when you used to work alone. No one remembers all of the fixes that have been made since then. No one even remembers all of the changes that happened since yesterday. In the end, you just build something that looks about right based on some old backups. Since you learned so much during the session, It took a few hours just to discriminate feature changes and pure code changes in your project.

When the time comes to hand over your work to your teacher, you do a few last changes and send over your code by email to your coworkers for them to integrate their last changes and submit it over to the school’s shared drive. You pat yourself on the back and finally go to sleep at 2 in the morning. The next day is a total panic in your mailbox. When you wake up, you notice that no one received the last version of the code. This caused your team to miss the dead line. Every one is writing in large unfriendly capital letters complaining that no one got a way to phone you or get to your house in the middle of the night. It turns out that you simply forgot to attach the large zip file that you created before sending the email.

All of this sounds pretty painful and trust me, it is. I have worked in a company where this was our day to day routine. The software was over 20 years old and dozens of people worked on it. Most of them moved on to other things since then so trying to understand what they did was close to impossible. The five of us worked off a network share with daily off-site backups. It took 3 days to revert a change someone made 6 weeks before hand. It was causing a major crash that no one noticed before we entered the final testing stage. We were lucky as we erased backups every two months because we only had so much tapes to store them. If we would have caught the issue a few days later, it would have been impossible to restore the code and we would have had to rewrite it from memory.

Fortunately, there is a better way to do things…

Introducing source control

Source control systems are tools to manage your files in a way that helps you deal with all of those problems. They handle versioning, reverting, sharing, merging and a lot more depending on the tool you choose. I wish we could explore my two favorite source control systems in this post (git and TFS), but that would take so much time that even the longest post here would feel small. Instead, I will keep it to git and focus on the idea behind it so that moving to other systems like TFS feels natural.

Most SCS works the same way. You have some kind of server which provides you with the source control service and a client which query the service for file content and metadata. Both can run on the same computer, so you might use it only for versioning, or the server can run on a public-facing machine and provide you with powerful code sharing and backup capabilities. Some SCS can also replicate and synchronize data between various servers, typically a remote and a local one, so that you can work offline.

Git is probably the most widely used SCS today and probably the most powerful. Github, a very popular, public-facing Git server, is probably the reason behind its popularity. While it is very easy to setup your own git server, github makes it absolutely painless. It is free for open source projects and very cheap for private ones. Nowadays, most open source project uses github to host their source code.

The place where git store source code is called a repository. This is the equivalent of a shared folder on Dropbox on steroids. There are quite a few services available on a repository: Clone, Checkout, Commit, Push and Pull are the basic ones you need to understand to get started. Creating a repository on github is just a question of finding a name and click “create”.


With git, unless you created a local-only repository, you first need to clone a remote repository to start working. Cloning a repository will essentially download the entire server’s content to a folder on your machine and turn that folder into a repository. It will also create a link between the local repository and the remote one so that you can synchronize them together later.


The next step is usually taken care of automatically by git. It will checkout all of the files from your local repository. Checking out a file has the effect of creating or updating a file on your computer to match the content that git has stored in its database. Effectively, this will make all files from the repository you just cloned available for editing in the repository’s folder.


At this point, you can work on your project like if git did not existed. You make changes to your code, add lines, remove lines, add files, move and rename others, etc. Once you’re done working on your task or feel like you have accomplished a change that make sense as a whole within your project, you can commit your changes to the repository. For instance, you might have fixed a bug, added some documentation or added a new feature. All of those are worthy of a commit, even if you only changed a single line to accomplish this step. Committing your changes will ask git to find all the files that changed in your working copy, find all the changes withing those files, package them together and store them in the repository. You should always attach a message to your commits. This will provide you with a nice list of changes, all labeled and dated, that you can browse if you ever need to.


The next step is to push your changes to the remote repository and to pull changes other people made on the remote repository since you cloned it. When you are confident that your changes are stable enough, you can ask git to push all commits to your local repository back to the remote repository. This is usually a one click process, but some times, other people might have already pushed some changes to the remote repository before you. You must then pull those changes back to your local repository, make sure it still work with your changes and then push your changes back to the remote server. In case two changes conflicts, most SCS will help you merge them together so that both changes stays in the code. Your changes are now available for anyone to use!

This is the basic idea behind source control. They all use a similar pattern. Some of them skips the push/pull synchronization part as there is no local repository. You always deal with a remote repository so your commits go straight to the remote server. At this point, SCS already fixes all the issues we had in our original story. It provides a history of changes from which you can extract release notes and rollback your changes. It enables quick and easy synchronization and sharing between all the developers and you can still work when disconnected if you really need to. It even helps you deal with conflicts by merging your changes with other people’s changes.

Advanced source control

In real-world projects, you will often need some extra features around the base that we just covered. You probably want some project management ability or better control over dependencies your code have over third party projects. Git can also help you deal with these issues.


With source control comes a concept of branches. A branch is a specific lifeline or bubble for your source code. Each operation is done on a specific branch. By default, this will be the master branch and you will never know they exists. At any point in time, you can decide to create a new branch which will track commits independently from the master branch. This will create a tree of branches in your repository. You can then merge them back together later if you want by applying commits from the branch back too its parent branch. You can see branches as their own little repository, all synchronized together through push, pulls and merges.

There are many uses for branches. For instance, you and your team might be working on a specific feature of a bigger project. You do not want to get blocked by some others team changes which could break the entire application, and you do not want to impose the same problem on the other teams either. It is very easy to push some bad code and prevent everyone from working when everyone is working on the same repository.

This is a case for what we call a feature branches. You create a branch from a stable commit in the repository, usually the latest one, and name it with the feature you are working on. You will commit all of your changes on this branch. You can be sure that even if someone else breaks the master branch, it will not affect your team and vice versa. Once in a while, you should check if the parent branch is still stable and bring their changes back into your branch through a merge. That way, you can stay up to date with other people’s changes and react to them if needed. When the feature is done, you merge your branch back into its parent. All other teams can now profit from your changes and no one gets hurt in the process.

A cool thing you can also do with feature branches is use them like a todo list. Figure out what needs to be done in your project and build a hierarchy of tasks from there. Save those tasks in git as branches to keep track of what needs to be done. You can see the progress of each individual tasks by looking at the commits on its branch and you can mark tasks as done by merging back the changes into their parent branch.

Similar to branches are tags which are simple labels that you associate to a specific commit. Since a commit’s unique id is generally unreadable, this is very useful to identify shipped code with the public version number so that you know the exact commit the user is running on their computer while keeping the semantic version numbers we are used to see around.

You can also quickly branch from a tag to fix a bug, publish an update to your users, merge back the changes into the master branch and make sure the next version is not affected.


In a completely different situation, you will need forks and pull requests. When managing large projects, you might want to filter which commit gets merged back from your developers. For this, you can impose on them to first fork the remote repository before cloning it.

A fork is basically a 1:1 copy of a repository for a given point in time, just like a branch. The difference is that the developer who created the fork becomes the owner and gets full access to the repository. From there, they can push and pull new commits, adding their own features to the project. If they think a feature is good enough that everyone upstream should be able to use it, they can create a pull request. This tells the original repository that a bunch of commits pushed on a fork are ready to be pulled into the repository. The author or maintainer of the original repository can then inspect the code, comment on it, requests changes or accept the request as-is and merge the code into their own repository.

This is very similar to branching, but it creates a wall between the contributors and the managers. Forking and pull requests are a common thing in the open source world where everyone can do pretty much anything they want. By limiting the amount of contributors to the main repository to a handful of people, you can keep a better control over the quality of the project and keep an eye on what gets included.


Sometimes, you do not want to participate in other repositories at all. You just want to depend on someone else’s code and make sure to keep it up to date. In this case, you can clone a repository as a sub-module inside your own repository, basically turning git into a package manager. As with any clone, you can then chose which branch and commit to use and update to a newer commit when you find it more stable or useful. If the remote repository contains breaking changes, you can quickly see the code that breaks by looking at their commit history.

This ability to create sub-modules makes it really easy to build a modular application by keeping the various modules into their own repository, further helping decentralization.


And finally, there are times when you feel so anti-social that you want to prevent some of your own changes to make it to the remote repository. To do this, you can ignore changes made to specific files or folders so that they do not appear in your commits. This is very useful if you have a configuration file with personal access keys.

In case the changes are already committed to your local repository, you can either force an overwrite of your local repository from a specific remote version to undo those changes or, if you had other committed but not yet pushed changes, even rebuild the history to make it look like it never even happened using the rebase command.

The good way

There are a lot more things you can do with a good source control setup which makes SCS a key tool in proper development teams. Learn to use it early as it will pay back really fast. On this note, I will let you explore the tool yourself a little bit. You can come back for the assignment related to this post which will be published some time in the future and will get you to build a small repository on your computer using the various features we have just discussed.

Read you later!

Teaching Modern Development – The series – Part 5

After going through a massive tour on variables, it is now time to push this a little further. Welcome back everyone for this new session in the Teaching Modern Development series. Today, we’re going to take a look at an extensions of variables introduced by the class Object Oriented Paradigm. Buckle your seat-belt and get ready for some in-depth look at the heart of OOP.

Working with something

In our previous session, we got a pretty good run at variables. I introduced them as “working on something”. They are the data your app is manipulating and the concept is key to go from a simple repeatable task all the way to AI and complex user interactions. Some times though, working with something is a lot more intuitive and logical than working on something.

You already are aware that in OOP, objects owns both actions and properties. Things they do and things they are. We have already looked at what actions are but we have only brushed he surface of properties during the first few sessions. Properties are the state of an object. They are variables associated to a specific instance of an object. If a Person is named Philip, that means its Name property is storing the string “Philip”. Other instances of the same Person class will have different values in their properties. Properties describes objects and make them what they are.

This is a very important tool that helps keeping large code-bases manageable. If you have to store a large amount of Person in you system, you do not want to have disparate lists of names, birth-dates and whatnot. It makes a lot more sense to have a single list of Person and have each instances handle their own content. This concept of keeping related data close together greatly helps to prevent errors. In this case, a first-name and a last-name are definitely closer than all first-names in the system. If this still feels unclear to you, think about it this way. There is a very large number of cases where you might want to display the first-name together with the last-name, but very few where displaying all first names in the system might be useful.

The Person class might even have a method called Store or Save, or some other class might know how to save a Person. It then becomes trivial to save all of those instances. You can simply loop over all the items in an array of Person and save each instance of the class one by one.

// passing in the object...
// vs passing in each variables one by one
aDatabase.Store(aName, aBirthdate, ...);

This example isn’t even that bad. What if someone decides to store the person’s name in one database, and the person’s birth-date in an other? There is nothing to prevent this as the data is already dismantled. The lists might get out of sync and you might end up billing the wrong credit card or the program might crash because you expect that some data would exist when it does not. Having everything under a single banner like this helps to bring a sense of cohesion in the code. You change the date of birth of a person, you change the credit card number of a person, you display the name of a person.

In this example, the Person’s instance might have come from very far away. There might be dozens of methods passing around this Person before it gets saved. Having a parameter for each of its properties, one by one, on every call will quickly become painful and dangerous. There are chances you might swap a few properties around, making the first name a last name, or even forget to pass in some data. Moreover, if you ever need to add a property to this Person concept, you will need to update all methods in your code to take the new data into account. With those properties tied to an instance of a class, you can simply pass the whole object as a parameter. This makes it a single parameter all around you code and if a property is added or removed, you do not have to change the whole program.

This helps reduce coupling in your code; friction points between the various concepts in your application. Instead of having a dozen friction points that have to change if a single concept like a person needs an extra property, you only have one to deal with: the Person class itself. It is only when you need a specific value, or that there is no more meaning into keeping everything together, that you use the dot syntax to dig into an instance and get what you need.

Ultimately, you will still have to deal with specific properties one by one anyway. You do not always want to display all the data related to one person. Maybe you just want to show a nice name tag to identify the current user or simply say “happy birthday” and keep their credit card number out of the way. This is fine. The point of this is to keep the friction between blocks of code to a minimum. When the class itself gets in the way, it is probably time to extract the data from it and deal with it directly.

This works just like when using a method:

Person aPerson;
Person anOtherPerson;

aPerson.Name = "Fred";
aPerson.Birthdate = DateTime.FromString("12/21/1988");
aPerson.Email = ""

anOtherPerson.Name = "Bob";
anOtherPerson.Birthdate = DateTime.FromString("7/14/2001")
anOtherPerson.Email = "";


In this example, you can see that we use the dot syntax to interact directly with the properties inside a Person’s instance. It reads as follow: assign “Fred” to aPerson’s Name, Create a new DateTime from the string “12/21/1988” and assign it to aPerson’s Birthdate, etc. In the end, send anEmail To aPerson and CC anOtherPerson. We try to keep the coupling to a minimum by only accessing the properties we need. The email is sent to a Person because it might use all of this data to build a customized message. Sure we could have passed it one value at a time, but this is simply so much more readable and cohesive that it just make sense.

    .To("Bob", "7/14/2001", "");

When we pass in each value individually, we can hardly make sense of what the To method is doing. Why is it sending to a date? This is a good example where using the Person class as a whole keeps the program meaningful.

A new keyword

But again, this example is lacking one keyword for everything to work. We have seen in the previous session that classes are not value types, they are reference types, and because of this we need to give the two Person instances some space in memory to store their data or else they will be of undefined value. To create a new instance of a class, C#, as well as most OO languages, uses the keyword new.

New is a special operator that acts on a class directly instead of an instance. To put it simply, it is a static method defined for all objects. It is similar to the DateTime.FromString() method used in the previous example. As its name conveys, it creates a new instance of a class from a set of parameters.

var aPerson = new Person();
var anOtherPerson = new Person();

It works by looking at the type to its right, then figures out how much memory that type needs to store its data and then allocate the space in memory to finally return back a reference to this memory block. Figuring out the size of Person is easy. The compiler simply looks at all the fields’ type this class uses. If they are base types, it already knows their size (ints are 4 bytes, doubles are 8, etc.). If they are complex types like an other class, it only stores the reference which also have a constant size. In the case of our previous example, here is the size of Person.

  • Name : String – Ref type, 8 bytes on x86_64
  • Birthdate : DateTime – Value type, 8 bytes, probably a 128 bits number in ns.
  • Email : String – Ref type, 8 bytes on x86_64
  • Total of 24 bytes. More will be used by the instances of string to store their own data.

As you can see, since we now know the type that will be produced from the return type of the new operator, we can use the var keyword again to define our variable.

Fields and Properties

So far, you have seen me use those two words quite a lot. It is important to understand that they are not the same thing. Some languages do not even have a concept of one or the other. The difference and use for both is quite subtle and it is wonderful to get to use both in C#.

A fields is the direct equivalent of a variable linked to an instance of a class. They are declared exactly like variables, but within the scope of a class. Because of this, we need to specify their accessibility qualifier. This makes public fields very dangerous because they leave the data wide open to any one who can access the class. It could be accessed from billions of places in the application and this can cause some problems down the hill. This is why you should try to keep all fields private.

Properties are a simple abstraction over fields. They add a form of safety to fields at the expense of a slight performance decrease. Yet, as a golden rule, you should always use properties unless you hit some very specific cases in which fields makes more sense or performance is critical and the compiler cannot be trusted. The later has never happened to me. Essentially, properties acts as a central point of interception to access a fields value. They are designed in a way so that it becomes impossible or very awkward to use the value directly. This ensures that all changes and accesses are done through this centralized piece of code. The reason for this centralization is the same as why we uses classes instead of values directly. It reduces coupling and make your whole life easier.

Fields being more dangerous than properties, it is good practice to keep their value constant. That is, once they are set, they should keep the same value as long as the instance in which they reside exists. To enforce this rule, you can use the readonly keyword when declaring a field. The only chance you will have to set their value is when the instance is first created. Also, the only time where you should omit this keyword is if the field is modified by a property. This practice further emphasis the importance of keeping fields hidden to the external world.

class Person
    private readonly string _name;
    private readonly DateTime _birthdate;
    private readonly string _email;

The naming convention for declaring fields in C# is to prefix them with an underscore. This is one of the rare cases where a name prefix actually make sense in a modern language. Let me explain the reasoning behind this exceptional measure.

Fields have a large enough scope to be used in potentially hundreds of lines of code all around an application. This means that its value can stay valid for a very long time within the life of an application, potentially even longer than the application life itself. Accessing or modifying a field in a wrong place can not only be extremely hard to debug, but also very dangerous. It is as dangerous as using a program to randomly overwrite data in your computer’s memory. It is as dangerous as letting anyone walk around your house and hope that they will not steal something.

For this reason, fields should always stay private. Seeing aPerson._name should be instantly suspicious. In the same way, they should always marked as read-only so seeing the _name = construct or any other assignment technique should also look suspicious.

Seeing underscores everywhere breaks the reading flow and makes some constructs looks really ugly. The dot-underscore pattern and lines starting with an underscore in C# are very rare and looks incredibly awkward in your code. It makes everything unaligned and filled with holes so that even a quick glance will suffice to say that something is terribly wrong in a piece of code.

Basically, this naming convention exists to make your eyes bleed.

Please use it for your own safety. But which safety? How are you supposed to keep a program’s state with variables that are well… not variables?

The biggest problem with fields, as I mentioned, is that you cannot track when their value changes. Properties centralize access and changes through the use of two special methods: a getter method and a setter method. In fact, properties are a simple syntactic sugar around a field and these two methods to make them painless to declare.

These two methods are exactly what makes properties the most powerful tool in an OOP language. Instead of changing the field’s value from everywhere, access to the field is always done through one single method. This makes it easy to add validation to prevent an invalid change, track when the value changes or even create a value on the fly. Plus, you get to use the value as if it was a field which keeps a natural feel to the code.

class Person
    // An automatic property
    public string Name { get; set; }

    // A property with its back-end field
    private DateTime _birthdate;

    public DateTime Birthdate
        get { return _birthdate; }
        set { _birthdate = value; }

    // A property that validates its value before changing the back-end field
    private string _email;

    public string Email
        get { return _email; }
            if (IsEmail(value))
                _email = value;

    // What really happens when you declare a property
    private string _phoneNumber;

    public string get_PhoneNumber()
        return _phoneNumber;

    public void set_PhoneNumber(string value)
        _phoneNumber = value;

    // A read-only property that uses other properties to build its own value
    public bool IsMarketable
                // Older than 18 years
                (DateTime.Now - Birthdate) > DateTimeOffset.FromYears(18) &&
                    // Have an email or phone number
                    !string.IsNullOrEmpty(Email) ||

The first property uses the simplest form. It lets you quickly declare a property which simply wraps around a hidden field of the same type. This is how most of your properties will be declared.

The second property is a more explicit version of the first one. They are perfectly equivalent which makes this version pretty much useless. The simplest form will makes the code more compact and does not convey any less information. In this case, you can clearly see what happens when the property is read from or written to. The get block is called when trying to access the property and will return the value of the _birthday field when this happens. Also, the set block will get called when trying to assign a new value to the property and it will update the same field using the implicit value variable provided through the assignment. The value keyword acts as a method parameter for the set block which contains the assigned value. Even though this is a keyword, you can still declare new variables using this name around in your code as long as they are not in a property’s set block. Only use this variation when you need to trace when the values are being accessed or changed and revert it to the automatic form when your are done with your tests.

The next sample shows why the long form of property declarations is useful. After all, the set and get blocks are just that, code blocks, just like methods, so they can do anything you want them to do. In this case, we validate the value to make sure it contains a valid email before storing it. This prevents invalid data from getting in the system. As you can see, you do not even have to update the backing field when the setter is called. You can leave it as is and move on.

The fourth sample shows exactly what happens behind the scene when you declare a property. In fact, the intermediate language code produced by this sample will be exactly the same as in the first and second samples. Aside from a few names, they produce the exact same structure. This is because properties are just a syntactic sugar, that is, they are translated and rewritten to this equivalent pattern automatically by the compiler. They are not carried over to assembly or even IL. The compiler will know which of the two methods to call based on their name prefixes.

Finally, the last sample shows an other, less visible use for properties. They do not even have to use a backing field. Their value can be calculated from the ground up, or even come from a constant if you want. This is what enables the properties safety. They might look like fields, they might be used exactly like fields, but behind the scenes, they might not even be bound to real data. They completely hide their implementation details. As an implementer of the class, this enables you to change the behavior you get when accessing or changing the value of a “field” without touching the code that is using the field. As a user of the class, this gives you a safe base to work on, knowing that if one day a behavior needs to be added, you will not have to replace all of your field access with method calls. This also makes it easier to read in some cases because those calls can be hidden which makes complex expressions lighter through the omission of a large amount of parenthesis and prefixes. Here is an example taken from a real life piece of code.

// Before using properties...
Set_abc(a.get_B() + a.getF() * b.C ? r.GetG() : e.P());
// And after...
Abc = a.B + a.F * b.C ? r.G : e.P;

Obviously, the real names have been redacted, but even though they are all one-letter words, you can already see the advantage of properties. All methods used different a naming convention and the important information gets lost in a sea of parenthesis. The cleaned version is a lot more direct and skips the overhead the first sample impose to your brain.


And we are done for today. The last session was broad enough to numb even an experienced developer so I thought I would keep it short this time. Soon, we will complete our first glance at control flow and take a better look at arrays. In the mean time, you can always work on this session’s assignment which will make you play with properties a bit more once it is available. The next post will take a little break from the usual formula as we will go back to our development tools and look at how we can store code and what is the right way to do it.

Read you later!

Teaching Modern Development – The series – Part 4

Welcome back everyone! After this little time off, it is now time to move on to our next big subject. This is the last key concept I want to discuss before we actually start to write some real code. So far, we might not have seen a lot of code, but this is going to change soon enough as today, we start to look at what makes programs powerful: data! Continue reading