16.03.2016

Don't Let Cleaning-Up Go Overboard

When a tweet pointed me to Radoslaw Sadowski's "Learn how to make a good code by bad example" I was excited to see how the author would go about showing how to clean-up brownfield code. Although his example looked simplistic, it turned out to be quite interesting. It's about some price discount calculation:

public class Class1
{
  public decimal Calculate(decimal ammount, int type, int years)
  {
    decimal result = 0;
    decimal disc = (years > 5) ? (decimal)5/100 : (decimal)years/100; 
    if (type == 1)
    {
      result = ammount;
    }
    else if (type == 2)
    {
      result = (ammount - (0.1m * ammount)) - disc * (ammount - (0.1m * ammount));
    }
    else if (type == 3)
    {
      result = (0.7m * ammount) - disc * (0.7m * ammount);
    }
    else if (type == 4)
    {
      result = (ammount - (0.5m * ammount)) - disc * (ammount - (0.5m * ammount));
    }
    return result;
  }
}

I liked how the author pointed out deficiencies and started removing them through simple refactorings.

This went well for the first half of his article – but then turned really bad. To my taste he went overboard following the "Thou shalt apply all design patterns thou knowest" commandment. Sad, very sad.

Just compare the number of lines of code:

Original brownfield solution: 25 lines of code (LOC)
Reasonably refactored solution: 55 LOC
Final solution: 118 LOC

By this the author says, that roughly 5 (!) times the size of the original dirty code is the pinnacle of cleanliness. Is he serious? And he even does this in the name of the DRY principle (see STEP VII in the article) and contrary to his insight "We should write as short and simple code as it is possible. Shorter code = less possible bugs, shorter time of understanding the business logic." (see STEP VIII).

What has gone wrong in the Clean Code camp?

A Different Approach

But I don't want to just bemoan this well meant attempt gone overboard. Let me try to improve upon it.

Test First

As much as I liked the first refactorings of the author, I disagree that's where you should start when cleaning up dirty code.

The first thing you always should do: Put the code to refactor under test. If there aren't any automated tests yet, set up at least a couple of integration tests. That's what I did when I took my stab at this refactoring problem. Here's the tests I came up with:

[TestCase(100.0, 1, 1, 100.0)]
[TestCase(100.0, 2, 1, 89.1)]
[TestCase(100.0, 3, 1, 69.3)]
[TestCase(100.0, 4, 1, 49.5)]
[TestCase(100.0, 0, 1, 0)]
[TestCase(100.0, 1, 6, 100.0)]
[TestCase(100.0, 2, 6, 85.5)]
[TestCase(100.0, 3, 6, 66.5)]
[TestCase(100.0, 4, 6, 47.5)]
[TestCase(100.0, 0, 6, 0)]
public void Test_discounting (decimal amount, int type, int years, decimal expected)
{
	var sut = new Class1 ();
	var result = sut.Calculate (amount, type, years);
	Assert.AreEqual (expected, result);
}

I looked at the different paths through the code and tried to find simple test cases for them. They reflect how the code currently works – whether that's ultimately correct or not.

After that all refactoring had to produce code which complied with these formalized requirements.

Interestingly the author's final version does not pass this test. It fails on type 1 accounts ("not registered") to which it incorrectly applies a loyality discount.

How come? The author did not realize the change in the semantics of the code during refactoring, when he moved application of the loyality discount to the end of the function.

He still gets it right in an intermediate version, though:

public class DiscountManager
{
  public decimal AddDiscount(decimal price, AccountStatuses accountStatus, int timeOfHavingAccountInYears)
  {
    decimal priceAfterDiscount = 0;
    decimal discountForLoyaltyInPercentage = (timeOfHavingAccountInYears > Constants.MAXIMUM_DISCOUNT_FOR_LOYALTY) ? (decimal)Constants.MAXIMUM_DISCOUNT_FOR_LOYALTY/100 : (decimal)timeOfHavingAccountInYears/100;
    switch (accountStatus)
    {
      case AccountStatuses.NotRegistered:
        priceAfterDiscount = price;
        break;
      case AccountStatuses.SimpleCustomer:
        priceAfterDiscount = (price - (Constants.DISCOUNT_FOR_SIMPLE_CUSTOMERS * price));
        priceAfterDiscount = priceAfterDiscount - (discountForLoyaltyInPercentage * priceAfterDiscount);
        break;
      case AccountStatuses.ValuableCustomer:
        priceAfterDiscount = (price - (Constants.DISCOUNT_FOR_VALUABLE_CUSTOMERS * price));
        priceAfterDiscount = priceAfterDiscount - (discountForLoyaltyInPercentage * priceAfterDiscount);
        break;
      case AccountStatuses.MostValuableCustomer:
        priceAfterDiscount = (price - (Constants.DISCOUNT_FOR_MOST_VALUABLE_CUSTOMERS * price));
        priceAfterDiscount = priceAfterDiscount - (discountForLoyaltyInPercentage * priceAfterDiscount);
        break;
      default:
        throw new NotImplementedException();
    }
    return priceAfterDiscount;
  }
}

The highlighted lines show: a loyality discount only gets applied for registered customers within the switch statement. But later the discount gets applied in all cases:

public decimal AddDiscount(decimal price, AccountStatuses accountStatus, int timeOfHavingAccountInYears)
{
  decimal priceAfterDiscount = 0;
  priceAfterDiscount = _factory.GetAccountDiscountCalculator(accountStatus).AddDiscount(price);
  priceAfterDiscount = _loyaltyDiscountCalculator.AddDiscount(priceAfterDiscount, timeOfHavingAccountInYears);
  return priceAfterDiscount;
}

Design Before Refactor

A test could have caught this error. But how to avoid it in the first place during clean-up/refactoring? I suggest you do an explicit design first.

When I say "design" you don't need to fear a full blown mainstream OO design with UML diagrams. No, it's much, much simpler. The author unwittingly almost did it, when he wrote:

Yes, an ordered list like this can be a poor man's design to start with.

Design is about responsibilities which get translated into modules (i.e. functions and classes among others). And it's about how those responsibilties should be put together to form the overall responsibility. The requirements state a certain behavior which then is produced through collaboration of many parts.

Despite his "design attempt", though, I don't agree with the author's solution approach.

There is no calculation algorithm to be determined. And the order of processing steps is wrong – especially if it's supposed to mean the loyality discount is always applied.

Let's look at the dirty code and highlight the different responsibilities:

I can spot three aspects of which two are badly intertwined:

The loyality discount is determined.
The status discount is determined. Only registered users are eligible to status discounts.
Status discount and loyality discount are applied only if the customer is eligible for a status discount.

Those aspects need to be retained – but brought in a proper order and cleaned-up. On my way to reach that goal I find it very useful to step back from hard to overlook imperative code. Instead I like to visualize the solution first in a declarative way.

Here's a simple data flow sketching how the desired behavior can be achieved by a small process of focused steps. The colors of the functional units match the aspects found in the original code:

As you clearly can see, a loyality discount now only needs to be determined if the customer if eligible for a (status) discount. Likewise only then a discounted price is calculated.

Each functional unit now has only a single responsibility. And on top there is another functional unit to integrate them all into a whole – which also is a responsibility of its own.

Implement Clean Code Without Frills

Such a straightforward easy to understand design sure is easy to implement. Every functional unit ("bubble") becomes a function of its own:

public enum AccountStatuses
{
	NotRegistered = 1,
	SimpleCustomer = 2,
	ValuableCustomer = 3,
	MostValuableCustomer = 4
}

public class Discounting
{  
	public decimal Adjust(decimal price, AccountStatuses accountStatus, int yearsLoyal)
	{
		decimal discountedPrice = 0.0m;
		Determine_status_discount (accountStatus,
			statusDiscount => {
				var loyalityDiscount = Determine_discount_for_loyality (yearsLoyal);
				discountedPrice = Calc_discounted_price (price, loyalityDiscount, statusDiscount);
			},
			() => discountedPrice = price);
		return discountedPrice;
	}

	void Determine_status_discount(AccountStatuses status, Action<decimal> onEligible, Action onNotEligible) {
		switch (status) {
		case AccountStatuses.NotRegistered:
			onNotEligible ();  break;
		case AccountStatuses.SimpleCustomer:
			onEligible (0.1m); break;
		case AccountStatuses.ValuableCustomer:
			onEligible (0.3m); break;
		case AccountStatuses.MostValuableCustomer:
			onEligible (0.5m); break;
		default:
			throw new NotImplementedException ("Not discount registered for status: " + status.ToString());
		}
	}

	decimal Determine_discount_for_loyality(int yearsLoyal) {
		const decimal MAX_RELEVANT_LOYALITY_YEARS = 5;
		return yearsLoyal > MAX_RELEVANT_LOYALITY_YEARS
					? (decimal)MAX_RELEVANT_LOYALITY_YEARS/100 
					: (decimal)yearsLoyal/100;
	}

	decimal Calc_discounted_price (decimal price, decimal loyalityDiscount, decimal statusDiscount)
	{
		decimal discountedPrice = price;
		discountedPrice -= discountedPrice * statusDiscount; // status is more valuable than loyality
		discountedPrice -= discountedPrice * loyalityDiscount;
		return discountedPrice;
	}
}

That's 52 lines of code, i.e. roughly twice as much code as the original. But these 52 lines of code are, well, easy to read and follow the SRP:

Calc_discounted_price(): Calculating a discounted price is two lines of trivial math. It's crystal clear how the discounts are applied. As you can see, the status discount is applied first, which means status is more important than loyality (because it's calculated based on a larger value). In Radoslaw's final code there is no function where this can be seen this clearly.
Determine_discount_for_loyality(): This differs not much from the original. But giving it its own function adds meaning to the one statement and wraps a contract around the responsibility. In case the calculation changes it's easy to spot where to intervene.
Determine_status_discount(): A very, very simple switch-statement suffices to map a status to a discount percentage. Here's where Radoslaw's solution slips into textbook overengineering. What a waste to set up interfaces and factory and strategy classes for a simple mapping! Even a dictionary data structure would do – were there not the special case of the unregistered customer. So what happens if a status discount changes? Alter the percentage in the method. What happens if a new status is introduced? Add it to the enum and add a case in the switch; that's probably 3 lines of code. Very, very easy and straightforward.
Adjust(): This method's responsibility is just to wire-up the data flow. It deliberately does not contain any logic. That's why it probably looks a bit funny to you: no if-then-else in there. Instead the two branches of the data flow are implemented using continuations. This is to follow the Integration Operation Segregation Principle (IOSP). This is how I "mechanically" translate a design like above; but if you don't want to be that rigorous you could use an if-then-else like so:

public decimal Adjust(decimal price, AccountStatuses accountStatus, int yearsLoyal)
{
	decimal statusDiscount;
	if (Determine_status_discount(accountStatus, out statusDiscount)) {
		var loyalityDiscount = Determine_discount_for_loyality (yearsLoyal);
		return Calc_discounted_price (price, loyalityDiscount, statusDiscount);
	}
	else
		return price;
}

bool Determine_status_discount(AccountStatuses status, out decimal discount) {
	discount = 0.0m;
	switch (status) {
	case AccountStatuses.NotRegistered:
		return false;
	case AccountStatuses.SimpleCustomer:
		discount = 0.1m;
		return true;
	case AccountStatuses.ValuableCustomer:
		discount = 0.3m;
		return true;
	case AccountStatuses.MostValuableCustomer:
		discount = 0.5m;
		return true;
	default:
		throw new NotImplementedException ("Not discount registered for status: " + status.ToString());
	}
}

In the name of Clean Code we could now argue about whether Determine_status_discount() really has only a single responsibility, though. Because currently it not only determines the status discount, but also whether any discount should be applied at all (which is different from a 0% status discount).

But, well... to me that's only a bit of dust left in the code. Negligible dirt ;-) Not extracting this responsibility keeps the code shorter without sacrificing too much readability/evolvability.

If you like, though, extract it. In any case that would be much more reasonable than introducing design patterns to blow up the code 5 times thereby obscuring the purpose of it: to calculate a discount.

Summary

It's great to explain Clean Code development by actually cleaning up dirty code. But as Radoslaw's code shows me it's very important to be careful to not overdo it. Applying all those pure principles and shiny design patterns is not an end in itself. We've to keep the reader in mind and stay humble.

Simplicity in coding is not about writing code quickly, but making it easy to understand.

And Clean Code is not about showing off yet another hyped concept, but staying true to what the code is supposed to do in the first place.

To that I find it useful to progress in a systematic way:

Put the code to refactor under test.
Identify the responsibilities in the code. Use color markers.
Design a solution before any code showing how the responsibilities collaborate towards the overall required behavior. This design should be declarative!
Refactor the code to your design; partially rewrite if needed.

Happy cleaning! :-)