Refactoring: And code smells

This section is about the “Bad Smells” that code can display and things to avoid.

Naming

Names should not leave the reader of your code guessing. Application/Package/Method/Class/Method/Fields should all have obvious names. Code is written once and read a multitude of times.

Tip: methods usually have 3 steps – preprocessing, a process, post-processing, and possibly a return value.

methodName() {
    preprocessingIfAny();
    doMainTask();
    postProcessingIfAny();
    returnOrFinish();
}

This is similar to test code structure of given, when, then. Each part is optional apart from the doMainTest() step. This could be a one line method. Typically, the smaller a method is, the easier it is to name and write tests for.

Duplicate Code – DRY (Don’t Repeat Yourself)

Do not repeat yourself. Try to be efficient with your code. If you write a method once and use it in more than one place, consider extracting the method to its own class and utilising it from each place that requires it. This approach requires less maintenance cost, and ensures the reader has less code to digest.

If the bit of code is duplicated in the same unit, then the refactoring to remove this duplication is simple.

Long Function

Long functions are hard to read. If you use small functions, they’re easier to name, and this often eliminates the necessity to read what the code is doing, beyond the function name. Generally if you need to comment something, that part of the code can be extracted to a named function.

If there are too many temporary variables in the way the extracted method may not be much more readable than the original. To eliminate this problem we can extract the variables to an object, introducing a parameter object.

look for signs of things that can be extracted: conditionals and loops. If the method using a loop is hard to name, thats potentially due to it doing more than one thing. Split the loop and you can iterate over your collection more than once if necessary.

Long Parameter List

You could combine a number of parameters into a “Parameter Object” if several always appear together (Data Clump). Or you could make one parameter the result of another object, and ask another parameter for the one you are relocating.

If a parameter is used as a flag to dispatch different behaviour, it can be removed in favour of a class.

Global Data

Bad because it can be modified from anywhere in the code base and there’s no way to know which bit of code interacted with it. This leads to bugs. Often seen with class variables and singletons. Encapsulate the variable if necessary. Now you will be able to see where it’s modified and start controlling its access, limiting scope as much as possible. Even worse is Global state that is mutable. Global state that never changes, once set, is relatively safe, if you have a language that can enforce that guarantee.

Mutable Data

Unless a variable needs to be mutable (changeable) it should be read only, using the relevant access modifiers. This secures the application state where it is necessary and makes it explicit to the reader that the variables are read only, once set, and ensures only variables that are permitted to change are able to.

If we can avoid mutable state we can run code is safer from side effects causing unexpected behaviour. Functional languages treat registers as constants rather than variables by default and variables only exist if they are explicitly specified.

Divergent change

We want to be able to jump to a single, clear location to make a change.

If a type of update involves making changes to more than one part of the codebase, it’s a pungent smell of divergent change. e.g. Introducing a new financial instrument means a change to the database and some additional modelling.

Shotgun Surgery

This is when a single change results in a number of small changes being necessary, and the changes are all over the place. In this instance we want to move the field and/or function to its own module so that everything is accessed in one way.

Inlining a lot of code/classes can help here as an intermediate step to extracting the relevant functionality into its own module.

Feature Envy

Modularising a program aims to put the code into zones to maximise interaction within the zone and minimise interaction between zones. This is also considered poor cohesion. Move the functions closer to one another to avoid this cross-talking.

If your code spends more time talking to another zone than its own, it’s a bad smell. E.g. a function that calls a bunch of getters to calculate some value, could be refactored to be closer to the data it relies on.

Strategy, Visitor, Self Delegation are methods of dealing with feature envy.

Rule of thumb: put things together that change together.

Data Clump

If certain fields always appear in a group, then they should probably form part of an object. If you can’t delete one of the fields this is a good sign you have a class waiting to be born. We say class rather than struct, because you’ll then have the opportunity to include cohesive methods as described by the “Feature Envy” refactoring suggestion, above.

Primitive Obsession

If you find you’re often performing conversions for a given primitive data type you’re using, you’ll probably find its better to use a typed class instead, with the relevant methods/behaviour you’re after. Too many times developers will use primitive types (strings, longs) as their data type when it would make more sense to use a type to help program in the domain you’re working in. e.g. Money, Time, Coordinates, etc. Moving out of the primitive world and into the meaningfully typed world will speed up your ability to introduce new code that’s less error prone.

Repeated Switches

If you have repeated switches, they can often be replaced with polymorphism by extracting the conditional bit of code out to a type and subclassing the various instances we’re looking for. The code will be cleaner, as well as easier to manage and extend.

Loops

It is preferable to use a functional style today with filter and map, as these are easier to digest for the reader vs for/while loops. If you can use the functional style, this is preferable.

Lazy Element

Sometimes we create code with the expectation it will be expanded on later: specific types that don’t end up being utilised much beyond getters/setters or a single field. These instances can simply be in-lined to remove the excessive structure and simplify code.

Speculative Generality

This is when you code something expecting it to be used in some future state. We may introduce extra hooks and special cases to handle situations that don’t presently exist or aren’t required. The end result is code thats harder to understand and maintain. If methods aren’t in use, remove them. If a class isn’t used, remove it. Abstract/Interface class not doing much, remove it. We also see this speculative style code when its only used by tests. In this instance the test case can be deleted and the dead code removed.

Temporary Field

Fields that are only used some of the time. These should be moved to their own class with their own behaviour so that the reader of your code doesn’t find it hard to read your code. This is also true for some conditional code.

Message Chains

When a client asks one object for another object which the client then asks for another object and so on. Could you move the function down the chain so that the function is closer to the object is needs? If not we end up with a bunch of middle men calling one another to get the object necessary and carry out some operation.

This could be considered a bottom up analysis that reveals there are several places using an object for similar operations. Could these operations be brought closer to the root object?

Middle Man

You’ll sometimes find code where half the methods delegate their calls to other objects. It would be better if the two objects just spoke directly to remove the indirection. Remove the delegation.

Insider Trading

Software developers like strong walls between their modules. Trading data around too much increasing coupling. Of course some trade needs to occur, but we want it to be minimal. If we find two classes that need data from one another or a function that needs a field from another class then it could be better to move them to a new or common location to reduce the chatter.

Large Class

When a class is trying to do too much it often shows up with too many fields and duplicate code. These can often be broken out into super classes/subclasses, external classes, etc. to make the code more modular. Often the clients of the class will make use of several methods that could be extracted to the clients or at least extracted to a class that lives alongside the client.

Classes with different interfaces

If you can guide two classes share the same interface, so they share the same protocol, you may find the two classes share duplicate methods and so we can reduce repetitiveness by extracting code to a superclass.

Data Class

These often have getters and setters but are simply data objects. These should be encapsulated properly by making the fields private (Java) and remove any setter methods for fields that should be immutable (unchangeable).

If we look where these getter/setter methods are used by other classes behaviour from the clients could be moved to this data class. Data classes are often a sign of behaviour in the wrong place.

Refused Bequest

If a parent class contains more than necessary, the non-necessary parts should be pushed down to a new child class, so that the parent only contains what is common/necessary to the two or more subclasses.

A child class that only inherits part of the parent indicates the hierarchy is wrong. It also leads to coders implementing only the parts of the interface they want without actually implementing the interface. Thus we lose the polymorphism offered by OO interfaces.