Duplicated Code

Perhaps this is already covered by Copy And Paste Programming and Code Smells? Deletion Candidate

I don't think so. Copy And Paste Programming is the process, that often leads to Duplicated Code, but

it also leads to Badly Factored Code

there are other sources of Duplicated Code (e.g. Badly Factored Code) which means that these two are often related, but none is a special case of the other.


What constitutes Duplicated Code?

Code is duplicated if more succinct code exists that describes the same functionality (gives same desired(excluding noise) results after running). --Aleksey Pavlichenko


Analogy with Zip data compression:

Refactoring is deduplication of source code done by human. Zip compression is deduplication of data done by machine.

Compression (deduplication) of one unit of information is many times more CPU-expensive than decompression (duplication). Similarly deduplication of one unit of functionality of software code is many times more human-expensive than duplication.

Consequently if a company spent X dollars to develop a product with duplicated code it will have to spend X * K dollars on code refactoring without any single visible improvement of the product, where K is between 3 and 100 depending on skills, complexity, language, etc.

Some companies hire more and more developers to pay interest on that duplication debt or keep changing developers wondering why production emergencies don’t disappear even with best minds on it.


Analogy with financial interest:

Duplicated Code represents Technical Debt. Cost of repayment of that debt equals to cost of human labor that is required to deduplicate the code. That debt has interest in form of decreased productivity of developers.

That interest is many times higher than financial interest for business loans, which is why it’s financially prudent for a company to borrow money or significantly increase R&D budget to repay technical debt. Since technical debt has similar properties as financial debt it is also possible to build Technical Debt Bubble. That bubble will appear by itself when company never allows developers to repay debts after deadline is met.


In Smalltalk Language, I think duplicated code is more apparent than in other languages (and certainly easier to refactor away). In Java, though, I find that duplicated code isn't always obvious and sometimes impossible to refactor out.

For example, suppose you have the following code in one method:

if (this.o.isSomeCondition()) { doSomething1(); }

and in another method you have:

if (this.o.isSomeCondition()) { doSomething2(); }

Does the if test itself constitute duplicated code? If so, it could be refactored away through some form of Double Dispatch. If I were doing Big Design Up Front, I might decide that a Double Dispatch mechanism might be useful some time in the future, but since You Arent Gonna Need It, I don't do that now. Does Refactor Mercilessly dictate that it is, in fact, duplicated code and should be immediately replaced?


Balance the smell against the cost of reducing it in your environment. Each environment (language, linkers, etc) has its own threshold for how much duplication you can remove without it being too painful or too obfuscating. These language features lend themselves to removing duplication without excessive pain (and you get to choose what's too painful):

Objects

Dynamic Typing

Meta-classes

Reflection

Macros

Interpretation

Closures / Lambda expressions

Inheritance

Templates / Parameterized Types

For example, here's some duplicated (pseudo-) code:

def findAManager: for each employee in employees: if employee.isManager: return employee return nil

def findASupervisor: for each employee in employees: if employee.isSupervisor: return employee return nil

def findAWorkerBee: for each employee in employees: if employee.isWorkerBee: return employee return nil

Here the loop is duplicated, but removing the loop will obfuscate the code without making it any clearer what you're doing. If, however, you have something like closures or lambda expressions, you can abstract the loop out:

def findFirstThat (condition): for each employee in employees: if condition (employee): return employee return nil

def findAManager: return findFirstThat (lambda worker: worker.isManager)

def findASupervisor: return findFirstThat (lambda worker: worker.isSupervisor)

def findAWorkerBee: return findFirstThat (lambda worker: worker.isWorkerBee)

There are other ways to abstract out the loop, some easier, and some harder. You just have to balance how hard the abstraction is against how painful the duplicated code is. Without lambda expressions/closures/continuations/whatever, the little bit of duplication in this little example isn't painful enough to do something about.

So: Balance smell against pain.


Is it sheer coincidence that the pseudocode above is almost executable Python Language code? :) Burkhard Kloss

Ah. That's what all the lambdas are doing. When I write pseudocode, I assume the language lets me say what I want. e.g.:

def findAManager: return findFirstThat (.isManager)


As a (kind of) counter-example, you can do something similar in C++:

struct Find''''''Supervisor { bool operator() (const Employee &e) { return e.isSupervisor; } } f;

Employees::iterator i = std::find_first_of(employees.begin(), employees.end(), f); if (i != employees.end()) { Employee e = *i; }

Repeat as necessary for the other types of worker.

My point being that, because C++ doesn't have real closures, you'll have to find some other way to refactor the expression (or leave it alone).

Just reinforcing the point: Balance smell against pain. You might consider this to be more painful than necessary.


Check out Once And Only Once. Also check Pattern Overdose and Pattern Abuse. Your first example on this page has this drift. --Bernd Goetz



Failure to factor out duplicated code is one way to get the dreaded Monster Subroutine.

I absolutely agree. But, when is it time to factor out duplicated code? Particularly because You Arent Gonna Need It. Clearly these forces need to be balanced, but how? Extreme Programming Masters can do it through intuition and experience, but I am but a grasshopper.

[My (outsider's) understanding is that You Arent Gonna Need It doesn't apply to refactorings -- If a particular refactoring brings you closer to The Simplest Code, then you need it. -- An Aspirant]

I think that the modification I describe below should set off the alarm - the code is absolutely identical. If there is a chance that the code will at some time differ for the different invocations of the function being replaced, then duplicating the code may make sense.

Besides the scenario described in Monster Subroutine, another arises from code like:

modify_it(&X); ... modify_it(&Y); ... more code with modify_it() calls

Then a change requires that modify_it()'s functionality be replaced by:

prep_modify_it(&X, some_other_variable); new_modify_it(&X); for ( I = 23; I < 999; ++I) { some_complicated_stuff(&X, I, third_variable + I); }

and then instead of factoring this into a function, it is edited in place for every modify_it() call, leading to code bloat and difficult maintenance. -- Robert Field


Consider instead...

modify_it(&X, some_other_variable, third_variable); ... modify_it(&Y, some_other_variable, third_variable); ... more code with modify_it() calls ''[And change these too, in the obvious way.]''

void modify_it(typeX *pX, type2 some_other_variable, type3 third_variable) { prep_modify_it(pX, some_other_variable); new_modify_it(pX); for ( I = 23; I < 999; ++I) { some_complicated_stuff(pX, I, third_variable + I); } }

Precisely. It is the failure to do something like this refactoring that gives rise to the monsters. -- Robert Field


If we could replace the original with:

if (isSomeOtherCondition()) doSomething1();

so that the expression is at least factored out:

bool isSomeOtherCondition() { return this.o.isSomeCondition(); }

then the repetition is less painful. (I think it helps to omit the unnecessary braces, too.) See also the Law Of Demeter for why having hardwiring long dotted-paths like this.o.isSomeCondition() is intrinsically dubious.


Beg to differ slightly... this.o.isSomeCondition() simply calls a method on an instance variable, although I would likely use an accessor instead of going directly to the var. Perhaps the answer is related to "why can't 'o' decide to doSomething1() or doSomething2() itself?"


Referring back to:

def findAManager: for each employee in employees: if employee.isManager: return employee return nil

def findASupervisor: for each employee in employees: if employee.isSupervisor: return employee return nil

def findAWorkerBee: for each employee in employees: if employee.isWorkerBee: return employee return nil

This could also be written as

def findType(type) for each employee in employees: if employee.isType(type) return employee return nil

It requires a different way of managing types. I personally prefer this way because you don't have a method per way, plus it allows for iteration over a series of types in a list. For those who want the isType() methods, have them call the generic one.


A Relational Weenie version:

getEmployees(criteria) { return(query(stdConn, "select * from employees where %1", criteria)); }

Note that this example is not limited to just Employee Types. However, the downside is that some may consider it a security risk because it is more open-ended (Sql Strings And Security). Then again, so is the original because one can simply get a list of all employees by calling it 3 times.


An interesting paper on redundant code that shows some statistically significant positive correlation between seemingly harmless duplicated code and real errors is at www.stanford.edu . "We expect that redundancies, even when harmless, strongly correlate with hard errors. Our relatively uncontroversial hypothesis is that confused or incompetent programmers tend to make mistakes." --Steven Newton


For automated support finding duplicated code, see Sim Scan, Dup Tective and Same Tool (or Category Duplication Finding Tool)


The Eclipse Ide will have in its release 3 a simple(?) refactoring tool for creating methods out of Duplicated Code. -- Andreas Schweikardt


See original on c2.com