Write Code Once
Do not copy code. Do this by writing reusable, generic code and/or calling existing methods instead.
One well-known principle in software engineering states doesn’t repeat yourself, also known as the DRY principle. A very obvious violation of DRY is the application of copy/paste to create duplicates of large portions of source code within the same code base. These duplicate pieces of code, are also known as code clones.
Copying existing code looks like a quick win, however copied code leads to duplicates, and duplicates are a problem.
Motivation
Duplicated Code Is Harder to Analyze
The fundamental problem of duplication is not knowing whether there is another copy of the code that you are analyzing, how many copies exist, and where they are located.
Duplicated Code Is Harder to Modify
If duplicated code contains a bug, the same bug appears multiple times. Therefore, duplicated code is harder to modify; you may need to repeat bug fixes multiple times.
A fix has to be made in a duplicate in the first place! This is why duplication is a typical source of so-called regression bugs: functionality that has worked normally before suddenly stops working (because a duplicate was overlooked). This, in turn, requires knowing that a fix has to be made in a duplicate in the first place! This is why duplication is a typical source of so-called regression bugs: functionality that has worked normally before suddenly stops working (because a duplicate was overlooked).
The same problem holds for regular changes. When code is duplicated, changes may need to be made in multiple places, and having many duplicates makes changing a codebase unpredictable.
Types of Duplication
Type 1: Exact copy, only differences in white space and comments.
Type 2: Same as type 1, but also variable renaming.
Type 3: Same as type 2, but also changing or adding a few statements.
Type 4: Semantically identical, but not necessarily the same syntax.
Type 1 clones are easier to detect and recognize (both by humans and computers).
Some tools available are somewhere between type 2 and 3, that is allowing more than just identifier differences but not allowing completely arbitrary statement changes. Full type 3 detection is often not desirable, as depending on the number of differences allowed, this would result in many false indications.
Type 4 detection requires full parsing.
How to Apply the Guideline
To avoid the problem of duplicated bugs, never reuse code by copying and pasting existing code fragments. Instead, put it in a method if it is not already in one, so that you can call it the second time that you need it.
Example of code duplicated
Consider a system that manages bank accounts. In this system, money transfers between accounts are represented by objects of the Transfer
class (not shown here). The bank offers checking accounts represented by class CheckingAccount
:
A class is needed to represent this new account type. Suppose the existing class is simply copied, renamed, and adapted. This would be the result:
The Extract Method Refactoring Technique
The clone has disappeared, but the following issues remain:
Even though according to the definition, the clone has disappeared, there is still logically repeated in the two classes.
The extracted fragment had to be put in a third class, just because in Java every method needs to be in a class. The class to which the extracted method was added runs the risk of becoming a mixture of unrelated methods. This leads to a large class smell and tight coupling. Having a large class is a smell because it signals that there are multiple unrelated functionalities within the class. This tends to lead to tight coupling when methods need to know implementation details in order to interact with such a large class.
The Extract Superclass Refactoring Technique
The Extract Superclass refactoring technique uses this feature by extracting a fragment of code lines not just to a method, but to a new class that is the superclass of the original class. So, to apply this technique, you create a new Account
class like so:
All functionality is now exactly where it belongs. The part of making a transfer that is the same for all accounts is in the Account
class, while the parts that are specific to certain types of accounts are in their respective classes. All duplication has been removed.
As the comments indicate, the makeTransfer
method in the Account
superclass has two responsibilities. Let makes the new Account class even more maintainable:
Common Objections to Avoiding Code Duplication
Copying from Another Codebase Should Be Allowed
“Copying and pasting code from another codebase is not a problem because it will not create a duplicate in the codebase of the current system.”
Technically, that is correct: it does not create a duplicate in the codebase of the current system. Copying code from another system may seem beneficial if the code solves the exact same problem in the exact same context. However, in any of the following situations you will run into problems:
The other (original) codebase is still maintained: Your copy will not benefit from the improvements made in the original codebase. Therefore, do not copy, but rather import the functionality needed (that is, add the other codebase to your classpath).
The other codebase is no longer maintained and you are working on rebuilding this codebase: In this case, you definitely should not copy the code. Often, rebuilds are caused by maintainability problems or technology renewals. In the case of maintainability issues, you would be defeating the purpose by copying code. You are introducing code that is determined to be (on average) hard to maintain. In the case of technology renewals, you would be introducing limitations of the old technology into the new codebase, such as an inability to use abstractions that are needed for reusing functionality efficiently.
Slight Variations, and Hence Duplication, Are Unavoidable
“Duplication is unavoidable in our case because we need slight variations of common functionality.”
Indeed, systems often contain slight variations of common functionality. For instance, some functionality is slightly different for different operating systems, for other versions (for reasons of backward compatibility), or for different customer groups. However, this does not imply that duplication is unavoidable. You need to find those parts of the code that are shared by all variants and move them to a common superclass. You should strive to model variations in the code in such a way that they are explicit, isolated, and testable.
This Code Will Never Change
“This code will never, ever change, so there is no harm in duplicating it.”
If it is absolutely, completely certain that code will never, ever change, duplication (and every other aspect of maintainability) is not an issue. For a start, you have to be absolutely, completely certain that the code in question also does not contain any bugs that need fixing. Apart from that, the reality is that systems change for many reasons, each of which may eventually lead to changes in parts deemed to never, ever change:
The functional requirements of the system may change because of changing users, changing behavior, or a change in the way the organization does business.
The organization may change in terms of ownership, responsibilities, development approach, development process, or legislative requirements.
Technology may change, typically in the system’s environment, such as the operating system, libraries, frameworks, or interfaces to other applications.
The code itself may change, because of bugs, refactoring efforts, or even cosmetic improvements.
That is why we argue that most of the time the expectation that code never changes is unfounded. So accepting duplication is really nothing more than accepting the risk that someone else will have to deal with it later if it happens.
Duplicates of Entire Files Should Be Allowed as Backups
“We are keeping copies of entire files in our codebase as backups. Every backup is an unavoidable duplicate of all other versions.”
We recommend keeping backups, but not in the way implied by this objection (inside the codebase). Version control systems such as SVN and Git provide a much better backup mechanism. If those are not available, move backup files to a directory next to the root of the codebase, not inside it. Why? Because sooner or later you will lose track of which variant of a file is the right one.
Unit Tests Are Covering Me
“Unit tests will sort out whether something goes wrong with a duplicate.”
This is true only if the duplicates are in the same method, and the unit test of the method covers both. If the duplicates are in other methods, it can be true only if a code analyzer alerts you if duplicates are changing. Otherwise, unit tests would not necessarily signal that something is wrong if only one duplicate has changed. Hence, you cannot rely only on the tests (identifying symptoms) instead of addressing the root cause of the problem (using duplicate code). You should not assume that eventual problems will be fixed later in the development process when you could avoid them altogether right now.
Duplication in String Literals Is Unavoidable and Harmless
“I need long string literals with a lot of duplication in them. Duplication is unavoidable and does not hurt because it is just in literals.”
This is a variant of one of the objections discussed in Write Short Unit of Code (“This unit is impossible to split”). We often see code that contains long SQL queries or XML or HTML documents appearing as string literals in Java code. Sometimes such literals are complete clones, but more often parts of them are repeated. For instance, we have seen SQL queries of more than a hundred lines of code that differed only in the sorting order (order by asc
versus order by desc
). This type of duplication is not harmless even though technically they are not in the Java logic itself. It is also not unavoidable; in fact this type of duplication can be avoided in a straightforward fashion:
Extract to a method that uses string concatenation and parameters to deal with variants.
Use a templating engine to generate HTML output from smaller, nonduplicated fragments that are kept in separate files.
Last updated