Structural Abstraction

EVANS, William S., FRASER, Christopher W. and MA, Fei, 2009. Clone detection via structural abstraction. Software Quality Journal. Online. 1 December 2009. Vol. 17, no. 4, p. 309–330. [Accessed 10 August 2023]. DOI 10.1007/s11219-009-9074-y. This paper describes the design, implementation, and application of a new algorithm to detect cloned code. It operates on the abstract syntax trees formed by many compilers as an intermediate representation. It extends prior work by identifying clones even when arbitrary subtrees have been changed. These subtrees may represent structural rather than simply lexical code differences. In several hundred thousand lines of Java and C# code, 20–50% of the clones that we find involve these structural changes, which are not accounted for by previous methods. Our method also identifies cloning in declarations, so it is somewhat more general than conventional procedural abstraction.