I consulted on an intellectual property case where I had access to many copies of suspect code from many computers and backup files. Recognizing that the structure of functions as represented in punctuation was less mutable than any variable name, I extracted and studied only the punctuation.
I reduced whole functions to a single line which had every character synchronized by its place in the flow of control.
I further extracted block structure into an even shorter sequence and sorted my catalog of possible duplicates on this first.
The corpus I was to examine was further reduced to fan-fold printout less than an inch thick. This I examined page by page noting signatures that deserved further attention.
.
I was inspired by the fruitful study of conserved DNA.
I applied the same analysis to the ParcPlace Smalltalk image which I new to be carefully edited do abstract out any duplication. I found one case. The floating point and double precision floating point methods were duplicates with small modifications only.