Copied and pasted code is usually badBut it can be hard to find, especially in a large project. So we wrote a utility - CPD - to find it for us. First we wrote it using a variant of Michael Wise's Greedy String Tiling algorithm (our variant is described here ). Then it was completely rewritten by Brian Ewins using the Burrows-Wheeler transform - or, at least, the first part of it. Here's a screenshot of CPD after running on the JDK java.lang package. Note that CPD works with Java, C, C++, and PHP code. If you have Java Web Start , you can run CPD by clicking here . Here are the duplicates CPD found in the JDK 1.4 source code. Here
are the duplicates CPD found in the APACHE_2_0_BRANCH branch of Apache
(just the There's also a JavaSpaces version available for splitting the CPD effort across a farm of machines. I usually post news on that here and the releases are here Andy Glover wrote an Ant task for CPD; here's how to use it: <target name="cpd"> <taskdef name="cpd" classname="net.sourceforge.pmd.cpd.CPDTask" /> <cpd minimumTokenCount="100" outputFile="/home/tom/cpd.txt" verbose="true"> <fileset dir="/home/tom/tmp/ant"> <include name="**/*.java"/> </fileset> </cpd> </target> Suggestions? Comments? Post them here . Thanks! |