= Duplicate code analysis =

In large program systems often occure duplicate codes, which is a computer 
programming term for a sequence of source code that occurs more than once. There 
are two ways in which two code sequences can be duplicates of each other: 
syntactically and functionally. This new feature can detect the syntactically 
similar duplicates, that only differ in atoms, constants (integer, float and 
string) and the name of variables. A lot of identical or very similar code 
fragments can result the application of copy and paste, for example.

== Parameters ==

Parameters can be defined by a proplist. The possible properties are summed up 
in the table below. All the properties are optional.

||=Property=||=Description=||=Type=||=Default value=||=Example=||
||files||files in which the search is carried out||[Module::atom() [[BR]] |Filepath::string() [[BR]] |RegExp::string() [[BR]] |File::string()]||all files from the database||{files,["/usr/local/lib/erlang/lib/mnesia-4.5/src/mnesia_log.erl", module]}||
||minlen||minimal length of a clone(length is in tokens)||integer()||10||{minlen,50}||
||minnum||minimal number of clones in one clone group||integer()||2||{minnum,5}||
||overlap||maximum length that duplicates can overlap each other (length is in tokens)||integer()||0||{overlap,1}||
||output||name of the file in which to save the result of the analysis||string()||-||{output,"result.txt"}||
||name||name of the result in the table||atom()||the string "temp" concatenated with the timestamp||{name,referl}||

== Interfaces ==

=== Web interface ===
See [wiki:WebInterface/CodeDuplicates Duplicate code analysis] on the web interface.

=== Console interface ===

We currently have two interface functions in the ri module: search_duplicates/0 
and search_duplicates/1. The first uses the default values of the parameters. 
The second takes a proplist as parameter described above. Both interface 
function provide information about the progress of the process. The result is 
the list of clone groups. Every clones are defined by the path of the file and 
information about the start and end positions (line and column number).
An example can be found at the bottom of the page.

== Examples ==

{{{#!erlang
ri:search_duplicates().
}}}

{{{#!erlang
ri:search_duplicates([
	{files, ["/home/user/dups/dup1.erl",
	         "/home/[0-9a-zA-Z/_.\-]+/src"]},
	{minlen, 50},
	{minnum, 3},
	{overlap, 1},
	{output, "result.txt"},
	{name, filtered}]).
}}}

Small example of the return value:

{{{#!erlang
ri:search_duplicates([{files,[dup1, dup2]}]).

Initial clone detection started.
Initial clone detection finished.
Trimming clones started.
Trimming clones finished.
Filter clones finished.
Calculating positions started.
Calculating positions finished.
[[[{filepath,"/home/user/dups/dup1.erl"},
   {startpos,{15,1}},
   {endpos,{17,24}}],
  [{filepath,"/home/user/dups/dup2.erl"},
   {startpos,{5,1}},
   {endpos,{7,24}}]],
 [[{filepath,"/home/user/dups/dup1.erl"},
   {startpos,{8,1}},
   {endpos,{14,11}}],
  [{filepath,"/home/user/dups/dup2.erl"},
   {startpos,{12,1}},
   {endpos,{18,11}}]]]
}}}