wiki:SuffixTreeBasedDuplicateCodeAnalysis

Suffix tree based duplicate code analysis

In large program systems often occure duplicate codes, which is a computer programming term for a sequence of source code that occurs more than once. There are two ways in which two code sequences can be duplicates of each other: syntactically and functionally. This new feature can detect the syntactically similar duplicates, that only differ in atoms, constants (integer, float and string) and the name of variables. A lot of identical or very similar code fragments can result the application of copy and paste, for example.

Parameters

Parameters can be defined by a proplist. The possible properties are summed up in the table below. All the properties are optional.

PropertyDescriptionTypeDefault valueExample
filesspecify files to search[Module::atom() [[BR]] |Filepath::string()
|RegExp::string()
|File::string()]
all files from the database{files,["/usr/local/lib/erlang/lib/mnesia-4.5/src/mnesia_log.erl", module]}
minlenminimal length of a clone(length is in tokens)integer()10{minlen,50}
minnumminimal number of clones in one clone groupinteger()2{minnum,5}
overlapmaximum length that duplicates can overlap each other (length is in tokens)integer()0{overlap,1}
outputname of the file in which the results are savedstring()-{output,"result.txt"}
namename of the result in the tableatom()the atom 'temp' concatenated with the timestamp{name,referl}

Interfaces

Web interface

See Duplicate code analysis on the web interface.

Console interface

Analyser functions

  • search_duplicates/0: uses the default values of the properties.
  • search_duplicates/1: takes a proplist as parameter described above.

Both interface function provide information about the progress of the process. The result is the list of clone groups. Every clones are defined by the path of the file and information about the start and end positions (line and column number). Results are saved. An example can be found at the bottom of the page.

Result query functions

FunctionParametersDescriptionExample
stored_dupcode_results/0-queries all saved results (name and information about the parameters of the analysis)ri:stored_dupcode_results()
save_dupcode_result/2Name::atom() - name associated with the result,
Filename::string() - name of the file
saves the given result in the fileri:save_dupcode_result(temp20120527205412,"result.txt")
show_dupcode/1Name::atom() - name associated with the resultlists the result's all clone groupesri:show_dupcode(temp20120527205412)
show_dupcode_group/2Name::atom() - name associated with the result,
GroupNumber::integer() - the required clone group's number
lists the given clone group of the resultri:show_dupcode_group(temp20120527205412, 2)

Examples

Analysis functions

Allowed parameterizations:

ri:search_duplicates().
ri:search_duplicates([
        {files, [dup2,
                 "/home/user/dups/dup1.erl",
                 "/home/[0-9a-zA-Z/_.\-]+/src"]},
        {minlen, 50},
        {minnum, 3},
        {overlap, 1},
        {output, "result.txt"},
        {name, filtered}]).

Return value and the progress report (only if the analysis was not run yet):

(refactorerl@localhost)3> ri:search_duplicates([{files,[dup1,dup2]}]).

Initial clone detection started.
Initial clone detection finished.
Trimming clones started.
Trimming clones finished.
Filter clones finished.
Calculating positions started.
Calculating positions finished.
2 clone groups found.
Result saved to temp20120527234307...
run ri:show_dupcode(temp20120527234307) to see the result.
ok

(refactorerl@localhost)4> ri:search_duplicates([{files,[dup1,dup2]}]).
2 clone groups found.
Result saved to temp20120527234307...
run ri:show_dupcode(temp20120527234307) to see the result.
ok

Result query functions

(refactorerl@localhost)5> ri:stored_dupcode_results().
[[temp20120527234307,
  [{files,["/home/user/dups/dup1.erl",
           "/home/user/dups/dup2.erl"]},
   {minlen,10},
   {minnum,2},
   {overlap,0}]],
 [temp20120527235556,
  [{files,["/usr/local/lib/erlang/lib/mnesia-4.5/src/mnesia_schema.erl"]},
   {minlen,20},
   {minnum,3},
   {overlap,0}]]]
ok
(refactorerl@localhost)17> ri:show_dupcode(temp20120527234307).        
[{1,
  [[{filepath,"/home/user/dups/dup1.erl"},
    {startpos,{8,1}},
    {endpos,{18,11}}],
   [{filepath,"/home/user/dups/dup2.erl"},
    {startpos,{5,1}},
    {endpos,{15,11}}]]},
 {2,
  [[{filepath,"/home/user/dups/dup1.erl"},
    {startpos,{6,5}},
    {endpos,{6,62}}],
   [{filepath,"/home/user/dups/dup2.erl"},
    {startpos,{18,5}},
    {endpos,{18,58}}]]}]
Contains 2 clone groups...
run ri:show_dupcode_group(temp20120527234307, GroupNumber::integer()) to see one of the clone groups.
ok
(refactorerl@localhost)16> ri:show_dupcode_group(temp20120527234307,2).
[[{filepath,"/home/user/dups/dup1.erl"},
  {startpos,{6,5}},
  {endpos,{6,62}}],
 [{filepath,"/home/user/dups/dup2.erl"},
  {startpos,{18,5}},
  {endpos,{18,58}}]]
Contains 2 members.
ok
Last modified 10 years ago Last modified on May 9, 2014, 2:40:08 PM