Context Navigation

Changes between Version 2 and Version 3 of SuffixTreeBasedDuplicateCodeAnalysis

Timestamp:: May 26, 2012, 3:11:55 PM (13 years ago)
Author:: manualwiki
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

SuffixTreeBasedDuplicateCodeAnalysis

-                      v2
+                      v3
 = Duplicate code analysis =
+In large program systems often occure duplicate codes, which is a computer
+programming term for a sequence of source code that occurs more than once. There
+are two ways in which two code sequences can be duplicates of each other:
+syntactically and functionally. This new feature can detect the syntactically
+similar duplicates, that only differ in atoms, constants (integer, float and
+string) and the name of variables. A lot of identical or very similar code
+fragments can result the application of copy and paste, for example.
 == Parameters ==
+||Parameter||Description||Default value||Example||
+||files||files in which the search is carried out||all files from the database||{files,["/usr/local/lib/erlang/lib/mnesia-4.5/src/mnesia_log.erl","/usr/local/lib/erlang/lib/mnesia-4.5/src/mnesia_lib.erl"]}||
+||minlen||minimal length of a clone(in tokens)||10||{minlen,50}||
+||minnum||minimal number of clones in one clone group||2||{minnum,5}||
+||overlap||scale of the overlap||0||{overlap,1}||
+Parameters can be defined by a proplist. The possible properties are summed up
+in the table below. All the properties are optional.
+* {files, Files}
+  Files = [ Module::atom(), Filepath::string() | File::string() | RegExp::string() ]
+* {minlen, integer()}
+* {minnum, integer()}
+* {overlap, integer()}
+* {output, Filename::string()}
+* {name, atom()}
+||=Parameter=||=Description=||=Type=||=Default value=||=Example=||
+||files||files in which the search is carried out||[Module::atom() [[BR]] |Filepath::string() [[BR]] |RegExp::string() [[BR]] |File::string()]||all files from the database||{files,["/usr/local/lib/erlang/lib/mnesia-4.5/src/mnesia_log.erl", module]}||
+||minlen||minimal length of a clone(length is in tokens)||integer()||10||{minlen,50}||
+||minnum||minimal number of clones in one clone group||integer()||2||{minnum,5}||
+||overlap||maximum length that duplicates can overlap each other (length is in tokens)||integer()||0||{overlap,1}||
+||output||name of the file in which to save the result of the analysis||string()||-||{output,"result.txt"}||
+||name||name of the result in the table||atom()||the string "temp" concatenated with the timestamp||{name,referl}||
+== Interfaces ==
+=== Web interface ===
+See [wiki:WebInterface/CodeDuplicates Duplicate code analysis] on the web interface.
+=== Console interface ===
+We currently have two interface functions in the ri module: search_duplicates/0
+and search_duplicates/1. The first uses the default values of the parameters.
+The second takes a proplist as parameter described above. Both interface
+function provide information about the progress of the process. The result is
+the list of clone groups. Every clones are defined by the path of the file and
+information about the start and end positions (line and column number).
+An example can be found at the bottom of the page.
 == Examples ==
 {{{#!erlang
+ri:search_duplicates([
+        {files, ["/home/csibe/Downloads/examples/one.erl",
+                         "^(/home)[0-9a-zA-Z/_]+(/src)$"]}]).
+ri:search_duplicates().
 }}}
 {{{#!erlang
 ri:search_duplicates([
         {files, ["/home/csibe/Downloads/examples/one.erl",
                          "^(/home)[0-9a-zA-Z/_]+(/src)$"]},
+        {files, ["/home/user/dups/dup1.erl",
+                 "/home/[0-9a-zA-Z/_.\-]+/src"]},
         {minlen, 50},
+        {overlap, 1}]).
+        {minnum, 3},
+        {overlap, 1},
+        {output, "result.txt"},
+        {name, filtered}]).
 }}}
+Small example of the return value:
+{{{#!erlang
+ri:search_duplicates([{files,[dup1, dup2]}]).
+Initial clone detection started.
+Initial clone detection finished.
+Trimming clones started.
+Trimming clones finished.
+Filter clones finished.
+Calculating positions started.
+Calculating positions finished.
+[[[{filepath,"/home/user/dups/dup1.erl"},
+   {startpos,{15,1}},
+   {endpos,{17,24}}],
+  [{filepath,"/home/user/dups/dup2.erl"},
+   {startpos,{5,1}},
+   {endpos,{7,24}}]],
+ [[{filepath,"/home/user/dups/dup1.erl"},
+   {startpos,{8,1}},
+   {endpos,{14,11}}],
+  [{filepath,"/home/user/dups/dup2.erl"},
+   {startpos,{12,1}},
+   {endpos,{18,11}}]]]
+}}}