| 56 | |
| 57 | {{{#!comment |
| 58 | == Parameters for decomposition == |
| 59 | |
| 60 | * '''''Decomposition:''''' It shows whether the user wants a possible decomposition |
| 61 | of the modules or not. Only available with module clustering. |
| 62 | * '''''Library limit:''''' The minimum number of function calls for library modules. If a module is called by at least this many other modules, it is considered a library modules. |
| 63 | * '''''Headers:''''' The format of header files. It is a string which is matched to the end of the file names. |
| 64 | |
| 65 | == Parameters for clustering == |
| 66 | |
| 67 | * '''''Algorithm:''''' The used clustering algorithm. |
| 68 | * '''''Entities:''''' The entities of the clustering. Modules and functions available for both algorithms. |
| 69 | * '''''Show Goodness:''''' Yes/No question. If enabled, the tool shows the goodness values for each of the clusterings. |
| 70 | * '''''Only best:''''' Yes/No question. If enabled, the tool shows the best clustering result only. |
| 71 | * '''''Store results:''''' Yes/No question. If enabled, the tool stores the results in 3 different format describes bellow. |
| 72 | |
| 73 | == Output formats == |
| 74 | |
| 75 | As mentioned above, the tool can store clustering results in 3 different formats: |
| 76 | * '''''Dets table:''''' It is used by the tool itself. |
| 77 | * '''''Scriptable file:''''' It is format which can easily be used by other programs. It is a list of pairs, every pair contains a keyword and a result. ([{clusterings, [[cluster1],[cluster2],...,[clusterN]]},{goodnesses,[goodness1,...,goodnessN]},...]) |
| 78 | * '''''Readable file:''''' It creates a report "readable to the human eye". It shows the resulting clusterings and the decomposition offered by the tool. |
| 79 | |
| 80 | == Important notes about the module == |
| 81 | |
| 82 | There are some things which are important to know, when using the clustering module of the tool. |
| 83 | |
| 84 | === Distance functions for agglomerative algorithm === |
| 85 | |
| 86 | Currently, there are two distance functions available for the agglomerative clustering module. (Weight and Call Sum) |
| 87 | The weight distance function recognizes two entities similar, if they use the same functions. It uses the idea, that the modules' behavior is largely dependent on the functions they call. It means that if two modules call the same functions, they must be similar in behavior. |
| 88 | The call sum distance function considers two modules similar if they call each others' functions. The idea here is that these modules must work similarly, because use each other more. |
| 89 | It is very important to choose the distance function wisely, because the two distance function can generate very different results, so the goal of the clustering must be considered before running the algorithm. The call sum distance function can be better, if the user wants to discover connections between the modules, and unite modules, while weight distance function can show, which modules have the same behavior. |
| 90 | |
| 91 | === Choosing the parameters of the genetic algorithm === |
| 92 | |
| 93 | As mentioned above, the genetic algorithm starts from a randomly generated population. This makes the algorithm very non-deterministic. This makes choosing the parameters very important. |
| 94 | We don't have proven method for choosing the parameters right, but we have some results which can make this decision easier. We examined the effect of population size and iteration number on the precision and run time of the algorithm. |
| 95 | The results are the following: |
| 96 | |
| 97 | || Iterations / Population size || 10 || 20 || 30 || 40 || 50 || |
| 98 | || 10 || {,} || {,} || {,} || {,} || {,} || |
| 99 | || 20 || {,} || {,} || {,} || {,} || {,} || |
| 100 | || 30 || {,} || {,} || {,} || {,} || {,} || |
| 101 | || 40 || {,} || {,} || {,} || {,} || {,} || |
| 102 | || 50 || {,} || {,} || {,} || {,} || {,} || |
| 103 | |
| 104 | Every cell shows the results' average weighted fitness and the run time of the algorithms, with parameters {iterations,X}, {population_size,Y} for cell {X,Y}. The fitness of the best clustering of this database is 2.404, so this is the maximum fitness achievable. The algorithm is run 500 times with every pair of parameters on this database. The run time is shown in seconds. |
| 105 | }}} |