= Querying syntactic and semantic information = A semantic query language was designed to query syntactic and semantic information about Erlang programs. The language concepts are defined according to the semantic units and relationships of the Erlang language, e.g. functions and function calls, records and their usage, etc. The main elements of the language are the '''entities''': module, function, variable, etc. Each entity has a set of selectors and properties. A '''selector''' selects a set of entities that meet the given requirements. A '''property''' describes some properties of an entity type. It is also possible to filter entities based on the properties. A '''filter''' is a boolean expression to select a subset of entities. We can build filters by using properties with boolean values, valid Erlang comparisons, logical operators or embedded queries. == Formal syntax of the language == {{{ semantic_query ::= initial_selection ['.' query_sequence] query_sequence ::= query ['.' query_sequence] query ::= selection | iteration | closure | property_query initial_selection ::= initial_selector ['[' filter ']'] selection ::= selector ['[' filter ']'] iteration ::= '{' query_sequence '}' int ['[' filter ']'] closure ::= '(' query_sequence ')' int ['[' filter ']'] | '(' query_sequence ')+' ['[' filter ']'] property_query ::= property ['[' filter ']'] }}} A semantic query is a sequence of queries starting with an initial selector and an optional filter. Queries are separated with dots. A query is a * selection (calculates the relationship with other entities based on selectors) * iteration (iterates a query n times) * closure (calculates the transitive closure of a query sequence) * property query (selects a property of an entity) == Language elements == === Entities === Entities correspond to the semantic units of Erlang. The result of a query written in the language is a set of entities. Each element of a set belongs to the same type. We have the following entity types: file, function, variable, macro, record, record field expression. Each entity type has a set of selectors and properties defined for them. For details, see: [wiki:SemanticQuery#Detailsofthelanguage Details of the language] section. === Selectors === Selectors are binary relations between entities. The entities belong to one of the seven entity types. A selector selects a set of entities that meet given requirements for each entity. ''Example:'' You can select the functions defined in a given module. In that case the selection is a relation between modules and functions. {{{ @mod.functions }}} ==== Initial selectors ==== Initial selectors get the current file and position as their parameters and return a set of entities as result. The entities of the result belong to the same type, but the type can not always be determined in advance, it depends on the parameters. Almost all of them begin with the character {{{@}}} to indicate that they depend on a position. For example, the initial selector {{{@variable}}} will look for a variable at the given position. If no variable can be found the result will be empty. Besides the position based initial selectors there is another initial selector: {{{mods}}}. This selector returns all of the modules that are loaded into the semantic program graph. === Properties === Properties are functions that give the value of the property for an entity. The main purpose of properties is to filter sets of entities using them, but their values can be queried too. To query the value of a property you have to use the name of the property at the end of a semantic query. ''Example:'' To query the value of the property {{{exported}}} for the functions of the given module: {{{ @module.functions.exported }}} ==== Statistics ==== For properties with numeric values statistics are also available. Using these for the results of metric queries can give more information than a simple list of values. ''Example:'' To query the average length of the functions of the given module: {{{ @file.functions.line_of_code:average }}} === Filters === A filter is a boolean expression to select subsets of entities. After applying a filter, the result contains the elements of the original set where this boolean expression is true. Building filters is possible using atoms, strings, integers, properties and embedded queries. The use of strings and integers is unambiguous, but the names of properties are atoms, so it is checked for each atom if they are properties or not. Atoms, strings, integers and properties can be used in comparisons. The language uses {{{/=, ==, >=, =<, <}}} and {{{>}}}. The results of comparisons are the same as in Erlang. The resulting expressions can be combined by {{{and}}}, {{{or}}}, and {{{not}}} operators, and parentheses can be used, too. The operator precedence for the filters is as follows: |||| '''Operator precedence (decreasing)''' || ||{{{not}}} ||unary || ||{{{/=, ==, >=, =<, <, >, =:=, =/=}}} ||left associative || ||{{{and}}} ||left associative || ||{{{or}}} ||left associative || ''Example:'' you may be interested in all the exported functions of a given module, or the functions with {{{0}}} arity, or maybe a combination of these: the exported functions with {{{0}}} arity. In the example exported and arity are both properties of functions and by using them it is possible to build a filter to select the required subset of functions. {{{ @module.functions[arity==0 and exported] }}} ==== Embedded queries ==== Embedded queries can be used to query information about entities that is otherwise unavailable, that is it can not be expressed by the help of properties. For example, we may need the functions with variables named {{{File}}}. This information can not be expressed with the help of properties. Without embedded queries it is only possible to query the variables named {{{File}}} and query the functions containing these variables after that, with the following query: {{{ "mods.functions.variables[name=="File"].function_definition }}} Embedded queries make it possible to use these kind of queries effectively, without the need to continue with the query directly. The continuation of the query is in the filter, used like a property with a boolean value. The value is considered true if the result of the query is not empty. For the previous example using the following query will give the desired results. {{{ @mods.functions[.variables[name=="File"]]"}}} }}} === Iteration === Iteration in the language means the repeated application of a query sequence. The queries are relations and a sequence of queries is a composition of these queries. Using iteration is possible if the domain and codomain of the query sequence is the same. The application is repeated exactly {{{int}}} times. The result shown in this case is not only the result of the iteration but the partial results also, in the form of chains. ''Example:'' {{{ @function.{calls}3 }}} The result is the same set of entities as of {{{@function.calls.calls.calls}}}. The result shown in the first case will give more information: it gives the call chains with the maximum length of 3 starting from a given function. === Transitive closure === Transitive closure in the language means the closure of a query sequence. The query sequence here is the same as in iteration, a binary relation with the same domain and codomain. ''Example:'' {{{ @function.(calls)+ }}} The result shown after this semantic query is the list of all possible call chains starting from a given function. == Details of the language == In this subsection, we list the names of initial selectors, selectors and properties and their possible abbreviations and synonyms. |||| '''Initial selectors''' || ||''Name'' ||''Synonyms'' || ||{{{files}}} || {{{-}}} || ||{{{@file}}} || {{{-}}} || ||{{{modules}}} || {{{mods}}} || ||{{{@module}}} || {{{@mod}}} || ||{{{@function}}} || {{{@fun}}} || ||{{{@definition}}} || {{{@def}}} || ||{{{@expression}}} || {{{@expr}}} || ||{{{@variable}}} || {{{@var}}} || ||{{{@record}}} || {{{@rec}}} || ||{{{@recfield}}} || {{{@field}}} || ||{{{@macro}}} || {{{-}}} || \\ |||||||| [wiki:SemanticQuery/FileEntity File entity] || |||| '''Selectors''' |||| '''Properties''' || ||''Name'' ||''Synonyms'' ||''Name'' ||''Synonyms'' || ||{{{function}}} || {{{functions, fun, funs}}} || {{{module}}} || {{{is_module, mod, is_mod}}} || ||{{{record}}} || {{{records, rec, recs}}} || {{{header}}} || {{{is_header}}} || ||{{{macro}}} || {{{macros}}} || {{{name}}} || {{{-}}} || ||{{{includes}}} || {{{-}}} || {{{directory}}} || {{{dir}}} || ||{{{included_by}}} || {{{-}}} || {{{path}}} || {{{-}}} || ||{{{imports}}} || {{{-}}} || || || ||{{{exports}}} || {{{-}}} || || || \\ |||||||| [wiki:SemanticQuery/FunctionEntity Function entity] || |||| '''Selectors''' |||| '''Properties''' || ||''Name'' || ''Synonyms'' || ''Name'' || ''Synonyms'' || ||{{{references}}} || {{{refs, ref, reference}}} || {{{exported}}} || {{{-}}} || ||{{{calls}}} || {{{-}}} || {{{name}}} || {{{-}}} || ||{{{called_by}}} || {{{-}}} || {{{arity}}} || {{{-}}} || ||{{{arguments}}} || {{{args}}} || {{{bif}}} || {{{-}}} || ||{{{body}}} || {{{-}}} || {{{pure}}} || {{{-}}} || ||{{{expressions}}} || {{{exprs, expr, expression}}} || {{{defined}}} || {{{-}}} || ||{{{variables}}} || {{{vars, var, variable}}} || {{{module}}} || {{{mod}}} ||{{{file}}} || {{{-}}} || {{{dirty}}} || {{{-}}} || ||{{{dynamic_calls}}} || {{{dynref, dynrefs}}} || {{{spec}}} || {{{-}}} || ||{{{dynamic_calls}}} || {{{dyncall, dyncalls}}} || || || ||{{{dynamic_called_by}}} || {{{dyncalled_by}}} || || || \\ |||||||| [wiki:SemanticQuery/ExpressionEntity Expression entity] || |||| '''Selectors''' |||| '''Properties''' || ||''Name'' ||''Synonyms'' ||''Name'' ||''Synonyms'' || ||{{{fundef}}} || {{{-}}} || {{{type}}} || {{{-}}} || ||{{{functions}}} || {{{function, fun, funs}}} || {{{value}}} || {{{val}}} || ||{{{variables}}} || {{{vars, var, variable}}} || {{{class}}} || {{{-}}} || ||{{{records}}} || {{{record, rec, recs}}} || {{{last}}} || {{{is_last}}} || ||{{{macro}}} || {{{macros}}} || {{{index}}} || {{{-}}} || ||{{{subexpression}}} || {{{sub, esub, subexpr}}} || {{{tailcall}}} || {{{is_tailcall}}} || ||{{{parameter}}} || {{{param}}} || {{{has_side_effect}}} || {{{dirty}}} || ||{{{top_expression}}} || {{{top, top_expr}}} || || || ||{{{file}}} || {{{-}}} || || || ||{{{dynamic_ functions}}} || {{{dynfun, dynfuns}}} || || || \\ |||||||| [wiki:SemanticQuery/VariableEntity Variable entity] || |||| '''Selectors''' |||| '''Properties''' || ||''Name'' || ''Synonyms'' || ''Name'' || ''Synonyms'' || ||{{{references}}} || {{{refs, ref, reference}}} || {{{name}}} || {{{-}}} || ||{{{bindings}}} || {{{-}}} || || || ||{{{fundef}}} || {{{-}}} || || || \\ |||||||| [wiki:SemanticQuery/RecordEntity Record entity] || |||| '''Selectors''' |||| '''Properties''' || ||''Name'' ||''Synonyms'' ||''Name'' ||''Synonyms'' || ||{{{references}}} || {{{refs, ref, reference}}} || {{{name}}} || {{{-}}} || ||{{{fields}}} || {{{-}}} || || || ||{{{file}}} || {{{-}}} || || || \\ |||||||| [wiki:SemanticQuery/RecordFieldEntity Record field entity] || |||| '''Selectors''' |||| '''Properties''' || ||''Name'' ||''Synonyms'' ||''Name'' ||''Synonyms'' || ||{{{references}}} || {{{refs, ref, reference}}} || {{{name}}} || {{{-}}} || ||{{{record}}} || {{{rec}}} || || || ||{{{file}}} || {{{-}}} || || || \\ |||||||| [wiki:SemanticQuery/MacroEntity Macro entity] || |||| '''Selectors''' |||| '''Properties''' || ||''Name'' ||''Synonyms'' ||''Name'' ||''Synonyms'' || ||{{{references}}} || {{{refs, ref, reference}}} || {{{name}}} || {{{-}}} || ||{{{file}}} || {{{-}}} || {{{arity}}} || {{{-}}} || || || || {{{const}}} || {{{-}}} || \\ |||||||| '''Metrics for files and functions (as properties)''' || ||''Name'' || ''Synonyms'' || ''File'' || ''Function'' || ||[wiki:MetricQuery#module_sum module_sum] || {{{mod_sum}}} || ok || - || ||[wiki:MetricQuery#function_sum function_sum] || {{{fun_sum}}} || - || ok || ||[wiki:MetricQuery#line_of_code line_of_code] || {{{loc}}} || ok || ok || ||[wiki:MetricQuery#char_of_code char_of_code] || {{{choc}}} || ok || ok || ||[wiki:MetricQuery#number_of_fun number_of_fun] || {{{num_of_fun, num_of_functions, number_of_functions}}} || ok || - || ||[wiki:MetricQuery#number_of_macros number_of_macros] || {{{num_of_macros, num_of_macr}}} || ok || - || ||[wiki:MetricQuery#number_of_records number_of_records] || {{{num_of_records, num_of_rec}}} || ok || - || ||[wiki:MetricQuery#included_files included_files] || {{{inc_files}}} || ok || - || ||[wiki:MetricQuery#imported_modules imported_modules] || {{{imp_modules, imported_mod, imp_mod}}} || ok || - || ||[wiki:MetricQuery#number_of_funpath number_of_funpath] || {{{number_of_funpaths, num_of_funpath, num_of_funpaths}}} || ok || - || ||[wiki:MetricQuery#function_calls_in function_calls_in] || {{{fun_calls_in}}} || ok || - || ||[wiki:MetricQuery#function_calls_out function_calls_out] || {{{fun_calls_out}}} || ok || - || ||[wiki:MetricQuery#calls_for_function calls_for_function] || {{{calls_for_fun, call_for_function, call_for_fun}}} || - || ok || ||[wiki:MetricQuery#calls_from_function calls_from_function] || {{{calls_from_fun, call_from_function, call_from_fun}}} || - || ok || ||[wiki:MetricQuery#cohesion cohesion] || {{{coh}}} || ok || - || ||[wiki:MetricQuery#otp_used otp_used] || {{{otp}}} || ok || - || ||{{{max_application_depth}}} || {{{max_app_depth}}} || ok || ok || ||[wiki:MetricQuery#max_depth_of_calling max_depth_of_calling] || {{{max_depth_calling, max_depth_of_call, max_depth_call}}} || ok || ok || ||{{{min_depth_of_calling}}} || {{{min_depth_calling, min_depth_of_call, min_depth_call}}} || ok || - || ||[wiki:MetricQuery#max_depth_of_cases max_depth_of_cases] || {{{max_depth_cases}}} || ok || ok || ||[wiki:MetricQuery#max_depth_of_structs max_depth_of_structs] || {{{-}}} || ok || ok || ||[wiki:MetricQuery#number_of_funclauses number_of_funclauses] || {{{num_of_funclauses, number_of_funclaus, num_of_funclaus}}} || ok || ok || ||[wiki:MetricQuery#branches_of_recursion branches_of_recursion] || {{{branches_of_rec, branch_of_recursion, branch_of_rec}}} || ok || ok || ||[wiki:MetricQuery#mcCabe mcCabe] || {{{mccabe}}} || ok || ok || ||[wiki:MetricQuery#number_of_funexpr number_of_funexpr] || {{{num_of_funexpr}}} || ok || ok || ||[wiki:MetricQuery#number_of_messpass number_of_messpass] || {{{num_of_messpass}}} || ok || ok || ||[wiki:MetricQuery#fun_return_points fun_return_points] || {{{fun_return_point, function_return_points, function_return_point}}} || ok || ok || ||[wiki:MetricQuery#max_length_of_line max_length_of_line] || {{{-}}} || ok || ok || ||[wiki:MetricQuery#average_length_of_line average_length_of_line] || {{{avg_length_of_line}}} || ok || ok || ||[wiki:MetricQuery#no_space_after_comma no_space_after_comma] || {{{-}}} || ok || ok || ||[wiki:MetricQuery#is_tail_recursive is_tail_recursive] || {{{-}}} || - || ok || \\ |||| '''Statistics''' || || ''Name'' || ''Synonyms'' || ||{{{minimum}}} || {{{min}}} || ||{{{maximum}}} || {{{max}}} || ||{{{sum}}} || {{{-}}} || ||{{{mean}}} || {{{average, avg}}} || ||{{{median}}} || {{{med}}} || ||{{{variance}}} || {{{var}}} || ||{{{standard_deviation}}} || {{{sd}}} || == Examples == === Basic queries === As you can read in the introduction, in this language we build difficult queries from lot of very simple queries. Here are some examples for simple ones: {{{ @fun.refs }}} Returns a list of expressions which call the pointed function. {{{ @file.funs.calls }}} Returns all function calls from current module group by the module's own functions. {{{ @file.funs[arity==3] }}} Returns all functions which have 3 arguments. === Advanced queries === Let's see some useful queries: {{{ @file.funs.vars[name=="Expl"] }}} Returns all functions which have a variable named "Expl". It useful when we want to know which functions use variables with same name. {{{ mods[name=="io"].funs[name==format].refs }}} Returns all io:format calls, this query is very useful when you have finished your software, and you want to find all debug messages. {{{ @expr.origin }}} For example we stand in a variable, and run this query, we get information about the variable gets its value from where. This functionality uses [wiki:DataFlow data-flow analysis]. {{{ @fun.refs.origin }}} Returns information about the function gets its return value from where and how its calculated. === Checking coding convensions === In !RefactorErl, [wiki:MetricQuery metrics] can be applied to modules or to functions. Modules are equivalent to {{{file}}} entities in the semantic query language, and functions are equivalent to {{{function}}} entities. We can say that a metric is a kind of property belongs to a {{{file}}} or {{{function}}} entity, so we can simply add the proper metrics to the properties of entities. Usually we have some coding conventions applied to our modules or functions. With our extended semantic query language we can check these conventions, and filter improper modules or functions. Hereinafter we present some design rules and some metrics to check these rules. ''' Rule1. A module should not contain more then 400 lines.''' When we would like to filter modules containing more than 400 effective lines of code, we have to load our modules to !RefactorErl system, and enter the following query: {{{ modules[line_of_code > 400] }}} In the result we will find our too long modules. ''' Rule2. A function should not contain more then 15 to 20 lines.''' When we would like to check, which functions do not fulfil this convention in our modules loaded into the !RefactorErl database, we use the following query: {{{ modules.funs[line_of_code > 20] }}} ''' Rule3. Use at most two level of nesting, do not write deeply nested code.''' It is achieved by dividing the code into shorter functions. With one of our metrics we can count the nesting level of case expressions, so we can filter functions with more than two maximum depth of cases. In this example, we would like to get the result just from our actual module. {{{ @file.funs[max_depth_of_cases > 2] }}} If we just would like to know, whether all of the functions fulfil this convention or not, we can simply query the maximum nesting level of cases in the whole module. If this value is more than two, there is at least one function containing deeply nested cases. {{{ @file.max_depth_of_cases }}} At least, let's filter modules containing functions with too deeply nested cases. {{{ mods[max_depth_of_cases > 2] }}} ''' Rule4. Use no more than 80 characters on a line.''' We can filter all of the functions, which contains lines with more than 80 characters with the following query: {{{ mods.funs[max_length_of_line > 80] }}} ''' Rule5. Use space after commas.''' We have a metric which returns with the number of cases when we do not fulfil this convention. When a modul or a function breaks this rule, the result of the metric will be more, then 0. Filter functions containing at least one case when whitespace misses after a comma: {{{ mods.funs[no_space_after_comma > 0] }}} ''' Rule6. Every recursive function should tail recursive.''' Tail recursion means that we have no recursive call (either direct or indirect) in our function, just in the last expression. Filter functions that recursive, but not tail recursive: {{{ mods.funs[is_tail_recursive == non_tail_rec] }}}