[[PageOutline]]

= Related pages =

This page describes how semantic queries are constructed.
You may be looking for the following related pages.

* [SemanticQuery/Components Available components]: lists the names of currently available initial selectors, selectors and properties and their possible abbreviations and synonyms.
* [SemanticQuery/Examples Examples]: basic and advanced query examples, and queries for checking coding conventions.


= Querying syntactic and semantic information =

A semantic query language was designed to query syntactic and semantic information about Erlang programs. The language concepts are defined according to the semantic units and relationships of the Erlang language, e.g. functions and function calls, records and their usage, etc. 

The main elements of the language are the '''entities''': module, function, variable, etc. Each entity has a set of selectors and properties. A '''selector''' selects a set of entities that meet the given requirements. A '''property''' describes some properties of an entity type. It is also possible to filter entities based on the properties. A '''filter''' is a boolean expression to select a subset of entities. We can build filters by using properties with boolean values, valid Erlang comparisons, logical operators or embedded queries. For usage examples, see SemanticQuery/Examples.

== Formal syntax of the language ==

{{{
semantic_query    ::= initial_selection ['.' query_sequence]
query_sequence    ::= query ['.' query_sequence]
query             ::= selection | iteration | closure | property_query | set_op_query
initial_selection ::= initial_selector ['[' filter ']']
selection         ::= selector ['[' filter ']']
iteration         ::= '{' query_sequence '}' int ['[' filter ']']
closure           ::= '(' query_sequence ')' int ['[' filter ']'] |
                      '(' query_sequence ')+' ['[' filter ']']
property_query    ::= property ['[' filter ']']
set_op_query      ::= '(' (query_sequence | semantic_query) set_op (query_sequence | semantic_query) ')'
set_op            ::= union | minus | intersect
}}}

A semantic query is a sequence of queries starting with an initial selector and an optional filter. Queries are
separated with dots. 
A query is a
* selection (calculates the relationship with other entities based on selectors)
* iteration (iterates a query n times)
* closure (calculates the transitive closure of a query sequence)
* property query (selects a property of an entity)
* set_op query (combines the results of two queries by the given set operator)

== Language elements ==

=== Entities ===
Entities correspond to the semantic units of Erlang. The result of a query written in the language is a set of entities. Each element of a set belongs to the same type. We have the following entity types: file, function, clause, variable, macro, record, record
field expression. Each entity type has a set of selectors and properties defined for them. For details, see SemanticQuery/Components.

=== Selectors ===
Selectors are binary relations between entities. The entities belong to one of the previously mentioned entity types. A selector selects a set of entities that meet given requirements for each entity.

''Example:'' You can select the functions defined in a given module. In that case the selection is a relation between modules and functions.
{{{
@mod.functions
}}}

==== Initial selectors ====
Initial selectors get the current file and position as their parameters and return a set of entities as result. The entities of
the result belong to the same type, but the type can not always be determined in advance, it depends on the parameters. Almost all of them begin with the character {{{@}}} to indicate that they depend on a position.

For example, the initial selector {{{@variable}}} will look for a variable at the given position. If no variable can be found the result will be empty. Besides the position based initial selectors there is another initial selector: {{{mods}}}. This selector returns all of the modules that are loaded into the semantic program graph.

=== Properties ===
Properties are functions that give the value of the property for an entity. The main purpose of properties is to filter sets of entities using them, but their values can be queried too. To query the value of a property you have to use the name of the property at the end of a semantic query.

''Example:'' To query the value of the property {{{exported}}} for the functions of the given module:
{{{
@module.functions.exported
}}}

==== Statistics ====

For properties with numeric values statistics are also available. Using these for the results of metric queries can give more
information than a simple list of values.

''Example:'' To query the average length of the functions of the given module:
{{{
@file.functions.line_of_code:average
}}}

=== Filters ===
A filter is a boolean expression to select subsets of entities.  
After applying a filter, the result contains the elements of the original 
set where this boolean expression is true. Building filters is possible using atoms, 
strings, integers, properties and embedded queries. The use of strings and integers is 
unambiguous, but the names of properties are atoms, so it is checked for each atom if they are properties or not.

Atoms, strings, integers and properties can be used in comparisons. 
The language uses {{{/=, ==, >=, =<, <, >}}} and {{{~}}}. 
The results of comparisons are the same as in Erlang. 
The resulting expressions can be combined by {{{and}}}, {{{or}}}, and {{{not}}} operators, and parentheses can be used, too. 
The {{{~}}} operator is a regular expression matching operator, and it can be used anywhere,
 where other binary comparison operators can be used. The same expressions can be used, which can be used in the {{{re}}} module.

The operator precedence for the filters is as follows:

||||   '''Operator precedence (decreasing)'''   ||
||{{{not}}} ||unary || 
||{{{/=, ==, >=, =<, <, >, =:=, =/=, ~}}} ||left associative || 
||{{{and}}} ||left associative ||
||{{{or}}} ||left associative ||

''Example:'' you may be interested in all the exported functions of a given module, 
or the functions with {{{0}}} arity, or maybe a combination of these: 
the exported functions with {{{0}}} arity. In the example exported and arity are 
both properties of functions and by using them it is possible to build a filter to select the required subset of functions.
{{{
@module.functions[arity==0 and exported]
}}}

''Example:'' you may be interested in all the module, whose name do not start with {{{test}}}.
{{{
mods[name ~ "[^test].*" ]
}}}

==== Variables ====
It is possible to bind property value to a variable for later use. All variable name begins with a capital letter, just like erlang variables.

Usage of variables is only allowed in filters. Variables may only bind to properties. Once bound, a variable can be compared to other properties, variables or literals of the same type using {{{/=, ==, >, <}}} etc. operators.

''Example:'' to obtain all function that has the same name as its containing module use the following query:
{{{
mods[name=A].funs[name==A]
}}}
Note that variables' semantics are different compared to erlang. In the example above, the value of variable {{{A}}} changes from module to module.

More detailed description can be found at [SemanticQuery/Variables Variables].

==== Embedded queries ====
Embedded queries can be used to query information about entities that is 
otherwise unavailable, that is it can not be expressed by the help of properties.

For example, we may need the functions with variables named {{{File}}}. 
This information can not be expressed with the help of properties. 
Without embedded queries it is only possible to query the variables named {{{File}}} 
and query the functions containing these variables after that, with the following query:
{{{
mods.functions.variables[name=="File"].function_definition
}}}
Embedded queries make it possible to use these kind of queries effectively, 
without the need to continue with the query directly. The continuation of the query is in the filter, 
used like a property with a boolean value. The value is considered true if 
the result of the query is not empty. For the previous example using the following query will give the desired results.
{{{
@mods.functions[.variables[name=="File"]]
}}}

===== In, any_in, all_in filters =====
Entities can also be filtered by in conditions in filters. Their operands can be
lists, embedded queries or semantic queries.
{{{
filter ::= (list | semantic_query | query_sequence) (all_in|any_in) query_sequence
filter ::= query_sequence (all_in|any_in) (list | semantic_query)
filter ::= (property | query_sequence) in (list | semantic_query)
filter ::= list in (property | query_sequence)
}}}
Any_in will evaluate its operands for every entity selected before the filter and returns true
if there is any value selected by the first operand that can also be found in the second operand's value list.
[[BR]][[BR]]
''any_in:''
{{{
mods.funs[.exprs.sub.val all_in (mods.funs--.file.funs).exprs.sub.val]
}}}
The query returns functions with subexpression values which can all be found in other files functions.
[[BR]][[BR]]
''any_in:''
{{{
mods.funs[.calls any_in mods[name=m1].funs]
}}}
This query will select every function that calls any function in module "m1".
[[BR]][[BR]]
All_in returns true only if all values of the first operand are also selected by the second operand.
[[BR]][[BR]]
''all_in:''
{{{
mods[.funs.calls all_in mods[name=m1].funs] 
}}}
The query will yield all modules that only call for m1's functions.
[[BR]][[BR]]
''all_in:''
{{{
mods[.funs.name all_in @mod.funs.name] 
}}}
Lists all modules whose functions names can also be found in the given module's functions names.
[[BR]][[BR]]
''in:''
In is mainly for checking if a property value can be found in a list or a semantic query's result.
For properties and lists this operator is symmetrical: "name in |e1, e2, ...|" means the same as "|e1, e2, ...| in name".
{{{
mods.func[name in mods.funs.exprs.esub.macro_value] 
}}}
The query gives back any function whose name has been defined as a macro value.

=== Iteration ===

Iteration in the language means the repeated application of a query sequence. 
The queries are relations and a sequence of queries is a composition of these queries. 
Using iteration is possible if the domain and codomain of the query sequence is the same. 
The application is repeated exactly {{{int}}} times.

The result shown in this case is not only the result of the 
iteration but the partial results also, in the form of chains.

''Example:''
{{{
@function.{calls}3
}}}
The result is the same set of entities as of {{{@function.calls.calls.calls}}}. 
The result shown in the first case will give more information: it gives the call 
chains with the maximum length of 3 starting from a given function.

=== Transitive closure ===

Transitive closure in the language means the closure of a query sequence. 
The query sequence here is the same as in iteration, a
binary relation with the same domain and codomain.

''Example:''
{{{
@function.(calls)+
}}}
The result shown after this semantic query is the list of all possible call 
chains starting from a given function.

=== Set operators ===

Set operators (and their operands) can take the place of query-sequences
or initial selectors. Operands are first evaluated (which can be semantic queries,
 query-sequences or set operators themselves), then the result is calculated based on the given operator.
When the operands are not property-queries their results have to be of the same entity-type
or the operator throws a type-mismatch error. If the operands return properties, but
of different types, they would be converted to strings. Duplicates of the result entities
are filtered. 
When the operator is nested or used as a semantic query, the enclosing parentheses can be ommited.
In the former case the evaluation order will be right to left.

''Example:''
{{{
mods[name in |m1, m2|].funs(.calls union .called_by).vars
}}}
The query above will list all variables defined in functions which call or are
called by any functions in modules named "m1" or "m2".

''Example:''
{{{
mods.funs(.calls minus @mod.funs)(.name union .exprs.macro_value) 
}}}
This query lists all macro values and names for functions that aren't defined in the given
module and are called by any module in the database.

==== Limitations: ====
- When the result of an operand is a closure or iteration, their results will be converted to the normal data structure as if they were reached using selections.
- The operands can not be statistics.

== Specifying grouping of results ==

By default the results of queries are grouped by the second to last selector
(unless the result is a statistic or a list of chains). This behavior can be changed
by passing a groupby value to the semantic query (if the used interface supports the options list).
The option {groupby, n} sets the grouping to the n-th query-element(filters are not counted).

''Example:''
{{{
ri:q("mods.funs.vars", [{groupby, 1}]).
}}}
The results from the above example are grouped by the first selector i.e. modules.

When the groupby option is used for queries with set operators, the interpretation depends on the type of the used set operators.

If the operands of the outermost set operation in the query are semantic queries themselves, the groupby option is passed to the execution of the operands. So in the following query
{{{
ri:q("(mods.funs.vars.name U mods.funs.loc)", [{groupby, 2}]).
}}}
the results are grouped by functions, the second selector of each sub-query.

When the grouping of a set operator's sub-queries do not match(different selector types, or at least one of them is not grouped), the results will be grouped by files; as in the following query:
{{{
ri:q("(mods.funs.vars.name U mods.funs.clause.loc)", [{groupby, 3}]).
}}}
As far as grouping is concerned, other set operators(e.g. a set operator between series of selectors) in a query are regarded as one unit. The numbering of the elements as used by the groupby option are incremented by one after the set operator, as if it was a single selector of the type resulting from any of the operands.

''Example:''
{{{
ri:q("mods(.funs[name ~ \"filter\"].calls U .exports[name ~ \"drop\"]).params.vars", [{groupby, 2}]).
}}}
So the query above is grouped by functions, the type of the results of the set operator.

== Analysing "-type" and "-spec" related informations ==

Erlang is dynamically typed, but it provides notations for types in the Erlang syntax. Although these do not affect the operation of the compiled code, it is generally a good practice to utilize them for type checking and other purposes, like doc generation. Our tool analyzes and stores "-type", "-opaque", "-export_type", "-spec" attributes and the types of record fields. These information can be conveniently accessed/used and comprehended via semantic queries.

=== Returned values for new internal types in sq: ===
As it can be seen in the description of [SemanticQuery/Components Available components], for the practical use of analyzed type informations, new internal types have been introduced for semantic queries. When a query isn't continued with other selectors to reach conventional types, it might not be obvious what the results would be. In case of our text-based and graphical interfaces, the format of the shown result for each of these entities can be found in their corresponding pages: [SemanticQuery/SpecEntity Spec], [SemanticQuery/SpecParamEntity Spec param], [SemanticQuery/TypeEntity Type].

Our scriptable interface, along with the textual representations, also provides the unique entities of the stored type informations, which might be used to differentiate between results even when their formatted values are the same.

{{{
ris:q("mods.specs.params.type.params").
{rich,"a:prp_user/2\n    Key\n     Val\nb:prp_user/2\n    Key\n     Val\nc:prp_user/2\n    Key\n     Val\n",
      [{entity,{'$gn',typexp,151}},
       {entity,{'$gn',typexp,152}},
       {entity,{'$gn',typexp,296}},
       {entity,{'$gn',typexp,297}},
       {entity,{'$gn',typexp,439}},
       {entity,{'$gn',typexp,440}}]}
}}}

=== Detailed review of non-trivial selectors: ===

During the specification we have tried our best to keep things simple in the perspective of usability. This meant that for example we would not differentiate between namedtypes(user-defined, built-in) and type expressions. In some cases this have resulted in rather complex specifications. Under normal circumstances, the different behaviors should be intuitively evident and would not cause any surprise. For the newly introduced entities, information about these selectors can be found on their respective pages: [SemanticQuery/SpecEntity Spec], [SemanticQuery/SpecParamEntity Spec param], [SemanticQuery/TypeEntity Type]. For others they are listed here:

==== File.typerefs: ====
This selector returns all types referenced in the file. It does not return type expressions, but it returns all other types(even built-in types or ones which have been defined in an other file) that have been referenced in "-spec", "-type", "-opaque" attributes or record definitions.
{{{
ri:q("mods.typerefs").
a.erl
    erlang:atom/0
    a:prp_user/2
}}}

==== Function.returntype(, Spec.returntype): ====
Returns types or type expressions present as the return-type of the function's specs. It does not split union types, but please note that specs can be overloaded so even functions with only one clause can have multiple values. (Single-atom type expressions are filtered.)
{{{
ri:q("mods.funs.returntypes").
a:g/2
    erlang:atom/0
}}}

=== Limitations: ===
 - For performance and readability reasons, type-expressions consisting of a single atom(e.g., bool, int, 'error') are filtered from results. This means, without re-parsing the textual representations, the results can not be used perform reliable type checking or comparison for functions. (We believe, including such simple expressions in results for an ordinarily large code-base would undermine the main goal of these features, which is providing an easy way for understanding type informations and their relation.)
 - As of now, "@type" and "@spec" declarations are not analyzed.
 - "-spec" attributes are not mapped to function clauses, but only to their functions.