-
Notifications
You must be signed in to change notification settings - Fork 3
Query API
This document was generated from 'src/documentation/print-query-wiki.ts' on 2025-02-07, 08:46:36 UTC presenting an overview of flowR's query API (v2.2.1, using R v4.4.0). Please do not edit this file/wiki page directly.
This page briefly summarizes flowR's query API, represented by the executeQueries function in ./src/queries/query.ts
.
Please see the Interface wiki page for more information on how to access this API.
Note
There are many ways to query a dataflow graph created by flowR.
For example, you can use the request-query
message
with a running flowR server, or the :query
command in the flowR REPL.
Queries are JSON arrays of query objects, each of which uses a type
property to specify the query type.
In general, we separate two types of queries:
- Active Queries: Are exactly what you would expect from a query (e.g., the Call-Context Query). They fetch information from the dataflow graph.
- Virtual Queries: Are used to structure your queries (e.g., the Compound Query).
We separate these from a concept perspective.
For now, we support the following active queries (which we will refer to simply as a query
):
-
Call-Context Query (
call-context
):
Finds all calls in a set of files that matches specified criteria. -
Config Query (
config
):
Returns the current configuration of flowR. -
Dataflow Cluster Query (
dataflow-cluster
):
Calculates and returns all the clusters present in the dataflow graph. -
Dataflow Query (
dataflow
):
Returns the dataflow graph of the given code. -
Dependencies Query (
dependencies
):
Returns all direct dependencies (in- and outputs) of a given R script -
Happens-Before Query (
happens-before
):
Check whether one normalized AST node happens before another in the CFG. -
Id-Map Query (
id-map
):
Returns the id-map of the normalized AST of the given code. -
Lineage Query (
lineage
):
Returns lineage of a criteria. -
Location Map Query (
location-map
):
Returns a simple mapping of ids to their location in the source file -
Normalized AST Query (
normalized-ast
):
Returns the normalized AST of the given code. -
Resolve Value Query (
resolve-value
):
Provides access to flowR's value tracking (which is configurable) -
Search Query (
search
):
Provides access to flowR's search API -
Static Slice Query (
static-slice
):
Slice the dataflow graph reducing the code to just the parts relevant for the given criteria.
Similarly, we support the following virtual queries:
-
Compound Query (
compound
):
Combines multiple queries of the same type into one, specifying common arguments.
Detailed Query Format (Automatically Generated)
Although it is probably better to consult the detailed explanations below, if you want to have a look at the scehma, here is its description:
-
. array
Queries to run on the file analysis information (in the form of an array)
Valid item types:
-
. alternatives
Any query
-
. alternatives
Supported queries
-
. object
Call context query used to find calls in the dataflow graph
- type string [required] The type of the query. Allows only the values: 'call-context'
- callName string [required] Regex regarding the function name!
-
callNameExact boolean [optional]
Should we automatically add the
^
and$
anchors to the regex to make it an exact match? -
kind string [optional]
The kind of the call, this can be used to group calls together (e.g., linking
plot
tovisualize
). Defaults to.
-
subkind string [optional]
The subkind of the call, this can be used to uniquely identify the respective call type when grouping the output (e.g., the normalized name, linking
ggplot
toplot
). Defaults to.
-
callTargets string [optional]
Call targets the function may have. This defaults to
any
. Request this specifically to gain all call targets we can resolve. Allows only the values: 'global', 'must-include-global', 'local', 'must-include-local', 'any' -
includeAliases boolean [optional]
Consider a case like
f <- function_of_interest
, do you want uses off
to be included in the results? -
fileFilter object [optional]
Filter that, when set, a node's file attribute must match to be considered
- fileFilter string [required] Regex that a node's file attribute must match to be considered
-
includeUndefinedFiles boolean [optional]
If
fileFilter
is set, but a nodesfile
attribute isundefined
, should we include it in the results? Defaults totrue
.
-
linkTo object [optional]
Links the current call to the last call of the given kind. This way, you can link a call like
points
to the latest graphics plot etc.- type string [required] The type of the linkTo sub-query. Allows only the values: 'link-to-last-call'
-
callName string [required]
Regex regarding the function name of the last call. Similar to
callName
, strings are interpreted as a regular expression. - ignoreIf function [optional] Should we ignore this (source) call? Currently, there is no well working serialization for this.
- cascadeIf function [optional] Should we continue searching after the link was created? Currently, there is no well working serialization for this.
-
. object
The config query retrieves the current configuration of the flowR instance.
- type string [required] The type of the query. Allows only the values: 'config'
-
. object
The dataflow query simply returns the dataflow graph, there is no need to pass it multiple times!
- type string [required] The type of the query. Allows only the values: 'dataflow'
-
. object
The id map query retrieves the id map from the normalized AST.
- type string [required] The type of the query. Allows only the values: 'id-map'
-
. object
The normalized AST query simply returns the normalized AST, there is no need to pass it multiple times!
- type string [required] The type of the query. Allows only the values: 'normalized-ast'
-
. object
The cluster query calculates and returns all clusters in the dataflow graph.
- type string [required] The type of the query. Allows only the values: 'dataflow-cluster'
-
. object
Slice query used to slice the dataflow graph
- type string [required] The type of the query. Allows only the values: 'static-slice'
-
criteria array [required]
The slicing criteria to use.
Valid item types:
- . string
- noReconstruction boolean [optional] Do not reconstruct the slice into readable code.
- noMagicComments boolean [optional] Should the magic comments (force-including lines within the slice) be ignored?
-
. object
Lineage query used to find the lineage of a node in the dataflow graph
- type string [required] The type of the query. Allows only the values: 'lineage'
- criterion string [required] The slicing criterion of the node to get the lineage of.
-
. object
The dependencies query retrieves and returns the set of all dependencies in the dataflow graph, which includes libraries, sourced files, read data, and written data.
- type string [required] The type of the query. Allows only the values: 'dependencies'
- ignoreDefaultFunctions boolean [optional] Should the set of functions that are detected by default be ignored/skipped?
-
libraryFunctions array [optional]
The set of library functions to search for.
Valid item types:
-
. object
- name string [required] The name of the library function.
- argIdx number [optional] The index of the argument that contains the library name.
- argName string [optional] The name of the argument that contains the library name.
-
. object
-
sourceFunctions array [optional]
The set of source functions to search for.
Valid item types:
-
. object
- name string [required] The name of the library function.
- argIdx number [optional] The index of the argument that contains the library name.
- argName string [optional] The name of the argument that contains the library name.
-
. object
-
readFunctions array [optional]
The set of data reading functions to search for.
Valid item types:
-
. object
- name string [required] The name of the library function.
- argIdx number [optional] The index of the argument that contains the library name.
- argName string [optional] The name of the argument that contains the library name.
-
. object
-
writeFunctions array [optional]
The set of data writing functions to search for.
Valid item types:
-
. object
- name string [required] The name of the library function.
- argIdx number [optional] The index of the argument that contains the library name.
- argName string [optional] The name of the argument that contains the library name.
-
. object
-
. object
The location map query retrieves the location of every id in the ast.
- type string [required] The type of the query. Allows only the values: 'location-map'
-
. object
The search query searches the normalized AST and dataflow graph for nodes that match the given search query.
- type string [required] The type of the query. Allows only the values: 'search'
- search object [required] The search query to execute.
-
. object
Happens-Before tracks whether a always happens before b.
- type string [required] The type of the query. Allows only the values: 'happens-before'
- a string [required] The first slicing criterion.
- b string [required] The second slicing criterion.
-
. object
The resolve value query used to get definitions of an identifier
- type string [required] The type of the query. Allows only the values: 'resolve-value'
-
criteria array [required]
The slicing criteria to use.
Valid item types:
- . string
-
. object
Call context query used to find calls in the dataflow graph
-
. alternatives
Virtual queries (used for structure)
-
. object
Compound query used to combine queries of the same type
- type string [required] The type of the query. Allows only the values: 'compound'
- query string [required] The query to run on the file analysis information.
- commonArguments object [required] Common arguments for all queries.
-
arguments array [required]
Arguments for each query.
Valid item types:
- . object
-
. object
Compound query used to combine queries of the same type
-
. alternatives
Supported queries
-
. alternatives
Any query
First, consider that you have a file like the following (of course, this is just a simple and artificial example):
library(ggplot)
library(dplyr)
library(readr)
# read data with read_csv
data <- read_csv('data.csv')
data2 <- read_csv('data2.csv')
m <- mean(data$x)
print(m)
data %>%
ggplot(aes(x = x, y = y)) +
geom_point()
plot(data2$x, data2$y)
points(data2$x, data2$y)
print(mean(data2$k))
Dataflow Graph of the Example
flowchart LR
1{{"`#91;RSymbol#93; ggplot
(1)
*1.9-14*`"}}
3[["`#91;RFunctionCall#93; library
(3)
*1.1-15*
(1)`"]]
style 3 stroke:red,stroke-width:5px;
5{{"`#91;RSymbol#93; dplyr
(5)
*2.9-13*`"}}
7[["`#91;RFunctionCall#93; library
(7)
*2.1-14*
(5)`"]]
style 7 stroke:red,stroke-width:5px;
9{{"`#91;RSymbol#93; readr
(9)
*3.9-13*`"}}
11[["`#91;RFunctionCall#93; library
(11)
*3.1-14*
(9)`"]]
style 11 stroke:red,stroke-width:5px;
14{{"`#91;RString#93; #39;data.csv#39;
(14)
*6.18-27*`"}}
16[["`#91;RFunctionCall#93; read#95;csv
(16)
*6.9-28*
(14)`"]]
12["`#91;RSymbol#93; data
(12)
*6.1-4*`"]
17[["`#91;RBinaryOp#93; #60;#45;
(17)
*6.1-28*
(12, 16)`"]]
20{{"`#91;RString#93; #39;data2.csv#39;
(20)
*7.19-29*`"}}
%% Environment of 22 [level: 0]:
%% Built-in
%% 24----------------------------------------
%% data: {**data** (id: 12, type: Unknown, def. @17)}
22[["`#91;RFunctionCall#93; read#95;csv
(22)
*7.10-30*
(20)`"]]
18["`#91;RSymbol#93; data2
(18)
*7.1-5*`"]
23[["`#91;RBinaryOp#93; #60;#45;
(23)
*7.1-30*
(18, 22)`"]]
26(["`#91;RSymbol#93; data
(26)
*9.11-14*`"])
27{{"`#91;RSymbol#93; x
(27)
*9.11-16*`"}}
29[["`#91;RAccess#93; $
(29)
*9.11-16*
(26, 27)`"]]
31[["`#91;RFunctionCall#93; mean
(31)
*9.6-17*
(29)`"]]
24["`#91;RSymbol#93; m
(24)
*9.1*`"]
32[["`#91;RBinaryOp#93; #60;#45;
(32)
*9.1-17*
(24, 31)`"]]
34(["`#91;RSymbol#93; m
(34)
*10.7*`"])
36[["`#91;RFunctionCall#93; print
(36)
*10.1-8*
(34)`"]]
38(["`#91;RSymbol#93; data
(38)
*12.1-4*`"])
43(["`#91;RSymbol#93; x
(43)
*13.24*`"])
44(["`#91;RArgument#93; x
(44)
*13.20*`"])
46(["`#91;RSymbol#93; y
(46)
*13.31*`"])
47(["`#91;RArgument#93; y
(47)
*13.27*`"])
%% Environment of 48 [level: 0]:
%% Built-in
%% 56----------------------------------------
%% data: {**data** (id: 12, type: Unknown, def. @17)}
%% data2: {**data2** (id: 18, type: Unknown, def. @23)}
%% m: {**m** (id: 24, type: Unknown, def. @32)}
48[["`#91;RFunctionCall#93; aes
(48)
*13.16-32*
(x (44), y (47))`"]]
%% Environment of 50 [level: 0]:
%% Built-in
%% 59----------------------------------------
%% data: {**data** (id: 12, type: Unknown, def. @17)}
%% data2: {**data2** (id: 18, type: Unknown, def. @23)}
%% m: {**m** (id: 24, type: Unknown, def. @32)}
50[["`#91;RFunctionCall#93; ggplot
(50)
*13.9-33*
(38, 48)`"]]
52[["`#91;RFunctionCall#93; data %#62;%
ggplot(aes(x = x, y = y))
(52)
*12.6-8*
(38, 50)`"]]
%% Environment of 54 [level: 0]:
%% Built-in
%% 65----------------------------------------
%% data: {**data** (id: 12, type: Unknown, def. @17)}
%% data2: {**data2** (id: 18, type: Unknown, def. @23)}
%% m: {**m** (id: 24, type: Unknown, def. @32)}
54[["`#91;RFunctionCall#93; geom#95;point
(54)
*14.9-20*`"]]
55[["`#91;RBinaryOp#93; #43;
(55)
*12.1-14.20*
(52, 54)`"]]
57(["`#91;RSymbol#93; data2
(57)
*16.6-10*`"])
58{{"`#91;RSymbol#93; x
(58)
*16.6-12*`"}}
60[["`#91;RAccess#93; $
(60)
*16.6-12*
(57, 58)`"]]
62(["`#91;RSymbol#93; data2
(62)
*16.15-19*`"])
63{{"`#91;RSymbol#93; y
(63)
*16.15-21*`"}}
65[["`#91;RAccess#93; $
(65)
*16.15-21*
(62, 63)`"]]
67[["`#91;RFunctionCall#93; plot
(67)
*16.1-22*
(60, 65)`"]]
69(["`#91;RSymbol#93; data2
(69)
*17.8-12*`"])
70{{"`#91;RSymbol#93; x
(70)
*17.8-14*`"}}
72[["`#91;RAccess#93; $
(72)
*17.8-14*
(69, 70)`"]]
74(["`#91;RSymbol#93; data2
(74)
*17.17-21*`"])
75{{"`#91;RSymbol#93; y
(75)
*17.17-23*`"}}
77[["`#91;RAccess#93; $
(77)
*17.17-23*
(74, 75)`"]]
79[["`#91;RFunctionCall#93; points
(79)
*17.1-24*
(72, 77)`"]]
82(["`#91;RSymbol#93; data2
(82)
*19.12-16*`"])
83{{"`#91;RSymbol#93; k
(83)
*19.12-18*`"}}
85[["`#91;RAccess#93; $
(85)
*19.12-18*
(82, 83)`"]]
87[["`#91;RFunctionCall#93; mean
(87)
*19.7-19*
(85)`"]]
89[["`#91;RFunctionCall#93; print
(89)
*19.1-20*
(87)`"]]
3 -->|"argument"| 1
7 -->|"argument"| 5
11 -->|"argument"| 9
16 -->|"argument"| 14
12 -->|"defined-by"| 16
12 -->|"defined-by"| 17
17 -->|"argument"| 16
17 -->|"returns, argument"| 12
22 -->|"argument"| 20
18 -->|"defined-by"| 22
18 -->|"defined-by"| 23
23 -->|"argument"| 22
23 -->|"returns, argument"| 18
26 -->|"reads"| 12
29 -->|"reads, returns, argument"| 26
29 -->|"reads, argument"| 27
31 -->|"reads, argument"| 29
24 -->|"defined-by"| 31
24 -->|"defined-by"| 32
32 -->|"argument"| 31
32 -->|"returns, argument"| 24
34 -->|"reads"| 24
36 -->|"reads, returns, argument"| 34
38 -->|"reads"| 12
44 -->|"reads"| 43
47 -->|"reads"| 46
48 -->|"reads"| 43
48 -->|"argument"| 44
48 -->|"reads"| 46
48 -->|"argument"| 47
50 -->|"reads, argument"| 48
50 -->|"argument"| 38
52 -->|"argument"| 38
52 -->|"argument"| 50
55 -->|"reads, argument"| 52
55 -->|"reads, argument"| 54
57 -->|"reads"| 18
60 -->|"reads, returns, argument"| 57
60 -->|"reads, argument"| 58
62 -->|"reads"| 18
65 -->|"reads, returns, argument"| 62
65 -->|"reads, argument"| 63
67 -->|"reads, argument"| 60
67 -->|"reads, argument"| 65
69 -->|"reads"| 18
72 -->|"reads, returns, argument"| 69
72 -->|"reads, argument"| 70
74 -->|"reads"| 18
77 -->|"reads, returns, argument"| 74
77 -->|"reads, argument"| 75
79 -->|"reads, argument"| 72
79 -->|"reads, argument"| 77
79 -->|"reads"| 67
82 -->|"reads"| 18
85 -->|"reads, returns, argument"| 82
85 -->|"reads, argument"| 83
87 -->|"reads, argument"| 85
89 -->|"reads, returns, argument"| 87
(The analysis required 24.85 ms (including parse and normalize, using the r-shell engine) within the generation environment.)
Additionally, consider that you are interested in all function calls which loads data with read_csv
.
A simple regex
-based query could look like this: ^read_csv$
.
However, this fails to incorporate
- Syntax-based information (comments, strings, used as a variable, called as a higher-order function, ...)
- Semantic information (e.g.,
read_csv
is overwritten by a function with the same name) - Context information (e.g., calls like
points
may link to the current plot)
To solve this, flowR provides a query API which allows you to specify queries on the dataflow graph.
For the specific use-case stated, you could use the Call-Context Query to find all calls to read_csv
which refer functions that are not overwritten.
Just as an example, the following Call-Context Query finds all calls to read_csv
that are not overwritten:
[
{
"type": "call-context",
"callName": "^read_csv$",
"callTargets": "global",
"kind": "input",
"subkind": "csv-file"
}
]
Results (prettified and summarized):
Query: call-context (1 ms)
╰ input
╰ csv-file: read_csv
(L.6), read_csv
(L.7)
All queries together required ≈1 ms (1ms accuracy, total 10 ms)
Show Detailed Results as Json
The analysis required 10.20 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"call-context": {
".meta": {
"timing": 1
},
"kinds": {
"input": {
"subkinds": {
"csv-file": [
{
"id": 16,
"name": "read_csv",
"calls": []
},
{
"id": 22,
"name": "read_csv",
"calls": []
}
]
}
}
}
},
".meta": {
"timing": 1
}
}
Call context queries can be used to identify calls to specific functions that match criteria of your interest. For now, we support two criteria:
-
Function Name (
callName
): The function name is specified by a regular expression. This allows you to find all calls to functions that match a specific pattern. Please note, that if you do not use Regex-Anchors, the query will match any function name that contains the given pattern (you can set thecallNameExact
property totrue
to automatically add the^...$
anchors). -
Call Targets (
callTargets
): This specifies to what the function call targets. For example, you may want to find all calls to a function that is not defined locally.
Besides this, we provide the following ways to automatically categorize and link identified invocations:
-
Kind (
kind
): This is a general category that can be used to group calls together. For example, you may want to link all calls toplot
tovisualize
. -
Subkind (
subkind
): This is used to uniquely identify the respective call type when grouping the output. For example, you may want to link all calls toggplot
toplot
. -
Linked Calls (
linkTo
): This links the current call to the last call of the given kind. This way, you can link a call likepoints
to the latest graphics plot etc. For now, we only offer support for linking to the last call, as the current flow dependency over-approximation is not stable. -
Aliases (
includeAliases
): Consider a case likef <- function_of_interest
, do you want calls tof
to be included in the results? There is probably no need to combine this with a global call target!
It's also possible to filter the results based on the following properties:
-
File (
fileFilter
): This allows you to filter the results based on the file in which the call is located. This can be useful if you are only interested in calls in, e.g., specific folders. ThefileFilter
property is an object made up of two properties:
-
Filter (
filter
): A regular expression that a node's file attribute must match to be considered. -
Include Undefined Files (
includeUndefinedFiles
): IffileFilter
is set, but a node's file attribute is not present, should we include it in the results? Defaults totrue
.
Re-using the example code from above, the following query attaches all calls to mean
to the kind visualize
and the subkind text
,
all calls that start with read_
to the kind input
but only if they are not locally overwritten, and the subkind csv-file
, and links all calls to points
to the last call to plot
:
[
{
"type": "call-context",
"callName": "^mean$",
"kind": "visualize",
"subkind": "text"
},
{
"type": "call-context",
"callName": "^read_",
"kind": "input",
"subkind": "csv-file",
"callTargets": "global"
},
{
"type": "call-context",
"callName": "^points$",
"kind": "visualize",
"subkind": "plot",
"linkTo": {
"type": "link-to-last-call",
"callName": "^plot$"
}
}
]
Results (prettified and summarized):
Query: call-context (1 ms)
╰ input
╰ csv-file: read_csv
(L.6), read_csv
(L.7)
╰ visualize
╰ text: mean
(L.9), mean
(L.19)
╰ plot: points
(L.17) with 1 link (plot
(L.16))
All queries together required ≈1 ms (1ms accuracy, total 13 ms)
Show Detailed Results as Json
The analysis required 13.29 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"call-context": {
".meta": {
"timing": 1
},
"kinds": {
"input": {
"subkinds": {
"csv-file": [
{
"id": 16,
"name": "read_csv",
"calls": []
},
{
"id": 22,
"name": "read_csv",
"calls": []
}
]
}
},
"visualize": {
"subkinds": {
"text": [
{
"id": 31,
"name": "mean"
},
{
"id": 87,
"name": "mean"
}
],
"plot": [
{
"id": 79,
"name": "points",
"linkedIds": [
67
]
}
]
}
}
}
},
".meta": {
"timing": 1
}
}
As you can see, all kinds and subkinds with the same name are grouped together. Yet, re-stating common arguments and kinds may be cumbersome (although you can already use clever regex patterns). See the Compound Query for a way to structure your queries more compactly if you think it gets too verbose.
Alias Example
Consider the following code:
foo <- my_test_function
foo()
if(u) bar <- foo
bar()
my_test_function()
Now let's say we want to query all uses of the my_test_function
:
[
{
"type": "call-context",
"callName": "^my_test_function",
"includeAliases": true
}
]
Results (prettified and summarized):
Query: call-context (1 ms)
╰ .
╰ .: foo
(L.2) with 1 alias root (my_test_function
(L.1)), bar
(L.4) with 1 alias root (my_test_function
(L.1)), my_test_function
(L.5)
All queries together required ≈1 ms (1ms accuracy, total 5 ms)
Show Detailed Results as Json
The analysis required 5.23 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"call-context": {
".meta": {
"timing": 1
},
"kinds": {
".": {
"subkinds": {
".": [
{
"id": 4,
"name": "foo",
"aliasRoots": [
1
]
},
{
"id": 12,
"name": "bar",
"aliasRoots": [
1
]
},
{
"id": 14,
"name": "my_test_function"
}
]
}
}
}
},
".meta": {
"timing": 1
}
}
Implementation Details
Responsible for the execution of the Call-Context Query query is executeCallContextQueries
in ./src/queries/catalog/call-context-query/call-context-query-executor.ts
.
This query provides access to the current configuration of the flowR instance. See the Interface wiki page for more information on what the configuration represents.
Implementation Details
Responsible for the execution of the Config Query query is executeConfigQuery
in ./src/queries/catalog/config-query/config-query-format.ts
.
This query automatically calculates clusters in flowR's dataflow graph
and returns a list of all clusters found.
Clusters are to be interpreted as literal clusters on the graph traversing
edges in both directions. From this perspective,
the code x <- 1; x
has one cluster (given that all code is related),
while the code x <- 1; y
has two clusters (given that the y
has no relation to the previous definition).
Example x <- 1; x
[
{
"type": "dataflow-cluster"
}
]
Results (prettified and summarized):
Query: dataflow-cluster (1ms)
╰ Found 1 cluster
╰ {3, 0, 1, 2} (marked)
All queries together required ≈1 ms (1ms accuracy, total 2 ms)
Show Detailed Results as Json
The analysis required 2.37 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"dataflow-cluster": {
".meta": {
"timing": 1
},
"clusters": [
{
"startNode": 3,
"members": [
3,
0,
1,
2
],
"hasUnknownSideEffects": false
}
]
},
".meta": {
"timing": 1
}
}
Example x <- 1; y
[
{
"type": "dataflow-cluster"
}
]
Results (prettified and summarized):
Query: dataflow-cluster (0ms)
╰ Found 2 clusters
╰ {3} (marked)
╰ {2, 1, 0} (marked)
All queries together required ≈0 ms (1ms accuracy, total 2 ms)
Show Detailed Results as Json
The analysis required 1.62 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"dataflow-cluster": {
".meta": {
"timing": 0
},
"clusters": [
{
"startNode": 3,
"members": [
3
],
"hasUnknownSideEffects": false
},
{
"startNode": 2,
"members": [
2,
1,
0
],
"hasUnknownSideEffects": false
}
]
},
".meta": {
"timing": 0
}
}
Using the example code from above, the following query returns all clusters:
[ { "type": "dataflow-cluster" } ]
Results (prettified and summarized):
Query: dataflow-cluster (0ms)
╰ Found 5 clusters
╰ {89, 87, 85, 82, 18, 22, ... (see JSON)} (marked)
╰ {55, 52, 38, 12, 16, 14, ... (see JSON)} (marked)
╰ (has unknown side effect) {11, 9} (marked)
╰ (has unknown side effect) {7, 5} (marked)
╰ (has unknown side effect) {3, 1} (marked)
All queries together required ≈0 ms (1ms accuracy, total 7 ms)
Show Detailed Results as Json
The analysis required 7.19 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"dataflow-cluster": {
".meta": {
"timing": 0
},
"clusters": [
{
"startNode": 89,
"members": [
89,
87,
85,
82,
18,
22,
20,
23,
57,
60,
58,
67,
65,
62,
63,
79,
72,
69,
70,
77,
74,
75,
83
],
"hasUnknownSideEffects": false
},
{
"startNode": 55,
"members": [
55,
52,
38,
12,
16,
14,
17,
26,
29,
27,
31,
32,
24,
34,
36,
50,
48,
43,
44,
46,
47,
54
],
"hasUnknownSideEffects": false
},
{
"startNode": 11,
"members": [
11,
9
],
"hasUnknownSideEffects": true
},
{
"startNode": 7,
"members": [
7,
5
],
"hasUnknownSideEffects": true
},
{
"startNode": 3,
"members": [
3,
1
],
"hasUnknownSideEffects": true
}
]
},
".meta": {
"timing": 0
}
}
Implementation Details
Responsible for the execution of the Dataflow Cluster Query query is executeDataflowClusterQuery
in ./src/queries/catalog/cluster-query/cluster-query-executor.ts
.
Maybe you want to handle only the result of the query execution, or you just need the dataflow graph again. This query type does exactly that!
Using the example code x + 1
, the following query returns the dataflow graph of the code:
[ { "type": "dataflow" } ]
Results (prettified and summarized):
Query: dataflow (0 ms)
╰ Dataflow Graph
All queries together required ≈0 ms (1ms accuracy, total 2 ms)
Show Detailed Results as Json
The analysis required 2.04 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
As the code is pretty long, we inhibit pretty printing and syntax highlighting (JSON):
{"dataflow":{".meta":{"timing":0},"graph":{"_idMap":{"size":7,"k2v":[[0,{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}}],[1,{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}}],[2,{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],[3,{"type":"RExpressionList","children":[{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],"info":{"additionalTokens":[],"id":3,"nesting":0,"role":"root","index":0}}],["2-arg",{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],["0-arg",{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}}],["1-arg",{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}}]],"v2k":{}},"_unknownSideEffects":[],"rootVertices":[0,1,2],"vertexInformation":[[0,{"tag":"use","id":0}],[1,{"tag":"value","id":1}],[2,{"tag":"function-call","id":2,"name":"+","onlyBuiltin":true,"args":[{"nodeId":0,"type":32},{"nodeId":1,"type":32}]}]],"edgeInformation":[[2,[[0,{"types":65}],[1,{"types":65}]]]]}},".meta":{"timing":0}}
Original Code
x + 1
Dataflow Graph of the R Code
The analysis required 1.73 ms (including parse and normalize, using the r-shell engine) within the generation environment. We encountered no unknown side effects during the analysis.
flowchart LR
0(["`#91;RSymbol#93; x
(0)
*1.1*`"])
1{{"`#91;RNumber#93; 1
(1)
*1.5*`"}}
2[["`#91;RBinaryOp#93; #43;
(2)
*1.1-5*
(0, 1)`"]]
2 -->|"reads, argument"| 0
2 -->|"reads, argument"| 1
Implementation Details
Responsible for the execution of the Dataflow Query query is executeDataflowQuery
in ./src/queries/catalog/dataflow-query/dataflow-query-executor.ts
.
This query extracts all dependencies from an R script, using a combination of a Call-Context Query and more advanced tracking in the Dataflow Graph.
In other words, if you have a script simply reading: library(x)
, the following query returns the loaded library:
[ { "type": "dependencies" } ]
Results (prettified and summarized):
Query: dependencies (1 ms)
╰ Libraries
╰ library
╰ Node Id: 3, x
All queries together required ≈1 ms (1ms accuracy, total 3 ms)
Show Detailed Results as Json
The analysis required 2.79 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"dependencies": {
".meta": {
"timing": 1
},
"libraries": [
{
"nodeId": 3,
"functionName": "library",
"libraryName": "x"
}
],
"sourcedFiles": [],
"readData": [],
"writtenData": []
},
".meta": {
"timing": 1
}
}
Of course, this works for more complicated scripts too. The query offers information on the loaded libraries, sourced files, data which is read and data which is written. For example, consider the following script:
source("sample.R")
foo <- loadNamespace("bar")
data <- read.csv("data.csv")
#' @importFrom ggplot2 ggplot geom_point aes
ggplot(data, aes(x=x, y=y)) + geom_point()
better::write.csv(data, "data2.csv")
print("hello world!")
The following query returns the dependencies of the script.
[ { "type": "dependencies" } ]
Show Results
Results (prettified and summarized):
Query: dependencies (1 ms)
╰ Libraries
╰ loadNamespace
╰ Node Id: 8, bar
╰ ::
╰ Node Id: 32, better
╰ Sourced Files
╰ source
╰ Node Id: 3, sample.R
╰ Read Data
╰ read.csv
╰ Node Id: 14, data.csv
╰ Written Data
╰ write.csv
╰ Node Id: 37, data2.csv
╰ print
╰ Node Id: 41, stdout
All queries together required ≈1 ms (1ms accuracy, total 6 ms)
Show Detailed Results as Json
The analysis required 5.93 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"dependencies": {
".meta": {
"timing": 1
},
"libraries": [
{
"nodeId": 8,
"functionName": "loadNamespace",
"libraryName": "bar"
},
{
"nodeId": 32,
"functionName": "::",
"libraryName": "better"
}
],
"sourcedFiles": [
{
"nodeId": 3,
"functionName": "source",
"file": "sample.R"
}
],
"readData": [
{
"nodeId": 14,
"functionName": "read.csv",
"source": "data.csv"
}
],
"writtenData": [
{
"nodeId": 37,
"functionName": "write.csv",
"destination": "data2.csv"
},
{
"nodeId": 41,
"functionName": "print",
"destination": "stdout"
}
]
},
".meta": {
"timing": 1
}
}
Currently the dependency extraction may fail as it is essentially a set of heuristics guessing the dependencies. We welcome any feedback on this (consider opening a new issue).
In the meantime we offer several properties to overwrite the default behavior (e.g., function names that should be collected)
[
{
"type": "dependencies",
"ignoreDefaultFunctions": true,
"libraryFunctions": [
{
"name": "print",
"argIdx": 0,
"argName": "library"
}
],
"sourceFunctions": [],
"readFunctions": [],
"writeFunctions": []
}
]
Show Results
Results (prettified and summarized):
Query: dependencies (0 ms)
╰ Libraries
╰ print
╰ Node Id: 41, hello world!
All queries together required ≈0 ms (1ms accuracy, total 5 ms)
Show Detailed Results as Json
The analysis required 5.13 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"dependencies": {
".meta": {
"timing": 0
},
"libraries": [
{
"nodeId": 41,
"functionName": "print",
"libraryName": "hello world!"
}
],
"sourcedFiles": [],
"readData": [],
"writtenData": []
},
".meta": {
"timing": 0
}
}
Implementation Details
Responsible for the execution of the Dependencies Query query is executeDependenciesQuery
in ./src/queries/catalog/dependencies-query/dependencies-query-executor.ts
.
With this query you can analyze the control flow graph:
Using the example code:
x <- 1
y <- 2
the following query returns that the first assignment happens always before the other:
[
{
"type": "happens-before",
"a": "1@x",
"b": "2@y"
}
]
Results (prettified and summarized):
Query: happens-before (1 ms)
╰ 1@x<2@y: always
All queries together required ≈1 ms (1ms accuracy, total 3 ms)
Show Detailed Results as Json
The analysis required 2.85 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"happens-before": {
".meta": {
"timing": 1
},
"results": {
"1@x<2@y": "always"
}
},
".meta": {
"timing": 1
}
}
Original Code
x <- 1
y <- 2
Dataflow Graph of the R Code
The analysis required 1.48 ms (including parse and normalize, using the r-shell engine) within the generation environment. We encountered no unknown side effects during the analysis.
flowchart LR
1{{"`#91;RNumber#93; 1
(1)
*1.6*`"}}
0["`#91;RSymbol#93; x
(0)
*1.1*`"]
2[["`#91;RBinaryOp#93; #60;#45;
(2)
*1.1-6*
(0, 1)`"]]
4{{"`#91;RNumber#93; 2
(4)
*2.6*`"}}
3["`#91;RSymbol#93; y
(3)
*2.1*`"]
5[["`#91;RBinaryOp#93; #60;#45;
(5)
*2.1-6*
(3, 4)`"]]
0 -->|"defined-by"| 1
0 -->|"defined-by"| 2
2 -->|"argument"| 1
2 -->|"returns, argument"| 0
3 -->|"defined-by"| 4
3 -->|"defined-by"| 5
5 -->|"argument"| 4
5 -->|"returns, argument"| 3
Implementation Details
Responsible for the execution of the Happens-Before Query query is executeSearch
in ./src/queries/catalog/happens-before-query/happens-before-query-executor.ts
.
This query provides access to all nodes in the normalized AST as a mapping from their id to the node itself.
Using the example code x + 1
, the following query returns all nodes from the code:
[ { "type": "id-map" } ]
Results (prettified and summarized):
Query: id-map (0 ms)
╰ Id List: {0, 1, 2, 3, 2-arg, 0-arg, ... (see JSON)}
All queries together required ≈0 ms (1ms accuracy, total 1 ms)
Show Detailed Results as Json
The analysis required 1.23 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
As the code is pretty long, we inhibit pretty printing and syntax highlighting (JSON):
{"id-map":{".meta":{"timing":0},"idMap":{"size":7,"k2v":[[0,{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}}],[1,{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}}],[2,{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],[3,{"type":"RExpressionList","children":[{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],"info":{"additionalTokens":[],"id":3,"nesting":0,"role":"root","index":0}}],["2-arg",{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],["0-arg",{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}}],["1-arg",{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}}]],"v2k":{}}},".meta":{"timing":0}}
Original Code
x + 1
Dataflow Graph of the R Code
The analysis required 1.23 ms (including parse and normalize, using the r-shell engine) within the generation environment. We encountered no unknown side effects during the analysis.
flowchart LR
0(["`#91;RSymbol#93; x
(0)
*1.1*`"])
1{{"`#91;RNumber#93; 1
(1)
*1.5*`"}}
2[["`#91;RBinaryOp#93; #43;
(2)
*1.1-5*
(0, 1)`"]]
2 -->|"reads, argument"| 0
2 -->|"reads, argument"| 1
Implementation Details
Responsible for the execution of the Id-Map Query query is executeIdMapQuery
in ./src/queries/catalog/id-map-query/id-map-query-executor.ts
.
This query calculates the lineage of a given slicing criterion. The lineage traces back all parts that the respective variables stems from given the reads, definitions, and returns in the dataflow graph.
To understand this, let's start with a simple example query, to get the lineage of the second use of x
in the following code:
x <- 1
x
For this, we use the criterion 2@x
(which is the first use of x
in the second line).
[
{
"type": "lineage",
"criterion": "2@x"
}
]
Results (prettified and summarized):
Query: lineage (0 ms)
╰ 2@x: {3, 0, 1, 2}
All queries together required ≈0 ms (1ms accuracy, total 2 ms)
Show Detailed Results as Json
The analysis required 1.57 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"lineage": {
".meta": {
"timing": 0
},
"lineages": {
"2@x": [
3,
0,
1,
2
]
}
},
".meta": {
"timing": 0
}
}
In this simple scenario, the lineage is equivalent to the slice (and in-fact the complete code). In general the lineage is smaller and makes no executability guarantees. It is just a quick and neither complete nor sound way to get information on where the variable originates from.
This query replaces the old request-lineage
message.
Implementation Details
Responsible for the execution of the Lineage Query query is executeLineageQuery
in ./src/queries/catalog/lineage-query/lineage-query-executor.ts
.
A query like the Id-Map Query query can return a really big result, especially for larger scripts. If you are not interested in all of the information contained within the full map, you can use the location map query to get a simple mapping of ids to their location in the source file.
Consider you have the following code:
x + 1
x * 2
The following query then gives you the aforementioned mapping:
[ { "type": "location-map" } ]
Results (prettified and summarized):
Query: location-map (0 ms)
╰ Id List: {0, 1, 2, 3, 4, 5, 6, ... (see JSON)}
All queries together required ≈0 ms (1ms accuracy, total 6 ms)
Show Detailed Results as Json
The analysis required 6.04 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"location-map": {
".meta": {
"timing": 0
},
"map": {
"0": [
1,
1,
1,
1
],
"1": [
1,
5,
1,
5
],
"2": [
1,
3,
1,
3
],
"3": [
2,
1,
2,
1
],
"4": [
2,
5,
2,
5
],
"5": [
2,
3,
2,
3
],
"2-arg": [
1,
3,
1,
3
],
"5-arg": [
2,
3,
2,
3
],
"0-arg": [
1,
1,
1,
1
],
"1-arg": [
1,
5,
1,
5
],
"3-arg": [
2,
1,
2,
1
],
"4-arg": [
2,
5,
2,
5
]
}
},
".meta": {
"timing": 0
}
}
All locations are given as a SourceRange
in the format [start-line, start-column, end-line, end-column]
.
Implementation Details
Responsible for the execution of the Location Map Query query is executeLocationMapQuery
in ./src/queries/catalog/location-map-query/location-map-query-executor.ts
.
Maybe you want to handle only the result of the query execution, or you just need the normalized AST again. This query type does exactly that!
Using the example code x + 1
, the following query returns the normalized AST of the code:
[ { "type": "normalized-ast" } ]
Results (prettified and summarized):
Query: normalized-ast (0 ms)
╰ Normalized AST
All queries together required ≈0 ms (1ms accuracy, total 2 ms)
Show Detailed Results as Json
The analysis required 1.71 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
As the code is pretty long, we inhibit pretty printing and syntax highlighting (JSON):
{"normalized-ast":{".meta":{"timing":0},"normalized":{"ast":{"type":"RExpressionList","children":[{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],"info":{"additionalTokens":[],"id":3,"nesting":0,"role":"root","index":0}},"idMap":{"size":7,"k2v":[[0,{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}}],[1,{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}}],[2,{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],[3,{"type":"RExpressionList","children":[{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],"info":{"additionalTokens":[],"id":3,"nesting":0,"role":"root","index":0}}],["2-arg",{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],["0-arg",{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}}],["1-arg",{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}}]],"v2k":{}},".meta":{"timing":1}}},".meta":{"timing":0}}
Original Code
x + 1
Dataflow Graph of the R Code
The analysis required 1.28 ms (including parse and normalize, using the r-shell engine) within the generation environment. We encountered no unknown side effects during the analysis.
flowchart LR
0(["`#91;RSymbol#93; x
(0)
*1.1*`"])
1{{"`#91;RNumber#93; 1
(1)
*1.5*`"}}
2[["`#91;RBinaryOp#93; #43;
(2)
*1.1-5*
(0, 1)`"]]
2 -->|"reads, argument"| 0
2 -->|"reads, argument"| 1
Implementation Details
Responsible for the execution of the Normalized AST Query query is executeNormalizedAstQuery
in ./src/queries/catalog/normalized-ast-query/normalized-ast-query-executor.ts
.
With this query you can use flowR's value-tracking capabilities to resolve identifiers to all potential values they may have at runtime (if possible). The extend to which flowR traces values (e.g. built-ins vs. constants) can be configured in flowR's Configuration file (see the Interface wiki page for more information).
Using the example code x <- 1 print(x)
, the following query returns all values of 'x' in the code:
[
{
"type": "resolve-value",
"criteria": [
"2@x"
]
}
]
Results (prettified and summarized):
Query: resolve-value (0 ms)
╰ Values for {2@x}
╰ 1
All queries together required ≈0 ms (1ms accuracy, total 2 ms)
Show Detailed Results as Json
The analysis required 1.70 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"resolve-value": {
".meta": {
"timing": 0
},
"results": {
"{\"type\":\"resolve-value\",\"criteria\":[\"2@x\"]}": {
"values": [
{
"num": 1,
"complexNumber": false,
"markedAsInt": false
}
]
}
}
},
".meta": {
"timing": 0
}
}
Original Code
x <- 1
print(x)
Dataflow Graph of the R Code
The analysis required 1.49 ms (including parse and normalize, using the r-shell engine) within the generation environment. We encountered unknown side effects (with ids: 6 (linked)) during the analysis.
flowchart LR
1{{"`#91;RNumber#93; 1
(1)
*1.6*`"}}
0["`#91;RSymbol#93; x
(0)
*1.1*`"]
2[["`#91;RBinaryOp#93; #60;#45;
(2)
*1.1-6*
(0, 1)`"]]
4(["`#91;RSymbol#93; x
(4)
*2.7*`"])
6[["`#91;RFunctionCall#93; print
(6)
*2.1-8*
(4)`"]]
0 -->|"defined-by"| 1
0 -->|"defined-by"| 2
2 -->|"argument"| 1
2 -->|"returns, argument"| 0
4 -->|"reads"| 0
6 -->|"reads, returns, argument"| 4
Implementation Details
Responsible for the execution of the Resolve Value Query query is executeSearch
in ./src/queries/catalog/resolve-value-query/resolve-value-query-executor.ts
.
With this query you can use the Search API to conduct searches on the flowR analysis result.
Using the example code x + 1
, the following query returns all uses of 'x' in the code:
[
{
"type": "search",
"search": {
"generator": {
"type": "generator",
"name": "get",
"args": {
"filter": {
"name": "x"
}
}
},
"search": [
{
"type": "transformer",
"name": "filter",
"args": {
"filter": "use"
}
}
]
}
}
]
Results (prettified and summarized):
Query: search (0 ms)
╰ query: {0}
All queries together required ≈0 ms (1ms accuracy, total 2 ms)
Show Detailed Results as Json
The analysis required 1.71 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"search": {
".meta": {
"timing": 0
},
"results": [
{
"ids": [
0
],
"search": {
"generator": {
"type": "generator",
"name": "get",
"args": {
"filter": {
"name": "x"
}
}
},
"search": [
{
"type": "transformer",
"name": "filter",
"args": {
"filter": "use"
}
}
]
}
}
]
},
".meta": {
"timing": 0
}
}
Original Code
x + 1
Dataflow Graph of the R Code
The analysis required 1.28 ms (including parse and normalize, using the r-shell engine) within the generation environment. We encountered no unknown side effects during the analysis.
flowchart LR
0(["`#91;RSymbol#93; x
(0)
*1.1*`"])
1{{"`#91;RNumber#93; 1
(1)
*1.5*`"}}
2[["`#91;RBinaryOp#93; #43;
(2)
*1.1-5*
(0, 1)`"]]
2 -->|"reads, argument"| 0
2 -->|"reads, argument"| 1
Implementation Details
Responsible for the execution of the Search Query query is executeSearch
in ./src/queries/catalog/search-query/search-query-executor.ts
.
To slice, flowR needs one thing from you: a variable or a list of variables (function calls are supported to, referring to the anonymous return of the call) that you want to slice the dataflow graph for. Given this, the slice is essentially the subpart of the program that may influence the value of the variables you are interested in. To specify a variable of interest, you have to present flowR with a slicing criterion (or, respectively, an array of them).
To exemplify the capabilities, consider the following code:
x <- 1
y <- 2
x
If you are interested in the parts required for the use of x
in the last line, you can use the following query:
[
{
"type": "static-slice",
"criteria": [
"3@x"
]
}
]
Results (prettified and summarized):
Query: static-slice (2 ms)
╰ Slice for {3@x}
╰ Code (newline as \n
): x <- 1\nx
All queries together required ≈2 ms (1ms accuracy, total 4 ms)
Show Detailed Results as Json
The analysis required 3.67 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"static-slice": {
".meta": {
"timing": 2
},
"results": {
"{\"type\":\"static-slice\",\"criteria\":[\"3@x\"]}": {
"slice": {
"timesHitThreshold": 0,
"result": [
6,
0,
1,
2
],
"decodedCriteria": [
{
"criterion": "3@x",
"id": 6
}
],
".meta": {
"timing": 1
}
},
"reconstruct": {
"code": "x <- 1\nx",
"linesWithAutoSelected": 0,
".meta": {
"timing": 1
}
}
}
}
},
".meta": {
"timing": 2
}
}
In general you may be uninterested in seeing the reconstructed version and want to save some computation time, for this,
you can use the noReconstruction
flag.
No Reconstruction Example
[
{
"type": "static-slice",
"criteria": [
"3@x"
],
"noReconstruction": true
}
]
Results (prettified and summarized):
Query: static-slice (1 ms)
╰ Slice for {3@x} no reconstruction
╰ Id List: {6, 0, 1, 2}
All queries together required ≈1 ms (1ms accuracy, total 2 ms)
Show Detailed Results as Json
The analysis required 2.24 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"static-slice": {
".meta": {
"timing": 1
},
"results": {
"{\"type\":\"static-slice\",\"criteria\":[\"3@x\"],\"noReconstruction\":true}": {
"slice": {
"timesHitThreshold": 0,
"result": [
6,
0,
1,
2
],
"decodedCriteria": [
{
"criterion": "3@x",
"id": 6
}
],
".meta": {
"timing": 1
}
}
}
}
},
".meta": {
"timing": 1
}
}
You can disable magic comments using the noMagicComments
flag.
This query replaces the old request-slice
message.
Implementation Details
Responsible for the execution of the Static Slice Query query is executeStaticSliceQuery
in ./src/queries/catalog/static-slice-query/static-slice-query-executor.ts
.
A compound query comes in use, whenever we want to state multiple queries of the same type with a set of common arguments. It offers the following properties of interest:
-
Query (
query
): the type of the query that is to be combined. -
Common Arguments (
commonArguments
): The arguments that are to be used as defaults for all queries (i.e., any argument the query may have). -
Arguments (
arguments
): The other arguments for the individual queries that are to be combined.
For example, consider the following compound query that combines two call-context queries for mean
and print
, both of which are to be
assigned to the kind visualize
and the subkind text
(using the example code from above):
[
{
"type": "compound",
"query": "call-context",
"commonArguments": {
"kind": "visualize",
"subkind": "text"
},
"arguments": [
{
"callName": "^mean$"
},
{
"callName": "^print$"
}
]
}
]
Results (prettified and summarized):
Query: call-context (0 ms)
╰ visualize
╰ text: mean
(L.9), print
(L.10), mean
(L.19), print
(L.19)
All queries together required ≈1 ms (1ms accuracy, total 6 ms)
Show Detailed Results as Json
The analysis required 6.16 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"call-context": {
".meta": {
"timing": 0
},
"kinds": {
"visualize": {
"subkinds": {
"text": [
{
"id": 31,
"name": "mean"
},
{
"id": 36,
"name": "print"
},
{
"id": 87,
"name": "mean"
},
{
"id": 89,
"name": "print"
}
]
}
}
}
},
".meta": {
"timing": 1
}
}
Of course, in this specific scenario, the following query would be equivalent:
[
{
"type": "call-context",
"callName": "^(mean|print)$",
"kind": "visualize",
"subkind": "text"
}
]
Show Results
Results (prettified and summarized):
Query: call-context (0 ms)
╰ visualize
╰ text: mean
(L.9), print
(L.10), mean
(L.19), print
(L.19)
All queries together required ≈0 ms (1ms accuracy, total 6 ms)
Show Detailed Results as Json
The analysis required 5.80 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"call-context": {
".meta": {
"timing": 0
},
"kinds": {
"visualize": {
"subkinds": {
"text": [
{
"id": 31,
"name": "mean"
},
{
"id": 36,
"name": "print"
},
{
"id": 87,
"name": "mean"
},
{
"id": 89,
"name": "print"
}
]
}
}
}
},
".meta": {
"timing": 0
}
}
However, compound queries become more useful whenever common arguments can not be expressed as a union in one of their properties.
Additionally, you can still overwrite default arguments.
In the following, we (by default) want all calls to not resolve to a local definition, except for those to print
for which we explicitly
want to resolve to a local definition:
[
{
"type": "compound",
"query": "call-context",
"commonArguments": {
"kind": "visualize",
"subkind": "text",
"callTargets": "global"
},
"arguments": [
{
"callName": "^mean$"
},
{
"callName": "^print$",
"callTargets": "local"
}
]
}
]
Results (prettified and summarized):
Query: call-context (0 ms)
╰ visualize
╰ text: mean
(L.9) with 1 call (built-in), mean
(L.19) with 1 call (built-in)
All queries together required ≈0 ms (1ms accuracy, total 10 ms)
Show Detailed Results as Json
The analysis required 9.96 ms (including parsing and normalization and the query) within the generation environment.
In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.
{
"call-context": {
".meta": {
"timing": 0
},
"kinds": {
"visualize": {
"subkinds": {
"text": [
{
"id": 31,
"name": "mean",
"calls": [
"built-in"
]
},
{
"id": 87,
"name": "mean",
"calls": [
"built-in"
]
}
]
}
}
}
},
".meta": {
"timing": 0
}
}
Now, the results no longer contain calls to plot
that are not defined locally.
Implementation Details
Responsible for the execution of the Compound Query query is executeCompoundQueries
in ./src/queries/virtual-query/compound-query.ts
.