Query API

This document was generated from 'src/documentation/print-query-wiki.ts' on 2025-02-07, 08:46:36 UTC presenting an overview of flowR's query API (v2.2.1, using R v4.4.0). Please do not edit this file/wiki page directly.

This page briefly summarizes flowR's query API, represented by the executeQueries function in ./src/queries/query.ts. Please see the Interface wiki page for more information on how to access this API.

Note

There are many ways to query a dataflow graph created by flowR. For example, you can use the request-query message with a running flowR server, or the :query command in the flowR REPL.

The Query Format

Queries are JSON arrays of query objects, each of which uses a type property to specify the query type. In general, we separate two types of queries:

Active Queries: Are exactly what you would expect from a query (e.g., the Call-Context Query). They fetch information from the dataflow graph.
Virtual Queries: Are used to structure your queries (e.g., the Compound Query).

We separate these from a concept perspective. For now, we support the following active queries (which we will refer to simply as a query):

Call-Context Query (call-context):
Finds all calls in a set of files that matches specified criteria.
Config Query (config):
Returns the current configuration of flowR.
Dataflow Cluster Query (dataflow-cluster):
Calculates and returns all the clusters present in the dataflow graph.
Dataflow Query (dataflow):
Returns the dataflow graph of the given code.
Dependencies Query (dependencies):
Returns all direct dependencies (in- and outputs) of a given R script
Happens-Before Query (happens-before):
Check whether one normalized AST node happens before another in the CFG.
Id-Map Query (id-map):
Returns the id-map of the normalized AST of the given code.
Lineage Query (lineage):
Returns lineage of a criteria.
Location Map Query (location-map):
Returns a simple mapping of ids to their location in the source file
Normalized AST Query (normalized-ast):
Returns the normalized AST of the given code.
Resolve Value Query (resolve-value):
Provides access to flowR's value tracking (which is configurable)
Search Query (search):
Provides access to flowR's search API
Static Slice Query (static-slice):
Slice the dataflow graph reducing the code to just the parts relevant for the given criteria.

Similarly, we support the following virtual queries:

Compound Query (compound):
Combines multiple queries of the same type into one, specifying common arguments.

Detailed Query Format (Automatically Generated)

Although it is probably better to consult the detailed explanations below, if you want to have a look at the scehma, here is its description:

. array Queries to run on the file analysis information (in the form of an array) Valid item types:
- . alternatives Any query
  - . alternatives Supported queries
    - . object Call context query used to find calls in the dataflow graph
      - type string [required] The type of the query. Allows only the values: 'call-context'
      - callName string [required] Regex regarding the function name!
      - callNameExact boolean [optional] Should we automatically add the ^ and $ anchors to the regex to make it an exact match?
      - kind string [optional] The kind of the call, this can be used to group calls together (e.g., linking plot to visualize). Defaults to .
      - subkind string [optional] The subkind of the call, this can be used to uniquely identify the respective call type when grouping the output (e.g., the normalized name, linking ggplot to plot). Defaults to .
      - callTargets string [optional] Call targets the function may have. This defaults to any. Request this specifically to gain all call targets we can resolve. Allows only the values: 'global', 'must-include-global', 'local', 'must-include-local', 'any'
      - includeAliases boolean [optional] Consider a case like f <- function_of_interest, do you want uses of f to be included in the results?
      - fileFilter object [optional] Filter that, when set, a node's file attribute must match to be considered
        
        fileFilter string [required] Regex that a node's file attribute must match to be considered
        
        includeUndefinedFiles boolean [optional] If fileFilter is set, but a nodes file attribute is undefined, should we include it in the results? Defaults to true.
      - linkTo object [optional] Links the current call to the last call of the given kind. This way, you can link a call like points to the latest graphics plot etc.
        
        type string [required] The type of the linkTo sub-query. Allows only the values: 'link-to-last-call'
        
        callName string [required] Regex regarding the function name of the last call. Similar to callName, strings are interpreted as a regular expression.
        
        ignoreIf function [optional] Should we ignore this (source) call? Currently, there is no well working serialization for this.
        
        cascadeIf function [optional] Should we continue searching after the link was created? Currently, there is no well working serialization for this.
    - . object The config query retrieves the current configuration of the flowR instance.
      - type string [required] The type of the query. Allows only the values: 'config'
    - . object The dataflow query simply returns the dataflow graph, there is no need to pass it multiple times!
      - type string [required] The type of the query. Allows only the values: 'dataflow'
    - . object The id map query retrieves the id map from the normalized AST.
      - type string [required] The type of the query. Allows only the values: 'id-map'
    - . object The normalized AST query simply returns the normalized AST, there is no need to pass it multiple times!
      - type string [required] The type of the query. Allows only the values: 'normalized-ast'
    - . object The cluster query calculates and returns all clusters in the dataflow graph.
      - type string [required] The type of the query. Allows only the values: 'dataflow-cluster'
    - . object Slice query used to slice the dataflow graph
      - type string [required] The type of the query. Allows only the values: 'static-slice'
      - criteria array [required] The slicing criteria to use. Valid item types:
        
        . string
      - noReconstruction boolean [optional] Do not reconstruct the slice into readable code.
      - noMagicComments boolean [optional] Should the magic comments (force-including lines within the slice) be ignored?
    - . object Lineage query used to find the lineage of a node in the dataflow graph
      - type string [required] The type of the query. Allows only the values: 'lineage'
      - criterion string [required] The slicing criterion of the node to get the lineage of.
    - . object The dependencies query retrieves and returns the set of all dependencies in the dataflow graph, which includes libraries, sourced files, read data, and written data.
      - type string [required] The type of the query. Allows only the values: 'dependencies'
      - ignoreDefaultFunctions boolean [optional] Should the set of functions that are detected by default be ignored/skipped?
      - libraryFunctions array [optional] The set of library functions to search for. Valid item types:
        
        . object
        
        name string [required] The name of the library function.
        
        argIdx number [optional] The index of the argument that contains the library name.
        
        argName string [optional] The name of the argument that contains the library name.
      - sourceFunctions array [optional] The set of source functions to search for. Valid item types:
        
        . object
        
        name string [required] The name of the library function.
        
        argIdx number [optional] The index of the argument that contains the library name.
        
        argName string [optional] The name of the argument that contains the library name.
      - readFunctions array [optional] The set of data reading functions to search for. Valid item types:
        
        . object
        
        name string [required] The name of the library function.
        
        argIdx number [optional] The index of the argument that contains the library name.
        
        argName string [optional] The name of the argument that contains the library name.
      - writeFunctions array [optional] The set of data writing functions to search for. Valid item types:
        
        . object
        
        name string [required] The name of the library function.
        
        argIdx number [optional] The index of the argument that contains the library name.
        
        argName string [optional] The name of the argument that contains the library name.
    - . object The location map query retrieves the location of every id in the ast.
      - type string [required] The type of the query. Allows only the values: 'location-map'
    - . object The search query searches the normalized AST and dataflow graph for nodes that match the given search query.
      - type string [required] The type of the query. Allows only the values: 'search'
      - search object [required] The search query to execute.
    - . object Happens-Before tracks whether a always happens before b.
      - type string [required] The type of the query. Allows only the values: 'happens-before'
      - a string [required] The first slicing criterion.
      - b string [required] The second slicing criterion.
    - . object The resolve value query used to get definitions of an identifier
      - type string [required] The type of the query. Allows only the values: 'resolve-value'
      - criteria array [required] The slicing criteria to use. Valid item types:
        
        . string
  - . alternatives Virtual queries (used for structure)
    - . object Compound query used to combine queries of the same type
      - type string [required] The type of the query. Allows only the values: 'compound'
      - query string [required] The query to run on the file analysis information.
      - commonArguments object [required] Common arguments for all queries.
      - arguments array [required] Arguments for each query. Valid item types:
        
        . object

Why Queries?

First, consider that you have a file like the following (of course, this is just a simple and artificial example):

library(ggplot)
library(dplyr)
library(readr)

# read data with read_csv
data <- read_csv('data.csv')
data2 <- read_csv('data2.csv')

m <- mean(data$x) 
print(m)

data %>%
	ggplot(aes(x = x, y = y)) +
	geom_point()
	
plot(data2$x, data2$y)
points(data2$x, data2$y)
	
print(mean(data2$k))

Dataflow Graph of the Example

flowchart LR
    1{{"`#91;RSymbol#93; ggplot
      (1)
      *1.9-14*`"}}
    3[["`#91;RFunctionCall#93; library
      (3)
      *1.1-15*
    (1)`"]]
    style 3 stroke:red,stroke-width:5px; 
    5{{"`#91;RSymbol#93; dplyr
      (5)
      *2.9-13*`"}}
    7[["`#91;RFunctionCall#93; library
      (7)
      *2.1-14*
    (5)`"]]
    style 7 stroke:red,stroke-width:5px; 
    9{{"`#91;RSymbol#93; readr
      (9)
      *3.9-13*`"}}
    11[["`#91;RFunctionCall#93; library
      (11)
      *3.1-14*
    (9)`"]]
    style 11 stroke:red,stroke-width:5px; 
    14{{"`#91;RString#93; #39;data.csv#39;
      (14)
      *6.18-27*`"}}
    16[["`#91;RFunctionCall#93; read#95;csv
      (16)
      *6.9-28*
    (14)`"]]
    12["`#91;RSymbol#93; data
      (12)
      *6.1-4*`"]
    17[["`#91;RBinaryOp#93; #60;#45;
      (17)
      *6.1-28*
    (12, 16)`"]]
    20{{"`#91;RString#93; #39;data2.csv#39;
      (20)
      *7.19-29*`"}}
    %% Environment of 22 [level: 0]:
    %% Built-in
    %% 24----------------------------------------
    %%   data: {**data** (id: 12, type: Unknown, def. @17)}
    22[["`#91;RFunctionCall#93; read#95;csv
      (22)
      *7.10-30*
    (20)`"]]
    18["`#91;RSymbol#93; data2
      (18)
      *7.1-5*`"]
    23[["`#91;RBinaryOp#93; #60;#45;
      (23)
      *7.1-30*
    (18, 22)`"]]
    26(["`#91;RSymbol#93; data
      (26)
      *9.11-14*`"])
    27{{"`#91;RSymbol#93; x
      (27)
      *9.11-16*`"}}
    29[["`#91;RAccess#93; $
      (29)
      *9.11-16*
    (26, 27)`"]]
    31[["`#91;RFunctionCall#93; mean
      (31)
      *9.6-17*
    (29)`"]]
    24["`#91;RSymbol#93; m
      (24)
      *9.1*`"]
    32[["`#91;RBinaryOp#93; #60;#45;
      (32)
      *9.1-17*
    (24, 31)`"]]
    34(["`#91;RSymbol#93; m
      (34)
      *10.7*`"])
    36[["`#91;RFunctionCall#93; print
      (36)
      *10.1-8*
    (34)`"]]
    38(["`#91;RSymbol#93; data
      (38)
      *12.1-4*`"])
    43(["`#91;RSymbol#93; x
      (43)
      *13.24*`"])
    44(["`#91;RArgument#93; x
      (44)
      *13.20*`"])
    46(["`#91;RSymbol#93; y
      (46)
      *13.31*`"])
    47(["`#91;RArgument#93; y
      (47)
      *13.27*`"])
    %% Environment of 48 [level: 0]:
    %% Built-in
    %% 56----------------------------------------
    %%   data:  {**data** (id: 12, type: Unknown, def. @17)}
    %%   data2: {**data2** (id: 18, type: Unknown, def. @23)}
    %%   m:     {**m** (id: 24, type: Unknown, def. @32)}
    48[["`#91;RFunctionCall#93; aes
      (48)
      *13.16-32*
    (x (44), y (47))`"]]
    %% Environment of 50 [level: 0]:
    %% Built-in
    %% 59----------------------------------------
    %%   data:  {**data** (id: 12, type: Unknown, def. @17)}
    %%   data2: {**data2** (id: 18, type: Unknown, def. @23)}
    %%   m:     {**m** (id: 24, type: Unknown, def. @32)}
    50[["`#91;RFunctionCall#93; ggplot
      (50)
      *13.9-33*
    (38, 48)`"]]
    52[["`#91;RFunctionCall#93; data %#62;%
	ggplot(aes(x = x, y = y))
      (52)
      *12.6-8*
    (38, 50)`"]]
    %% Environment of 54 [level: 0]:
    %% Built-in
    %% 65----------------------------------------
    %%   data:  {**data** (id: 12, type: Unknown, def. @17)}
    %%   data2: {**data2** (id: 18, type: Unknown, def. @23)}
    %%   m:     {**m** (id: 24, type: Unknown, def. @32)}
    54[["`#91;RFunctionCall#93; geom#95;point
      (54)
      *14.9-20*`"]]
    55[["`#91;RBinaryOp#93; #43;
      (55)
      *12.1-14.20*
    (52, 54)`"]]
    57(["`#91;RSymbol#93; data2
      (57)
      *16.6-10*`"])
    58{{"`#91;RSymbol#93; x
      (58)
      *16.6-12*`"}}
    60[["`#91;RAccess#93; $
      (60)
      *16.6-12*
    (57, 58)`"]]
    62(["`#91;RSymbol#93; data2
      (62)
      *16.15-19*`"])
    63{{"`#91;RSymbol#93; y
      (63)
      *16.15-21*`"}}
    65[["`#91;RAccess#93; $
      (65)
      *16.15-21*
    (62, 63)`"]]
    67[["`#91;RFunctionCall#93; plot
      (67)
      *16.1-22*
    (60, 65)`"]]
    69(["`#91;RSymbol#93; data2
      (69)
      *17.8-12*`"])
    70{{"`#91;RSymbol#93; x
      (70)
      *17.8-14*`"}}
    72[["`#91;RAccess#93; $
      (72)
      *17.8-14*
    (69, 70)`"]]
    74(["`#91;RSymbol#93; data2
      (74)
      *17.17-21*`"])
    75{{"`#91;RSymbol#93; y
      (75)
      *17.17-23*`"}}
    77[["`#91;RAccess#93; $
      (77)
      *17.17-23*
    (74, 75)`"]]
    79[["`#91;RFunctionCall#93; points
      (79)
      *17.1-24*
    (72, 77)`"]]
    82(["`#91;RSymbol#93; data2
      (82)
      *19.12-16*`"])
    83{{"`#91;RSymbol#93; k
      (83)
      *19.12-18*`"}}
    85[["`#91;RAccess#93; $
      (85)
      *19.12-18*
    (82, 83)`"]]
    87[["`#91;RFunctionCall#93; mean
      (87)
      *19.7-19*
    (85)`"]]
    89[["`#91;RFunctionCall#93; print
      (89)
      *19.1-20*
    (87)`"]]
    3 -->|"argument"| 1
    7 -->|"argument"| 5
    11 -->|"argument"| 9
    16 -->|"argument"| 14
    12 -->|"defined-by"| 16
    12 -->|"defined-by"| 17
    17 -->|"argument"| 16
    17 -->|"returns, argument"| 12
    22 -->|"argument"| 20
    18 -->|"defined-by"| 22
    18 -->|"defined-by"| 23
    23 -->|"argument"| 22
    23 -->|"returns, argument"| 18
    26 -->|"reads"| 12
    29 -->|"reads, returns, argument"| 26
    29 -->|"reads, argument"| 27
    31 -->|"reads, argument"| 29
    24 -->|"defined-by"| 31
    24 -->|"defined-by"| 32
    32 -->|"argument"| 31
    32 -->|"returns, argument"| 24
    34 -->|"reads"| 24
    36 -->|"reads, returns, argument"| 34
    38 -->|"reads"| 12
    44 -->|"reads"| 43
    47 -->|"reads"| 46
    48 -->|"reads"| 43
    48 -->|"argument"| 44
    48 -->|"reads"| 46
    48 -->|"argument"| 47
    50 -->|"reads, argument"| 48
    50 -->|"argument"| 38
    52 -->|"argument"| 38
    52 -->|"argument"| 50
    55 -->|"reads, argument"| 52
    55 -->|"reads, argument"| 54
    57 -->|"reads"| 18
    60 -->|"reads, returns, argument"| 57
    60 -->|"reads, argument"| 58
    62 -->|"reads"| 18
    65 -->|"reads, returns, argument"| 62
    65 -->|"reads, argument"| 63
    67 -->|"reads, argument"| 60
    67 -->|"reads, argument"| 65
    69 -->|"reads"| 18
    72 -->|"reads, returns, argument"| 69
    72 -->|"reads, argument"| 70
    74 -->|"reads"| 18
    77 -->|"reads, returns, argument"| 74
    77 -->|"reads, argument"| 75
    79 -->|"reads, argument"| 72
    79 -->|"reads, argument"| 77
    79 -->|"reads"| 67
    82 -->|"reads"| 18
    85 -->|"reads, returns, argument"| 82
    85 -->|"reads, argument"| 83
    87 -->|"reads, argument"| 85
    89 -->|"reads, returns, argument"| 87

(The analysis required 24.85 ms (including parse and normalize, using the r-shell engine) within the generation environment.)

Additionally, consider that you are interested in all function calls which loads data with read_csv. A simple regex-based query could look like this: ^read_csv$. However, this fails to incorporate

Syntax-based information (comments, strings, used as a variable, called as a higher-order function, ...)
Semantic information (e.g., read_csv is overwritten by a function with the same name)
Context information (e.g., calls like points may link to the current plot)

To solve this, flowR provides a query API which allows you to specify queries on the dataflow graph. For the specific use-case stated, you could use the Call-Context Query to find all calls to read_csv which refer functions that are not overwritten.

Just as an example, the following Call-Context Query finds all calls to read_csv that are not overwritten:

[
  {
    "type": "call-context",
    "callName": "^read_csv$",
    "callTargets": "global",
    "kind": "input",
    "subkind": "csv-file"
  }
]

Results (prettified and summarized):

Query: call-context (1 ms)
╰ input
╰ csv-file: read_csv (L.6), read_csv (L.7)
All queries together required ≈1 ms (1ms accuracy, total 10 ms)

Show Detailed Results as Json

The analysis required 10.20 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "call-context": {
    ".meta": {
      "timing": 1
    },
    "kinds": {
      "input": {
        "subkinds": {
          "csv-file": [
            {
              "id": 16,
              "name": "read_csv",
              "calls": []
            },
            {
              "id": 22,
              "name": "read_csv",
              "calls": []
            }
          ]
        }
      }
    }
  },
  ".meta": {
    "timing": 1
  }
}

Call-Context Query

Call context queries can be used to identify calls to specific functions that match criteria of your interest. For now, we support two criteria:

Function Name (callName): The function name is specified by a regular expression. This allows you to find all calls to functions that match a specific pattern. Please note, that if you do not use Regex-Anchors, the query will match any function name that contains the given pattern (you can set the callNameExact property to true to automatically add the ^...$ anchors).
Call Targets (callTargets): This specifies to what the function call targets. For example, you may want to find all calls to a function that is not defined locally.

Besides this, we provide the following ways to automatically categorize and link identified invocations:

Kind (kind): This is a general category that can be used to group calls together. For example, you may want to link all calls to plot to visualize.
Subkind (subkind): This is used to uniquely identify the respective call type when grouping the output. For example, you may want to link all calls to ggplot to plot.
Linked Calls (linkTo): This links the current call to the last call of the given kind. This way, you can link a call like points to the latest graphics plot etc. For now, we only offer support for linking to the last call, as the current flow dependency over-approximation is not stable.
Aliases (includeAliases): Consider a case like f <- function_of_interest, do you want calls to f to be included in the results? There is probably no need to combine this with a global call target!

It's also possible to filter the results based on the following properties:

File (fileFilter): This allows you to filter the results based on the file in which the call is located. This can be useful if you are only interested in calls in, e.g., specific folders. The fileFilter property is an object made up of two properties:

Filter (filter): A regular expression that a node's file attribute must match to be considered.
Include Undefined Files (includeUndefinedFiles): If fileFilter is set, but a node's file attribute is not present, should we include it in the results? Defaults to true.

Re-using the example code from above, the following query attaches all calls to mean to the kind visualize and the subkind text, all calls that start with read_ to the kind input but only if they are not locally overwritten, and the subkind csv-file, and links all calls to points to the last call to plot:

[
  {
    "type": "call-context",
    "callName": "^mean$",
    "kind": "visualize",
    "subkind": "text"
  },
  {
    "type": "call-context",
    "callName": "^read_",
    "kind": "input",
    "subkind": "csv-file",
    "callTargets": "global"
  },
  {
    "type": "call-context",
    "callName": "^points$",
    "kind": "visualize",
    "subkind": "plot",
    "linkTo": {
      "type": "link-to-last-call",
      "callName": "^plot$"
    }
  }
]

Results (prettified and summarized):

Query: call-context (1 ms)
   ╰ input
     ╰ csv-file: read_csv (L.6), read_csv (L.7)
   ╰ visualize
     ╰ text: mean (L.9), mean (L.19)
     ╰ plot: points (L.17) with 1 link (plot (L.16))
All queries together required ≈1 ms (1ms accuracy, total 13 ms)

Show Detailed Results as Json

The analysis required 13.29 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "call-context": {
    ".meta": {
      "timing": 1
    },
    "kinds": {
      "input": {
        "subkinds": {
          "csv-file": [
            {
              "id": 16,
              "name": "read_csv",
              "calls": []
            },
            {
              "id": 22,
              "name": "read_csv",
              "calls": []
            }
          ]
        }
      },
      "visualize": {
        "subkinds": {
          "text": [
            {
              "id": 31,
              "name": "mean"
            },
            {
              "id": 87,
              "name": "mean"
            }
          ],
          "plot": [
            {
              "id": 79,
              "name": "points",
              "linkedIds": [
                67
              ]
            }
          ]
        }
      }
    }
  },
  ".meta": {
    "timing": 1
  }
}

As you can see, all kinds and subkinds with the same name are grouped together. Yet, re-stating common arguments and kinds may be cumbersome (although you can already use clever regex patterns). See the Compound Query for a way to structure your queries more compactly if you think it gets too verbose.

Alias Example

Consider the following code:

foo <- my_test_function
foo()
if(u) bar <- foo
bar()
my_test_function()

Now let's say we want to query all uses of the my_test_function:

[
  {
    "type": "call-context",
    "callName": "^my_test_function",
    "includeAliases": true
  }
]

Results (prettified and summarized):

Query: call-context (1 ms)
╰ .
╰ .: foo (L.2) with 1 alias root (my_test_function (L.1)), bar (L.4) with 1 alias root (my_test_function (L.1)), my_test_function (L.5)
All queries together required ≈1 ms (1ms accuracy, total 5 ms)

Show Detailed Results as Json

The analysis required 5.23 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "call-context": {
    ".meta": {
      "timing": 1
    },
    "kinds": {
      ".": {
        "subkinds": {
          ".": [
            {
              "id": 4,
              "name": "foo",
              "aliasRoots": [
                1
              ]
            },
            {
              "id": 12,
              "name": "bar",
              "aliasRoots": [
                1
              ]
            },
            {
              "id": 14,
              "name": "my_test_function"
            }
          ]
        }
      }
    }
  },
  ".meta": {
    "timing": 1
  }
}

Implementation Details

Responsible for the execution of the Call-Context Query query is executeCallContextQueries in ./src/queries/catalog/call-context-query/call-context-query-executor.ts.

Config Query

This query provides access to the current configuration of the flowR instance. See the Interface wiki page for more information on what the configuration represents.

Implementation Details

Responsible for the execution of the Config Query query is executeConfigQuery in ./src/queries/catalog/config-query/config-query-format.ts.

Dataflow Cluster Query

This query automatically calculates clusters in flowR's dataflow graph and returns a list of all clusters found. Clusters are to be interpreted as literal clusters on the graph traversing edges in both directions. From this perspective, the code x <- 1; x has one cluster (given that all code is related), while the code x <- 1; y has two clusters (given that the y has no relation to the previous definition).

Example x <- 1; x

[
  {
    "type": "dataflow-cluster"
  }
]

Results (prettified and summarized):

Query: dataflow-cluster (1ms)
╰ Found 1 cluster
╰ {3, 0, 1, 2} (marked)
All queries together required ≈1 ms (1ms accuracy, total 2 ms)

Show Detailed Results as Json

The analysis required 2.37 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "dataflow-cluster": {
    ".meta": {
      "timing": 1
    },
    "clusters": [
      {
        "startNode": 3,
        "members": [
          3,
          0,
          1,
          2
        ],
        "hasUnknownSideEffects": false
      }
    ]
  },
  ".meta": {
    "timing": 1
  }
}

Example x <- 1; y

[
  {
    "type": "dataflow-cluster"
  }
]

Results (prettified and summarized):

Query: dataflow-cluster (0ms)
   ╰ Found 2 clusters
      ╰ {3} (marked)
      ╰ {2, 1, 0} (marked)
All queries together required ≈0 ms (1ms accuracy, total 2 ms)

Show Detailed Results as Json

The analysis required 1.62 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "dataflow-cluster": {
    ".meta": {
      "timing": 0
    },
    "clusters": [
      {
        "startNode": 3,
        "members": [
          3
        ],
        "hasUnknownSideEffects": false
      },
      {
        "startNode": 2,
        "members": [
          2,
          1,
          0
        ],
        "hasUnknownSideEffects": false
      }
    ]
  },
  ".meta": {
    "timing": 0
  }
}

Using the example code from above, the following query returns all clusters:

[ { "type": "dataflow-cluster" } ]

Results (prettified and summarized):

Query: dataflow-cluster (0ms)
   ╰ Found 5 clusters
      ╰ {89, 87, 85, 82, 18, 22, ... (see JSON)} (marked)
      ╰ {55, 52, 38, 12, 16, 14, ... (see JSON)} (marked)
      ╰ (has unknown side effect) {11, 9} (marked)
      ╰ (has unknown side effect) {7, 5} (marked)
      ╰ (has unknown side effect) {3, 1} (marked)
All queries together required ≈0 ms (1ms accuracy, total 7 ms)

Show Detailed Results as Json

The analysis required 7.19 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "dataflow-cluster": {
    ".meta": {
      "timing": 0
    },
    "clusters": [
      {
        "startNode": 89,
        "members": [
          89,
          87,
          85,
          82,
          18,
          22,
          20,
          23,
          57,
          60,
          58,
          67,
          65,
          62,
          63,
          79,
          72,
          69,
          70,
          77,
          74,
          75,
          83
        ],
        "hasUnknownSideEffects": false
      },
      {
        "startNode": 55,
        "members": [
          55,
          52,
          38,
          12,
          16,
          14,
          17,
          26,
          29,
          27,
          31,
          32,
          24,
          34,
          36,
          50,
          48,
          43,
          44,
          46,
          47,
          54
        ],
        "hasUnknownSideEffects": false
      },
      {
        "startNode": 11,
        "members": [
          11,
          9
        ],
        "hasUnknownSideEffects": true
      },
      {
        "startNode": 7,
        "members": [
          7,
          5
        ],
        "hasUnknownSideEffects": true
      },
      {
        "startNode": 3,
        "members": [
          3,
          1
        ],
        "hasUnknownSideEffects": true
      }
    ]
  },
  ".meta": {
    "timing": 0
  }
}

Implementation Details

Responsible for the execution of the Dataflow Cluster Query query is executeDataflowClusterQuery in ./src/queries/catalog/cluster-query/cluster-query-executor.ts.

Dataflow Query

Maybe you want to handle only the result of the query execution, or you just need the dataflow graph again. This query type does exactly that!

Using the example code x + 1, the following query returns the dataflow graph of the code:

[ { "type": "dataflow" } ]

Results (prettified and summarized):

Query: dataflow (0 ms)
╰ Dataflow Graph
All queries together required ≈0 ms (1ms accuracy, total 2 ms)

Show Detailed Results as Json

The analysis required 2.04 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

As the code is pretty long, we inhibit pretty printing and syntax highlighting (JSON):

{"dataflow":{".meta":{"timing":0},"graph":{"_idMap":{"size":7,"k2v":[[0,{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}}],[1,{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}}],[2,{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],[3,{"type":"RExpressionList","children":[{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],"info":{"additionalTokens":[],"id":3,"nesting":0,"role":"root","index":0}}],["2-arg",{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],["0-arg",{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}}],["1-arg",{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}}]],"v2k":{}},"_unknownSideEffects":[],"rootVertices":[0,1,2],"vertexInformation":[[0,{"tag":"use","id":0}],[1,{"tag":"value","id":1}],[2,{"tag":"function-call","id":2,"name":"+","onlyBuiltin":true,"args":[{"nodeId":0,"type":32},{"nodeId":1,"type":32}]}]],"edgeInformation":[[2,[[0,{"types":65}],[1,{"types":65}]]]]}},".meta":{"timing":0}}

Original Code

x + 1

Dataflow Graph of the R Code

The analysis required 1.73 ms (including parse and normalize, using the r-shell engine) within the generation environment. We encountered no unknown side effects during the analysis.

flowchart LR
    0(["`#91;RSymbol#93; x
      (0)
      *1.1*`"])
    1{{"`#91;RNumber#93; 1
      (1)
      *1.5*`"}}
    2[["`#91;RBinaryOp#93; #43;
      (2)
      *1.1-5*
    (0, 1)`"]]
    2 -->|"reads, argument"| 0
    2 -->|"reads, argument"| 1

Implementation Details

Responsible for the execution of the Dataflow Query query is executeDataflowQuery in ./src/queries/catalog/dataflow-query/dataflow-query-executor.ts.

Dependencies Query

This query extracts all dependencies from an R script, using a combination of a Call-Context Query and more advanced tracking in the Dataflow Graph.

In other words, if you have a script simply reading: library(x), the following query returns the loaded library:

[ { "type": "dependencies" } ]

Results (prettified and summarized):

Query: dependencies (1 ms)
   ╰ Libraries
       ╰ library
           ╰ Node Id: 3, x
All queries together required ≈1 ms (1ms accuracy, total 3 ms)

Show Detailed Results as Json

The analysis required 2.79 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "dependencies": {
    ".meta": {
      "timing": 1
    },
    "libraries": [
      {
        "nodeId": 3,
        "functionName": "library",
        "libraryName": "x"
      }
    ],
    "sourcedFiles": [],
    "readData": [],
    "writtenData": []
  },
  ".meta": {
    "timing": 1
  }
}

Of course, this works for more complicated scripts too. The query offers information on the loaded libraries, sourced files, data which is read and data which is written. For example, consider the following script:

source("sample.R")
foo <- loadNamespace("bar")

data <- read.csv("data.csv")

#' @importFrom ggplot2 ggplot geom_point aes
ggplot(data, aes(x=x, y=y)) + geom_point()

better::write.csv(data, "data2.csv")
print("hello world!")

The following query returns the dependencies of the script.

[ { "type": "dependencies" } ]

Show Results

Results (prettified and summarized):

Query: dependencies (1 ms)
   ╰ Libraries
       ╰ loadNamespace
           ╰ Node Id: 8, bar
       ╰ ::
           ╰ Node Id: 32, better
   ╰ Sourced Files
       ╰ source
           ╰ Node Id: 3, sample.R
   ╰ Read Data
       ╰ read.csv
           ╰ Node Id: 14, data.csv
   ╰ Written Data
       ╰ write.csv
           ╰ Node Id: 37, data2.csv
       ╰ print
           ╰ Node Id: 41, stdout
All queries together required ≈1 ms (1ms accuracy, total 6 ms)

Show Detailed Results as Json

The analysis required 5.93 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "dependencies": {
    ".meta": {
      "timing": 1
    },
    "libraries": [
      {
        "nodeId": 8,
        "functionName": "loadNamespace",
        "libraryName": "bar"
      },
      {
        "nodeId": 32,
        "functionName": "::",
        "libraryName": "better"
      }
    ],
    "sourcedFiles": [
      {
        "nodeId": 3,
        "functionName": "source",
        "file": "sample.R"
      }
    ],
    "readData": [
      {
        "nodeId": 14,
        "functionName": "read.csv",
        "source": "data.csv"
      }
    ],
    "writtenData": [
      {
        "nodeId": 37,
        "functionName": "write.csv",
        "destination": "data2.csv"
      },
      {
        "nodeId": 41,
        "functionName": "print",
        "destination": "stdout"
      }
    ]
  },
  ".meta": {
    "timing": 1
  }
}

Currently the dependency extraction may fail as it is essentially a set of heuristics guessing the dependencies. We welcome any feedback on this (consider opening a new issue).

In the meantime we offer several properties to overwrite the default behavior (e.g., function names that should be collected)

[
  {
    "type": "dependencies",
    "ignoreDefaultFunctions": true,
    "libraryFunctions": [
      {
        "name": "print",
        "argIdx": 0,
        "argName": "library"
      }
    ],
    "sourceFunctions": [],
    "readFunctions": [],
    "writeFunctions": []
  }
]

Show Results

Results (prettified and summarized):

Query: dependencies (0 ms)
   ╰ Libraries
       ╰ print
           ╰ Node Id: 41, hello world!
All queries together required ≈0 ms (1ms accuracy, total 5 ms)

Show Detailed Results as Json

The analysis required 5.13 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "dependencies": {
    ".meta": {
      "timing": 0
    },
    "libraries": [
      {
        "nodeId": 41,
        "functionName": "print",
        "libraryName": "hello world!"
      }
    ],
    "sourcedFiles": [],
    "readData": [],
    "writtenData": []
  },
  ".meta": {
    "timing": 0
  }
}

Implementation Details

Responsible for the execution of the Dependencies Query query is executeDependenciesQuery in ./src/queries/catalog/dependencies-query/dependencies-query-executor.ts.

Happens-Before Query

With this query you can analyze the control flow graph:

Using the example code:

x <- 1
y <- 2

the following query returns that the first assignment happens always before the other:

[
  {
    "type": "happens-before",
    "a": "1@x",
    "b": "2@y"
  }
]

Results (prettified and summarized):

Query: happens-before (1 ms)
╰ 1@x<2@y: always
All queries together required ≈1 ms (1ms accuracy, total 3 ms)

Show Detailed Results as Json

The analysis required 2.85 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "happens-before": {
    ".meta": {
      "timing": 1
    },
    "results": {
      "1@x<2@y": "always"
    }
  },
  ".meta": {
    "timing": 1
  }
}

Original Code

x <- 1
y <- 2

Dataflow Graph of the R Code

The analysis required 1.48 ms (including parse and normalize, using the r-shell engine) within the generation environment. We encountered no unknown side effects during the analysis.

flowchart LR
    1{{"`#91;RNumber#93; 1
      (1)
      *1.6*`"}}
    0["`#91;RSymbol#93; x
      (0)
      *1.1*`"]
    2[["`#91;RBinaryOp#93; #60;#45;
      (2)
      *1.1-6*
    (0, 1)`"]]
    4{{"`#91;RNumber#93; 2
      (4)
      *2.6*`"}}
    3["`#91;RSymbol#93; y
      (3)
      *2.1*`"]
    5[["`#91;RBinaryOp#93; #60;#45;
      (5)
      *2.1-6*
    (3, 4)`"]]
    0 -->|"defined-by"| 1
    0 -->|"defined-by"| 2
    2 -->|"argument"| 1
    2 -->|"returns, argument"| 0
    3 -->|"defined-by"| 4
    3 -->|"defined-by"| 5
    5 -->|"argument"| 4
    5 -->|"returns, argument"| 3

Implementation Details

Responsible for the execution of the Happens-Before Query query is executeSearch in ./src/queries/catalog/happens-before-query/happens-before-query-executor.ts.

Id-Map Query

This query provides access to all nodes in the normalized AST as a mapping from their id to the node itself.

Using the example code x + 1, the following query returns all nodes from the code:

[ { "type": "id-map" } ]

Results (prettified and summarized):

Query: id-map (0 ms)
╰ Id List: {0, 1, 2, 3, 2-arg, 0-arg, ... (see JSON)}
All queries together required ≈0 ms (1ms accuracy, total 1 ms)

Show Detailed Results as Json

The analysis required 1.23 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

As the code is pretty long, we inhibit pretty printing and syntax highlighting (JSON):

{"id-map":{".meta":{"timing":0},"idMap":{"size":7,"k2v":[[0,{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}}],[1,{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}}],[2,{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],[3,{"type":"RExpressionList","children":[{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],"info":{"additionalTokens":[],"id":3,"nesting":0,"role":"root","index":0}}],["2-arg",{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],["0-arg",{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}}],["1-arg",{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}}]],"v2k":{}}},".meta":{"timing":0}}

Original Code

x + 1

Dataflow Graph of the R Code

The analysis required 1.23 ms (including parse and normalize, using the r-shell engine) within the generation environment. We encountered no unknown side effects during the analysis.

flowchart LR
    0(["`#91;RSymbol#93; x
      (0)
      *1.1*`"])
    1{{"`#91;RNumber#93; 1
      (1)
      *1.5*`"}}
    2[["`#91;RBinaryOp#93; #43;
      (2)
      *1.1-5*
    (0, 1)`"]]
    2 -->|"reads, argument"| 0
    2 -->|"reads, argument"| 1

Implementation Details

Responsible for the execution of the Id-Map Query query is executeIdMapQuery in ./src/queries/catalog/id-map-query/id-map-query-executor.ts.

Lineage Query

This query calculates the lineage of a given slicing criterion. The lineage traces back all parts that the respective variables stems from given the reads, definitions, and returns in the dataflow graph.

To understand this, let's start with a simple example query, to get the lineage of the second use of x in the following code:

x <- 1
x

For this, we use the criterion 2@x (which is the first use of x in the second line).

[
  {
    "type": "lineage",
    "criterion": "2@x"
  }
]

Results (prettified and summarized):

Query: lineage (0 ms)
╰ 2@x: {3, 0, 1, 2}
All queries together required ≈0 ms (1ms accuracy, total 2 ms)

Show Detailed Results as Json

The analysis required 1.57 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "lineage": {
    ".meta": {
      "timing": 0
    },
    "lineages": {
      "2@x": [
        3,
        0,
        1,
        2
      ]
    }
  },
  ".meta": {
    "timing": 0
  }
}

In this simple scenario, the lineage is equivalent to the slice (and in-fact the complete code). In general the lineage is smaller and makes no executability guarantees. It is just a quick and neither complete nor sound way to get information on where the variable originates from.

This query replaces the old request-lineage message.

Implementation Details

Responsible for the execution of the Lineage Query query is executeLineageQuery in ./src/queries/catalog/lineage-query/lineage-query-executor.ts.

Location Map Query

A query like the Id-Map Query query can return a really big result, especially for larger scripts. If you are not interested in all of the information contained within the full map, you can use the location map query to get a simple mapping of ids to their location in the source file.

Consider you have the following code:

x + 1
x * 2

The following query then gives you the aforementioned mapping:

[ { "type": "location-map" } ]

Results (prettified and summarized):

Query: location-map (0 ms)
╰ Id List: {0, 1, 2, 3, 4, 5, 6, ... (see JSON)}
All queries together required ≈0 ms (1ms accuracy, total 6 ms)

Show Detailed Results as Json

The analysis required 6.04 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "location-map": {
    ".meta": {
      "timing": 0
    },
    "map": {
      "0": [
        1,
        1,
        1,
        1
      ],
      "1": [
        1,
        5,
        1,
        5
      ],
      "2": [
        1,
        3,
        1,
        3
      ],
      "3": [
        2,
        1,
        2,
        1
      ],
      "4": [
        2,
        5,
        2,
        5
      ],
      "5": [
        2,
        3,
        2,
        3
      ],
      "2-arg": [
        1,
        3,
        1,
        3
      ],
      "5-arg": [
        2,
        3,
        2,
        3
      ],
      "0-arg": [
        1,
        1,
        1,
        1
      ],
      "1-arg": [
        1,
        5,
        1,
        5
      ],
      "3-arg": [
        2,
        1,
        2,
        1
      ],
      "4-arg": [
        2,
        5,
        2,
        5
      ]
    }
  },
  ".meta": {
    "timing": 0
  }
}

All locations are given as a SourceRange in the format [start-line, start-column, end-line, end-column].

Implementation Details

Responsible for the execution of the Location Map Query query is executeLocationMapQuery in ./src/queries/catalog/location-map-query/location-map-query-executor.ts.

Normalized AST Query

Maybe you want to handle only the result of the query execution, or you just need the normalized AST again. This query type does exactly that!

Using the example code x + 1, the following query returns the normalized AST of the code:

[ { "type": "normalized-ast" } ]

Results (prettified and summarized):

Query: normalized-ast (0 ms)
╰ Normalized AST
All queries together required ≈0 ms (1ms accuracy, total 2 ms)

Show Detailed Results as Json

The analysis required 1.71 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

As the code is pretty long, we inhibit pretty printing and syntax highlighting (JSON):

{"normalized-ast":{".meta":{"timing":0},"normalized":{"ast":{"type":"RExpressionList","children":[{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],"info":{"additionalTokens":[],"id":3,"nesting":0,"role":"root","index":0}},"idMap":{"size":7,"k2v":[[0,{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}}],[1,{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}}],[2,{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],[3,{"type":"RExpressionList","children":[{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],"info":{"additionalTokens":[],"id":3,"nesting":0,"role":"root","index":0}}],["2-arg",{"type":"RBinaryOp","location":[1,3,1,3],"lhs":{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}},"rhs":{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}},"operator":"+","lexeme":"+","info":{"fullRange":[1,1,1,5],"additionalTokens":[],"id":2,"parent":3,"nesting":0,"index":0,"role":"expr-list-child"}}],["0-arg",{"type":"RSymbol","location":[1,1,1,1],"content":"x","lexeme":"x","info":{"fullRange":[1,1,1,1],"additionalTokens":[],"id":0,"parent":2,"role":"binop-lhs","index":0,"nesting":0}}],["1-arg",{"location":[1,5,1,5],"lexeme":"1","info":{"fullRange":[1,5,1,5],"additionalTokens":[],"id":1,"parent":2,"role":"binop-rhs","index":1,"nesting":0},"type":"RNumber","content":{"num":1,"complexNumber":false,"markedAsInt":false}}]],"v2k":{}},".meta":{"timing":1}}},".meta":{"timing":0}}

Original Code

x + 1

Dataflow Graph of the R Code

The analysis required 1.28 ms (including parse and normalize, using the r-shell engine) within the generation environment. We encountered no unknown side effects during the analysis.

flowchart LR
    0(["`#91;RSymbol#93; x
      (0)
      *1.1*`"])
    1{{"`#91;RNumber#93; 1
      (1)
      *1.5*`"}}
    2[["`#91;RBinaryOp#93; #43;
      (2)
      *1.1-5*
    (0, 1)`"]]
    2 -->|"reads, argument"| 0
    2 -->|"reads, argument"| 1

Implementation Details

Responsible for the execution of the Normalized AST Query query is executeNormalizedAstQuery in ./src/queries/catalog/normalized-ast-query/normalized-ast-query-executor.ts.

Resolve Value Query

With this query you can use flowR's value-tracking capabilities to resolve identifiers to all potential values they may have at runtime (if possible). The extend to which flowR traces values (e.g. built-ins vs. constants) can be configured in flowR's Configuration file (see the Interface wiki page for more information).

Using the example code x <- 1 print(x), the following query returns all values of 'x' in the code:

[
  {
    "type": "resolve-value",
    "criteria": [
      "2@x"
    ]
  }
]

Results (prettified and summarized):

Query: resolve-value (0 ms)
╰ Values for {2@x}
╰ 1
All queries together required ≈0 ms (1ms accuracy, total 2 ms)

Show Detailed Results as Json

The analysis required 1.70 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "resolve-value": {
    ".meta": {
      "timing": 0
    },
    "results": {
      "{\"type\":\"resolve-value\",\"criteria\":[\"2@x\"]}": {
        "values": [
          {
            "num": 1,
            "complexNumber": false,
            "markedAsInt": false
          }
        ]
      }
    }
  },
  ".meta": {
    "timing": 0
  }
}

Original Code

x <- 1
print(x)

Dataflow Graph of the R Code

The analysis required 1.49 ms (including parse and normalize, using the r-shell engine) within the generation environment. We encountered unknown side effects (with ids: 6 (linked)) during the analysis.

flowchart LR
    1{{"`#91;RNumber#93; 1
      (1)
      *1.6*`"}}
    0["`#91;RSymbol#93; x
      (0)
      *1.1*`"]
    2[["`#91;RBinaryOp#93; #60;#45;
      (2)
      *1.1-6*
    (0, 1)`"]]
    4(["`#91;RSymbol#93; x
      (4)
      *2.7*`"])
    6[["`#91;RFunctionCall#93; print
      (6)
      *2.1-8*
    (4)`"]]
    0 -->|"defined-by"| 1
    0 -->|"defined-by"| 2
    2 -->|"argument"| 1
    2 -->|"returns, argument"| 0
    4 -->|"reads"| 0
    6 -->|"reads, returns, argument"| 4

Implementation Details

Responsible for the execution of the Resolve Value Query query is executeSearch in ./src/queries/catalog/resolve-value-query/resolve-value-query-executor.ts.

Search Query

With this query you can use the Search API to conduct searches on the flowR analysis result.

Using the example code x + 1, the following query returns all uses of 'x' in the code:

[
  {
    "type": "search",
    "search": {
      "generator": {
        "type": "generator",
        "name": "get",
        "args": {
          "filter": {
            "name": "x"
          }
        }
      },
      "search": [
        {
          "type": "transformer",
          "name": "filter",
          "args": {
            "filter": "use"
          }
        }
      ]
    }
  }
]

Results (prettified and summarized):

Query: search (0 ms)
╰ query: {0}
All queries together required ≈0 ms (1ms accuracy, total 2 ms)

Show Detailed Results as Json

The analysis required 1.71 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "search": {
    ".meta": {
      "timing": 0
    },
    "results": [
      {
        "ids": [
          0
        ],
        "search": {
          "generator": {
            "type": "generator",
            "name": "get",
            "args": {
              "filter": {
                "name": "x"
              }
            }
          },
          "search": [
            {
              "type": "transformer",
              "name": "filter",
              "args": {
                "filter": "use"
              }
            }
          ]
        }
      }
    ]
  },
  ".meta": {
    "timing": 0
  }
}

Original Code

x + 1

Dataflow Graph of the R Code

The analysis required 1.28 ms (including parse and normalize, using the r-shell engine) within the generation environment. We encountered no unknown side effects during the analysis.

flowchart LR
    0(["`#91;RSymbol#93; x
      (0)
      *1.1*`"])
    1{{"`#91;RNumber#93; 1
      (1)
      *1.5*`"}}
    2[["`#91;RBinaryOp#93; #43;
      (2)
      *1.1-5*
    (0, 1)`"]]
    2 -->|"reads, argument"| 0
    2 -->|"reads, argument"| 1

Implementation Details

Responsible for the execution of the Search Query query is executeSearch in ./src/queries/catalog/search-query/search-query-executor.ts.

Static Slice Query

To slice, flowR needs one thing from you: a variable or a list of variables (function calls are supported to, referring to the anonymous return of the call) that you want to slice the dataflow graph for. Given this, the slice is essentially the subpart of the program that may influence the value of the variables you are interested in. To specify a variable of interest, you have to present flowR with a slicing criterion (or, respectively, an array of them).

To exemplify the capabilities, consider the following code:

x <- 1
y <- 2
x

If you are interested in the parts required for the use of x in the last line, you can use the following query:

[
  {
    "type": "static-slice",
    "criteria": [
      "3@x"
    ]
  }
]

Results (prettified and summarized):

Query: static-slice (2 ms)
╰ Slice for {3@x}
╰ Code (newline as \n): x <- 1\nx
All queries together required ≈2 ms (1ms accuracy, total 4 ms)

Show Detailed Results as Json

The analysis required 3.67 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "static-slice": {
    ".meta": {
      "timing": 2
    },
    "results": {
      "{\"type\":\"static-slice\",\"criteria\":[\"3@x\"]}": {
        "slice": {
          "timesHitThreshold": 0,
          "result": [
            6,
            0,
            1,
            2
          ],
          "decodedCriteria": [
            {
              "criterion": "3@x",
              "id": 6
            }
          ],
          ".meta": {
            "timing": 1
          }
        },
        "reconstruct": {
          "code": "x <- 1\nx",
          "linesWithAutoSelected": 0,
          ".meta": {
            "timing": 1
          }
        }
      }
    }
  },
  ".meta": {
    "timing": 2
  }
}

In general you may be uninterested in seeing the reconstructed version and want to save some computation time, for this, you can use the noReconstruction flag.

No Reconstruction Example

[
  {
    "type": "static-slice",
    "criteria": [
      "3@x"
    ],
    "noReconstruction": true
  }
]

Results (prettified and summarized):

Query: static-slice (1 ms)
╰ Slice for {3@x} no reconstruction
╰ Id List: {6, 0, 1, 2}
All queries together required ≈1 ms (1ms accuracy, total 2 ms)

Show Detailed Results as Json

The analysis required 2.24 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "static-slice": {
    ".meta": {
      "timing": 1
    },
    "results": {
      "{\"type\":\"static-slice\",\"criteria\":[\"3@x\"],\"noReconstruction\":true}": {
        "slice": {
          "timesHitThreshold": 0,
          "result": [
            6,
            0,
            1,
            2
          ],
          "decodedCriteria": [
            {
              "criterion": "3@x",
              "id": 6
            }
          ],
          ".meta": {
            "timing": 1
          }
        }
      }
    }
  },
  ".meta": {
    "timing": 1
  }
}

You can disable magic comments using the noMagicComments flag. This query replaces the old request-slice message.

Implementation Details

Responsible for the execution of the Static Slice Query query is executeStaticSliceQuery in ./src/queries/catalog/static-slice-query/static-slice-query-executor.ts.

Compound Query

A compound query comes in use, whenever we want to state multiple queries of the same type with a set of common arguments. It offers the following properties of interest:

Query (query): the type of the query that is to be combined.
Common Arguments (commonArguments): The arguments that are to be used as defaults for all queries (i.e., any argument the query may have).
Arguments (arguments): The other arguments for the individual queries that are to be combined.

For example, consider the following compound query that combines two call-context queries for mean and print, both of which are to be assigned to the kind visualize and the subkind text (using the example code from above):

[
  {
    "type": "compound",
    "query": "call-context",
    "commonArguments": {
      "kind": "visualize",
      "subkind": "text"
    },
    "arguments": [
      {
        "callName": "^mean$"
      },
      {
        "callName": "^print$"
      }
    ]
  }
]

Results (prettified and summarized):

Query: call-context (0 ms)
╰ visualize
╰ text: mean (L.9), print (L.10), mean (L.19), print (L.19)
All queries together required ≈1 ms (1ms accuracy, total 6 ms)

Show Detailed Results as Json

The analysis required 6.16 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "call-context": {
    ".meta": {
      "timing": 0
    },
    "kinds": {
      "visualize": {
        "subkinds": {
          "text": [
            {
              "id": 31,
              "name": "mean"
            },
            {
              "id": 36,
              "name": "print"
            },
            {
              "id": 87,
              "name": "mean"
            },
            {
              "id": 89,
              "name": "print"
            }
          ]
        }
      }
    }
  },
  ".meta": {
    "timing": 1
  }
}

Of course, in this specific scenario, the following query would be equivalent:

[
  {
    "type": "call-context",
    "callName": "^(mean|print)$",
    "kind": "visualize",
    "subkind": "text"
  }
]

Show Results

Results (prettified and summarized):

Query: call-context (0 ms)
╰ visualize
╰ text: mean (L.9), print (L.10), mean (L.19), print (L.19)
All queries together required ≈0 ms (1ms accuracy, total 6 ms)

Show Detailed Results as Json

The analysis required 5.80 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "call-context": {
    ".meta": {
      "timing": 0
    },
    "kinds": {
      "visualize": {
        "subkinds": {
          "text": [
            {
              "id": 31,
              "name": "mean"
            },
            {
              "id": 36,
              "name": "print"
            },
            {
              "id": 87,
              "name": "mean"
            },
            {
              "id": 89,
              "name": "print"
            }
          ]
        }
      }
    }
  },
  ".meta": {
    "timing": 0
  }
}

However, compound queries become more useful whenever common arguments can not be expressed as a union in one of their properties. Additionally, you can still overwrite default arguments. In the following, we (by default) want all calls to not resolve to a local definition, except for those to print for which we explicitly want to resolve to a local definition:

[
  {
    "type": "compound",
    "query": "call-context",
    "commonArguments": {
      "kind": "visualize",
      "subkind": "text",
      "callTargets": "global"
    },
    "arguments": [
      {
        "callName": "^mean$"
      },
      {
        "callName": "^print$",
        "callTargets": "local"
      }
    ]
  }
]

Results (prettified and summarized):

Query: call-context (0 ms)
╰ visualize
╰ text: mean (L.9) with 1 call (built-in), mean (L.19) with 1 call (built-in)
All queries together required ≈0 ms (1ms accuracy, total 10 ms)

Show Detailed Results as Json

The analysis required 9.96 ms (including parsing and normalization and the query) within the generation environment.

In general, the JSON contains the Ids of the nodes in question as they are present in the normalized AST or the dataflow graph of flowR. Please consult the Interface wiki page for more information on how to get those.

{
  "call-context": {
    ".meta": {
      "timing": 0
    },
    "kinds": {
      "visualize": {
        "subkinds": {
          "text": [
            {
              "id": 31,
              "name": "mean",
              "calls": [
                "built-in"
              ]
            },
            {
              "id": 87,
              "name": "mean",
              "calls": [
                "built-in"
              ]
            }
          ]
        }
      }
    }
  },
  ".meta": {
    "timing": 0
  }
}

Now, the results no longer contain calls to plot that are not defined locally.

Implementation Details

Responsible for the execution of the Compound Query query is executeCompoundQueries in ./src/queries/virtual-query/compound-query.ts.

Currently maintained by Florian Sihler
Email | GitHub | Penguins | Portfolio

flowR Home
- Setup
- Overview
- Interface
- Core
  - Normalized AST
  - Dataflow Graph
- Linting and Testing
  - Benchmarks
Extra Information

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query API

The Query Format

Why Queries?

Call-Context Query

Config Query

Dataflow Cluster Query

Dataflow Query

Dependencies Query

Happens-Before Query

Id-Map Query

Lineage Query

Location Map Query

Normalized AST Query

Resolve Value Query

Search Query

Static Slice Query

Compound Query

Clone this wiki locally