Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track evaluation dependencies and cache results #1517

Merged
merged 5 commits into from
Nov 9, 2022

Conversation

jonatanklosko
Copy link
Member

@jonatanklosko jonatanklosko commented Nov 8, 2022

This adds a mechanism for tracking how cells depend on each other in terms of variables, imports, aliases, modules, process dictionary, etc. Based on this information cells are marked as "stale" and reevaluated only if necessary.

Note: this requires Elixir v1.14.2 to work as expected for variables.

Motivation

Currently, whenever a cell is evaluated, all subsequent cells are marked as stale and require reevaluation. This happens regardless of whether those cells depend on the evaluated cell. This simple approach ensures reproducability by always evaluating cells sequentially.

The main issue with this "greedy" approach is that a cell may do a long computation and changing anything above it require running the long computation again.

Idea

We now track which identifiers each cell references and defines (or redefines), then when a cell is reevaluated we know which cells it affects and we mark only those as "stale".

At evaluation level, instead of storing full evaluation context (all variables/aliases after an evaluation), we store diffs (new variables/aliases defined during an evaluation). Then, when evaluating a cell, we combine all the diffs from previous cells into full evaluation context. For example:

# Cell 1
x = 1

# Cell 2
y = 1

# Cell 3
x + y

The diffs for cells 1 and 2 are [x: 1] and [y: 1] respectively. Now, when we change the first cell to x = 2, the diff becomes [x: 2]. Then to evaluate cell 3 we merge [x: 2] with [y: 1] and have [x: 2, y: 1] as the context, without reevaluating cell 2.

Implementation details

Session data

On the Livebook side, each cell has an additional information:

%{
  ...,
  identifiers_used: list(identifier :: term()) | :unknown,
  identifiers_defined: %{(identifier :: term()) => version :: term()},
}

An identifier can be anything, a variable name, a module name, a fixed term such as :pdict. Each defined identifier has a version, which again can be anything, an hash digest, a random id, a fixed value.

This information is used when computing which cells are stale. To determine cell validity we already compute snapshots, but now a cell snapshot looks only at the parent cells that define identifiers used by that cell, and the identifier versions.

Evaluator

On the Runtime side (specifically in the evaluator), after an evaluation we determine the identifiers it depends on, mostly by using a compilation tracer. The identifiers are reported/tracked with varying granularity, for example we have {:variable, name}, {:module, name} to track individual variables/modules, but we also have a single identifier :pdict to atomically track the process dictionary.

Depending on the identifier type, we approach the "version" differently:

  • for variables it's a random id (reevaluating a cell like x = 1 changes the snapshots anyway)
  • for modules we compute MD5
  • for pdict and imports we compute phash2
  • for aliases we use the alias expanded value
  • for requires we use a fixed :ok version

@jonatanklosko jonatanklosko merged commit 484e471 into main Nov 9, 2022
@jonatanklosko jonatanklosko deleted the jk-evaluation-caching branch November 9, 2022 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants