Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: add parent symbol tables #680

Merged
merged 2 commits into from
Feb 17, 2025

Conversation

niklasdewally
Copy link
Collaborator

Change SymbolTable to hold a pointer to a parent symbol table.

  • This required a large-scale refactor of the symbol-table API, and its
    calling code due to the use of Rc and RefCell. See SYMBOL TABLE.

  • These changes broke the de/serialisation of the model using serde;
    therefore, this commit also adds de/serialisation helpers for
    Rc<RefCell<>> types. See DE/SERIALISATION.

This commit is the first in a series adding lexical scoping to
Conjure-Oxide. Planned commits are as follows:

  • Add Submodel. This will represent a particular lexical scope,
    containing the local symbol table for that scope and its
    constraints. Model will become a wrapper over Submodel, allowing
    it to act as a one while also containing global model information
    (such as dominance information).

  • Add the Scope expression type to represent sub-models inside a
    model.

  • Make the rewriter keep track of the sub-model it is in and the
    symbol table for that model. This will allow rules to run inside a
    scope, acting on the local variables defined in its symbol table.

SYMBOL TABLE

Using the parent pointer, a symbol table for a scope can lookup all
declarations in its enclosing scopes.

A symbol table first looks at its declarations to see if the given name
is defined in itself. If not, As the most-local declaration for a name
is returned, variable shadowing works as expected.

As parent symbol tables must be pointed to by all of their children,
symbol-tables are now contained in Rc<RefCell<SymbolTable>> pointers.

Due to the borrow and RAII semantics of Rc and RefCell, this made an
API that returns values as borrows no longer possible. Things that used
to return borrows now return Refs or Rcs. The new intermediate type,
Declaration was added primarily to help with borrow checking and Rc
RefCell related issues.

DE/SERIALISATION

Serde does not support Rc types by default. There is a feature flag,
rc to enable support, however this does not retain shared pointers:

Serializing a data structure containing reference-counted pointers
will serialize a copy of the inner value of the pointer each time a
pointer is referenced within the data structure. Serialization will
not attempt to deduplicate these repeated data.

Deserializing a data structure containing reference-counted pointers
will not attempt to deduplicate references to the same data. Every
deserialized pointer will end up with a strong count of 1.

-- https://serde.rs/feature-flags.html

Having parent pointers point to the same symbol table is important in
our case, so this is inadequate.

This commit adds two de/serialisation helpers, RcRefCellAsId and
RcRefCellAsInner. These can be used for Rc<RefCell<>> fields using
serde_as.

Types that use these helpers must implement the HasId and
DefaultFromId traits.

  • HasId requires that a type has a function id() that returns a unique
    id for every distinct instance of that type in memory.

  • DefaultFromId allows creating an instance of a type by just it's
    id.

For symbol tables, we generate these using a static atomic. We also add
a custom implementation of clone so that cloning a symbol table changes
its id. Thus, two symbol tables have the same ids iff they have the same
memory address.

The helpers do the following:

  • RcRefCellAsInner works similarly to serde's rc flag, storing a copy
    of the data inside the RcRefCell<<>>, and deserialising it without
    deduplication.

  • RcRefCellAsId stores just the id on serialisation, and on
    de-serialisation creates a dummy object the id using DefaultFromId.

For symbol tables, we plan to store a full version of each in its
submodel using RcRefCellAsInner, and store the parent pointers as ids
using RcRefCellAsId.

Post-deserialisation, the shared pointers can be restored by
deduplicating pointers to objects with a given id with one to a shared
object. Note that it is up to the caller code to do this
post-deserialization step, otherwise this works identically to serdes
rc flag.

To enforce this post-deserialisation step in Conjure Oxide, this commit
adds a new type SerdeModel. Only a SerdeModel is de/serialisable,
not a Model. To use a SerdeModel, it must be converted to a Model
by calling SerdeModel::initialize(), which will do this
post-deserialisation step. This will be implemented fully in a later
commit, once multiple submodels are representable inside the model.

This initialisation step also fixes the setting context pointer in
deserialised models.

Change `SymbolTable` to hold a pointer to a parent symbol table.

+ This required a large-scale refactor of the symbol-table API, and its
  calling code due to the use of `Rc` and `RefCell`. See SYMBOL TABLE.

+ These changes broke the de/serialisation of the model using `serde`;
  therefore, this commit also adds de/serialisation helpers for
  `Rc<RefCell<>>` types. See DE/SERIALISATION.

This commit is the first in a series adding lexical scoping to
Conjure-Oxide. Planned commits are as follows:

  + Add `Submodel`. This will represent a particular lexical scope,
    containing the local symbol table for that scope and its
    constraints. `Model` will become a wrapper over `Submodel`, allowing
    it to act as a one while also containing global model information
    (such as dominance information).

  + Add the `Scope` expression type to represent sub-models inside a
    model.

  + Make the rewriter keep track of the sub-model it is in and the
    symbol table for that model. This will allow rules to run inside a
    scope, acting on the local variables defined in its symbol table.

SYMBOL TABLE

Using the parent pointer, a symbol table for a scope can lookup all
declarations in its enclosing scopes.

A symbol table first looks at its declarations to see if the given name
is defined in itself. If not, As the most-local declaration for a name
is returned, variable shadowing works as expected.

As parent symbol tables must be pointed to by all of their children,
symbol-tables are now contained in `Rc<RefCell<SymbolTable>>` pointers.

Due to the borrow and RAII semantics of `Rc` and `RefCell`, this made an
API that returns values as borrows no longer possible. Things that used
to return borrows now return `Ref`s or `Rc`s. The new intermediate type,
`Declaration` was added primarily to help with borrow checking and `Rc`
`RefCell` related issues.

DE/SERIALISATION

Serde does not support `Rc` types by default. There is a feature flag,
`rc` to enable support, however this does not retain shared pointers:

   > Serializing a data structure containing reference-counted pointers
   > will serialize a copy of the inner value of the pointer each time a
   > pointer is referenced within the data structure. Serialization will
   > not attempt to deduplicate these repeated data.
   >
   > Deserializing a data structure containing reference-counted pointers
   > will not attempt to deduplicate references to the same data. Every
   > deserialized pointer will end up with a strong count of 1.

   -- https://serde.rs/feature-flags.html

Having parent pointers point to the same symbol table is important in
our case, so this is inadequate.

This commit adds two de/serialisation helpers, `RcRefCellAsId` and
`RcRefCellAsInner`. These can be used for `Rc<RefCell<>>` fields using
`serde_as`.

Types that use these helpers must implement the `HasId` and
`DefaultFromId` traits.

  + `HasId` requires that a type has a function `id()` that returns a unique
     id for every distinct instance of that type in memory.

  + `DefaultFromId` allows creating an instance of a type by just it's
     id.

For symbol tables, we generate these using a static atomic. We also add
a custom implementation of clone so that cloning a symbol table changes
its id. Thus, two symbol tables have the same ids iff they have the same
memory address.

The helpers do the following:

  + `RcRefCellAsInner` works similarly to serde's `rc` flag, storing a copy
    of the data inside the `RcRefCell<<>>`, and deserialising it without
    deduplication.

  + `RcRefCellAsId` stores just the id on serialisation, and on
    de-serialisation creates a dummy object the id using `DefaultFromId`.

For symbol tables, we plan to store a full version of each in its
submodel using `RcRefCellAsInner`, and store the parent pointers as ids
using `RcRefCellAsId`.

Post-deserialisation, the shared pointers can be restored by
deduplicating pointers to objects with a given id with one to a shared
object. Note that it is up to the caller code to do this
post-deserialization step, otherwise this works identically to serdes
`rc` flag.

To enforce this post-deserialisation step in Conjure Oxide, this commit
adds a new type `SerdeModel`. Only a `SerdeModel` is de/serialisable,
not a `Model`. To use a `SerdeModel`, it must be converted to a `Model`
by calling `SerdeModel::initialize()`, which will do this
post-deserialisation step. This will be implemented fully in a later
commit, once multiple submodels are representable inside the model.

This initialisation step also fixes the setting context pointer in
deserialised models.
@niklasdewally niklasdewally self-assigned this Feb 17, 2025
Copy link
Contributor

Code and Documentation Coverage Report

Documentation Coverage

This PR:

conjure_core: 3% with examples, 55% documented -- 8/144/271
conjure_oxide: 0% with examples, 19% documented -- 0/4/36
minion_rs: 30% with examples, 82% documented -- 2/23/112
tree-morph: 42% with examples, 100% documented -- 8/22/22

Main:

conjure_core: 4% with examples, 51% documented -- 8/119/239
conjure_oxide: 0% with examples, 19% documented -- 0/4/36
minion_rs: 30% with examples, 82% documented -- 2/23/112
tree-morph: 42% with examples, 100% documented -- 8/22/22

View full documentation coverage for main, this PR

Code Coverage Summary

This PR: Detailed Report

  functions..: 65.8% (644 of 979 functions)
  branches...: no data found

Main: Detailed Report

  functions..: 66.9% (628 of 939 functions)
  branches...: no data found

Coverage Main & PR Coverage Change

Functions coverage changed by -1.10% and covered lines changed by 16
Branches... coverage: No comparison data available

@ozgurakgun
Copy link
Contributor

LGTM modulo the hygiene check.

cc @lixitrixi @gskorokhod @YehorBoiar @Soph1514 for info

@niklasdewally niklasdewally merged commit 8636432 into main Feb 17, 2025
13 of 14 checks passed
@niklasdewally niklasdewally deleted the nik/scoping/refactor-symbol-table branch February 17, 2025 11:53
niklasdewally added a commit that referenced this pull request Feb 17, 2025
Add `Submodel`, and refactor `Model` to be a wrapper over `Submodel`.

This commit is part of on-going work to add lexical scope to Conjure
Oxide and is a follow up to 8636432 (feat!: add parent symbol tables
(#680), 2025-02-17).

DETAILS

A `Submodel` represents a particular scope and holds the symbol-table
and constraints tree for that scope.

This commit refactors `Model` to be a wrapper over `Submodel`, and
removes methods operating on constraints and the symbol table from
`Model`, placing them in `Submodel` instead. A `Model` can be borrowed as
a `Submodel` using `as_submodel() and `as_submodel_mut()`.

The language semantics of a top level model and a sub-model are
identical, so treating it as `Submodel` in most cases is valid.

`Model` is a separate type than `Submodel` for the following reasons:

  + It will hold global-only information in the future, such as dominance
    constraints.

  + It holds a pointer to the context.

  + We need special initialisation and de-serialisation logic for the
    top level model that we do not want for `Submodel`. See
    `SerdeModel`.
niklasdewally added a commit that referenced this pull request Feb 17, 2025
Add `Submodel`, and refactor `Model` to be a wrapper over `Submodel`.

This commit is part of on-going work to add lexical scope to Conjure
Oxide and is a follow up to 8636432 (feat!: add parent symbol tables
(#680), 2025-02-17).

DETAILS

A `Submodel` represents a particular scope and holds the symbol-table
and constraints tree for that scope.

This commit refactors `Model` to be a wrapper over `Submodel`, and
removes methods operating on constraints and the symbol table from
`Model`, placing them in `Submodel` instead. A `Model` can be borrowed as
a `Submodel` using `as_submodel()` and `as_submodel_mut()`.

The language semantics of a top level model and a sub-model are
identical, so treating it as `Submodel` in most cases is valid.

`Model` is a separate type than `Submodel` for the following reasons:

  + It will hold global-only information in the future, such as dominance
    constraints.

  + It holds a pointer to the context.

  + We need special initialisation and de-serialisation logic for the
    top level model that we do not want for `Submodel`. See
    `SerdeModel`.
niklasdewally added a commit that referenced this pull request Feb 17, 2025
Add `Submodel`, and refactor `Model` to be a wrapper over `Submodel`.

This commit is part of on-going work to add lexical scope to Conjure
Oxide and is a follow up to 8636432 (feat!: add parent symbol tables
(#680), 2025-02-17).

DETAILS

A `Submodel` represents a particular scope and holds the symbol-table
and constraints tree for that scope.

This commit refactors `Model` to be a wrapper over `Submodel`, and
removes methods operating on constraints and the symbol table from
`Model`, placing them in `Submodel` instead. A `Model` can be borrowed as
a `Submodel` using `as_submodel()` and `as_submodel_mut()`.

The language semantics of a top level model and a sub-model are
identical, so treating it as `Submodel` in most cases is valid.

`Model` is a separate type than `Submodel` for the following reasons:

  + It will hold global-only information in the future, such as dominance
    constraints.

  + It holds a pointer to the context.

  + We need special initialisation and de-serialisation logic for the
    top level model that we do not want for `Submodel`. See
    `SerdeModel`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants