Enable `spago install` to work on advanced Dhall expressions #849

JordanMartinez · 2022-02-05T00:53:06Z

Description of the change

Fixes #813. This is the same work as #815 but with a very simplified commit history (largely to see how much is being added when tests are not included) that has been updated to work on master. Unlike #815, this also includes an explanation for this PR.

Checklist:

Added the change to the "Unreleased" section of the changelog
Added some example of the new feature to the README
Added a test for the contribution (if applicable)

Overview of PR

Currently, spago install works by finding the first dependencies key in the Dhall expression. If the value stored at the key is an empty list (i.e. [] : List Text) or a list literal (i.e. [ "effect", "prelude"]), spago adds the dependencies to the list and succeeds. Spago will crash when one uses a list append (i.e. config.dependencies # someOtherList) or any expression more complicated than that.

The current spago install implementation takes a very simplistic approach to installing dependencies because of how complex it can be to correctly update the raw Dhall expression's source code (i.e. this PR) without doing something wrong. Enabling spago install to work (and perhaps in the future spago uninstall) requires dealing with more complex possible Dhall expressions.

The present PR enables spago install to work on these more advanced features by taking the below approach:

Fully normalize the spago.dhall file before editing it (i.e. resolve imports like https://github.com/purescript/package-sets/releases/download/psc-0.14.1-20210419/packages.dhall)
Of the dependencies the user wants to install, determine which actually need to be installed (i.e. difference requestedPkgs currentPkgs)
Attempt to produce a new raw Dhall expression by traversing through the expression and applying the update, which should succeed if the Dhall expression matches Spago's expected config schema.
Verify that the newly produced Dhall expression would produce the current unmodified Config value plus the new packages to install
If the verification passes, overwrite the spago.dhall file with the new Dhall expression

Possible Concerns

Obviously, there's no way to guarantee that this command will succeed 100% of the time, but I don't think that's a major concern due to widespread usage of source control. There could be a bug in this implementation that causes it to fail. The verification step above will guarantee that the spago.dhall file isn't edited unless it's a valid change. However, if this check ever produces a false positive, then one's invalid spago.dhall file could cause future spago commands to fail. Moreover, the user would need to revert these changes done by the CLI. Since most if not all users are using source control, this isn't a huge concern to me. In 80% of the cases, this command should work. In the 20% that it doesn't succeed or produces an invalid file that breaks future runs, source control can revert the changes and the user can edit the file manually.

One potential issue is if spago install is run with the offline flag (see #841). If the Internet cannot be accessed, then we can no longer do Step 1, which means there isn't a way to do Step 4. Rather, we'd have to assume that the package exists in the config folder and that the user is ok with creating a spago.dhall file that may not work. Since most people are using source control, I don't personally see this as an issue because discarding these changes is easy.

Another issue with this PR is the maintenance burden it imposes. This PR adds 1.8KLOC to this codebase, most of which are test files, verifying that the implementation is correct. The heart of this PR is the Spago/Config/AST.hs file (680 LOC, a lot of which is documentation). If we ever need to change the config format, at the very least, many of these test files would need to be updated. One possible way to implement this feature without imposing its corresponding maintenance burden is to port the updating code to its own library and have spago depend on it. I'm not sure how that would affect the test files (i.e. which would stay and which would be removed), but it relieves the burden of having to be directly responsible for any issues that may arise.

Implementation Details

There are a few things that make this implementation tricky. A raw Dhall expression...

might have unresolved imports (e.g. the HTTP url above or a reference to a local file).
might have let bindings where editing the binding's definition may introduce a side-effect.
must be traversed correctly

Dealing with Unresolved Imports

Regarding the first... If we are trying to update something in the expression and we hit this case, the implementation doesn't know what the value represented by that import is. In that case, it must make a "best guess" effort. For example, the spago.dhall file always has at least one unresolved import when it references the packages.dhall file. Another example is taking an existing spago.dhall file and adding or overwriting some part of it.
let config = ./spago.dhall in config with sources = config.sources # [ "example/**/*.purs" ]
However, our "best guess" isn't entirely uninformed. When the original raw Dhall expression is fully normalized (i.e. Step 1), we can verify that the expression produces a valid Config value that has a dependencies key whose contents are a list of text (i.e. packages). As long as we traverse through the expression correctly, we know what changes need to be done if we hit such a case.

For example, each of the following expressions below (where some parts of the spago.dhall file has been omitted for brevity) has an unresolved import. The commented out code indicate what the correct update would be.

-- here we know that the import produces a Spago config value
-- because the full normalization part (i.e. Step 1) produces a valid Config value
{- let config = -} ./spago.dhall -- in config with dependencies = config.dependencies # [ "new" ]

-- here we know that the import produces a `List Text` value
-- because the full normalization part (i.e. Step 1) produces a valid Config value
{ dependencies = ./someList.dhall {- # [ "new" ] -}, ... }

These updates are safe because the full normalization (Step 1) has already told us that the expression eventually normalizes to a valid Spago schema. If we cannot do Step 1 (e.g. offline flag is enabled), we can still attempt to make these updates, but all risk is on the user at that point.

Updating let bindings correctly

Regarding the second... This is the issue I ran into here. Let's assume we have the following Dhall expression:

let someValue = [ "a", "b", "c" ]
in { dependencies = someValue, ... }

Ideally, running spago install new would update the above expression to

let someValue = [ "a", "b", "c", "new" ]
in { dependencies = someValue, ... }

rather than

let someValue = [ "a", "b", "c" ]
in { dependencies = someValue # [ "new" ], ... }

It's more idiomatic in style. However, this can (but doesn't necessarily) introduce a side-effect. Let's change the initial Dhall expression to the following (this might be incorrect Dhall, but it gets the point across):

let someValue = [ "a", "b", "c" ]
in { dependencies = someValue, name = List.fold someValue, ... }

Applying the same "stylistic" logic above to this new expression, we unintentionally modify the name field. If we had used the "unsylistic" logic, no side-effect would have occurred.

This might not seem like an issue, but consider the idea of targets. In #681, targets were originally proposed using let bindings, so that one target's dependencies and source globs could be easily depended upon by another target:

let packages = ./packages.dhall
let main = { packages = packages
           , dependencies = [ "dep1" ,"dep2" ]
           , sources = [ "src/**/*.purs" ]
           }
let test = { packages = packages
           , dependencies = main.dependencies # [ "spec" ]
           , sources = main.sources # [ "test/**/*.purs" ]
           }
in { name = "my-project"
   , target = { main, test }
   }

If we run spago install "new" on the main target, we want to update the main binding's dependencies field, so that the change propagates to the test target's dependencies. In other words, we want to use a "stylistic" logic update.

However, this is not as ambiguous as it may appear. So long as we verify that the only change made to the expression was what we intended to do, the "stylistic" logic update is valid. If it fails, we could fallback to the "unsylistic" logic. This PR's current state always uses the "unstylistic" logic. It should be updated to use this "fallback" idea.

Traversing the Expression

Updating a raw Dhall expression entails traversing through it correctly and then making an change in the correct location, so that future parses produces the intended value. We can think of the expression as a Zipper. As long as we know the "shape" of the Zipper, we can traverse through it to the correct location, apply our change, and then recurse back to the root, rebuilding the surrounding structure along the way.

The spago.dhall file format is just an eventual Record expression that eventually refers to a "dependencies" field that eventually refers to a list of text values. By "eventual"/"eventually", I mean there may be other expressions in-between one part (e.g. the root expression) and the next part (e.g. the "dependencies" field).

Fortunately, the traversal logic is pretty straight forward.

Store a stack of keys that represent the fields we need to "go down" before we get to the expression we wish to update. Initially, this is just ["dependencies"].
Start at the root expression
Follow the expression's structure
- If we come across a field projection (e.g. { config = ./spago.dhall }.config, we add the new field to the stack as we have to "go down" that key before we can "go down" the key currently at the top of the key stack.
- If we come across a field that matches the key on the stack, we "go down" it and then pop that key off the stack.
- If the key stack becomes empty, we are in the "dependencies" value's expression. We never want to recurse back out of this place as it may produce the "let binding side-effect" mentioned above.
- If we come across a List-like value, we apply the update

Finally, for some other context, see this summary I wrote for the Contributors working group call

f-f · 2022-02-06T14:24:05Z

Thanks for putting this together @JordanMartinez!
Since that replaces #815 I'll go ahead and close that one. As I mentioned there I'd like to take a stance on this once we start using the new configuration format. I'm putting together a proof of concept of that based on #842, so I'll get back to this soon

JordanMartinez added 3 commits February 4, 2022 16:55

Enable spago install to work on more advanced dhall expressions

7fac78b

Add tests

f91fa37

Add changelog entry

31007c7

f-f mentioned this pull request Feb 6, 2022

Make spago install work when more advanced Dhall expressions are used #815

Closed

3 tasks

JordanMartinez closed this Oct 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable `spago install` to work on advanced Dhall expressions #849

Enable `spago install` to work on advanced Dhall expressions #849

JordanMartinez commented Feb 5, 2022

f-f commented Feb 6, 2022

Enable spago install to work on advanced Dhall expressions #849

Enable spago install to work on advanced Dhall expressions #849

Conversation

JordanMartinez commented Feb 5, 2022

Description of the change

Checklist:

Overview of PR

Possible Concerns

Implementation Details

Dealing with Unresolved Imports

Updating let bindings correctly

Traversing the Expression

f-f commented Feb 6, 2022

Enable `spago install` to work on advanced Dhall expressions #849

Enable `spago install` to work on advanced Dhall expressions #849