Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project dependencies #1477

Open
edvald opened this issue Jan 5, 2020 · 3 comments
Open

Project dependencies #1477

edvald opened this issue Jan 5, 2020 · 3 comments

Comments

@edvald
Copy link
Collaborator

edvald commented Jan 5, 2020

Background

Our current mechanism for importing other repositories as remote sources is handy in some cases but also fairly crude. Essentially it allows importing another repository as a new directory of sources, much like a submodule, which is then scanned for modules like any other directory.

The motivation for that feature was primarily
A) to remove barriers for adopting Garden when projects are already spread across repositories, and
B) to (at least half way) support having related projects that can function individually, but also be combined into a single stack.

The main issue with the current implementation (as I see it), is that it completely ignores any project configuration contained within, and is by extension not much more useful than simply using git submodules or subtrees. In fact, the latter (despite their own UX flaws) are probably preferable since those are known and standard ways of nesting repositories.

Another issue is that different projects will need to be aware of each other to some degree, because we currently only have a single namespace for module names, and another for services and tasks combined. This is kinda workable by applying strict conventions, but hardly ideal. And, because all modules are treated equally, they all end up deployed to that same environment, namespace etc. So they must also make sure any included resources don't conflict.

The upside is that our work on that feature can serve as a solid building block for something more useful and Garden-native.

Also, the related feature of remote module sources still holds up nicely, and can remain complementary to the reformulated project dependencies feature.

Motivation and use cases

The motivation for project dependencies **remains largely the same as for remote sources, but goes a step further and makes the relationship more structured, which should improve UX and open the door for interesting new applications. Here are some use cases we want to enable:

  1. Allow composing a Garden stack out of code from multiple repositories, in a deterministic fashion.
  2. Enable working with portions of a larger stack, without having to build, deploy or even be aware of other parts of the larger stack.
  3. Enable sharing an instance of a Garden stack between multiple projects. Meaning, supporting cases where a project referenced by a project should not be deployed individually for every instance of the parent project.
  4. Enable sharing a Garden stack between multiple different projects. This can be within an organization, or even across organizations (e.g. delivering a stack of services as an open source project, or even a commercial product). Think "Helm chart, except for any supported platform, cloud services, Terraform stacks, whatever... And with structured parameters!"

That last two are new, and the last one is potentially very interesting by itself.

Design considerations

We obviously want the design to be easy to understand, and easy to implement, both when sharing projects and consuming them.

Some of the more specific questions to consider are:

  1. How do we handle the project configuration of a dependant project? Do we merge it with that of the parent project? If so, how?
  2. How do we handle namespace issues across the projects? How do we avoid or prevent accidental conflicts between resources in the projects?
  3. Should we support circular or mutual dependencies between projects? If so, how?
  4. When sharing instances of Garden stacks, how do we prevent conflicting use between users and downstream projects? How might we handle a case where a downstream project (and user) only has read access to the shared instance?
  5. Can we improve our UX around linking to local instances of referenced repos?

Proposal

I suggest we go for simplicity, and allow ourselves to sidestep some of the above considerations.

We leverage existing mechanisms for passing variables to projects, as well as our ability to run nested Garden instances (which we already do for the kubernetes provider).

Here's a suggestion for how a project might declare a dependency on another project:

kind: Project
name: my-project
# We could make the name field optional and default to the referenced project
# name, but we do need to allow setting a custom name in case of conflicts.
dependencies:
  - name: some-project
    # We could also allow relative paths here, to reference within the repo.
    # See below for more on that.
    source: "https://github.com/org/some-project#stable"
    # The environment to use in some-project. Defaults to the current my-project 
    # environment name.
    environment: dev
    # These are passed as input variables to the nested project.
    variables:
      username: "${local.username}" # <- can be templated like any other value
...

And the linked some-project repo's garden.yml (which will notably be required to be present) could look something like this:

kind: Project
name: some-project
environments:
  - name: dev
providers:
  - name: kubernetes
    namespace: "some-project-${var.username}"
variables:
  username: "default"
...

Notice that we simply use our existing variables mechanism to pass values into the nested project. This means that some-project works fine on its own, but overriding variables can be passed to it when referencing from another project.

Now, when I deploy or test my-project, the referenced some-project is deployed first, via a nested Garden instance. The variables specified in dependencies[].variables are passed to the nested instance.

This will apply to any invocation of garden dev, garden deploy, or garden test for my-project. Tests for some-project will not be run, however, since we assume those already passed before getting merged.

That's fairly simple to start, and the above setup is probably just fine already for a lot of cases. But there's more to consider, so let's iterate and explore some further cases and usability features.

Declaring expected input variables

In some cases, a stack can't (or shouldn't) have default values for variables. Perhaps some authentication keys are expected, or there are values that simply don't have sensible defaults.

We can add an inputs field to explicitly declare and describe these:

kind: Project
name: some-project
environments:
  - name: dev
providers:
  - name: kubernetes
    namespace: "some-project-${var.username}"
inputs:
  - name: username
   type: string
   description: The username to scope the Kubernetes namespace to
...

Notice that the input variable is referenced the same way in the namespace key. The type key provides a degree of type-safety and validation (we could maybe support some subset of JSON schemas as an additional safety, as future work). The description field provides helpful information.

These combined can be used to generate project documentation, and to provide helpful error messages when values are missing or invalid.

Note: Even when the inputs field is not declared, we should probably throw errors when variables provided don't match the type of the default value.

Referencing the linked project

If you're using another project, chances are the services/modules/etc in your project need some way to communicate with services in that stack, or otherwise receive outputs (e.g. from tasks, builds etc.).

Most obviously, we can expose all of the nested project's template context via a ${projects.some-project.*} template key. This could be used all over the parent project, and would include all the available keys (i.e. the ModuleConfigContext of the nested project, with the projects.<project-name> prefix).

This does the job, but we might also want to allow the nested project to more explicitly export variables, similar to Terraform stack outputs. This would be much easier to document and reason about, and could be implemented something like this:

kind: Project
name: some-project
...
outputs:
  - name: auth-service-url
    description: The URL of the auth service
    value: "${services.auth.outputs.ingress-url}"
...

These could then be referenced as ${projects.some-projects.outputs.<key>}. Using these would be less error-prone and easier to grok for the downstream user. And as an easy bonus, we could add a garden get outputs command, to use in other contexts, programmatically or otherwise.

Shared instances of projects

In many cases, you don't actually want a whole new instance of a referenced project, but rather to share one with your team or organization. In some cases it's simply too heavy to run a separate instance, and in some cases you want to share some state.

In many cases, it could be as simple as referencing a specific environment, where you can assume the nested project has all the required configuration:

kind: Project
name: my-project
dependencies:
  - name: some-project
    source: "https://github.com/org/some-project#stable"
    environment: shared-dev
...

For this scenario, there's nothing else to add or implement. When some-project is deployed, it will simply be a no-op if it's already deployed, much like if you run garden deploy for a second time for any project that's already deployed.

Where this may fall short is where the downstream user doesn't have privileges (or required input variables etc.) to deploy or otherwise manage the nested project. I'd suggest filing that concern as future work, and giving the upcoming Garden Cloud service (and Garden Enterprise, by extension) a role here. For example, Garden Cloud might be able to hold the outputs from that project, and we could further make that explicit when declaring the dependency:

kind: Project
name: my-project
dependencies:
  - name: some-project
    source: "https://github.com/org/some-project#stable"
    environment: shared-dev
    readOnly: true # <---
...

Here, the readOnly flag indicates we should not try to deploy this, but expect the outputs to be available in Garden Cloud/Enterprise. If they aren't, or Garden Cloud indicates the reference project+environment isn't available, we error and inform the user. If they are, we simply get those outputs and can reference them as described above.

Referencing projects within the same repo as the parent project

You may have multiple projects in the same repository. We should allow any relative file path for the source path as well:

kind: Project
name: my-project
dependencies:
  - name: some-project
    source: some/subdirectory
    environment: shared-dev
...

Additionally, we should not scan for modules in that directory for the parent project.

We should handle this specifically, and not treat it as an external repository. This would include watching the directory for changes and handling that. We could later optimize this, but I'd suggest handling watching in the parent project, and responding to watch events there by doing a full project deploy on the nested project, and then checking if the outputs have changed. Any module referencing changed outputs should then be triggered.

Referencing projects outside of your organization

This really has no additional requirements in terms of design or implementation, but is a potentially very useful implicit feature. Essentially, a well-formed Garden project can become a portable blueprint for any type of stack, which will only become more and more useful as more providers become available.

Nested dependencies

We must handle recursive project dependencies, and detect circular dependencies, similar to how we handle providers, modules etc. Circular or mutual dependencies will not be supported, and we throw errors when we detect those (and we may need to take special care to detect it as we traverse across repositories).

What if I just want the same capability as the current remote sources?

In that case, you can create a "dummy" project in the target repo, that just echoes the parameters from the parent and deploys to the same namespace.

I'm unsure if there will be much demand for this, but if so we could consider either keeping the remote sources existing feature around, or adding a flag to the dependency declaration indicating that we want to re-use the same project configuration. Let's see what our users say about this.

Review

Let's look back at our use cases, listed above:

  1. We can compose a Garden stack from multiple repositories. Instead of simply importing those as additional directories to be scanned, we now require the linked repo to be a project of its own.
  2. By design, each project in a stack works in isolation, provided that you have the necessary privileges and inputs.
  3. We can share instances of nested projects by sharing environment configs. As further work, we can implement Garden Cloud/Enterprise features to share output state across an organization.
  4. We can trivially define projects that are meant for consumption by anyone.

And, to review our design consideration questions:

  1. We completely decouple the nested project's configuration from the parent's. No complex merging that would be tricky to reason about. Just inputs and outputs.
  2. We're pretty much hands-off when it comes to namespacing etc. Each project is responsible for itself. In some cases you may want to share namespaces, in other cases not. We don't enforce anything there. We could potentially use our overarching information about the structure of each project to warn the user about potential conflicts. In particular, once we have planning mode implemented, we could imagine hooks that perform cross-project validation ahead of deployment.
  3. No support for circular or mutual dependencies to start. We could technically at a later stage allow it if outputs are referenced in a way that doesn't cause issues, but in any case we'd likely consider that bad practice. If anyone has reasonable use cases for that, we can explore it further.
  4. This can be addressed using a combination of "read only" dependencies and Garden Cloud, as described above. Other ideas welcome.
  5. The UX we currently lean on for linking local clones of referenced repos is an open concern. This is not really an issue for shared projects that the downstream developer is not working on, and overall is arguably less of an issue with this new structure. I think this still warrants further exploration, but is not imo a blocker for the implementation of the feature, and rather a separate UX improvement opportunity.

Implementation

We can (roughly) decompose the development of this feature as follows:

  1. Implement project outputs. This can be individually tested, and potentially useful for some users out of the box.
  2. Implement project inputs. Same as above, perhaps less useful individually but fairly simple to test and implement in isolation.
  3. Implement project dependencies. This'll be the most complex part, but we have a head start with the existing remote sources feature.

The first two can be implemented in parallel. The third after the other two are done.

Future work

  1. Implement shared read-only environments by integrating the feature with Garden Cloud/Enterprise.
  2. Explore ways to improve UX around local clones of referenced project repos.

Feedback is very much appreciated! We'd love to know if anyone has use cases not covered by the above, or if you see any flaws in the approach.

@vvagaytsev
Copy link
Collaborator

@edvald @eysi09 is this still something we're going to implement?

@eysi09
Copy link
Collaborator

eysi09 commented Jul 5, 2023

Yes, I believe we will. Or some version thereof, although now timeline at the moment.

In any case this is a good reference doc.

@eli-hasson
Copy link

This feature is critical for our workflows, as we rely on deploying and testing multiple interconnected Garden projects in an integrated yet modular environment. Support for shared variables, namespace isolation, and scalable modular testing would greatly enhance our ability to manage dependencies and conduct seamless integration tests.

Could you share if there’s a planned timeline or any updates on prioritizing this feature? It would be helpful to understand how we can align our processes accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants