Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polykey secrets env integration #289

Open
wants to merge 1 commit into
base: staging
Choose a base branch
from

Conversation

brynblack
Copy link
Member

@brynblack brynblack commented Sep 27, 2024

Description

This PR integrates the polykey secrets env command into the development shell hook, to securely load development secrets into the development environment.

Tasks

  • 1. Replace . ./.env with pk secrets env ...
  • 2. Update CI to include Polykey and pull from seed node

Final checklist

  • Domain specific tests
  • Full tests
  • Updated inline-comment documentation
  • Lint fixed
  • Squash and rebased
  • Sanity check the final build

@brynblack brynblack self-assigned this Sep 27, 2024
@tegefaulkes
Copy link
Contributor

@brynblack does this not link back to any issues?

@CMCDragonkai
Copy link
Member

Where's the issue?

@aryanjassal
Copy link
Member

We need to get this issue completed. The main blocker for this issue is the usage of polykey in the CI environment.

So far, we have discussed that creating an agent the normal way isn't predictable, as it randomly generates a node id. As such, we cannot delegate authority to a randomly-created agent.

We can use the recovery passphrase to re-generate the node with the same node ID. However, how will the password and the passphrase be stored safely? If it is hard-coded in the repo or in the CI file, then it can be leaked. Now that we are no longer using a runner image in newer CIs, we cannot pre-load the information onto the runner file system either. Perhaps the passphrase and password could be a repo-specific secret.

This is not the end of all issues, however. We also need to figure out a distribution method for a Polykey agent.

Should we integrate the agent in the runner image itself? That can no longer be done, as we are transitioning to the usage of a custom action which sets up nix on the stock runners. No images are being used anymore, so this idea will not work after the CI upgrade.

We currently don't have an action which might set up polykey and run the agent in the background. Brynley has created a custom action which prepares the nix environment, so she might be able to set up a Polykey action. The simplest way I can think of doing this would be downloading the package directly from the releases page for the appropriate platform, then using the aforementioned methods to launch an agent and delegate it authority.

Alternatively, we can use the inbuilt package manager to download Polykey. (https://docs.github.com/en/actions/using-github-hosted-runners/using-github-hosted-runners/customizing-github-hosted-runners#installing-software-on-ubuntu-runners). This method works on all platforms, so custom packages can be imported. However, the examples have used apt for Ubuntu, brew for MacOS, and chocolatey for Windows. We currently don't have Polykey published to these package managers. There might be ways to 'sideload' package managers and get Polykey, but that likely won't be supported and an API change might introduce failures in all our CIs.

This is partially related to MatrixAI/Polykey#222. Implementing egress schema would help with smooth CI by restricting the exported secrets for each workflow.

This leads to another point - how to delegate secrets. Should each repo get a vault? Or each branch? Technically, each runtime is unique, so do we need a unique vault per runtime? (of course not!)

Once support for egress schema has been implemented, a vault can be dedicated for a single repo, with egress schema controlling the secrets that are exported. This idea seems the best to me, but perhaps I could be overlooking something.

Eventually, we would use a tool like Orchestrator or even Polykey Enterprise to manage secret delegation, but initially, we would need someone to delegate and manage these secrets, which can get cumbersome very quickly.

However, there is another issue with dogfooding Polykey to this extent.

What if a commit deploys a change which breaks Polykey on the CI, but isn't caught by the tests? Then, the newer Polykey version would be published, but break in CI, which would prevent any other commits to trigger the CI, becoming deadlocked in a broken state. We would need to manually downgrade polykey to allow the CI to trigger, then upgrade it to the default version.

That issue is specific to a scenario where Polykey is always fetched from the latest version. However, what about the case where polykey would be fetched from a fixed version instead? In that case, if a new major update has released, polykey would need manual update in all repos to bring the pin to the latest version.

I haven't thought up of a solution to this issue yet. This potential failure case might need more discussion.

This is what I could understand about the current state of this issue; about getting Polykey into the CI environments. I might be missing some key details, though. Some discussion might be needed. Thoughts, @CMCDragonkai @brynblack?

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Feb 3, 2025

As we had discussed during our earlier meeting, I believe it make sense to have a Polykey agent running separately orchestrated by the Orchestrator. This agent will run in our cloud. Then our CI workers will instead pull secrets down according to the schema.

You want to incorporate:

There's the issue of secret-zero. How do you "give the initial secret" to the polykey client to call the agent?

There was recently some discussion about "workload identities" I did with ChatGPT, and I believe there's a token delegation - short-lived delegation process that should be investigated: ChatGPT-IAM and Service Accounts.pdf

But I think atm, the easiest way is to actually just pass the root password to each job at a Organisation level.

But @brynblack you have not been keeping up closing issues, so there's way too much entropy here. I'm going to be trying to get @Abby010 to help (along with some debug/tracing problems for PK in production). It's time to think operationally now.

@CMCDragonkai
Copy link
Member

@brynblack comment when consolidated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants