-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Polykey secrets env integration #289
base: staging
Are you sure you want to change the base?
Conversation
@brynblack does this not link back to any issues? |
Where's the issue? |
cbdc2c4
to
bab5959
Compare
bab5959
to
e685d20
Compare
We need to get this issue completed. The main blocker for this issue is the usage of polykey in the CI environment. So far, we have discussed that creating an agent the normal way isn't predictable, as it randomly generates a node id. As such, we cannot delegate authority to a randomly-created agent. We can use the recovery passphrase to re-generate the node with the same node ID. However, how will the password and the passphrase be stored safely? If it is hard-coded in the repo or in the CI file, then it can be leaked. Now that we are no longer using a runner image in newer CIs, we cannot pre-load the information onto the runner file system either. Perhaps the passphrase and password could be a repo-specific secret. This is not the end of all issues, however. We also need to figure out a distribution method for a Polykey agent. Should we integrate the agent in the runner image itself? That can no longer be done, as we are transitioning to the usage of a custom action which sets up nix on the stock runners. No images are being used anymore, so this idea will not work after the CI upgrade. We currently don't have an action which might set up polykey and run the agent in the background. Brynley has created a custom action which prepares the nix environment, so she might be able to set up a Polykey action. The simplest way I can think of doing this would be downloading the package directly from the releases page for the appropriate platform, then using the aforementioned methods to launch an agent and delegate it authority. Alternatively, we can use the inbuilt package manager to download Polykey. (https://docs.github.com/en/actions/using-github-hosted-runners/using-github-hosted-runners/customizing-github-hosted-runners#installing-software-on-ubuntu-runners). This method works on all platforms, so custom packages can be imported. However, the examples have used This is partially related to MatrixAI/Polykey#222. Implementing egress schema would help with smooth CI by restricting the exported secrets for each workflow. This leads to another point - how to delegate secrets. Should each repo get a vault? Or each branch? Technically, each runtime is unique, so do we need a unique vault per runtime? (of course not!) Once support for egress schema has been implemented, a vault can be dedicated for a single repo, with egress schema controlling the secrets that are exported. This idea seems the best to me, but perhaps I could be overlooking something. Eventually, we would use a tool like Orchestrator or even Polykey Enterprise to manage secret delegation, but initially, we would need someone to delegate and manage these secrets, which can get cumbersome very quickly. However, there is another issue with dogfooding Polykey to this extent. What if a commit deploys a change which breaks Polykey on the CI, but isn't caught by the tests? Then, the newer Polykey version would be published, but break in CI, which would prevent any other commits to trigger the CI, becoming deadlocked in a broken state. We would need to manually downgrade polykey to allow the CI to trigger, then upgrade it to the default version. That issue is specific to a scenario where Polykey is always fetched from the latest version. However, what about the case where polykey would be fetched from a fixed version instead? In that case, if a new major update has released, polykey would need manual update in all repos to bring the pin to the latest version. I haven't thought up of a solution to this issue yet. This potential failure case might need more discussion. This is what I could understand about the current state of this issue; about getting Polykey into the CI environments. I might be missing some key details, though. Some discussion might be needed. Thoughts, @CMCDragonkai @brynblack? |
As we had discussed during our earlier meeting, I believe it make sense to have a Polykey agent running separately orchestrated by the Orchestrator. This agent will run in our cloud. Then our CI workers will instead pull secrets down according to the schema. You want to incorporate:
There's the issue of secret-zero. How do you "give the initial secret" to the polykey client to call the agent? There was recently some discussion about "workload identities" I did with ChatGPT, and I believe there's a token delegation - short-lived delegation process that should be investigated: ChatGPT-IAM and Service Accounts.pdf But I think atm, the easiest way is to actually just pass the root password to each job at a Organisation level. But @brynblack you have not been keeping up closing issues, so there's way too much entropy here. I'm going to be trying to get @Abby010 to help (along with some debug/tracing problems for PK in production). It's time to think operationally now. |
@brynblack comment when consolidated. |
Description
This PR integrates the
polykey secrets env
command into the development shell hook, to securely load development secrets into the development environment.Tasks
. ./.env
withpk secrets env ...
Final checklist