Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random failure to decrypt when we have multiple(>=7) files to decrypt in .tf #126

Open
mrdntgrn opened this issue Nov 13, 2024 · 3 comments

Comments

@mrdntgrn
Copy link

When we have more than 6 (>=7) sops_file data blocks in tf code the decode(for example when we run terraform plan) fails randomly. The files to decrypt have been encrypted by using sops -e -i <file-name> based on gpg with passphrase. One can think that they typed wrong passphrase but the issue appears randomly and sometime decrypt can succeed. The sops -d <file-name> succeeds always.
By increasing the count of sops_file data blocks the probability of failures also increase, so that for 15 items only 1 from 10 may succeed.
The number 7 maybe fixed or maybe depend on system performance, I tested on Apple MacBook m3 pro (and also on ubuntu linux on local and remote docker based terraform ci/cd pipeline). When we having only few(less than 7) sops_file blocks in tif no failures can be seen.

Tested with latest terraform and sops provider version (same issue also can be seen with old terraform and sops provider versions).

Here is the tools version informations:

# output of `terraform version`
Terraform v1.9.3
on darwin_arm64
+ provider registry.terraform.io/carlpett/sops v1.1.1

Here is sample tf code which can be used to reproduce the issue with 10 sops_file data blocks:

# main.tf file content
terraform {
  required_providers {
    sops = {
      source  = "carlpett/sops"
      version = "1.1.1"
    }
  }
}

provider "sops" {}

data "sops_file" "this1" {
  source_file = "my-gpg-encrypted-data.yaml"
}
data "sops_file" "this2" {
  source_file = "my-gpg-encrypted-data.yaml"
}
data "sops_file" "this3" {
  source_file = "my-gpg-encrypted-data.yaml"
}
data "sops_file" "this4" {
  source_file = "my-gpg-encrypted-data.yaml"
}
data "sops_file" "this5" {
  source_file = "my-gpg-encrypted-data.yaml"
}
data "sops_file" "this6" {
  source_file = "my-gpg-encrypted-data.yaml"
}
data "sops_file" "this7" {
  source_file = "my-gpg-encrypted-data.yaml"
}
data "sops_file" "this8" {
  source_file = "my-gpg-encrypted-data.yaml"
}
data "sops_file" "this9" {
  source_file = "my-gpg-encrypted-data.yaml"
}
data "sops_file" "this10" {
  source_file = "my-gpg-encrypted-data.yaml"
}

please let me know if you need more info

@carlpett
Copy link
Owner

Hey @mrdntgrn,
Apologies for the late response - I wrote a test for this and prepared an answer but left it in this tab for some weeks 😰

I cannot reproduce this, I'm afraid. Do you have a gpg-agent running that might interfere?

@mrdntgrn
Copy link
Author

mrdntgrn commented Jan 7, 2025

Hey @carlpett

Thank you for your response and checking the issue, I have prepared an complete sample with docker compose to run all inside container so that no platform/os/configuration impact should be.

You will need to have docker installed/run on your laptop and run the following commands to reproduce the issue:

docker compose up -d              # to run container
docker compose exec tf /bin/sh  # to go inside the container
gpg --import testing-key.pgp     # to import pgp key
terraform init                               # to initiate terraform
terraform plan                             # to reproduce the issue run this command multiple times, most of the runs will fail and some may succeed

Here is the screen of the files what I used to reproduce the issue,
image

Here are the files zipped:
docker-sops-terraform.zip

The content of my-gpg-encrypted-data.yaml has been encrypted by using /test/testing-key.pgp key from this repo,
Its looks like that there is no need to have passphrase set/used for gpg key in order to reproduce the issue as your test key has no

Please let me know if you need more help

@carlpett
Copy link
Owner

Thanks for taking the time to create a self-contained repro case! It took quite a few more re-runs than I expected, but when I did get the error, this is the report:

│ Error: Failed to get the data key required to decrypt the SOPS file.
│
│ Group 0: FAILED
│   3CE5CC7219D6597CE6488BF1BF36CD3D0749A11A: FAILED
│     - | could not decrypt data key with PGP key:
│       | github.com/ProtonMail/go-crypto/openpgp error: could not
│       | load secring: open /root/.gnupg/pubring.gpg: no such file or
│       | directory; GnuPG binary error: failed to decrypt sops data
│       | key with pgp: gpg: encrypted with rsa2048 key, ID
│       | F15D3C50575B206E, created 2019-01-23
│       |       "Terraform Sops Provider (Testing)
│       | <terraform-sops@local>"
│       | gpg: public key decryption failed: Out of memory
│       | gpg: decryption failed: Out of memory

Is this the same error message you got? Sort of peculiar that it says out of memory, since it isn't actually out of memory. I would suspect that it is rather racing itself somewhere which lacks a mutex.

I still cannot reproduce outside of the container environment even with the same .tf file, so there's something environment specific happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants