Skip to content

Commit

Permalink
Add details about configuring an IAM user; Fix config keys; General r…
Browse files Browse the repository at this point in the history
…efactoring.
  • Loading branch information
kirkrodrigues committed Jan 21, 2025
1 parent f1a40e8 commit 4ea2018
Showing 1 changed file with 101 additions and 20 deletions.
121 changes: 101 additions & 20 deletions docs/src/user-guide/guides-using-object-storage.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
# Using object storage

CLP can both compress logs from object storage (e.g., S3) and store archives on object storage. This
guide explains how to configure CLP for both use cases.
CLP can:

* [compress logs from object storage](#compressing-logs-from-object-storage) (e.g., S3);
* [store archives on object storage](#storing-archives-on-object-storage); and
* [view the compressed logs from object storage](#viewing-compressed-logs-from-object-storage).

This guide explains how to configure CLP for all three use cases.

:::{note}
Currently, only the [clp-json][release-choices] release supports object storage. Support for
clp-text will be added in a future release.
`clp-text` will be added in a future release.
:::

:::{note}
Expand All @@ -15,17 +20,83 @@ will be added in a future release.

## Compressing logs from object storage

To compress logs from S3, use the `s3` subcommand of the `compress.sh` script:
To compress logs from S3, you'll need to:

1. Set up an AWS IAM user that CLP can use to access the bucket containing your logs.
2. Use the `s3` subcommand of `sbin/compress.sh` to compress your logs.

### Setting up an AWS IAM user

To set up a user:

1. Create a user by following [this guide][aws-create-iam-user].
* If you already have a user to use for ingesting logs, you can skip this step.
2. Attach the following policy to the user by following [this guide][add-iam-policy].

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": [
"arn:aws:s3:::<bucket-name>/<key-prefix>/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::<bucket-name>"
],
"Condition": {
"StringLike": {
"s3:prefix": "<key-prefix>/*"
}
}
}
]
}
```

Replace the fields in angle brackets (`<>`) with the appropriate values:
* `<bucket-name>` should be the name of the S3 bucket containing your logs.
* `<key-prefix>` should be the prefix of all logs you wish to compress.

:::{warning}
To follow the [principle of least privilege][least-privilege-principle], ensure the user doesn't
have other unnecessary permission policies attached. If the user does have other policies,
consider creating a new user with only the permission policy above.
:::

### Using `sbin/compress.sh s3`

You can use the `s3` subcommand as follows:

```bash
sbin/compress.sh s3 s3://<bucket-name>/<path-prefix>
sbin/compress.sh s3 --aws-credentials-file <credentials-file> s3://<bucket-name>/<key-prefix>
```

* `<credentials-file>` is the path to an AWS credentials file like the following:

```ini
[default]
aws_access_key_id = <aws-access-key-id>
aws_secret_access_key = <aws-secret-access-key>
```

* CLP expects the credentials to be in the `default` section.
* `<aws-access-key-id>` and `<aws-secret-access-key>` are the access key ID and secret access
key of the IAM user you set up in the previous section.

* `<bucket-name>` is the name of the S3 bucket containing your logs.
* `<path-prefix>` is the path prefix of all logs you wish to compress.
* `<key-prefix>` is the path prefix of all logs you wish to compress.

:::{note}
The `s3` subcommand only supports a single URL but will compress any logs that have the given path
The `s3` subcommand only supports a single URL but will compress any logs that have the given key
prefix.

If you wish to compress a single log file, specify the entire path to the log file. However, if that
Expand All @@ -44,24 +115,31 @@ archive_output:
type: "s3"
staging_directory: "var/data/staged-archives" # Or a path of your choosing
s3_config:
region: "<aws-region-code>"
bucket: "<s3-bucket-name>"
key-prefix: "<s3-key-prefix>"
region_code: "<region-code>"
bucket: "<bucket-name>"
key_prefix: "<key-prefix>"
credentials:
access_key_id: "<aws-access-key-id>"
secret_access_key: "<aws-secret-access-key>"

# archive_output's other config keys
```

* `s3_config` configures both the S3 bucket where archives should be stored as well as credentials
* `s3_config` configures both the S3 bucket where archives should be stored and the credentials
for accessing it.
* `<aws-region-code>` is the AWS region [code][aws-region-codes] for the bucket.
* `<s3-bucket-name>` is the bucket's name.
* `<s3-key-prefix>` is the "directory" where all archives will be stored within the bucket and
* `<region-code>` is the AWS region [code][aws-region-codes] for the bucket.
* `<bucket-name>` is the bucket's name.
* `<key-prefix>` is the "directory" where all archives will be stored within the bucket and
must end with `/`.
* `credentials` contains the S3 credentials necessary for accessing the bucket.

:::{note}
These credentials can be for a different IAM user than the one set up in the previous section,
as long as they can access the bucket.
:::

## Viewing compressed logs from object storage

To configure CLP to be able to view compressed log files from S3, you'll need to configure a bucket
where CLP can store intermediate files that the log viewer can open. To do so, update the
`stream_output.storage` key in `<package>/etc/clp-config.yml`:
Expand All @@ -72,9 +150,9 @@ stream_output:
type: "s3"
staging_directory: "var/data/staged-streams" # Or a path of your choosing
s3_config:
region: "<aws-region-code>"
bucket: "<s3-bucket-name>"
key-prefix: "<s3-key-prefix>"
region_code: "<region-code>"
bucket: "<bucket-name>"
key_prefix: "<key-prefix>"
credentials:
access_key_id: "<aws-access-key-id>"
secret_access_key: "<aws-secret-access-key>"
Expand All @@ -89,9 +167,12 @@ a different bucket entirely).
:::{note}
To view compressed log files, clp-text currently converts them into IR streams that the log viewer
can open, while clp-json converts them into JSONL streams. These streams only need to be stored for
as long as the streams are being viewed in the viewer, however CLP currently doesn't explicitly
delete the streams. This limitation will be addressed in a future release.
as long as the streams are being viewed, but CLP currently doesn't explicitly delete the streams.
This limitation will be addressed in a future release.
:::

[add-iam-policy]: https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#embed-inline-policy-console
[aws-create-iam-user]: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html
[aws-region-codes]: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html#Concepts.RegionsAndAvailabilityZones.Availability
[release-choices]: http://localhost:8080/user-guide/quick-start-cluster-setup/index.html#choosing-a-release
[least-privilege-principle]: https://en.wikipedia.org/wiki/Principle_of_least_privilege
[release-choices]: quick-start-cluster-setup/index.md#choosing-a-release

0 comments on commit 4ea2018

Please sign in to comment.