Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(storage): adds a description of tiered storage #9866

Merged
merged 5 commits into from
Mar 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
To configure the tiered storage feature, set the `type` property to `custom`.
Enables custom tiered storage for Kafka.

`RemoteStorageManager` is a Kafka interface for managing the interaction between Kafka and remote tiered storage solutions.
Custom tiered storage enables the use of a custom `RemoteStorageManager` configuration.
If custom tiered storage is enabled, Strimzi uses the link:https://github.com/apache/kafka/blob/trunk/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/TopicBasedRemoteLogMetadataManager.java[`TopicBasedRemoteLogMetadataManager`] for Remote Log Metadata Management (RLMM) configuration.
If you want to use custom tiered storage, you must first add the tiered storage plugin to the Strimzi image by building a custom container image.
If you want to use custom tiered storage, you must first add a tiered storage for Kafka plugin to the Strimzi image by building a custom container image.

Custom tiered storage configuration enables the use of a custom `RemoteStorageManager` configuration.
`RemoteStorageManager` is a Kafka interface for managing the interaction between Kafka and remote tiered storage.

If custom tiered storage is enabled, Strimzi uses the https://github.com/apache/kafka/blob/trunk/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/TopicBasedRemoteLogMetadataManager.java[`TopicBasedRemoteLogMetadataManager`^] for Remote Log Metadata Management (RLMM).

WARNING: Tiered storage is an early access Kafka feature, which is also available in Strimzi.
Due to its https://kafka.apache.org/documentation/#tiered_storage_limitation[current limitations^], it is not recommended for production environments.

.Example custom tiered storage configuration
[source,yaml,subs="attributes+"]
Expand All @@ -16,10 +21,12 @@ kafka:
classPath: /opt/kafka/plugins/tiered-storage-s3/*
config:
# A map with String keys and String values.
# Key properties are automatically prefixed with `rsm.config.` and appended to Kafka broker config.
# Key properties are automatically prefixed with `rsm.config.`
# and appended to Kafka broker config.
storage.bucket.name: my-bucket
config:
...
# Additional RLMM configuration can be added through the Kafka config under `spec.kafka.config` using the `rlmm.config.` prefix.
# Additional RLMM configuration can be added through the Kafka config
# under `spec.kafka.config` using the `rlmm.config.` prefix.
rlmm.config.remote.log.metadata.topic.replication.factor: 1
----
17 changes: 15 additions & 2 deletions documentation/assemblies/configuring/assembly-storage.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,29 @@ The supported storage types are:
* Ephemeral (Recommended for development only)
* Persistent
* JBOD (Kafka only; not available for ZooKeeper)
* Tiered storage (Early access)

To configure storage, you specify `storage` properties in the custom resource of the component.
The storage type is set using the `storage.type` property.

When using node pools, you can specify storage configuration unique to each node pool used in the cluster.
When using node pools, you can specify storage configuration unique to each node pool used in a Kafka cluster.
PaulRMellor marked this conversation as resolved.
Show resolved Hide resolved
The same storage properties available to the `Kafka` resource are also available to the `KafkaNodePool` pool resource.

Tiered storage provides more flexibility for data management by leveraging the parallel use of storage types with different characteristics.
For example, tiered storage might include the following:

* Higher performance and higher cost block storage
* Lower performance and lower cost object storage

Tiered storage is an early access feature in Kafka.
To configure tiered storage, you specify `tieredStorage` properties.
Tiered storage is configured only at the cluster level using the `Kafka` custom resource.

The storage-related schema references provide more information on the storage configuration properties:

* link:{BookURLConfiguring}#type-EphemeralStorage-reference[`EphemeralStorage` schema reference^]
* link:{BookURLConfiguring}#type-PersistentClaimStorage-reference[`PersistentClaimStorage` schema reference^]
* link:{BookURLConfiguring}#type-JbodStorage-reference[`JbodStorage` schema reference^]
* link:{BookURLConfiguring}#type-TieredStorageCustom-reference[`TieredStorageCustom` schema reference^]

WARNING: The storage type cannot be changed after a Kafka cluster is deployed.

Expand All @@ -41,3 +52,5 @@ include::../../modules/configuring/ref-storage-jbod.adoc[leveloffset=+1]
include::../../modules/configuring/proc-adding-volumes-to-jbod-storage.adoc[leveloffset=+1]

include::../../modules/configuring/proc-removing-volumes-from-jbod-storage.adoc[leveloffset=+1]

include::../../modules/configuring/ref-storage-tiered.adoc[leveloffset=+1]
47 changes: 47 additions & 0 deletions documentation/modules/configuring/ref-storage-tiered.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
[id='ref-tiered-storage-{context}']
= Tiered storage (early access)

[role="_abstract"]
Tiered storage introduces a flexible approach to managing Kafka data whereby log segments are moved to a separate storage system.
For example, you can combine the use of block storage on brokers for frequently accessed data and offload older or less frequently accessed data from the block storage to more cost-effective, scalable remote storage solutions, such as Amazon S3, without compromising data accessibility and durability.

WARNING: Tiered storage is an early access Kafka feature, which is also available in Strimzi.
Due to its https://kafka.apache.org/documentation/#tiered_storage_limitation[current limitations^], it is not recommended for production environments.

Tiered storage requires an implementation of Kafka's `RemoteStorageManager` interface to handle communication between Kafka and the remote storage system, which is enabled through configuration of the `Kafka` resource.
Strimzi uses Kafka's https://github.com/apache/kafka/blob/trunk/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/TopicBasedRemoteLogMetadataManager.java[`TopicBasedRemoteLogMetadataManager`^] for Remote Log Metadata Management (RLMM) when custom tiered storage is enabled.
The RLMM manages the metadata related to remote storage.

To use custom tiered storage, do the following:

* Include a tiered storage plugin for Kafka in the Strimzi image by building a custom container image.
The plugin must provide the necessary functionality for a Kafka cluster managed by Strimzi to interact with the tiered storage solution.
* Configure Kafka for tiered storage using `tieredStorage` properties in the `Kafka` resource.
Specify the class name and path for the custom `RemoteStorageManager` implementation, as well as any additional configuration.
* If required, specify RLMM-specific tiered storage configuration.

.Example custom tiered storage configuration for Kafka
[source,yaml,subs="attributes+"]
----
apiVersion: {KafkaApiVersion}
kind: Kafka
metadata:
name: my-cluster
spec:
kafka:
tieredStorage:
type: custom # <1>
remoteStorageManager: # <2>
className: com.example.kafka.tiered.storage.s3.S3RemoteStorageManager
classPath: /opt/kafka/plugins/tiered-storage-s3/*
config:
storage.bucket.name: my-bucket # <3>
# ...
config:
rlmm.config.remote.log.metadata.topic.replication.factor: 1 # <4>
# ...
----
<1> The `type` must be set to `custom`.
<2> The configuration for the custom `RemoteStorageManager` implementation, including class name and path.
<3> Configuration to pass to the custom `RemoteStorageManager` implementation, which Strimzi automatically prefixes with `rsm.config.`.
<4> Tiered storage configuration to pass to the RLMM, which requires an `rlmm.config.` prefix. For more information on tiered storage configuration, see the {kafkaDoc}.
Loading