Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated TSDB documentation with additional details #5706

Merged
merged 12 commits into from
Aug 23, 2023
Merged
68 changes: 44 additions & 24 deletions docs/developer_tsdb_migration_guidelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,20 +20,26 @@ Integration is one of the biggest sources of input data to elasticsearch. Enabli


1. **Datastream having type `logs` can be excluded from TSDB migration.**
agithomas marked this conversation as resolved.
Show resolved Hide resolved
2. **Modify the `kibana.version` to 8.8.2 within the manifest.yml file of the package.**
```
conditions:
kibana.version: "^8.8.0"
agithomas marked this conversation as resolved.
Show resolved Hide resolved
```
2. **Add the changes to the manifest.yml file of the datastream as below to enable the timeseries index mode**
```
elasticsearch:
index_mode: "time_series"
```
If your datastream has more number of dimension fields, you can modify this limit by modifying index.mapping.dimension_fields.limit value as below
Should your datastream contain an increased count of dimension fields, you have the option to adjust this restriction by altering the index.mapping.dimension_fields.limit value as indicated below. The default maximum limit stands at 21.
```
elasticsearch:
index_mode: "time_series"
index_template:
settings:
# Defaults to 16
# Defaults to 21
index.mapping.dimension_fields.limit: 32
```

3. **Identifying the dimensions in the datastream.**

Read about dimension fields [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html#time-series-dimension). It is important that dimensions or a set of dimensions that are part of a datastream uniquely identify a timeseries. Dimensions are used to form _tsid which then is used for routing and index sorting. Read about the ways to add field a dimension [here](https://github.com/elastic/integrations/blob/main/docs/generic_guidelines.md#specify-dimensions])
Expand All @@ -46,40 +52,51 @@ Integration is one of the biggest sources of input data to elasticsearch. Enabli

From the context of integrations that are related to products that are deployed on-premise, there exist certain fields that are part of every package and they are potential candidates of becoming dimension fields

* host.ip
* service.address
* agent.id
* `host.name`
* `service.address`
* `agent.id`
* `container.id`
agithomas marked this conversation as resolved.
Show resolved Hide resolved

For products that are capable of running both on-premise and in a public cloud environment (by being deployed on public cloud virtual machines), it is recommended to annotate the ECS fields listed below as dimension fields.
* `host.name`
* `service.address`
* `container.id`
* `cloud.account.id`
* `cloud.provider`
* `cloud.region`
* `cloud.availability_zone`
* `agent.id`
* `cloud.instance.id`

For products operating as managed services within cloud providers like AWS, Azure, and GCP, it is advised to label the fields listed below as dimension fields.
* `cloud.account.id`
* `cloud.region`
* `cloud.availability_zone`
* `cloud.provider`
* `agent.id `

When metrics are collected from a resource running in the cloud or in a container, certain fields are potential candidates of becoming dimension fields

* host.ip
* service.address
* agent.id
* cloud.project.id
* cloud.instance.id
* cloud.provider
* container.id

*Warning: Choosing an insufficient number of dimension fields may lead to data loss*

*Hint: Fields having type [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html#keyword-field-type) in your datastream are very good candidates of becoming dimension fields*


4. **Annotating the integration specific fields as dimension**

`files.yml` file has the field mappings specific to a datastream of an integration. This step is needed when the dimension fields in ECS is not sufficient enough to create a unique [_tsid](https://www.elastic.co/guide/en/elasticsearch/reference/current/tsds.html#tsid) value for the documents stored in elasticsearch. Annotate the field with `dimension: true` to tag the field as dimension field.

Adding an inline comment prior to the dimension annotation is advised, detailing the rationale behind the choice of a particular field as a dimension field.

```
- name: wait_class
type: keyword
description: Every wait event belongs to a class of wait events.
# Multiple events are generated based on the values of wait_class. Hence, it is a dimension
dimension: true
description: Every wait event belongs to a class of wait events.
```
*Notes:*
* *There exists a limit on how many dimension fields can have. By default this value is 16. Out of this, 8 are reserved for ecs fields.*
* *There exists a limit on how many dimension fields can have. By default this value is 21.*
agithomas marked this conversation as resolved.
Show resolved Hide resolved
* *Dimension keys have a hard limit of 512b. Documents are rejected if this limit is reached.*
* *Dimension values have a hard limit of 1024b. Documents are rejected if this limit is reached*
* *Dimension values have a hard limit of 1024b. Documents are rejected if this limit is reached*

**Warning:** Choosing an insufficient number of dimension fields may lead to data loss
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you refer the reader to the testing section and link the TSDB migration test kit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@constanca-m , can we take that as a separate update to this document? May be you can add that part?


**Hint:** Fields having type [keyword](https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html#keyword-field-type) in your datastream are very good candidates of becoming dimension fields

5. **Annotating Metric Types values for all applicable fields**

Expand All @@ -104,7 +121,7 @@ Integration is one of the biggest sources of input data to elasticsearch. Enabli

- After migration, verify if the dashboard is rendering the data properly. If certain visualisation do not work, consider migrating to [Lens](https://www.elastic.co/guide/en/kibana/current/lens.html)

Certain aggregation functions are not supported when a field is having a metric_type counter. Example avg(). Replace such aggregation functions with a supported aggregation type such as max().
Certain aggregation functions are not supported when a field is having a metric_type `counter`. Example `avg()`. Replace such aggregation functions with a supported aggregation type such as `max()` or `min()`.
agithomas marked this conversation as resolved.
Show resolved Hide resolved

- It is recommended to compare the number of documents within a certain time frame before enabling the TSDB and after enabling TSDB index mode. If the count differs, please check if there exists a field that is not annotated as dimension field.

Expand Down Expand Up @@ -142,6 +159,9 @@ Reference : https://github.com/elastic/elasticsearch/issues/93539
- Currently, there are several limits around the number of dimensions.
Reference : https://github.com/elastic/elasticsearch/issues/93564

- Other known issues: https://github.com/elastic/integrations/issues/5233. Refer the section - New Issues Identified, TSDB Issues reported earlier.

# <a id="existing-migrated-packages"></a> Reference to existing package already migrated

Oracle integration TSDB enablement: [PR Link](https://github.com/elastic/integrations/pull/5307)
- [Oracle integration](https://github.com/elastic/integrations/tree/main/packages/oracle)
- [Redis integrations](https://github.com/elastic/integrations/tree/main/packages/redis)