-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Prototype] Convert Oracle Performance Datastream to TSDB #4966
Conversation
🌐 Coverage report
|
index_template: | ||
settings: | ||
# Defaults to 16 | ||
index.mapping.dimension_fields.limit: 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed for this datastream. Just testing to ensure that package-spec is supporting these configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you have a chance to also play around with the metric types (gauge / counter) and use it in Lens? https://www.elastic.co/guide/en/elasticsearch//reference/master/tsds.html#time-series-metric
- external: ecs | ||
name: host.ip | ||
- external: ecs | ||
name: ecs.version | ||
- external: ecs | ||
name: service.address | ||
dimension: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this is the only field set as dimension. You had host.name below but commented out. Can you share a bit background on why? What happens if this runs under k8s? Are there additional dimensions needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we should always add a dimension that uniquely identifies the shipper, something like an ephemeral_id, to avoid that two senders can send data to the same time series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A service.address will have the DSN value that would uniquely identify a database connection. An Oracle DSN comprise of 3 parts - hostname name/ cluster URL, port and database name.
Since oracle integration is not capturing the host level metrics, service.address which identifies a unique DB connection, is sufficient enough, i believe.
I am also thinking about username to additionally include as a dimension. This is because, a database can have multiple users created under it. So, if a user configures two users to collect data from same database identified by the DSN, not including a username may lead to missing series (data). This is not a practical usecase but an exceptional handling mechanism. Currently the username field is not having value and to have it contain username value, additional changes may be needed in metricbeat code. I added the username as a dimension to handle this exception in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the details. I suggest to add parts of this also the fields.yml as a comment. The reason is that if in the future someone will touch this integration and will modify the dimensions fully understands why it was set for it.
For the username: My understanding is, this applies if the exact same query at the exact same time is run and ingested. For monitoring purpose, I wonder what the use case is to do the same queries with 2 different users. For other data, I could see the use case that different data is returned for the same user with different permissions for the same query. But I would expect in the monitoring use case, the user that runs the query has access to all the monitoring data?
I see the theoretical reason behind adding username but I don't fully understand yet how it would happen in production.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, it would be good to display fields that are dimensions in the documentation showing "Exported Fields" .
@ruflin , do you agree?
Adding one more column would further limit the space available for showing Descriptions. Column - "Type" have values such as "keyword, dimension" would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There exist one more reason why username is a dimension field. There exist a query that returns session count grouped by username and machine (machine name).
If there exist a N-tier application having multiple webserver / application server connecting to the database layer, having machine name, in addition to the username helps.
I would need to add the machine name (machine) additionally as the dimension field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felixbarny , If i add ephermal_id of agent as a dimension field, wouldn't that create a duplicate time series data ( case when two or more agents receive same policy) ?
Please correct me if i misinterpreted your comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruflin , All metric types under Oracle datastreams are of type gauge
.
I tested the visualisation having
- time_series_metric = gauge
- time_series_metric value not set
I didn't notice any issues.
There exist a plan to test against every time_series_metric_type. But, it may not be possible with Oracle Integration. A different integration must be picked, probably an integration based on prometheus (eg: Influxdb). Will share the details soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good do hear everything just kept working.
One interesting bit to test later would be on what happens if TSDB is enabled with gauge
and later is turned off again. I expect everything still keeps working.
The test you did above I assume is first you had it not set and then set? Did you mix both data or you only had either / or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't have the time_series_metric value set and then i set it to gauge
When i did the visualisation testing, i have some timeseries without having time_series_metric
set and few timeseries record with time_series_metric
values set. It didn't show any problems in Metric type, Area Chart and Line chart based views.
I would test for the scenario you mentioned above and share the outcome.
@@ -12,6 +18,7 @@ | |||
Name of the buffer pool in the instance. | |||
- name: username | |||
type: keyword | |||
dimension: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the username a dimension? If the username changes, should it be a different time series?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why username is added is mentioned above
- name: oracle.performance | ||
type: group | ||
release: beta | ||
fields: | ||
- name: query | ||
type: keyword | ||
dimension: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that dimension values are limited to 1024. IIRC, documents that exceed that value are rejected. It seems like the raw query can easily get over the limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes true. I plan to convert query to a hash value and use the hash value as the dimension field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this data stream contain event based data, similar to a slow log which has an entry for each individual slow execution of a query? Or is it a summary of the statistics for each query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The queries are for returning the summary statistics. Reporting Slow running queries are beyond the scope of Oracle integration.
- external: ecs | ||
name: host.ip | ||
- external: ecs | ||
name: ecs.version | ||
- external: ecs | ||
name: service.address | ||
dimension: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we should always add a dimension that uniquely identifies the shipper, something like an ephemeral_id, to avoid that two senders can send data to the same time series.
I see very useful discussions in this PR. I am hoping we will take the summary and key highlights and capture them in the issue and the guideline document? |
packages/oracle/docs/README.md
Outdated
| oracle.performance.session_count.active | Total count of sessions. | double | | gauge | | ||
| oracle.performance.session_count.inactive | Total count of Inactive sessions. | double | | gauge | | ||
| oracle.performance.session_count.inactive_morethan_onehr | Total inactive sessions more than one hour. | double | | gauge | | ||
| oracle.performance.username | Oracle username | keyword | | | | ||
| oracle.performance.wait.pct_time | Percentage of time waits that are not Idle wait class. | double | percent | gauge | | ||
| oracle.performance.wait.pct_waits | Percentage of number of pct time waits that are not of Idle wait class. | double | percent | gauge | | ||
| oracle.performance.wait.time_waited_secs | Amount of time spent in the wait class by the session. | double | s | gauge | | ||
| oracle.performance.wait.total_waits | Number of times waits of the class occurred for the session. | double | | counter | | ||
| oracle.performance.wait.total_waits | | integer | | counter | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the description disappeared?
@@ -1,7 +1,7 @@ | |||
format_version: 1.0.0 | |||
name: oracle | |||
title: "Oracle" | |||
version: "1.10.0" | |||
version: "1.10.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we increase the compatibility version, it seems to me this is more then just a bugfix bump. 1.11.0
?
@@ -11,7 +11,7 @@ categories: | |||
- infrastructure | |||
release: ga | |||
conditions: | |||
kibana.version: "^8.3.0" | |||
kibana.version: "^8.7.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the requirement bumped to 8.7.0? I'm aware the dimension part will not take affect before but I assume it works in the way that it is just ignored? Are there features inside here that don't work in 8.3.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was specifically looking for TSDB enable-disable flag and hence 8.7.0.
Should TSDB be enabled by default or not? If yes, then enable /disable flag is important to give the user the option to use non-TSDB. If non-TSDB is the default, 8.3.0 is sufficient enough.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it throw an error if you have it in there and you try to install it in 8.3?
"type": "visualization", | ||
"version": "8.3.0" | ||
"type": "lens", | ||
"version": "8.7.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you change this manually to 8.7.0? I'm suprised there is no -SNAPSHOT
suffix :-)
Not required but I think there is a chance to split this PR up into multiple smaller PR's to get it in quickly. For example Lens can be added separately, fingerprint processor change etc. |
Fingerprint processor change is an essential part of TSDB changes that are going in. |
This PR will be split because Lens migration will be taken up separately for all packages. New PR with limited scope is here : #5307 |
Hi! We just realized that we haven't looked into this PR in a while. We're sorry! We're labeling this issue as |
Hi! This PR has been stale for a while and we're going to close it as part of our cleanup procedure. We appreciate your contribution and would like to apologize if we have not been able to review it, due to the current heavy load of the team. Feel free to re-open this PR if you think it should stay open and is worth rebasing. Thank you for your contribution! |
Type of change
Please label this PR with one of the following labels, depending on the scope of your change:
What does this PR do?
Convert Oracle Performance Datastream to TSDB
Checklist
changelog.yml
file.Author's Checklist
How to test this PR locally
Related issues
Screenshots