-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export benchmark information as line protocol #6107
Comments
I am currently focused on getting to the point where we can run the benchmarks repeatedly -- once I have it so I can easily run the benchmarks I will start working on running and collecting data over time. I view the line protocol conversion as part of the story of over time conversion |
I added details of a proposed design in the |
@alamb would you mind to clarify a bit? Are you planning to keep a collection kinda If so the its also possible to backfill the history with archived benchmarks/versions |
Apache Arrow uses the Conbench for similar purpose https://github.com/conbench/conbench |
Yes -- that is basically what I have in mind. In my mind we would store it as lineprotocol and check it into a repo somewhere and then visualize it with existing tools (e.g. grafana and influxdb, which is what I know, but I am happy to use some other open source stack)
Agree
Yes, I looked briefly into conbench (in fact there is some vestigal code in datafusion -- see https://github.com/apache/arrow-datafusion/tree/main/conbench and #5504 for details) TLDR is I could not get it to work, and it seems as if the dev team went dormant(ish) so I didn't pursue it farther. If someone else can get it to work that would be great |
Currently working on this. I have extended the existing
At this moment I am trying to setup an ingestion into influx using docker, but cannot quite seem to make the data available to visualize. |
@Smurphy000 could you potentially push up what you have as a draft PR ? Maybe I can help with the "how to get this ingested / visualized" as I have more experience with that |
Is your feature request related to a problem or challenge?
We want to have information about DataFusion's performance over time -- #5504. This is becoming more important as we work on more performance items / optimizations such as #5904
Currently the datafusion benchmarks in https://github.com/apache/arrow-datafusion/tree/main/benchmarks#datafusion-benchmarks can output the runs results as a JSON file.
I would like to use existing visualization systems (like timeseries databases).
Describe the solution you'd like
I would like to output the benchmark data optionally as LineProtocol https://docs.influxdata.com/influxdb/cloud-iox/reference/syntax/line-protocol/ so that it can be visualized by grafana / other systems that can handle line protocol
See https://grafana.com/docs/grafana-cloud/data-configuration/metrics/metrics-influxdb/push-from-telegraf/
Proposed Design
Write a python script, modeled after
compare.py
, that takes a performance json file and produces as output lineprotocolDesired output
measurement:
benchmark
tags: details from run
fields:
query
iteration
,row_count
,elapsed_ms
timestamp: ns since epoch (I think that means multiply by 1000, but maybe by 1,000,000)
Example output
A line like this for each element in the
queries
array:Example input:
Here is a zip file with a bunch of example benchmark json files: results.zip
Describe alternatives you've considered
No response
Additional context
Related to #5504 tracking data over time
The text was updated successfully, but these errors were encountered: