Use Pyairbyte for SQL source and Blob Destination Performance testing? #561

ramandatascientist · 2024-12-16T19:56:18Z

Hello

We are using SQL as a source and Blob as a destination. We have limited number of records in our dev SQL source. We want to run a performance testing to understand how Airbyte will perform if we have 1 M, 10 M, 100 M records in a sync.

Is there a way to run performance tests on Airbyte using Pyairbyte with Fake data library? or any other way?

aaronsteers · 2024-12-17T01:17:23Z

@ramandatascientist - Thanks for raising this. The benchmark CLI command may be of help to you. I'm about to merge this PR to our auto-generated docs - adding a new docs page for the cli module.

Other things which should be helpful:

After each sync, a performance log path is printed, and that file will have a jsonl line for all sync runs, including records/second, bytes/second and many other helpful performance stats.
We also have convenience functions airbyte.sources.get_benchmark_source() and airbyte.destinations.get_noop_destination() - which you can use in your scripts if you want more control than the pyab benchmark CLI command, while still leveraging existing patterns.
However you run the sync operations, stats will be appended to the same performance log file.

Does this help? Let us know how it goes!

ramandatascientist · 2024-12-17T02:01:58Z

Hi @aaronsteers Thank you for your response. Is there a way to mock the datasets against Azure SQL? We are running airbyte against dev Azure SQL databases that has limited number of records, so I am wondering if there is a way to fake/mock the datasets and run performance test against it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Pyairbyte for SQL source and Blob Destination Performance testing? #561

Use Pyairbyte for SQL source and Blob Destination Performance testing? #561

ramandatascientist commented Dec 16, 2024

aaronsteers commented Dec 17, 2024 •

edited

Loading

ramandatascientist commented Dec 17, 2024 •

edited

Loading

Use Pyairbyte for SQL source and Blob Destination Performance testing? #561

Use Pyairbyte for SQL source and Blob Destination Performance testing? #561

Comments

ramandatascientist commented Dec 16, 2024

aaronsteers commented Dec 17, 2024 • edited Loading

ramandatascientist commented Dec 17, 2024 • edited Loading

aaronsteers commented Dec 17, 2024 •

edited

Loading

ramandatascientist commented Dec 17, 2024 •

edited

Loading