Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn on sample data generation with flag #1635

Open
ruflin opened this issue Jan 22, 2024 · 5 comments
Open

Turn on sample data generation with flag #1635

ruflin opened this issue Jan 22, 2024 · 5 comments

Comments

@ruflin
Copy link
Contributor

ruflin commented Jan 22, 2024

There is now a list of integration packages available which are able to generate sample data, see #1634 for more details. It would be nice, if elastic-package could be started with a flag that automatically installs some of these integrations and starts the sample data generation so when testing features, data is already available and ingested in real time.

@ruflin ruflin assigned ruflin and unassigned ruflin Jan 22, 2024
@aspacca
Copy link
Contributor

aspacca commented Jan 23, 2024

It would be nice, if elastic-package could be started with a flag that automatically installs some of these integrations and starts the sample data generation so when testing features, data is already available and ingested in real time.

@ruflin

we are in the scope of the benchmark stream command, correct?

the flag will instruct the command to installs all/some of the integrations that can generate sample data and to generate and ingest sample data for all the installed integrations.

a few considerations on that, impacting on how the feature might be designed and when it's better to add it:

currently the command must be run from a package root, it installs the package and by default starts the sample data generation for all the benchmarks in the package. this is because there are no source packages yet, and we must rely on the assets being already on the filesystem
still without source packages, in order to install more integrations at once and starts the sample data generation for all of them, the command must be run instead from the packages folder (or more generically from a folder containing multiple package roots subfolder).

the flag in this case could be basically used to make the command aware of where is run from, affecting its behaviour.

once source packages will be available, the command will accept a flag to specify what source package has to be downloaded, and we can drop the constraint about where to run the command from.

in this case we still rely on a flag, the one defining the source package, but it will be its absence that will trigger the behaviour you've described: ie, if no specific source package is defined, let's install all/some of them etc etc.

@ruflin
Copy link
Contributor Author

ruflin commented Jan 24, 2024

we are in the scope of the benchmark stream command, correct?

Yes

It sounds like this could become a dependency on #1577 as you describe. I like the idea that at first to not make 1577 a requirement, we support running it from a package directory.

@ruflin
Copy link
Contributor Author

ruflin commented Jan 24, 2024

In #1640 there is an option 2 around checking out the integrations repo which could also help with this effort.

@jsoriano
Copy link
Member

jsoriano commented Jan 24, 2024

Btw, we also have elastic/package-spec#446, that wouldn't be so hard to implement and I think we want in any case. This would also allow to get the source of the package indirectly through the built package.

@ruflin
Copy link
Contributor Author

ruflin commented Jan 25, 2024

This might be an interesting combination. We rely on the registry to get the info about where the package comes from and then we check out the package repository (assuming it is public). For many packages, it will mean checking out the integrations repo so if you use 3 packages and all are from the integrations repo, it will only be checked out once and in the future only the diff to main so things will be fast. This would also mean we don't have to hardcode repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants