Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jaeger-v2] Rethink components configuration to align with OTEL #5229

Closed
10 of 16 tasks
yurishkuro opened this issue Feb 24, 2024 · 33 comments
Closed
10 of 16 tasks

[jaeger-v2] Rethink components configuration to align with OTEL #5229

yurishkuro opened this issue Feb 24, 2024 · 33 comments
Labels

Comments

@yurishkuro
Copy link
Member

yurishkuro commented Feb 24, 2024

Right now several PRs that are trying to add support for standard storage backends to v2 are reusing the same configuration for storage as we had in v1. Now is a good time to rethink that configuration, in particular:

  • better align with standard components of OTEL Collector. For example, the gRPC storage ([jaeger-v2] Add support for GRPC storarge #5228) is conceptually very similar to OTLP Exporter - they both send data over gRPC to a remote service. Our configuration is pretty slim, while OTLP Exporter's configuration has a lot more features, there is no reason why we cannot support them (would be even better if we could simply reuse OTEL's configuration utilities for gRPC)
  • our storage configs have been accumulating over time, we can see if there are better logical groupings. For example, ES config is a flat list of everything, we can group many flags into sub-types so that the YAML config looks a lot more structured than using long field names like priority_dependencies_template
  • ensure that we have sensible defaults. For example, in [jaeger-v2] add cassandra e2e integration tests #5398 basic options had to be added to Cassandra configuration, even though in v1 many of them are already defaulted to these values, so we're clearly missing defaulting mechanism. It's somewhat complicated by the fact that the storage extension cannot actually define the defaults in CreateDefaultConfig because at that point we don't known which storage will be used.

Scope

Tasks / outcomes

For each of the above components, we need to ensure

  • All config fields are tagged with mapstructure such that they can be addressable from YAML via "good" names
  • Configs implement Validate() -- example
  • Configs via YAML support the same defaults as in jaeger-v1
  • Configs reuse standard elements of OTEL configs whenever possible, e.g. TLS, clientcfg, etc.
  • Configuration looks "logical" and properly grouped, not flattened with long weird names (good example [Jaeger-V2] Add configurable index data layout and rollover frequency #5790)
  • We create a migration guide document with a table that lists v1 CLI flags and the corresponding v2 config fields (example for Kafka)

How to work on this meta-issue

  • pick a component / area
  • create a dedicated ticket for it
  • copy tasks / outcomes from above
  • use the ticket for design discussions, updates, links to PRs
@yurishkuro yurishkuro converted this from a draft issue Feb 24, 2024
@pavolloffay pavolloffay added the v2 label Mar 13, 2024
yurishkuro added a commit that referenced this issue May 20, 2024
## Which problem is this PR solving?
- part of #5229 

## Description of the changes
- added more grpc storage client configuration to align with otel
 ```
	configgrpc.ClientConfig        `mapstructure:",squash"`
	exporterhelper.TimeoutSettings `mapstructure:",squash"`
 ```
These are not all the configs, but i'll add more based on feedback on
this initial approach.

## How was this change tested?
- not tested yet 

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Harshvir Potpose <[email protected]>
Signed-off-by: Harshvir Potpose <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Co-authored-by: Yuri Shkuro <[email protected]>
@mahadzaryab1
Copy link
Collaborator

@yurishkuro What is the status of this issue? Is there anything I can help with?

@yurishkuro yurishkuro changed the title [jaeger-v2] Rethink storage configuration to align with OTEL [jaeger-v2] Rethink components configuration to align with OTEL Aug 31, 2024
@yurishkuro
Copy link
Member Author

yes, plenty. I just updated the description.

@mahadzaryab1
Copy link
Collaborator

@yurishkuro are the configs for v1 supposed to be in plugin/storage/*/options.go and the configs for v2 supposed to be in pkg/*/config.go?

@yurishkuro
Copy link
Member Author

yes, that's accurate

@mahadzaryab1
Copy link
Collaborator

mahadzaryab1 commented Sep 1, 2024

@yurishkuro how come some stores still have the v2 configuration living in plugin/storage/*? do we want to move those over?

@yurishkuro
Copy link
Member Author

We are in the transition period. Moving them doesn't solve anything, having configs side by side is actually convenient. The long term goal is to only have v2 configurations.

Wwe started v2 by just wrapping v1 configs. For some backends we did preliminary work of cleaning up and making v2 the "primary" config, i.e. the Factory uses v2 config directly, but v1 config still drives the v1 cli flags and can be transformed into v2. We could continue using this pattern. The reason we want v2 configs to be primary is to ensure that all their settings are actually used by the storage implementation - there's less guarantee of that if v1 is primary and v2 is translated into v1.

But overall this ticket is about what v2 config look like, rather than where they are located.

@mahadzaryab1
Copy link
Collaborator

mahadzaryab1 commented Sep 1, 2024

@yurishkuro sounds good! and where should the migration doc be created? is any public google doc fine? Or do we want to have them in a README in this repo?

@yurishkuro
Copy link
Member Author

yes a google doc is fine

@mahadzaryab1
Copy link
Collaborator

@yurishkuro PR for memory storage component at #5925

@mahadzaryab1
Copy link
Collaborator

@yurishkuro PR for badger storage component at #5927

@mahadzaryab1
Copy link
Collaborator

@yurishkuro PR for Cassandra component ready for review at #5949.

@mahadzaryab1
Copy link
Collaborator

@yurishkuro PR for Query Service ready for review at #5998

@mahadzaryab1
Copy link
Collaborator

@yurishkuro we can link the following PRs to the issue description for badger:

@mahadzaryab1
Copy link
Collaborator

@yurishkuro what is the Collector-specific config checklist item referring to?

@yurishkuro
Copy link
Member Author

what is the Collector-specific config checklist item referring to?

if you run SPAN_STORAGE_TYPE=memory go run ./cmd/collector help there are a bunch of collector-specific flags printed that are not related to storage but to the collector itself. Most of them are related to http/grpc endpoints, but some could control the overall collector's behavior. Ideally we need to create a gdoc mapping all these flags to the corresponding OTEL config locations and if any are not supported decide what to do about them.

@mahadzaryab1
Copy link
Collaborator

what is the Collector-specific config checklist item referring to?

if you run SPAN_STORAGE_TYPE=memory go run ./cmd/collector help there are a bunch of collector-specific flags printed that are not related to storage but to the collector itself. Most of them are related to http/grpc endpoints, but some could control the overall collector's behavior. Ideally we need to create a gdoc mapping all these flags to the corresponding OTEL config locations and if any are not supported decide what to do about them.

@yurishkuro Got it! So for this part we just need to create a migration guide? No code changes?

@yurishkuro
Copy link
Member Author

Perhaps, depends on what the comparison would show.

@mahadzaryab1
Copy link
Collaborator

@yurishkuro Thanks! Would you also be able to help me understand how to solve Utilize env var defaults one OTEL supports them in 0.110 ? How do we determine which things need to be defaulted? As well, where do these defaults needs to be utilized? Is it just in the docker compose setups?

@yurishkuro
Copy link
Member Author

We had a number of nasty workarounds because we couldn't use defaults. For instance, Kafka e2e test needs to physically rewrite the config because we can't just have a var for encoding without default (it makes the config invalid by default). I think it's sufficient to just fix that to close the item, since we can add more vars/defaults incrementally as the needs arise.

@mahadzaryab1
Copy link
Collaborator

@yurishkuro we can link the following issues to the issue description as they are complete / in-review:

@yurishkuro
Copy link
Member Author

For Kafka, we have this document.

@mahadzaryab1
Copy link
Collaborator

For Kafka, we have this document.

@yurishkuro Did you want me to link this somewhere?

@yurishkuro
Copy link
Member Author

no - I keep all links in the v2 RFC doc.

@mahadzaryab1
Copy link
Collaborator

mahadzaryab1 commented Oct 4, 2024

@yurishkuro can link #6041 for grpc storage in the issue description

@mahadzaryab1
Copy link
Collaborator

@yurishkuro looks like we're pretty close to finishing this issue - do you know what the status is of #5790? any help needed there?

@yurishkuro
Copy link
Member Author

do you know what the status is of #5790? any help needed there?

I had some comments that the author did not respond to. We can ping the author on the issue - if they don't have time to continue, we could take over that PR, it was nearly done iirc.

@JaredTan95
Copy link
Contributor

do you know what the status is of #5790? any help needed there?

I had some comments that the author did not respond to. We can ping the author on the issue - if they don't have time to continue, we could take over that PR, it was nearly done iirc.

Sorry for the late reply. After @yurishkuro reivew and adjustment, I feel that the logic here is rather complicated for me. I can no longer make the changes at present, and hope to get more help from the maintainer or take over directly.

@mahadzaryab1
Copy link
Collaborator

do you know what the status is of #5790? any help needed there?

I had some comments that the author did not respond to. We can ping the author on the issue - if they don't have time to continue, we could take over that PR, it was nearly done iirc.

Sorry for the late reply. After @yurishkuro reivew and adjustment, I feel that the logic here is rather complicated for me. I can no longer make the changes at present, and hope to get more help from the maintainer or take over directly.

@JaredTan95 no worries! I can try picking up your PR and try addressing the feedback.

@mahadzaryab1
Copy link
Collaborator

do you know what the status is of #5790? any help needed there?

I had some comments that the author did not respond to. We can ping the author on the issue - if they don't have time to continue, we could take over that PR, it was nearly done iirc.

Sorry for the late reply. After @yurishkuro reivew and adjustment, I feel that the logic here is rather complicated for me. I can no longer make the changes at present, and hope to get more help from the maintainer or take over directly.

@JaredTan95 @yurishkuro I've opened a new PR at #6060 since I didn't have permission to modify the existing ones. I've addressed most of the feedback that was left on the original PR. Please take a look and let me know if you have any other feedback.

@mahadzaryab1
Copy link
Collaborator

@yurishkuro The elasticsearch configurations are now complete as well. Any other items we want to address here? Otherwise, I think we can close this issue out.

@yurishkuro
Copy link
Member Author

Did you already update the google doc with the mapping?

@mahadzaryab1
Copy link
Collaborator

@yurishkuro Yes - the migration guide can be viewed here.

@yurishkuro
Copy link
Member Author

I think this is completed - huge thanks to @mahadzaryab1 🚀 , this is a critical milestone for v2.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Jaeger V2 Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

4 participants