Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add remote read clusters option to elasticsearch to enable cross-cluster querying #1892

Closed
wants to merge 1 commit into from

Conversation

allen13
Copy link
Contributor

@allen13 allen13 commented Nov 1, 2019

Which problem is this PR solving?

Enables elasticsearch cross-cluster querying

Resolves #1883

Short description of the changes

The cross cluster query api functions by appending a pre-configured remote cluster to an index e.g. cluster_one:index_name. This change simply applies a prefix for each remote cluster and for each index. You will end up with total_indicies = indicies * remote_clusters. Also added a special case local which will not append a prefix and query the local cluster.

This works with both standard and rollover indices since it mostly comes down to a simple string manipulation.

@codecov
Copy link

codecov bot commented Nov 1, 2019

Codecov Report

Merging #1892 into master will decrease coverage by 0.01%.
The diff coverage is 94.11%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1892      +/-   ##
==========================================
- Coverage   98.44%   98.42%   -0.02%     
==========================================
  Files         197      197              
  Lines        9648     9675      +27     
==========================================
+ Hits         9498     9523      +25     
- Misses        114      115       +1     
- Partials       36       37       +1     
Impacted Files Coverage Δ
plugin/storage/es/options.go 99.02% <88.23%> (-0.98%) ⬇️
plugin/storage/es/factory.go 100.00% <100.00%> (ø)
plugin/storage/es/spanstore/reader.go 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1dd338d...6de969d. Read the comment docs.

nsCfg.remoteReadClusters = opt.primary.remoteReadClusters
}

if nsCfg.remoteReadClusters == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be omitted and inner block added to line 314

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return func(indices string, startTime time.Time, endTime time.Time) []string {
jaegerIndices := fn(indices, startTime, endTime)
if len(remoteReadClusters) > 0 {
remoteJaegerIndices := []string{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know the size at creation time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want the array to be statically sized?
remoteJaegerIndices := [size]string{}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After removing local just appended to the jaegerIndices to simplify things.

for _, remoteCluster := range remoteReadClusters {
for _, jaegerIndex := range jaegerIndices {

if remoteCluster != "local" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid this? We can require not adding it via flags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some options:
--es.remote-read-clusters-remote-only
--es.remote-read-clusters-no-local

What do you think?

Copy link
Member

@pavolloffay pavolloffay Nov 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but why two flags? The names on the flag will be added to indices. If local is added it will be added. if it is not added it won't be added. I just want to avoid unnecessary checks in the reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I misunderstood. Are you suggesting we always keep the local un-prefixed index by default? I really struggled to figure out how we should toggle on and off the local cluster reference. I've seen use cases where you have an elastic query node that only has remote clusters and no data local.

Current:
Input: --es.remote-read-clusters=local,cluster_one
Output: /twitter,cluster_one:twitter
Alternate?:
Input: --es.remote-read-clusters=cluster_one
Output: /twitter,cluster_one:twitter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen use cases where you have an elastic query node that only has remote clusters and no data local.

can you provide any references to this use-case?

Then I guess we cannot go with the following as it will always add local data.

Alternate?:
Input: --es.remote-read-clusters=cluster_one
Output: /twitter,cluster_one:twitter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input! This does make things much simpler! The no local use case can be addressed in the future if anyone ever needs it.

index: "foo:" + indexPrefixSeparator + spanIndex + archiveReadIndexSuffix},
indices: []string{"foo:" + indexPrefixSeparator + spanIndex + archiveReadIndexSuffix}},
{params: SpanReaderParams{Client: client, Logger: logger, MetricsFactory: metricsFactory,
IndexPrefix: "", Archive: false, RemoteReadClusters: []string{"local", "cluster_one"}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a test with two remote clusters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@pavolloffay
Copy link
Member

@allen13 does this switch work for ES 5, 6 and 7?

@allen13
Copy link
Contributor Author

allen13 commented Nov 4, 2019

@pavolloffay Cross-cluster query support was added in Elasticserch 5.4. https://www.elastic.co/guide/en/elasticsearch/reference/5.5/modules-cross-cluster-search.html

@allen13 allen13 force-pushed the remote-read-clusters branch from d4a5fc4 to 5d40fa0 Compare November 6, 2019 13:47
@allen13 allen13 force-pushed the remote-read-clusters branch from 5d40fa0 to 6de969d Compare November 6, 2019 13:50
@@ -278,6 +287,13 @@ func initFromViper(cfg *namespaceConfig, v *viper.Viper) {
// GetPrimary returns primary configuration.
func (opt *Options) GetPrimary() *config.Configuration {
opt.primary.Servers = strings.Split(opt.primary.servers, ",")

if opt.primary.remoteReadClusters == "" {
opt.primary.RemoteReadClusters = []string{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this branch is not necessary

@@ -293,6 +309,13 @@ func (opt *Options) Get(namespace string) *config.Configuration {
nsCfg.servers = opt.primary.servers
}
nsCfg.Servers = strings.Split(nsCfg.servers, ",")

if nsCfg.remoteReadClusters == "" {
nsCfg.RemoteReadClusters = []string{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

branch not necessary

// Elasticsearch cross cluster query api example: GET /twitter,cluster_one:twitter,cluster_two:twitter/_search
// Add a remote cluster prefix for each cluster and for each index and add it to the list of original indices
func addRemoteReadClusters(fn timeRangeIndexFn, remoteReadClusters []string) timeRangeIndexFn {
return func(indices string, startTime time.Time, endTime time.Time) []string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be short-circuited to return fn if len(remoteReadClusters) == 0?

indices: []string{"foo:" + indexPrefixSeparator + spanIndex + archiveReadIndexSuffix}},
{params: SpanReaderParams{Client: client, Logger: logger, MetricsFactory: metricsFactory,
IndexPrefix: "", Archive: false, RemoteReadClusters: []string{"cluster_one", "cluster_two"}},
indices: []string{spanIndex + dateFormat, "cluster_one:" + spanIndex + dateFormat, "cluster_two:" + spanIndex + dateFormat}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make this stmt multi-line, one cluster per line

@dgrizzanti
Copy link
Contributor

@allen13 @pavolloffay I'd be really interested in seeing this PR merged. Do you know if anyone is actively looking into this? I'd be happy to try and get this ready otherwise if there is still willingness for the feature. Thanks!

@jpkrohling
Copy link
Contributor

Closing, as I think the linked PR replaced this one.

@jpkrohling jpkrohling closed this Aug 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Elasticsearch Cross Cluster Query
5 participants