-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add remote read clusters option to elasticsearch to enable cross-cluster querying #1892
Conversation
1e4750e
to
d4a5fc4
Compare
Codecov Report
@@ Coverage Diff @@
## master #1892 +/- ##
==========================================
- Coverage 98.44% 98.42% -0.02%
==========================================
Files 197 197
Lines 9648 9675 +27
==========================================
+ Hits 9498 9523 +25
- Misses 114 115 +1
- Partials 36 37 +1
Continue to review full report at Codecov.
|
plugin/storage/es/options.go
Outdated
nsCfg.remoteReadClusters = opt.primary.remoteReadClusters | ||
} | ||
|
||
if nsCfg.remoteReadClusters == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be omitted and inner block added to line 314
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return func(indices string, startTime time.Time, endTime time.Time) []string { | ||
jaegerIndices := fn(indices, startTime, endTime) | ||
if len(remoteReadClusters) > 0 { | ||
remoteJaegerIndices := []string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know the size at creation time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want the array to be statically sized?
remoteJaegerIndices := [size]string{}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After removing local just appended to the jaegerIndices to simplify things.
for _, remoteCluster := range remoteReadClusters { | ||
for _, jaegerIndex := range jaegerIndices { | ||
|
||
if remoteCluster != "local" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid this? We can require not adding it via flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some options:
--es.remote-read-clusters-remote-only
--es.remote-read-clusters-no-local
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something, but why two flags? The names on the flag will be added to indices. If local is added it will be added. if it is not added it won't be added. I just want to avoid unnecessary checks in the reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I misunderstood. Are you suggesting we always keep the local un-prefixed index by default? I really struggled to figure out how we should toggle on and off the local cluster reference. I've seen use cases where you have an elastic query node that only has remote clusters and no data local.
Current:
Input: --es.remote-read-clusters=local,cluster_one
Output: /twitter,cluster_one:twitter
Alternate?:
Input: --es.remote-read-clusters=cluster_one
Output: /twitter,cluster_one:twitter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen use cases where you have an elastic query node that only has remote clusters and no data local.
can you provide any references to this use-case?
Then I guess we cannot go with the following as it will always add local data.
Alternate?:
Input: --es.remote-read-clusters=cluster_one
Output: /twitter,cluster_one:twitter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the input! This does make things much simpler! The no local use case can be addressed in the future if anyone ever needs it.
index: "foo:" + indexPrefixSeparator + spanIndex + archiveReadIndexSuffix}, | ||
indices: []string{"foo:" + indexPrefixSeparator + spanIndex + archiveReadIndexSuffix}}, | ||
{params: SpanReaderParams{Client: client, Logger: logger, MetricsFactory: metricsFactory, | ||
IndexPrefix: "", Archive: false, RemoteReadClusters: []string{"local", "cluster_one"}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a test with two remote clusters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@allen13 does this switch work for ES 5, 6 and 7? |
@pavolloffay Cross-cluster query support was added in Elasticserch 5.4. https://www.elastic.co/guide/en/elasticsearch/reference/5.5/modules-cross-cluster-search.html |
d4a5fc4
to
5d40fa0
Compare
…ter querying Signed-off-by: allen13 <[email protected]>
5d40fa0
to
6de969d
Compare
@@ -278,6 +287,13 @@ func initFromViper(cfg *namespaceConfig, v *viper.Viper) { | |||
// GetPrimary returns primary configuration. | |||
func (opt *Options) GetPrimary() *config.Configuration { | |||
opt.primary.Servers = strings.Split(opt.primary.servers, ",") | |||
|
|||
if opt.primary.remoteReadClusters == "" { | |||
opt.primary.RemoteReadClusters = []string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this branch is not necessary
@@ -293,6 +309,13 @@ func (opt *Options) Get(namespace string) *config.Configuration { | |||
nsCfg.servers = opt.primary.servers | |||
} | |||
nsCfg.Servers = strings.Split(nsCfg.servers, ",") | |||
|
|||
if nsCfg.remoteReadClusters == "" { | |||
nsCfg.RemoteReadClusters = []string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
branch not necessary
// Elasticsearch cross cluster query api example: GET /twitter,cluster_one:twitter,cluster_two:twitter/_search | ||
// Add a remote cluster prefix for each cluster and for each index and add it to the list of original indices | ||
func addRemoteReadClusters(fn timeRangeIndexFn, remoteReadClusters []string) timeRangeIndexFn { | ||
return func(indices string, startTime time.Time, endTime time.Time) []string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be short-circuited to return fn
if len(remoteReadClusters) == 0
?
indices: []string{"foo:" + indexPrefixSeparator + spanIndex + archiveReadIndexSuffix}}, | ||
{params: SpanReaderParams{Client: client, Logger: logger, MetricsFactory: metricsFactory, | ||
IndexPrefix: "", Archive: false, RemoteReadClusters: []string{"cluster_one", "cluster_two"}}, | ||
indices: []string{spanIndex + dateFormat, "cluster_one:" + spanIndex + dateFormat, "cluster_two:" + spanIndex + dateFormat}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make this stmt multi-line, one cluster per line
@allen13 @pavolloffay I'd be really interested in seeing this PR merged. Do you know if anyone is actively looking into this? I'd be happy to try and get this ready otherwise if there is still willingness for the feature. Thanks! |
Closing, as I think the linked PR replaced this one. |
Which problem is this PR solving?
Enables elasticsearch cross-cluster querying
Resolves #1883
Short description of the changes
The cross cluster query api functions by appending a pre-configured remote cluster to an index e.g. cluster_one:index_name. This change simply applies a prefix for each remote cluster and for each index. You will end up with
total_indicies = indicies * remote_clusters
. Also added a special caselocal
which will not append a prefix and query the local cluster.This works with both standard and rollover indices since it mostly comes down to a simple string manipulation.