Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[abandoned][es] Add index rollover mode that can choose day and hour #2935

Closed
wants to merge 12 commits into from
Closed

Conversation

WalkerWang731
Copy link
Contributor

@WalkerWang731 WalkerWang731 commented Apr 13, 2021

Which problem is this PR solving?

Short description of the changes

  • Add a parameter index-rollover-frequency, default is day mode, will not affect existing usage
  • Minimize changes to ensure compatibility with previous code

Parameter config of collector ingester query (SPAN_STORAGE_TYPE=elasticsearch)

--es-archive.index-rollover-frequency string                     Rotates Jaeger indices over the given period. For example "day" creates "jaeger-span-yyyy-HH-dd" every day after UTC 12AM. Valid options: [hour, day]. Jaeger also support Elasticsearch ILM to manage indices, reference(https://www.jaegertracing.io/docs/deployment/#elasticsearch-ilm-support) (default "day")
--es.index-rollover-frequency string                             Rotates Jaeger indices over the given period. For example "day" creates "jaeger-span-yyyy-HH-dd" every day after UTC 12AM. Valid options: [hour, day]. Jaeger also support Elasticsearch ILM to manage indices, reference(https://www.jaegertracing.io/docs/deployment/#elasticsearch-ilm-support) (default "day")

@WalkerWang731 WalkerWang731 requested a review from a team as a code owner April 13, 2021 11:11
@codecov
Copy link

codecov bot commented Apr 13, 2021

Codecov Report

Merging #2935 (5d63e5a) into master (312f83b) will increase coverage by 0.02%.
The diff coverage is 100.00%.

❗ Current head 5d63e5a differs from pull request most recent head 278203c. Consider uploading reports for the commit 278203c to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2935      +/-   ##
==========================================
+ Coverage   95.95%   95.97%   +0.02%     
==========================================
  Files         223      223              
  Lines        9712     9726      +14     
==========================================
+ Hits         9319     9335      +16     
  Misses        324      324              
+ Partials       69       67       -2     
Impacted Files Coverage Δ
plugin/storage/es/options.go 100.00% <100.00%> (ø)
plugin/storage/es/spanstore/reader.go 100.00% <100.00%> (ø)
pkg/config/tlscfg/cert_watcher.go 92.20% <0.00%> (-2.60%) ⬇️
plugin/storage/badger/spanstore/reader.go 96.08% <0.00%> (+0.71%) ⬆️
cmd/query/app/static_handler.go 96.77% <0.00%> (+1.61%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 312f83b...278203c. Read the comment docs.

@albertteoh
Copy link
Contributor

Sorry for the slow response, I can take a look at this tomorrow.

@WalkerWang731
Copy link
Contributor Author

WalkerWang731 commented Apr 16, 2021

Sorry for the slow response, I can take a look at this tomorrow.

Actually, just only jaeger-span index need rotation hourly, jaeger-service seems do not need rotation, because in our environments, jaeger-span can reach more than 100TB, but jaeger-service just only 500MB around.(maybe just occur our scenario)
I have not thought of good that how to just rotate jaeger-span and look for the most graceful solution
so just commit this version code, if we want separate jaeger-span and jaeger-service rotation logic, we have to change since the factory function.

This is our temporary solution, I know is not graceful solution
File path: plugin/storage/es/spanstore/index_utils.go

func indexWithDate(indexPrefix, indexDateLayout string, date time.Time) string {
	spanDate := date.UTC().Format(indexDateLayout)
	match := strings.Index(indexPrefix, "jaeger-service")
	if match != -1{
		spanDate = date.UTC().Format("2006-01-02")
	}
	return indexPrefix + spanDate
}

btw, although, we can not separate jaeger-span and jaeger-service rotation logic, but for the normal user that small traffic system user, I think daily rotation is enough. and even if jaeger-service have to rorate hourly(if jaeger-span choice hourly rotation), I think is not affected, in addition to more indexes and shards.

so, above is my thought, for your information.

@albertteoh
Copy link
Contributor

Thanks @WalkerWang731 , have you considered the esRollover functionality available in Jaeger?

There is also support for Elasticsearch's ILM functionality that is a bit more sophisticated, providing automated rollover without the need for a cron job to execute the rollover as is required with esRollover: https://www.jaegertracing.io/docs/1.22/deployment/#elasticsearch-ilm-support.

I believe both approaches support rollover by time units (days, hours, minutes), number of documents or by size.

@WalkerWang731
Copy link
Contributor Author

Thanks @WalkerWang731 , have you considered the esRollover functionality available in Jaeger?

There is also support for Elasticsearch's ILM functionality that is a bit more sophisticated, providing automated rollover without the need for a cron job to execute the rollover as is required with esRollover: https://www.jaegertracing.io/docs/1.22/deployment/#elasticsearch-ilm-support.

I believe both approaches support rollover by time units (days, hours, minutes), number of documents or by size.

Thanks @albertteoh , For the Elasticsearch's ILM functionality and esRollover functionality available in Jaeger they looks good.
Currently, these 3 methods each have advantages and disadvantages:

method advantages disadvantages
esRollover functionality available in Jaeger support age docs size need deploy cron job
Elasticsearch's ILM functionality support age docs size and automatically just only support 7 or later
This commit don't need other operation just support age of day and hour, the service index also will be rotated

So, although the solution of this commit is not the most perfect, but the operation of the rotation is also simplified as much as possible to provide users with a simpler way. And maybe we can add the other two ways in the help text.

Actually, this commit part will not have any impact on the code structure, which is a minor change,

But I have to say I'm really not sure whether need to merge, need your review and assessment, I just want to give a simple way to rotate the index.

Thanks, above are my suggestions, for your information.

@albertteoh
Copy link
Contributor

How do you plan to delete old indices? It seems having too many indices can impact shard performance.

@WalkerWang731
Copy link
Contributor Author

How do you plan to delete old indices? It seems having too many indices can impact shard performance.

Yes, more shards will impact ES performance, delete old indices can use Elasticsearch's ILM and have not limited the version. For example, we use ES 6 version. Because just only set Cold phase and Delete phase, so it don't impact jaeger write and query

Copy link
Contributor

@albertteoh albertteoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bhiravabhatla, please remind us why Jaeger's ILM support was limited to ES 7 and does not supported for ES 6's ILM feature?

I would prefer users adopt a single index management solution for rollover and cleanup (either via scripts+cronjob or ILM) rather than require setting a flag + ILM policy for cleanup.

However, I appreciate using esRollover/Cleaner scripts needs a bit more effort on setup and maintenance, and ILM is limited to version 7.

If we determine that ILM isn't feasible in ES 6, this proposal is okay with me so long as we document usage and the recommended alternatives clearly both here and in the official documentation.

flagSet.String(
nsConfig.namespace+suffixIndexRotate,
defaultIndexRotate,
"Optional rotation opportunity of Jaeger indices. For example \"day\" creates \"jaeger-span-yyyy-HH-dd\" every day after UTC 12AM. Valid opportunity: [hour, day]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Optional rotation opportunity of Jaeger indices/Rotates Jaeger indices over the given period/
s/Valid opportunity/Valid options/

As you suggested, I would add a reference to the alternative esRollover and ILM solutions.


indexRotate := strings.ToLower(v.GetString(cfg.namespace + suffixIndexRotate))

switch indexRotate {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can DRY and simplify to something like:

	separator := v.GetString(cfg.namespace + suffixIndexDateSeparator)
	rotateBy := strings.ToLower(v.GetString(cfg.namespace + suffixIndexRotate))
	cfg.IndexDateLayout = initDateLayout(rotateBy, separator)
...
func initDateLayout(rotateBy, separator string) string {
	seps := []interface{}{separator, separator}
	indexLayout := "2006%s01%s02"
	if rotateBy == "hour" {
		indexLayout = "2006%s01%s02%s15"
		seps = append(seps, separator)
	}
	return fmt.Sprintf(indexLayout, seps...)
}

@@ -63,6 +63,7 @@ type Configuration struct {
BulkFlushInterval time.Duration `mapstructure:"-"`
IndexPrefix string `mapstructure:"index_prefix"`
IndexDateLayout string `mapstructure:"index_date_layout"`
IndexRotate string `mapstructure:"-"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ES uses the term Rollover, I think we should stay consistent with terminology.

@@ -200,8 +200,12 @@ func timeRangeIndices(indexName, indexDateLayout string, startTime time.Time, en
firstIndex := indexWithDate(indexName, indexDateLayout, startTime)
currentIndex := indexWithDate(indexName, indexDateLayout, endTime)
for currentIndex != firstIndex {
indices = append(indices, currentIndex)
endTime = endTime.Add(-24 * time.Hour)
if len(indices) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be simplified:

if len(indices) == 0 || indices[len(indices)-1] != currentIndex {
    indices = append(indices, currentIndex)
}

@bhiravabhatla
Copy link
Contributor

bhiravabhatla commented Apr 20, 2021

@bhiravabhatla, please remind us why Jaeger's ILM support was limited to ES 7 and does not supported for ES 6's ILM feature?

I would prefer users adopt a single index management solution for rollover and cleanup (either via scripts+cronjob or ILM) rather than require setting a flag + ILM policy for cleanup.

However, I appreciate using esRollover/Cleaner scripts needs a bit more effort on setup and maintenance, and ILM is limited to version 7.

If we determine that ILM isn't feasible in ES 6, this proposal is okay with me so long as we document usage and the recommended alternatives clearly both here and in the official documentation.

@albertteoh
As I understand, ILM was not OSS in 6.x - correct me if I am wrong, was one of the reasons - #2935 (review).
Not entirely sure about this.

Another reason is is_write_index is only supported for ES Versions > 6.4. We use is_write_index for ILM implementation.

@WalkerWang731 WalkerWang731 changed the title [es] Add index rotation mode that can choice day and hour [es] Add index rollover mode that can choose day and hour Apr 20, 2021
@WalkerWang731
Copy link
Contributor Author

Hi, @albertteoh
I have adjusted it, please review continue

  • adjust terminology from rotate to rollover, include option parameters
  • add reference into .index-rollover help
  • simplified code

flagSet.String(
nsConfig.namespace+suffixIndexRollover,
defaultIndexRollover,
"Rotates Jaeger indices over the given period. For example \"day\" creates \"jaeger-span-yyyy-HH-dd\" every day after UTC 12AM. Valid options: [hour, day]. "+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rollover Jaeger indices for the given period.

s/jaeger-span-yyyy-HH-dd/jaeger-span-yyyy-MM-dd/

nsConfig.namespace+suffixIndexRollover,
defaultIndexRollover,
"Rotates Jaeger indices over the given period. For example \"day\" creates \"jaeger-span-yyyy-HH-dd\" every day after UTC 12AM. Valid options: [hour, day]. "+
"Jaeger also support Elasticsearch ILM to manage indices, reference(https://www.jaegertracing.io/docs/deployment/#elasticsearch-ilm-support)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jaeger additionally supports manual and automated (via ILM) index management. Reference: https://www.jaegertracing.io/docs/deployment/#elasticsearch-rollover.

@@ -295,7 +303,7 @@ func initFromViper(cfg *namespaceConfig, v *viper.Viper) {
cfg.BulkFlushInterval = v.GetDuration(cfg.namespace + suffixBulkFlushInterval)
cfg.Timeout = v.GetDuration(cfg.namespace + suffixTimeout)
cfg.IndexPrefix = v.GetString(cfg.namespace + suffixIndexPrefix)
cfg.IndexDateLayout = initDateLayout(v.GetString(cfg.namespace + suffixIndexDateSeparator))
cfg.IndexDateLayout = initDateLayout(strings.ToLower(v.GetString(cfg.namespace+suffixIndexRollover)), v.GetString(cfg.namespace+suffixIndexDateSeparator))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bit more readable to separate these into two separate variables as it starts to get a bit unwieldy with the additional string getter and manipulation functions:

rolloverBy := strings.ToLower(v.GetString(cfg.namespace + suffixIndexRollover))
separator := v.GetString(cfg.namespace + suffixIndexDateSeparator)
cfg.IndexDateLayout = initDateLayout(rolloverBy, separator)

@@ -343,6 +351,12 @@ func stripWhiteSpace(str string) string {
return strings.Replace(str, " ", "", -1)
}

func initDateLayout(separator string) string {
return fmt.Sprintf("2006%s01%s02", separator, separator)
func initDateLayout(RolloverBy, separator string) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/RolloverBy/rolloverBy/

plugin/storage/es/options.go Outdated Show resolved Hide resolved
@albertteoh
Copy link
Contributor

Thanks @WalkerWang731, just a few minor comments.

By the way, it's good practice to create PRs off a branch in your fork, instead of master, and keeping your fork's master clean and in sync with remote master. I recall having some problems in the past when submitting PRs directly from master, but can't recall what they were. :P

Anybody understand what causes the following build errors?

+ docker login docker.io --username jaegertracingbot --password-stdin
Error: Cannot perform an interactive login from a non TTY device
Error: Process completed with exit code 1.

@WalkerWang731
Copy link
Contributor Author

Thanks @WalkerWang731, just a few minor comments.

By the way, it's good practice to create PRs off a branch in your fork, instead of master, and keeping your fork's master clean and in sync with remote master. I recall having some problems in the past when submitting PRs directly from master, but can't recall what they were. :P

Anybody understand what causes the following build errors?

+ docker login docker.io --username jaegertracingbot --password-stdin
Error: Cannot perform an interactive login from a non TTY device
Error: Process completed with exit code 1.

Hi @albertteoh,
Thanks for your suggestions, I have updated follow your suggestions.
About Error: Cannot perform an interactive login from a non TTY device issue, maybe need @Ashmita152 help to take a look? I'm not sure too

albertteoh
albertteoh previously approved these changes Apr 22, 2021
Copy link
Contributor

@albertteoh albertteoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @WalkerWang731.

@@ -63,6 +63,7 @@ type Configuration struct {
BulkFlushInterval time.Duration `mapstructure:"-"`
IndexPrefix string `mapstructure:"index_prefix"`
IndexDateLayout string `mapstructure:"index_date_layout"`
IndexRollover string `mapstructure:"-"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a poor name, what does "index rollover" mean? From the flag description it seems to mean "frequency", so let's name it IndexRolloverFrequency (and all other vars/flags respectively)

nsConfig.namespace+suffixIndexRollover,
defaultIndexRollover,
"Rotates Jaeger indices over the given period. For example \"day\" creates \"jaeger-span-yyyy-MM-dd\" every day after UTC 12AM. Valid options: [hour, day]. "+
"Jaeger additionally supports manual and automated (via ILM) index management. Reference: https://www.jaegertracing.io/docs/deployment/#elasticsearch-rollover.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jaeger additionally supports manual and automated (via ILM) index management.

This is confusing. Doesn't this day/hour enhancement classifies as "manual"? If not, what other "manual" rollover are we referring to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"manual" here refers to the esRollover/Cleaner python scripts that are used to manually rollover/delete an index (automated via cronjobs).

This day/hour enhancement is a simple and convenient way to rollover an index by duration only and lacks the ability to:

  • Rollover on other dimensions such as by size or number of ES documents.
  • Clean older indices.

Both the esRollover/Cleaner python scripts and ILM (supported in Jaeger) provide the above, which is why I suggested the term "index management" here because I consider these as complete index management solutions and want users to be aware of these more comprehensive but "higher-effort" alternatives.

Agree this is confusing, perhaps we could rephrase to: "This does not delete old indices. For details on complete index management solutions supported by Jaeger, refer to: https://www.jaegertracing.io/docs/deployment/#elasticsearch-rollover".

Open to suggestions.

if len(indices) == 0 || indices[len(indices)-1] != currentIndex {
indices = append(indices, currentIndex)
}
endTime = endTime.Add(-1 * time.Hour)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will do 24 iterations on every invocation, even though the default is daily rotation? Could we make this more efficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok let me take a look

flagSet.String(
nsConfig.namespace+suffixIndexRollover,
defaultIndexRollover,
"Rotates Jaeger indices over the given period. For example \"day\" creates \"jaeger-span-yyyy-MM-dd\" every day after UTC 12AM. Valid options: [hour, day]. "+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding more detailed explanation to the website (for next release) and linking there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

plugin/storage/es/options.go Outdated Show resolved Hide resolved
@mergify mergify bot dismissed albertteoh’s stale review April 26, 2021 06:57

Pull request has been modified.

@WalkerWang731
Copy link
Contributor Author

WalkerWang731 commented Apr 26, 2021

Hi @albertteoh @yurishkuro,
I have followed @yurishkuro suggestions and updated

By the way, I also want to limit jaeger-service indices to daily rollover, how about? Because the jaeger-service only has a small amount of data every day. By doing this users can avoid more shards for the jaeger-service index, but I'm not sure whether to need

Here is my plan

// plugin/storage/es/spanstore/index_utils.go
func indexWithDate(indexPrefix, indexDateLayout string, date time.Time) string {
	if strings.Contains(indexPrefix, "jaeger-service") && strings.HasSuffix(indexDateLayout, "15"){
		separatorLength := (len(indexDateLayout) - 10) / 3
		indexDateLayout = indexDateLayout[0:separatorLength * 2 + 8]
	}
	spanDate := date.UTC().Format(indexDateLayout)
	return indexPrefix + spanDate
}

Because I can't get the index-separator in the indexWithDate() func, so just can realize by calculation indexDateLayout length.
Do you have any better suggestions, please let me know.

Thank you!

@albertteoh
Copy link
Contributor

I also want to limit jaeger-service indices to daily rollover

My two cents... I think it's better to maintain index format consistency between spans and services to avoid surprises and more work to manage these indices. I would also think that service index sizes would be fairly small and so not have as much an impact on ES performance?

Either that or introduce --index-rollover-span-frequency and --index-rollover-service-frequency.

@WalkerWang731
Copy link
Contributor Author

I also want to limit jaeger-service indices to daily rollover

My two cents... I think it's better to maintain index format consistency between spans and services to avoid surprises and more work to manage these indices. I would also think that service index sizes would be fairly small and so not have as much an impact on ES performance?

Either that or introduce --index-rollover-span-frequency and --index-rollover-service-frequency.

Hi @albertteoh
From our experience, one node less than 600 shards is safe, not impact ES performance, 600 shards are the total number, include replica shards, and also depends on how long will keep it.

The formula: (SHARD_COUNT * INDEX_COUNT * (REPLICA_COUNT+ 1) ) * KEEP_DAY / NODE_COUNT

  • KEEP_DAY is the number of indices that actually exist during that time, daily is one day * 1, hourly is one day * 24
  • If the jaeger-service index also is hourly rollover then will expand shard count.
  • Actually, just keep shard count the total as same previous. For example, the previous index shards count is 600(daily), so the current index shards count is 25(hourly, 600 / 24) will be ok.

I also agree that keep the index format consistency between spans and services, but provide like as --index-rollover-span-frequency and --index-rollover-service-frequency will be complex, we must change the factory func and more. I always thought this was a small commit and didn't want to impact too much code.

So, according to the above, limiting jaeger-service shards just for the more optimization. We also keep the current code, and temporarily without regard for the jaeger-service shards, how about?

Do you have any better suggestions, please let me know.

Thank you!

@albertteoh
Copy link
Contributor

Hi @WalkerWang731, sorry I don't completely follow; are you saying it should be okay to leave this PR as is, or is it quite critical to ensure the service index must always be on a daily rollover?

Please also make sure:

  • all your commits are signed.
  • test coverage is adequate to pass the build.

@WalkerWang731
Copy link
Contributor Author

WalkerWang731 commented Apr 27, 2021

Hi @WalkerWang731, sorry I don't completely follow; are you saying it should be okay to leave this PR as is, or is it quite critical to ensure the service index must always be on a daily rollover?

Hi @albertteoh
Ah sorry, I didn't say clearly, I think it's okay for the current PR. service index rollover of daily is not a must and urgent

Please also make sure:

  • all your commits are signed.
  • test coverage is adequate to pass the build.

Yeah, I think it's should be ok now, unit test has passed, except docker relevant issues. I'm not sure whether need @Ashmita152 help to take a look?

+ '[' '!' -f .docker.io.login ']'
+ '[' -v DOCKERHUB_TOKEN ']'
+ printenv DOCKERHUB_TOKEN
+ docker login docker.io --username jaegertracingbot --password-stdin
Error: Cannot perform an interactive login from a non TTY device

@WalkerWang731
Copy link
Contributor Author

close this PR because can't fix the signoff issue and let me recreate a new PR.

@WalkerWang731 WalkerWang731 changed the title [es] Add index rollover mode that can choose day and hour [abandoned][es] Add index rollover mode that can choose day and hour Apr 27, 2021
@jpkrohling jpkrohling added this to the Release 1.23.0 milestone Jun 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ES index rotation mode can choice with day and hour
5 participants