-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Eventgen backfill with perdayvolume does not produce the proper volume #166
Comments
can not reproduced with latest code. |
@GordonWang
` |
Nice catch! |
i suppose there are two way to fix:
i'll suggest the first one |
Or we can just set the backfillearliest=et when the queue is full
获取 Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
发件人: Yifeng <[email protected]>
发送时间: Tuesday, April 30, 2019 9:05:13 PM
收件人: splunk/eventgen
抄送: Ryan Yeung; State change
主题: Re: [splunk/eventgen] [BUG] Eventgen backfill with perdayvolume does not produce the proper volume (#166)
i suppose there are two way to fix:
* block on generatorQueue.put instead of catch Full and ignore
* auto adjust interval when backfill is not done to a proper value(total backfill time / size of queue) and rollback it after backfill is done
i'll suggest the first one
―
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub<#166 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ADHXJ5YDRJMDJPXLT5Z7S2LPTA7YTANCNFSM4HGRILKQ>.
|
Not sure I understand, why would you block if it's full? We're overflowing our own queue, and it should drop and log / notify like it does now. Currently the backfill kicks off 1 generator for every interval, and shoves it into the queue, without any care of max queue size. The proper fix would be to lock the main timer thread and do a "while room in queue" to stick in the events, or, spawn another timer for the backfill with an end (and then block that thread with "while room in queue"). If there isn't enough room in the queue, just sit there until there is, and don't start the backfill interval, but this becomes blocking on the timer, and you're causing a different block. This could also cause other timers to not kick off and is crazy dangerous. to the backfillearliest=et, isn't this the same as just not doing the backfill? Are you saying we should just reverse fill instead? Or are you saying, try backfill, if it fails, just skip to the "now" interval? However isn't the queue still full? What happens if that maxes the queue and then the next interval for the "live" data hits, it can't add it in? Right now, it just skips the first interval, and it causes the sleep for the interval timer, giving the queue time to empty a little before shoving in more events. Maybe the solution is just simply allowing the queue to be larger? or manually configured queue length sizes? I just picked a pretty small number for the queue size to start, you could easily up it to 50k or something, it just might take longer for EG to stabilize. |
If we block when full, we'll encounter difficulties with locks on the timer, as Brian mentioned. I also don't think that skipping to now would be smart because it gives the user an inconsistent stream of events and could cause a lot of confusion over the meaning of backfill. |
Making the queue larger does not make any sense to me. What is the right "large" queue size? How does a customer know how large the queue size he should set for his workload? What if I want to generate backfill data for one month? even one year? |
I believe the discussion sort of leads to what philosophy you have towards user experience. My personal opinion is that there always will be a tipping point for unable to meet up with expectation for Eventgen. In this case, as Gordon mentioned, some value of "large" queue size won't solve the problem. I am fine with providing queue size as a parameter, because this allows some flexibility to the user and user's environment. Then, we should drop the upcoming backfill interval (without blocking timer) and log to the user because it is simply not the user's Eventgen that can handle. This is more of a "it is ok to fail, but fail safely" approach. User will notice that certain periods of backfill data are missing due to the configuration of Eventgen and will have to take an action. Eventgen's job should be trying its best to meet the rest of the demand without breaking. I like the idea of waiting for the available queue, but afraid that this might cause threads to be blocked. If we can find a way to safely log backfill transactions that failed due to queue overfill, it would be a nice solution. Something similar to how DBs recover using transactions. How about write those transactions to a file or memory (if there is any resources left), and redo the failed transactions at later times such as when queue is empty or when we get to the current time? Once we get to the current time, backfill supposedly stops, so in many cases, will free up some queue spaces. |
I think it's easy to sum of like so, "you asked me for 10 weeks of data, and "now"", how do you expect me to handle that you only have the resources to generate 1 week of data? Right now eventgen drops all the remaining backfill, and drops the current interval to give time for the queue to empty a bit. Increasing the queue size will let you get 2weeks of data instead of 1. Blocking the queue means you'll never get to "now" if the request is too large, and I don't think is a viable solution. If we really want to support an unlimted backfil, I think the backfill generators do need their own queue, and since it's always boundeded, that queue should be unlimited in size. It'll allow a user to also then set a number of processes to "backfill" that won't impact "now". That being said, i'm not sure backfilling in this style really is even remotely viable, nor does splunk handle backfill right. It's great for dashboards, but anything using summary indexing will miss events anyways, and it impacts datamodel searches negatively without letting the user know why. To put it bluntly, I don't mind having the feature if we have cpu's to spare, but once they're full, I think we should eject that backfill, and focus on meeting current requests. |
Lets book a meeting on this, we need to discuss this in a forum better than text :P |
* Fix jinja template bug under SA-Eventgen app * Feature timeMultiple (#141) * Adjusting interval with timeMultiple * Update issue templates add the bug report and feature request issue templates * changing the stanzas to produce data * Windbag generator/count + end=0 edge cases (#145) * Generator handling for count = -1, end = 0 working properly * Update docs (#146) * Updated docs, added release notes to changelog * Bump version * add python code style lint and format * [Docs] update contribute docs (#148) * [Docs] update contribute docs * [Docs] update the contribute * Fix make docs bug and summary anchor link error * init unittest * [Build] add ut for timeparser functions * Add more UT for config module * Add more unit tests for config module * Pep8 (#151) * Format to code standards, addressed linter errors/warnings * Update docs * Post-PEP8 Fixes (#157) * skip sort, gitignore 3rd party libs * fixed yapf ignore, ran format across repo * Issue 160 (#163) * Fixed timer and token * Added a conditional for end=-1 * Update eventgentimer.py * Fixed timer and token (#162) * add extendIndexes feature (#154) * add extendIndexes feature * set extendIndexes as a list value * correct log level * upate doc, change num to weight * Test fix (#168) * Add sample functional test for replay mode * Add token replacement functional tests * skip failed case * Added a timeout * created a results dir * Update version.json * Fix previous pep8 format regression bug (#171) * Fix merge conflict bug * Delete tool orca related part (#178) * add test for jinja template test (#177) * clean up the logic about sampleDir and jinja_template_dir setting Add functional test for jinja template generator Fixes #174 Fixes #167 * revert the change about token, sample mode test cases * merge the change from develop branch * merge test conf from develop branch * use urllib 1.24.2 * format with pep8 * fix bug in template dir resolution * update test case to address multiline event * fix a issue that when setup eventgen with 200+ indexers, any exceptions will terminate setup and re-writing default configure files * Fix bug #179 urlparams is not set (#181) * fix issue-183 * add modinput ft (#185) * Issue 159 (#186) * Fixed timer and token * Added a conditional for end=-1 * Added timeMultiple logic * Time Multiple Fix * Add default pull request reviewers (#190) * Default to -1 (#189) * Changed verbose -> verbosity (#191) * Update README.md (#195) Like other Splunk products - Splunk Enterprise Security, Splunk Business Workflow ... - Splunk Event Generator does not need a definitive article "The" before the product name. * update doc for friendly reading and add backward capability section. (#193) * Update index.md (#194) Update index.md * change the sampleDir setting in test cases (#196) * return status=2 when all tasks finished. (#182) * return status=2 when all tasks finished. * Change release verion to 6.3.6 (#200) * Change release verion to 6.3.6 * Fix flaky tests (#204) * Fix flaky tests * Rename modinput to mod_input to avoid functional test fail * when hit an exception, make sure egx logs it to files * Fix circleci status badges (#208) * Clean up and interval error (#211) * Fix generatorWorks not work issue (#207) * Fix navigation error with installed with splunk stream (#214) * add metric_httpevent plugin * update log content in metric_httpevent * add doc for metric_httpevent * Add 3rd lib in app (#210) * Add 3rd lib in app * Bugfix/197 multiprocess not working (#218) * Fix issue 197 multiprocess not working * fix issue 219 (#220) * define httpevent_core and add fix eventgen-httpeventout log handler * restore samples file * restore unit test file * add metric_httpevent plugin * update log content in metric_httpevent * add doc for metric_httpevent * define httpevent_core and add fix eventgen-httpeventout log handler * restore samples file * restore unit test file * Add license credits (#222) * Feature/multi indexes (#224) * Fix jinja template bug under SA-Eventgen app * Update docs (#146) * Updated docs, added release notes to changelog * Bump version * add python code style lint and format * Add more unit tests for config module * Pep8 (#151) * Format to code standards, addressed linter errors/warnings * add extendIndexes feature * set extendIndexes as a list value * upate doc, change num to weight * calculate generate rate and use sequential index replacement * update dockerfile * randomize index replacement in each batch * Fix jinja template bug under SA-Eventgen app * Update docs (#146) * Updated docs, added release notes to changelog * add python code style lint and format * Add more unit tests for config module * calculate generate rate and use sequential index replacement * update dockerfile * randomize index replacement in each batch * fix no len dict out of range * clean duplicate code * Revert "Metrics output plugin" (#226) * [issue 217]disable logging queue in multiprocess mode (#223) * disable logging queue in multiprocess mode * Fixed fileName (#229) * Issue 201 (#221) * Removing unavailable servers * Removed nonresponding httpserver * Added a counter * fixed a bug * Added docs and spec * Addressing unused var * fixing url reference * fixed currentreadsize * Httpevent str formatting * fix #166 (#192)
* Fix jinja template bug under SA-Eventgen app * Feature timeMultiple (#141) * Adjusting interval with timeMultiple * Update issue templates add the bug report and feature request issue templates * changing the stanzas to produce data * Windbag generator/count + end=0 edge cases (#145) * Generator handling for count = -1, end = 0 working properly * Update docs (#146) * Updated docs, added release notes to changelog * Bump version * add python code style lint and format * [Docs] update contribute docs (#148) * [Docs] update contribute docs * [Docs] update the contribute * Fix make docs bug and summary anchor link error * init unittest * [Build] add ut for timeparser functions * Add more UT for config module * Add more unit tests for config module * Pep8 (#151) * Format to code standards, addressed linter errors/warnings * Update docs * Post-PEP8 Fixes (#157) * skip sort, gitignore 3rd party libs * fixed yapf ignore, ran format across repo * Issue 160 (#163) * Fixed timer and token * Added a conditional for end=-1 * Update eventgentimer.py * Fixed timer and token (#162) * add extendIndexes feature (#154) * add extendIndexes feature * set extendIndexes as a list value * correct log level * upate doc, change num to weight * Test fix (#168) * Add sample functional test for replay mode * Add token replacement functional tests * skip failed case * Added a timeout * created a results dir * Update version.json * Fix previous pep8 format regression bug (#171) * Fix merge conflict bug * Delete tool orca related part (#178) * add test for jinja template test (#177) * clean up the logic about sampleDir and jinja_template_dir setting Add functional test for jinja template generator Fixes #174 Fixes #167 * revert the change about token, sample mode test cases * merge the change from develop branch * merge test conf from develop branch * use urllib 1.24.2 * format with pep8 * fix bug in template dir resolution * update test case to address multiline event * fix a issue that when setup eventgen with 200+ indexers, any exceptions will terminate setup and re-writing default configure files * Fix bug #179 urlparams is not set (#181) * fix issue-183 * add modinput ft (#185) * Issue 159 (#186) * Fixed timer and token * Added a conditional for end=-1 * Added timeMultiple logic * Time Multiple Fix * Add default pull request reviewers (#190) * Default to -1 (#189) * Changed verbose -> verbosity (#191) * Update README.md (#195) Like other Splunk products - Splunk Enterprise Security, Splunk Business Workflow ... - Splunk Event Generator does not need a definitive article "The" before the product name. * update doc for friendly reading and add backward capability section. (#193) * Update index.md (#194) Update index.md * change the sampleDir setting in test cases (#196) * return status=2 when all tasks finished. (#182) * return status=2 when all tasks finished. * Change release verion to 6.3.6 (#200) * Change release verion to 6.3.6 * Fix flaky tests (#204) * Fix flaky tests * Rename modinput to mod_input to avoid functional test fail * when hit an exception, make sure egx logs it to files * Fix circleci status badges (#208) * Clean up and interval error (#211) * Fix generatorWorks not work issue (#207) * Fix navigation error with installed with splunk stream (#214) * add metric_httpevent plugin * update log content in metric_httpevent * add doc for metric_httpevent * Add 3rd lib in app (#210) * Add 3rd lib in app * Bugfix/197 multiprocess not working (#218) * Fix issue 197 multiprocess not working * fix issue 219 (#220) * define httpevent_core and add fix eventgen-httpeventout log handler * restore samples file * restore unit test file * add metric_httpevent plugin * update log content in metric_httpevent * add doc for metric_httpevent * define httpevent_core and add fix eventgen-httpeventout log handler * restore samples file * restore unit test file * Add license credits (#222) * Feature/multi indexes (#224) * Fix jinja template bug under SA-Eventgen app * Update docs (#146) * Updated docs, added release notes to changelog * Bump version * add python code style lint and format * Add more unit tests for config module * Pep8 (#151) * Format to code standards, addressed linter errors/warnings * add extendIndexes feature * set extendIndexes as a list value * upate doc, change num to weight * calculate generate rate and use sequential index replacement * update dockerfile * randomize index replacement in each batch * Fix jinja template bug under SA-Eventgen app * Update docs (#146) * Updated docs, added release notes to changelog * add python code style lint and format * Add more unit tests for config module * calculate generate rate and use sequential index replacement * update dockerfile * randomize index replacement in each batch * fix no len dict out of range * clean duplicate code * Revert "Metrics output plugin" (#226) * [issue 217]disable logging queue in multiprocess mode (#223) * disable logging queue in multiprocess mode * Fixed fileName (#229) * Issue 201 (#221) * Removing unavailable servers * Removed nonresponding httpserver * Added a counter * fixed a bug * Added docs and spec * Addressing unused var * fixing url reference * fixed currentreadsize * Httpevent str formatting * fix #166 (#192) * Change release version to 6.4.0 and add release notes (#232)
* Fix jinja template bug under SA-Eventgen app * Feature timeMultiple (#141) * Adjusting interval with timeMultiple * Update issue templates add the bug report and feature request issue templates * changing the stanzas to produce data * Windbag generator/count + end=0 edge cases (#145) * Generator handling for count = -1, end = 0 working properly * Update docs (#146) * Updated docs, added release notes to changelog * Bump version * add python code style lint and format * [Docs] update contribute docs (#148) * [Docs] update contribute docs * [Docs] update the contribute * Fix make docs bug and summary anchor link error * init unittest * [Build] add ut for timeparser functions * Add more UT for config module * Add more unit tests for config module * Pep8 (#151) * Format to code standards, addressed linter errors/warnings * Update docs * Post-PEP8 Fixes (#157) * skip sort, gitignore 3rd party libs * fixed yapf ignore, ran format across repo * Issue 160 (#163) * Fixed timer and token * Added a conditional for end=-1 * Update eventgentimer.py * Fixed timer and token (#162) * add extendIndexes feature (#154) * add extendIndexes feature * set extendIndexes as a list value * correct log level * upate doc, change num to weight * Test fix (#168) * Add sample functional test for replay mode * Add token replacement functional tests * skip failed case * Added a timeout * created a results dir * Update version.json * Fix previous pep8 format regression bug (#171) * Fix merge conflict bug * Delete tool orca related part (#178) * add test for jinja template test (#177) * clean up the logic about sampleDir and jinja_template_dir setting Add functional test for jinja template generator Fixes #174 Fixes #167 * revert the change about token, sample mode test cases * merge the change from develop branch * merge test conf from develop branch * use urllib 1.24.2 * format with pep8 * fix bug in template dir resolution * update test case to address multiline event * fix a issue that when setup eventgen with 200+ indexers, any exceptions will terminate setup and re-writing default configure files * Fix bug #179 urlparams is not set (#181) * fix issue-183 * add modinput ft (#185) * Issue 159 (#186) * Fixed timer and token * Added a conditional for end=-1 * Added timeMultiple logic * Time Multiple Fix * Add default pull request reviewers (#190) * Default to -1 (#189) * Changed verbose -> verbosity (#191) * Update README.md (#195) Like other Splunk products - Splunk Enterprise Security, Splunk Business Workflow ... - Splunk Event Generator does not need a definitive article "The" before the product name. * update doc for friendly reading and add backward capability section. (#193) * Update index.md (#194) Update index.md * change the sampleDir setting in test cases (#196) * return status=2 when all tasks finished. (#182) * return status=2 when all tasks finished. * Change release verion to 6.3.6 (#200) * Change release verion to 6.3.6 * Fix flaky tests (#204) * Fix flaky tests * Rename modinput to mod_input to avoid functional test fail * when hit an exception, make sure egx logs it to files * Fix circleci status badges (#208) * Clean up and interval error (#211) * Fix generatorWorks not work issue (#207) * Fix navigation error with installed with splunk stream (#214) * add metric_httpevent plugin * update log content in metric_httpevent * add doc for metric_httpevent * Add 3rd lib in app (#210) * Add 3rd lib in app * Bugfix/197 multiprocess not working (#218) * Fix issue 197 multiprocess not working * fix issue 219 (#220) * define httpevent_core and add fix eventgen-httpeventout log handler * restore samples file * restore unit test file * add metric_httpevent plugin * update log content in metric_httpevent * add doc for metric_httpevent * define httpevent_core and add fix eventgen-httpeventout log handler * restore samples file * restore unit test file * Add license credits (#222) * Feature/multi indexes (#224) * Fix jinja template bug under SA-Eventgen app * Update docs (#146) * Updated docs, added release notes to changelog * Bump version * add python code style lint and format * Add more unit tests for config module * Pep8 (#151) * Format to code standards, addressed linter errors/warnings * add extendIndexes feature * set extendIndexes as a list value * upate doc, change num to weight * calculate generate rate and use sequential index replacement * update dockerfile * randomize index replacement in each batch * Fix jinja template bug under SA-Eventgen app * Update docs (#146) * Updated docs, added release notes to changelog * add python code style lint and format * Add more unit tests for config module * calculate generate rate and use sequential index replacement * update dockerfile * randomize index replacement in each batch * fix no len dict out of range * clean duplicate code * Revert "Metrics output plugin" (#226) * [issue 217]disable logging queue in multiprocess mode (#223) * disable logging queue in multiprocess mode * Fixed fileName (#229) * Issue 201 (#221) * Removing unavailable servers * Removed nonresponding httpserver * Added a counter * fixed a bug * Added docs and spec * Addressing unused var * fixing url reference * fixed currentreadsize * Httpevent str formatting * fix #166 (#192) * Change release version to 6.4.0 and add release notes (#232) * fix modinput init error (#234)
Describe the bug
Backfill with perdayvolume does not generate the proper amount of data. For example, if backfill is the past 7 days and perdayvolume is 30gb, it generates 49gb instead of 30*7=210gb.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Should output 30gb for each day.
Actual behavior
for the past 7 days, first 6 days don't even come close to the expected output. The last day peaks to produce more output.
Screenshots
Sample files and eventgen.conf file
Please attach your sample files and eventgen conf file
Do you run eventgen with SA-eventgen?
No
If you are using SA-Eventgen with Splunk (please complete the following information):
6.3.4
If you are using eventgen with pip module mode (please complete the following information):
eventgenx container through Orca
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: