-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(reports): Set a minimum interval for each report's execution #28176
Conversation
superset/commands/report/base.py
Outdated
if report_type == ReportScheduleType.ALERT | ||
else "REPORT_MINIMUM_INTERVAL_MINUTES" | ||
) | ||
minimum_interval = current_app.config.get(config_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you may want to also perform some validation to make sure that this value is a number, or it will fail below at line 119.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great catch! Just fixed that
superset/commands/report/base.py
Outdated
# Check if cron is set to every minute (*) | ||
# or if it has an every-minute block (1-5) | ||
if "-" in minutes_interval or minutes_interval == "*": | ||
raise ReportScheduleFrequencyNotAllowed(report_type, minimum_interval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically the first arg when raising will be the error message, so this feels a bit out of pattern. I'd suggest that we can either keep the error message very generic, and let the UI get more specific, or pass these in as optional kwargs (but not required).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if the new implementation is better -- happy to further change it as well.
Test the ``validate_report_frequency`` method when there's | ||
only a configuration for alerts and user is creating report. | ||
""" | ||
for schedule in TEST_SCHEDULES: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @pytest.mark.parametrize would work well for you on these for loop tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh, wasn't familiar with it! just added it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried about manipulating the crontab as a string to figure out the shortest interval — it can miss edge cases, and it forces constraints to the user due to the implementation details (in this case, the interval has to be in minutes, and has a maximum value of 60, which are artificial requirements).
An alternative solution would be to allow any minimum interval between reports:
ALERT_MINIMUM_INTERVAL_MINUTES = timedelta(seconds=90)
Or, as it seems to be more common for intervals in config.py
:
ALERT_MINIMUM_INTERVAL_MINUTES = int(timedelta(seconds=90).total_seconds())
Then you can compute a few execution times programmatically and see if it violates the config (untested):
def is_valid_schedule(cron_schedule: str, config_value: int) -> bool:
it = croniter(cron_schedule)
previous = next(it)
for _ in range(1000): # how many? I have no idea...
current = next(it)
interval, previous = current - previous, current
if interval.total_seconds() < config_value:
return False
return True
hey @betodealmeida that was the route I was initially going for too, but "how many repetitions?" (in the I can see benefits and concerns in both routes:
Since I believe most Orgs would like to limit this in the minutes range (like at least 5/10 minutes interval), I thought it made more sense to go with the first route. But I'm happy to migrate to the second approach -- we could start with a loop with |
thanks for the feedback, @eschutho 🙏 I'll put this on hold until we align on best route to move forward, before working on these. |
@Vitor-Avila how do these setting reflected in the UI given one can specifying whatever cron schedule they desire? |
@john-bodley I haven't implemented any UI validation, so the user would face the toast error message when trying to save: |
36c72c5
to
c8b30f3
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #28176 +/- ##
==========================================
+ Coverage 60.48% 64.10% +3.61%
==========================================
Files 1931 521 -1410
Lines 76236 37214 -39022
Branches 8568 0 -8568
==========================================
- Hits 46114 23855 -22259
+ Misses 28017 13359 -14658
+ Partials 2105 0 -2105
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
logger.error( | ||
"Invalid value for %s: %s", config_key, minimum_interval, exc_info=True | ||
) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to avoid raising a ValidationError
here to prevent blocking users from creating alerts/reports until the configuration is fixed by an admin. Open to feedback, tho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I just left a couple suggestions.
superset/config.py
Outdated
ALERT_MINIMUM_INTERVAL = 0 | ||
REPORT_MINIMUM_INTERVAL = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this would be clearer about the expected units of ALERT_MINIMUM_INTERVAL
and REPORT_MINIMUM_INTERVAL
.
ALERT_MINIMUM_INTERVAL = 0 | |
REPORT_MINIMUM_INTERVAL = 0 | |
ALERT_MINIMUM_INTERVAL = int(timedelta(seconds=0).total_seconds()) | |
REPORT_MINIMUM_INTERVAL = int(timedelta(seconds=0).total_seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh that's a great suggestion! I think I'll use int(timedelta(minutes=0).total_seconds())
since seconds are not really exposed in the report configuration, but it's def better than 0
. 🙌
SUMMARY
It's currently possible to create alerts and reports configured to be executed every minute. While it might be a suitable implementation for some use-cases, some users might create assets with a more recurrent frequency needed, leading to additional load to the DB which can cause slowness, timeouts, etc.
This PR introduces two configs that can be specified in
superset_config.py
:This minimum interval is validated for both creation of new reports and also the modification of existing ones.
Note that this PR doesn't include a migration, so changing this config won't automatically change existing reports.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
No UI changes.
TESTING INSTRUCTIONS
Both integration and unit tests added. For manual testing:
ADDITIONAL INFORMATION