-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add minionInstanceTag config to minion-tasks for resource isolation #12459
Add minionInstanceTag config to minion-tasks for resource isolation #12459
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #12459 +/- ##
============================================
- Coverage 61.75% 61.57% -0.19%
+ Complexity 207 198 -9
============================================
Files 2436 2463 +27
Lines 133233 134773 +1540
Branches 20636 20827 +191
============================================
+ Hits 82274 82982 +708
- Misses 44911 45588 +677
- Partials 6048 6203 +155
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
bbd9a7d
to
992145b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snleee @swaminathanmanish Please also take a look
...roller/src/main/java/org/apache/pinot/controller/api/resources/PinotTaskRestletResource.java
Outdated
Show resolved
Hide resolved
...controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotTaskManager.java
Outdated
Show resolved
Hide resolved
...controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotTaskManager.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Xiaotian (Jackie) Jiang <[email protected]>
@snleee can you please review this too? |
@snleee @Jackie-Jiang : we'll be working on getting this merged this week, so would be great if you folks can share your feedback. @tibrewalpratik17 : can you also share what's the test plan for this PR in the description? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this capability. Looks good overall !
...controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotTaskManager.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/pinot/controller/helix/core/minion/generator/BaseTaskGenerator.java
Outdated
Show resolved
Hide resolved
...controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotTaskManager.java
Outdated
Show resolved
Hide resolved
if (tableTaskConfig != null) { | ||
Map<String, String> configs = tableTaskConfig.getConfigsForTaskType(getTaskType()); | ||
if (!configs.isEmpty()) { | ||
return configs.getOrDefault(PinotTaskManager.MINION_INSTANCE_TAG_CONFIG, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the user has to manually fill this in, can we add table config validation to ensure that the minion tag configured here has actually been created?
If there are typos, it'll be some work to troubleshoot this, if not validated during config time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's difficult to add in TableConfigUtils as we don't have HelixManager present in that class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. It would be ideal to enforce some validation especially given that user has to enter this for every taskConfig. Im curious if we have any validation for table tenant.
Can we keep track of this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we do in PinotHelixResourceManager class -
Lines 1781 to 1785 in af0e5b8
for (String tag : tagsToCheck) { | |
if (getInstancesWithTag(tag).isEmpty()) { | |
throw new InvalidTableConfigException( | |
"Failed to find instances with tag: " + tag + " for table: " + tableNameWithType); | |
} |
But I think we should refactor to have it in the TableConfigUtils class. Tracking it here: #12698
...rc/test/java/org/apache/pinot/integration/tests/MergeRollupMinionClusterIntegrationTest.java
Show resolved
Hide resolved
To get more clarity - |
@swaminathanmanish +1 on your point. I think that this feature naturally fits better to the |
...controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotTaskManager.java
Outdated
Show resolved
Hide resolved
...controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotTaskManager.java
Outdated
Show resolved
Hide resolved
...controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotTaskManager.java
Show resolved
Hide resolved
Yes agreed to have a Also, will address comments shortly and update the patch. |
hey @snleee @swaminathanmanish addressed comments. Please review! |
Thanks a lot @tibrewalpratik17 . Would you be able to create a github issue to track this effort? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Could you please create github issues to track the additional work?
Tracking all tasks in #12698 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Can you fix the conflicts? once the test all passes, i can merge this.
@@ -615,12 +615,15 @@ public Map<String, String> scheduleTasks(@ApiParam(value = "Task type") @QueryPa | |||
tableName = DatabaseUtils.translateTableName(tableName, headers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it also make sense to extend this API to take in a minionInstanceTag
?
I understand that tasks/execute
API already provides that capability.
We plan to use this feature as follows:
- Say all minion periodic tasks are usually executed in the minion_untagged hosts.
- If our minion purge tasks are running behind for a table, we plan to deploy new set of hosts with a special tag e.g.
minion_emergency
, and use the schedule/execute API for one-off scheduling of these jobs on these hosts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this is a good suggestion. Let me do it in a follow-up as this patch has been open from quite some time. Tracking improvements in #12698. Will add this one there.
...controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotTaskManager.java
Show resolved
Hide resolved
hey @snleee @vvivekiyer can one of you help me merge this? |
5732a04
to
f5762d9
Compare
Unrelated test failure in |
Has documentation been added @tibrewalpratik17 ? |
Iabel:
feature
release-notes
(**)Currently, all minion-scheduled jobs run on
minion_untagged
node instances. This doesn't allow isolation among tables / tasks at minion level.Adhoc-minion-tasks already acknowledge a task-config
minionInstanceTag
and schedule those adhoc-runs only on those set of instances. Extending the same for periodic-scheduled tasks.Resolving some TODOs / bugs in this PR itself where if
generateTasks
schedules tasks for some tables we still end up emitting a failed metric for all the tables. This PR makes the change of iterating through all tables and emitting metrics for each one of them.Now, instead of generating a single parent task at taskType level, the change generates a single parent task at taskType, minionInstanceTag level.
Example config:
Here
"minionInstanceTag": "dummy_MINION",
is pertaining to the change in this patch.