Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add minionInstanceTag config to minion-tasks for resource isolation #12459

Merged

Conversation

tibrewalpratik17
Copy link
Contributor

@tibrewalpratik17 tibrewalpratik17 commented Feb 21, 2024

Iabel:

  • feature
  • release-notes (**)

Currently, all minion-scheduled jobs run on minion_untagged node instances. This doesn't allow isolation among tables / tasks at minion level.
Adhoc-minion-tasks already acknowledge a task-config minionInstanceTag and schedule those adhoc-runs only on those set of instances. Extending the same for periodic-scheduled tasks.

Resolving some TODOs / bugs in this PR itself where if generateTasks schedules tasks for some tables we still end up emitting a failed metric for all the tables. This PR makes the change of iterating through all tables and emitting metrics for each one of them.

Now, instead of generating a single parent task at taskType level, the change generates a single parent task at taskType, minionInstanceTag level.

Example config:

"task": {
      "taskTypeConfigsMap": {
        "UpsertCompactionTask": {
          "invalidRecordsThresholdCount": "10000",
          "bufferTimePeriod": "0d",
          "schedule": "0 * * * * ?",
          "tableMaxNumTasks": "15",
          "minionInstanceTag": "dummy_MINION",
        }
      }
    },

Here "minionInstanceTag": "dummy_MINION", is pertaining to the change in this patch.

@codecov-commenter
Copy link

codecov-commenter commented Feb 21, 2024

Codecov Report

Attention: Patch coverage is 4.91803% with 58 lines in your changes are missing coverage. Please review.

Project coverage is 61.57%. Comparing base (59551e4) to head (3479383).
Report is 190 commits behind head on master.

Files Patch % Lines
...controller/helix/core/minion/PinotTaskManager.java 0.00% 46 Missing ⚠️
...roller/api/resources/PinotTaskRestletResource.java 0.00% 5 Missing ⚠️
...helix/core/minion/generator/BaseTaskGenerator.java 0.00% 3 Missing ⚠️
...elix/core/minion/generator/TaskGeneratorUtils.java 50.00% 1 Missing and 2 partials ⚠️
...elix/core/minion/generator/PinotTaskGenerator.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12459      +/-   ##
============================================
- Coverage     61.75%   61.57%   -0.19%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2463      +27     
  Lines        133233   134773    +1540     
  Branches      20636    20827     +191     
============================================
+ Hits          82274    82982     +708     
- Misses        44911    45588     +677     
- Partials       6048     6203     +155     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 61.54% <4.91%> (-0.17%) ⬇️
java-21 61.45% <4.91%> (-0.18%) ⬇️
skip-bytebuffers-false 61.55% <4.91%> (-0.19%) ⬇️
skip-bytebuffers-true 61.43% <4.91%> (+33.70%) ⬆️
temurin 61.57% <4.91%> (-0.19%) ⬇️
unittests 61.56% <4.91%> (-0.19%) ⬇️
unittests1 46.16% <ø> (-0.73%) ⬇️
unittests2 27.95% <4.91%> (+0.22%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Jackie-Jiang Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes labels Feb 23, 2024
@Jackie-Jiang Jackie-Jiang requested a review from snleee February 23, 2024 00:50
@tibrewalpratik17 tibrewalpratik17 force-pushed the minion_instance_tag_feature branch from bbd9a7d to 992145b Compare February 29, 2024 10:37
@tibrewalpratik17 tibrewalpratik17 changed the title [WIP] Add minionInstanceTag config to minion-tasks for resource isolation Add minionInstanceTag config to minion-tasks for resource isolation Feb 29, 2024
@tibrewalpratik17 tibrewalpratik17 marked this pull request as ready for review February 29, 2024 20:09
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snleee @swaminathanmanish Please also take a look

@tibrewalpratik17
Copy link
Contributor Author

@snleee can you please review this too?

@ankitsultana
Copy link
Contributor

@snleee @Jackie-Jiang : we'll be working on getting this merged this week, so would be great if you folks can share your feedback.

@tibrewalpratik17 : can you also share what's the test plan for this PR in the description?

Copy link
Contributor

@swaminathanmanish swaminathanmanish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this capability. Looks good overall !

if (tableTaskConfig != null) {
Map<String, String> configs = tableTaskConfig.getConfigsForTaskType(getTaskType());
if (!configs.isEmpty()) {
return configs.getOrDefault(PinotTaskManager.MINION_INSTANCE_TAG_CONFIG,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the user has to manually fill this in, can we add table config validation to ensure that the minion tag configured here has actually been created?

If there are typos, it'll be some work to troubleshoot this, if not validated during config time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's difficult to add in TableConfigUtils as we don't have HelixManager present in that class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It would be ideal to enforce some validation especially given that user has to enter this for every taskConfig. Im curious if we have any validation for table tenant.
Can we keep track of this work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we do in PinotHelixResourceManager class -

for (String tag : tagsToCheck) {
if (getInstancesWithTag(tag).isEmpty()) {
throw new InvalidTableConfigException(
"Failed to find instances with tag: " + tag + " for table: " + tableNameWithType);
}

But I think we should refactor to have it in the TableConfigUtils class. Tracking it here: #12698

@swaminathanmanish
Copy link
Contributor

swaminathanmanish commented Mar 18, 2024

minionInstanceTag

To get more clarity -
@Jackie-Jiang , @snleee, @tibrewalpratik17 - Why should minionInstanceTag be at taskTypeConfigsMap instead of being a table level config ? Do we want isolation at the table level or task level?

@snleee
Copy link
Contributor

snleee commented Mar 19, 2024

@swaminathanmanish +1 on your point. I think that this feature naturally fits better to the tenant config?

@tibrewalpratik17
Copy link
Contributor Author

To get more clarity -
@Jackie-Jiang , @snleee, @tibrewalpratik17 - Why should minionInstanceTag be at taskTypeConfigsMap instead of being a table level config ? Do we want isolation at the table level or task level?
@swaminathanmanish +1 on your point. I think that this feature naturally fits better to the tenant config?

Yes agreed to have a minionTenant concept as a table level config. Based on a discussion here - #11240 (comment) we would also like to give the flexibility to override it as task-level too just in case there are lot of tasks for a table and we want to isolate those too or group similar tasks of different tables in dedicated minion-nodes (may be different SKUs based on tasks).
I raised this patch to quickly give the option of isolating tasks first. Post this, can work on a patch to introduce minionTenant concept in Pinot. Anyways, since this task-level config will have higher precedence over table's tenant config, this change will be backward compatible even then. What do you all think about this?

Also, will address comments shortly and update the patch.

@tibrewalpratik17
Copy link
Contributor Author

hey @snleee @swaminathanmanish addressed comments. Please review!

@swaminathanmanish
Copy link
Contributor

To get more clarity -
@Jackie-Jiang , @snleee, @tibrewalpratik17 - Why should minionInstanceTag be at taskTypeConfigsMap instead of being a table level config ? Do we want isolation at the table level or task level?
@swaminathanmanish +1 on your point. I think that this feature naturally fits better to the tenant config?

Yes agreed to have a minionTenant concept as a table level config. Based on a discussion here - #11240 (comment) we would also like to give the flexibility to override it as task-level too just in case there are lot of tasks for a table and we want to isolate those too or group similar tasks of different tables in dedicated minion-nodes (may be different SKUs based on tasks). I raised this patch to quickly give the option of isolating tasks first. Post this, can work on a patch to introduce minionTenant concept in Pinot. Anyways, since this task-level config will have higher precedence over table's tenant config, this change will be backward compatible even then. What do you all think about this?

Also, will address comments shortly and update the patch.

Thanks a lot @tibrewalpratik17 . Would you be able to create a github issue to track this effort?

Copy link
Contributor

@swaminathanmanish swaminathanmanish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Could you please create github issues to track the additional work?

@tibrewalpratik17
Copy link
Contributor Author

Thanks a lot @tibrewalpratik17 . Would you be able to create a github issue to track this effort?
LGTM. Could you please create github issues to track the additional work?

Tracking all tasks in #12698

Copy link
Contributor

@snleee snleee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you fix the conflicts? once the test all passes, i can merge this.

@@ -615,12 +615,15 @@ public Map<String, String> scheduleTasks(@ApiParam(value = "Task type") @QueryPa
tableName = DatabaseUtils.translateTableName(tableName, headers);
Copy link
Contributor

@vvivekiyer vvivekiyer Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it also make sense to extend this API to take in a minionInstanceTag?
I understand that tasks/execute API already provides that capability.

We plan to use this feature as follows:

  • Say all minion periodic tasks are usually executed in the minion_untagged hosts.
  • If our minion purge tasks are running behind for a table, we plan to deploy new set of hosts with a special tag e.g. minion_emergency, and use the schedule/execute API for one-off scheduling of these jobs on these hosts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this is a good suggestion. Let me do it in a follow-up as this patch has been open from quite some time. Tracking improvements in #12698. Will add this one there.

@tibrewalpratik17
Copy link
Contributor Author

hey @snleee @vvivekiyer can one of you help me merge this?

@tibrewalpratik17 tibrewalpratik17 force-pushed the minion_instance_tag_feature branch from 5732a04 to f5762d9 Compare April 3, 2024 14:30
@vvivekiyer
Copy link
Contributor

Unrelated test failure in PulsarConsumerTest. Merging this PR.

@vvivekiyer vvivekiyer merged commit b825bb4 into apache:master Apr 3, 2024
18 of 19 checks passed
@npawar
Copy link
Contributor

npawar commented May 22, 2024

Has documentation been added @tibrewalpratik17 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature release-notes Referenced by PRs that need attention when compiling the next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants