Add lifecycle hooks for cloudmap (de)registration operations on vtgate pods #3934

VinaySagarGonabavi · 2024-08-22T05:50:11Z

Added the scripts to the docker image in https://github.yelpcorp.com/docker-images/vitess_base/pull/21
AWS role and policy added for namespace pods in https://github.yelpcorp.com/misc/terraform-code/pull/24481

nemacysts

are there any notes about y'all's discussion with CIC about potentially integrating vitess with our mesh? using hooks for service discovery registration/deregistration makes me a little nervous (e.g., what happens in cloudmap if a k8s node goes offline in a way where the prestop hooks for vitess pods don't run?)

nemacysts · 2024-08-28T15:04:31Z

paasta_tools/vitesscluster_tools.py

+    def get_aws_region(self) -> str:
+        region = self.get_region()
+        return f"{region[:2]}-{region[2:6]}-{region[6:7]}"


this seems a tad brittle as this slicing won't work for all AWS regions - it's probably fine for where vitess will be deployed initially, but perhaps we can have the region identification happen inside the script itself?

we can have the region identification happen inside the script itself?
Can you explain this? Or do you mean pass it from yelpsoa?

@VinaySagarGonabavi my bad, i meant that the register/deregister script could read /nail/etc/habitat and figure out the region from there - in general, it's probably better to have the script be somewhat independent of how paasta is configuring pods since it's easier/faster to roll out an image change for your service than to do anything else (e.g., you can avoid having to go through a paasta release if the script is called with no or minimal arguments)

paasta_tools/vitesscluster_tools.py

VinaySagarGonabavi · 2024-09-06T03:03:02Z

Reopening this as we moved back to a single VitessCluster setup in #3947

VinaySagarGonabavi · 2024-09-06T03:08:44Z

are there any notes about y'all's discussion with CIC about potentially integrating vitess with our mesh? using hooks for service discovery registration/deregistration makes me a little nervous (e.g., what happens in cloudmap if a k8s node goes offline in a way where the prestop hooks for vitess pods don't run?)

Notes from a few rounds of discussion with CIC/Service Mesh on this are noted in https://jira.yelpcorp.com/browse/DREIMP-10901 and the last discussion with CIPX is in https://yelp.slack.com/archives/C060C8L80LE/p1723597193236439?thread_ts=1723511315.755149&cid=C060C8L80LE

For the cases of when a pod goes away before it could deregister itself from cloudmap, we're planning to handle with an external monitoring check to query the exposed endpoint /debug/health from vtgate service on each pod ip registered in a cloudmap service. If either the pod is not responsive or the health check fails, it can be handled with an auto remediation step of deregistering the ip. It's not ideal but as Krall pointed out if the external monitoring job runs frequently enough, this case can be handled

…e pods

VinaySagarGonabavi requested review from jfongatyelp, flopex and matt-ullmer August 22, 2024 05:50

VinaySagarGonabavi force-pushed the u/gonabavi/u/gonabavi/DREIMP-10901_add_cloudmap_scripts_add_lifecycle_hooks_for_vtgate_pods branch from 665c15f to 93adf70 Compare August 22, 2024 17:18

VinaySagarGonabavi requested review from alexanderjulianmartinez and alex-eid and removed request for alex-eid August 22, 2024 17:27

flopex approved these changes Aug 22, 2024

View reviewed changes

nemacysts reviewed Aug 28, 2024

View reviewed changes

nemacysts requested a review from ilkinmammadzada August 28, 2024 15:06

VinaySagarGonabavi requested a review from nemacysts August 28, 2024 22:17

VinaySagarGonabavi closed this Aug 30, 2024

VinaySagarGonabavi reopened this Sep 6, 2024

VinaySagarGonabavi added 2 commits September 5, 2024 20:09

Add lifecycle hooks for cloudmap (de)registration operations on vtgat…

ccc9c1f

…e pods

Use the right cloudmap service name for primary tablets

43758bc

VinaySagarGonabavi force-pushed the u/gonabavi/u/gonabavi/DREIMP-10901_add_cloudmap_scripts_add_lifecycle_hooks_for_vtgate_pods branch from 93adf70 to 43758bc Compare September 6, 2024 03:42

nemacysts approved these changes Sep 6, 2024

View reviewed changes

VinaySagarGonabavi merged commit bf2f8cf into master Sep 6, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lifecycle hooks for cloudmap (de)registration operations on vtgate pods #3934

Add lifecycle hooks for cloudmap (de)registration operations on vtgate pods #3934

VinaySagarGonabavi commented Aug 22, 2024

nemacysts left a comment

nemacysts Aug 28, 2024

VinaySagarGonabavi Sep 6, 2024

nemacysts Sep 6, 2024

VinaySagarGonabavi commented Sep 6, 2024

VinaySagarGonabavi commented Sep 6, 2024 •

edited

Loading

Add lifecycle hooks for cloudmap (de)registration operations on vtgate pods #3934

Add lifecycle hooks for cloudmap (de)registration operations on vtgate pods #3934

Conversation

VinaySagarGonabavi commented Aug 22, 2024

nemacysts left a comment

Choose a reason for hiding this comment

nemacysts Aug 28, 2024

Choose a reason for hiding this comment

VinaySagarGonabavi Sep 6, 2024

Choose a reason for hiding this comment

nemacysts Sep 6, 2024

Choose a reason for hiding this comment

VinaySagarGonabavi commented Sep 6, 2024

VinaySagarGonabavi commented Sep 6, 2024 • edited Loading

VinaySagarGonabavi commented Sep 6, 2024 •

edited

Loading