-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] add method to kill aws instance to simulate chaos #45546
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,6 +24,7 @@ | |
from dataclasses import dataclass | ||
|
||
import requests | ||
import paramiko | ||
from ray._raylet import Config | ||
|
||
import psutil # We must import psutil after ray because we bundle it with ray. | ||
|
@@ -1533,6 +1534,34 @@ def _kill_resource(self, node_id, node_to_kill_ip, node_to_kill_port): | |
) | ||
self.killed.add(node_id) | ||
|
||
def _kill_node(self, ip): | ||
# This command uses IMDSv2 to get the host instance id and region. | ||
# After that it terminates itself using aws cli. | ||
command = """ | ||
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600") | ||
|
||
instanceId=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/instance-id/) | ||
region=$(curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/placement/region) | ||
|
||
aws ec2 terminate-instances --region $region --instance-ids $instanceId | ||
""" # noqa: E501 | ||
|
||
ssh = paramiko.SSHClient() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do we need paramiko? Maybe I'm wrong but can we just use subprocess with command = "ssh ..."? If we are to add it, I think we need to add it also somewhere for example python/requirements/test-requirements.txt. Apart from ssh, you can also curl the IDMSv2 HTTP ports. Here is some code (not tested)
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure. I will write it in pure ssh. We can't write python code because it's not wrapped in ssh command. |
||
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) | ||
|
||
# This is a feature on Anyscale platform that enables | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we can detect if we are on anyscale and else, skip the test so the test does not fail on local desktop. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's something to add to the test. This PR doesn't change that, but only add the utility method. That's also why I separate this from #45364. I can't make the call and need to talk to other decision makers to follow up on config change. |
||
# easy ssh access to worker nodes. | ||
ssh.connect(ip, username="ray", port=2222) | ||
|
||
stdin, stdout, stderr = ssh.exec_command(command) | ||
output = stdout.read().decode() | ||
error = stderr.read().decode() | ||
|
||
stdin.close() | ||
|
||
print(f"STDOUT:\n{output}") | ||
print(f"STDERR:\n{error}") | ||
|
||
def _kill_raylet(self, ip, port, graceful=False): | ||
import grpc | ||
from grpc._channel import _InactiveRpcError | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.