Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster creation with existing FSx throws traceback #935

Closed
wleepang opened this issue Mar 15, 2019 · 2 comments
Closed

cluster creation with existing FSx throws traceback #935

wleepang opened this issue Mar 15, 2019 · 2 comments

Comments

@wleepang
Copy link

Environment:

  • AWS ParallelCluster :aws-parallelcluster-2.2.1
  • OS: alinux
  • Scheduler: SGE
  • Master instance type: t2.micro
  • Compute instance type: t2.micro

Bug description and how to reproduce:
Creating a cluster with a pre-existing FSx file system fails with a traceback:

[ec2-user@ip-172-31-41-28 ~]$ pcluster create cromwell
Beginning cluster creation for cluster: cromwell
Traceback (most recent call last):
  File "/usr/bin/pcluster", line 11, in <module>
    load_entry_point('aws-parallelcluster==2.2.1', 'console_scripts', 'pcluster')()
  File "/usr/lib/python2.7/site-packages/pcluster/cli.py", line 354, in main
    args.func(args)
  File "/usr/lib/python2.7/site-packages/pcluster/cli.py", line 27, in create
    pcluster.create(args)
  File "/usr/lib/python2.7/site-packages/pcluster/pcluster.py", line 76, in create
    config = cfnconfig.ParallelClusterConfig(args)
  File "/usr/lib/python2.7/site-packages/pcluster/cfnconfig.py", line 99, in __init__
    self.__init_fsx_parameters()
  File "/usr/lib/python2.7/site-packages/pcluster/cfnconfig.py", line 827, in __init_fsx_parameters
    self.__validate_resource("fsx_fs_id", (value, self.__master_subnet))
  File "/usr/lib/python2.7/site-packages/pcluster/cfnconfig.py", line 281, in __validate_resource
    self.__resource_validator.validate(resource_type, resource_value)
  File "/usr/lib/python2.7/site-packages/pcluster/config_sanity.py", line 517, in validate
    self.__validate_fsx_parameters(resource_type, resource_value)
  File "/usr/lib/python2.7/site-packages/pcluster/config_sanity.py", line 195, in __validate_fsx_parameters
    self.__check_fsx_fs_id(ec2, fsx, resource_value)
  File "/usr/lib/python2.7/site-packages/pcluster/config_sanity.py", line 173, in __check_fsx_fs_id
    "inbound and outbound TCP traffic from 0.0.0.0/0 through port 988." % resource_value[0]
TypeError: __fail() takes exactly 2 arguments (1 given)

Additional context:

Config file is:

[aws]

[cluster cromwell]
vpc_settings = public
key_name = ********
initial_queue_size = 2
maintain_initial_size = true
fsx_settings = fs

[fsx fs]
shared_dir = /cromwell_root
fsx_fs_id = fs-0f0ddbaf5d3781422

[vpc public]
master_subnet_id = subnet-c264e1a5
vpc_id = vpc-461dcc3c

[global]
update_check = true
sanity_check = true
cluster_template = cromwell

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

The FSx Lustre file system is in us-east-1 where the cluster is created.

@sean-smith
Copy link
Contributor

@wleepang This is because the default base_os = alinux which isn't currently supported. Fix it by switching to centos:

[cluster cromwell]
...
base_os = centos7

I know this is confusing, so I wrote validation to check before cluster create, in #904 but it hasn't been released yet.

Also stayed tuned for amazon linux support in the very near future ;-).

@wleepang
Copy link
Author

@sean-smith - Thanks! I did eventually figure this out by reading a blog by Jiawei Zhuang. Looking forward to the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants