-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOSGCP-83 and DAOSGCP-84 Automatic format storage with pool and container creation that supports ACLs #40
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you still need to provide a means by which clients who do not have a daos image can mount daos.
In a classic NFS world. You would need two outputs: install-nfs (installing nfs-common drivers) and (mount).
In DAOS, you would need to:
- install DAOS rpms
- copy all the yamls
- launch daos services
Then later, mount daos using dfuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lsitkiew There are a few extra changes in this ticket for 84.
- Please put changes to pool types into an 83 workstreams. 83 will being in a full complete data type with all the things. Better to get this landed and build 83 work on top.
- I am not sure about the move to put the daso_agent starting in the client startup script. I think there is logic today that calls as is today in some use cases. Can we revert this change for now? or ????
- Io500 autopool seems like a good idea. Can we wait for 83 to fully roll out all the syntax default pools need to go in the examples and io500 and keep this pull request just to the functionality? There are default policy to discuss.
- Please put io500 server instances changes in their own ticket so they can be reviewed in context.
This patch looks close thanks for the quick rebase.
terraform/modules/daos_server/templates/daos_startup_script.tftpl
Outdated
Show resolved
Hide resolved
terraform/modules/daos_server/templates/daos_startup_script.tftpl
Outdated
Show resolved
Hide resolved
We decided to combine changes from DAOSGCP-83 and DAOSGCP-84 into this single PR. |
…ainer creation that supports ACLs Signed-off-by: Łukasz Sitkiewicz <[email protected]>
I rebased changes on top of current develop branch. |
Signed-off-by: Łukasz Sitkiewicz <[email protected]>
Signed-off-by: Łukasz Sitkiewicz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are 2 major issue:
- Showing "reclaim:disabled" to any user is a no-go for me.
This is a DEV only feature of DAOS. - There new pool type is in 3? spots. Lets find a way to have a single pool type.
- There is a new script called pool_cont_create but it is also "Formats" daos.
Minor issue:
After reading the patch I don't know how to use the ACL feature. I think I add my user name in some spots but I will have to read the DAOS manual.
There are also sever another things outside of 84/83 in this patch but lets just leave them here for now...
# tier_ratio = 3 | ||
# acls = [ | ||
# "A::OWNER@:rwdtTaAo", | ||
# "A:G:GROUP@:rwtT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to show a actual user name in this place. It is not clear to me where to add my user name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a comment with example would be better?
@@ -38,4 +38,6 @@ systemctl start daos_server | |||
systemctl enable daos_agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine I will retest. My testing need a few seconds of sleep on a single server for DMG commands to not error out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put the 5 seconds sleep back. dmg cannot talk for a few seconds and the logic of startup gets messed up.
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: Created symlink from /etc/systemd/system/multi-user.target.wants/daos_agent.service to /u
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + systemctl start daos_agent
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + set -x
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + echo 'BEGIN: DAOS server format'
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: BEGIN: DAOS server format
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + [[ daos-server-0001 == \d\a\o\s-\s\e\r\v\e\r-\0\0\0\1 ]]
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + dmg network scan
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + grep --fixed-strings daos-server-0001
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: ERROR: dmg: 1 host had errors
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: daos-server-0001 the server at daos-server-0001:10001 refused the connection
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + echo 'All DAOS Servers started'
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: All DAOS Servers started
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + echo 'Formatting storage on servers: daos-server-0001'
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: Formatting storage on servers: daos-server-0001
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + dmg storage format
May 13 06:56:05 daos-server-0001 google_metadata_script_runner[1371]: startup-script: Format Summary:
May 13 06:56:05 daos-server-0001 google_metadata_script_runner[1371]: startup-script: Hosts SCM Devices NVMe Devices
May 13 06:56:05 daos-server-0001 google_metadata_script_runner[1371]: startup-script: ----- ----------- ------------
May 13 06:56:05 daos-server-0001 google_metadata_script_runner[1371]: startup-script: daos-server-0001 1 2
May 13 06:56:05 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + dmg system query -v
May 13 06:56:37 daos-server-0001 google_metadata_script_runner[1371]: startup-script: Rank UUID Control Address Fault Domain State Reas
May 13 06:56:37 daos-server-0001 google_metadata_script_runner[1371]: startup-script: ---- ---- --------------- ------------ ----- ----
May 13 06:56:37 daos-server-0001 google_metadata_script_runner[1371]: startup-script: 0 d1745607-575d-4295-b4e8-37d60e60b11e 10.128.0.54:10001 /daos-server-0001 Joined
May 13 06:56:37 daos-server-0001 google_metadata_script_runner[1371]: startup-script:
May 13 06:56:37 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + echo 'Done formating DAOS server'
Signed-off-by: Mark A. Olson <[email protected]>
Signed-off-by: Mark A. Olson <[email protected]>
Signed-off-by: Łukasz Sitkiewicz <[email protected]>
Signed-off-by: Łukasz Sitkiewicz <[email protected]>
Signed-off-by: Łukasz Sitkiewicz <[email protected]>
Signed-off-by: Łukasz Sitkiewicz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After testing with and without the HPC Toolkit this PR looks good to me.
Signed-off-by: Mark A. Olson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please land this. It was testing well for me yesterday.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has been tested with and without the HPC Toolkit. It's working as intended. Excellent work @lsitkiew
No description provided.