Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package: add rdma-core #223

Merged
merged 2 commits into from
Nov 6, 2024
Merged

Conversation

ytsssun
Copy link
Contributor

@ytsssun ytsssun commented Oct 26, 2024

Issue number:

Related to bottlerocket-os/bottlerocket#1031

Description of changes:
Add rdma-core package to core-kit. This is part of the effort to fully support EFA in Bottlerocket first party AMI. We will use the helper programs provided by rdma-core for better troubleshoot and logging experience (will be added in follow up PR).

  • Add helper program ibv_devices and ibv_devinfo
  • Add minimum required libraries and drivers in libibverbs.
  • Add logdog entries for rdma-core.

Testing done:

  • Built aws-k8s-1.28-nvidia and tested the helper binary.
bash-5.1# ibv_devinfo
hca_id: efa_0
        transport:                      unspecified (4)
        fw_ver:                         0.0.0.0
        node_guid:                      0000:0000:0000:0000
        sys_image_guid:                 0000:0000:0000:0000
        vendor_id:                      0x1d0f
        vendor_part_id:                 61344
        hw_ver:                         0xEFA0
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x01
                        link_layer:             Unspecified


bash-5.1# ibv_devices
    device                 node GUID
    ------              ----------------
    efa_0               0000000000000000
  • Test logdog entries.
    logdog command indicates the rdma-core related logs are produced
[root@admin]# sheltie logdog
....
Checking: /usr/share/logdog.d/logdog.rdma.conf
....
Running: exec ibv_devinfo.log ibv_devinfo
Running: glob /sys/class/infiniband/*/device/p2p
Running: glob /sys/class/infiniband/*/ports/1/hw_counters/*
logs are at: /var/log/support/bottlerocket-logs.tar.gz

Unzipped the log file and verified the contents:

[root@admin]# tar xzf /.bottlerocket/rootfs/var/log/support/bottlerocket-logs.tar.gz
[root@admin]# cd bottlerocket-logs/

[root@admin]# cat ibv_devinfo.log
hca_id: efa_0
        transport:                      unspecified (4)
        fw_ver:                         0.0.0.0
        node_guid:                      0000:0000:0000:0000
        sys_image_guid:                 0000:0000:0000:0000
        vendor_id:                      0x1d0f
        vendor_part_id:                 61344
        hw_ver:                         0xEFA0
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x01
                        link_layer:             Unspecified

[root@admin]# cat sys/class/infiniband/efa_0/device/p2p

[root@admin]# cat sys/class/infiniband/efa_0/ports/1/hw_counters/
lifespan               rdma_read_resp_bytes   rdma_read_wrs          rdma_write_recv_bytes  rdma_write_wrs         recv_wrs               rx_drops               send_bytes             tx_bytes
rdma_read_bytes        rdma_read_wr_err       rdma_write_bytes       rdma_write_wr_err      recv_bytes             rx_bytes               rx_pkts                send_wrs               tx_pkts

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

packages/rdma-core/rdma-core.spec Outdated Show resolved Hide resolved
packages/rdma-core/rdma-core.spec Outdated Show resolved Hide resolved
packages/rdma-core/rdma-core.spec Outdated Show resolved Hide resolved
packages/rdma-core/rdma-core.spec Show resolved Hide resolved
packages/rdma-core/rdma-core.spec Show resolved Hide resolved
packages/rdma-core/rdma-core.spec Outdated Show resolved Hide resolved
packages/rdma-core/rdma-core.spec Outdated Show resolved Hide resolved
packages/rdma-core/rdma-core.spec Outdated Show resolved Hide resolved
packages/rdma-core/rdma-core.spec Outdated Show resolved Hide resolved
packages/rdma-core/rdma-core.spec Outdated Show resolved Hide resolved
packages/rdma-core/logdog.rdma.conf Outdated Show resolved Hide resolved
packages/rdma-core/rdma-core.spec Outdated Show resolved Hide resolved
packages/rdma-core/rdma-core.spec Show resolved Hide resolved
@ytsssun ytsssun force-pushed the br-efa-support branch 2 times, most recently from d288737 to 4c6d72c Compare November 2, 2024 00:56
@ytsssun ytsssun marked this pull request as ready for review November 2, 2024 07:12
- Add minimum required libraries and drivers in libibverbs.
- Add helper program ibv_devices and ibv_devinfo

Signed-off-by: Yutong Sun <[email protected]>
%{cross_cmake} . \
-DNO_PYVERBS=1 \
-DNO_MAN_PAGES=1 \
-DCMAKE_BUILD_TYPE=Release \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you don't need to set CMAKE_BUILD_TYPE=Release, it is already set by cross_cmake

@arnaldo2792 arnaldo2792 merged commit 1196e2f into bottlerocket-os:develop Nov 6, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants