Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix rocm get_device name #359

Merged
merged 12 commits into from
Feb 7, 2025
Merged

fix rocm get_device name #359

merged 12 commits into from
Feb 7, 2025

Conversation

divakar-amd
Copy link

@divakar-amd divakar-amd commented Jan 14, 2025

Update:

  • (Not) Using target name (gfx*) and number of compute units for the device name. This distinguished mi308 and mi300. But this won't distinguish MI325
  • Using Market Name
  • renamed mi308 from AMD_Radeon_graphics to MI308X
  • rm (duplicate) MI300X_OAM
  • rename MI325_OAM to MI325X

Problem: Different machines resort to different names. This particularly creates confusion for moe config files.

Using amdsmi api:
product_name for mi308 can result in "MI300X" (see issue-1 below)
market_name seems somewhat better, though not perfect (see issue-2 below)
 
Proposing: use 'market_name' && hard-code names for mi308 & mi300

from amdsmi import *
amdsmi_init()
h = amdsmi_get_processor_handles()[0]
print(amdsmi_get_gpu_asic_info(h))
print(amdsmi_get_gpu_board_info(h))
MI308:
    hjbog2:
        - 'market_name': 'MI308X'
        - 'product_name': 'AMD Instinct MI308X OAM'

    smc300x:
        - 'market_name': 'MI308X'
        - 'product_name': 'AMD Instinct MI300X OAM'  <--- issue-1

    banff-s74:
        - 'market_name': 'AMD Instinct MI308X OAM'  <--- issue-2 | instead of just 'MI308X'
        - 'product_name': 'AMD Instinct MI308X OAM'

MI300:
    s65:
        - 'market_name': 'AMD Instinct MI300X'
        - 'product_name': 'AMD Instinct MI300X OAM'

    s73:
        - 'market_name': 'AMD Instinct MI300X'
        - 'product_name': 'AMD Instinct MI300X OAM'

use 'market_name'
hard-code names for mi308 & mi300
@divakar-amd divakar-amd marked this pull request as ready for review January 20, 2025 18:18
@divakar-amd
Copy link
Author

s79:

{'market_name': 'AMD Instinct MI300X HF', 'vendor_id': '0x1002', 'vendor_name': 'Advanced Micro Devices Inc. [AMD/ATI]', 'subvendor_id': '0x1002', 'device_id': '0x74a9', 'rev_id': '0x00', 'asic_serial': '0x12A9A09B704AD06E', 'oam_id': 2, 'num_compute_units': 304, 'target_graphics_version': 'gfx942'}
 
{'model_number': '102-G30233-00', 'product_serial': '000000000000', 'fru_id': '113-AMDG302330004-100-300000183', 'product_name': 'AMD Instinct MI300XHF OAM', 'manufacturer_name': 'AMD'}

@divakar-amd divakar-amd merged commit 3f610f0 into main Feb 7, 2025
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants