Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API flag for whether an AQL queue should be allocated in device memory. #269

Open
benvanik opened this issue Dec 16, 2024 · 6 comments
Assignees

Comments

@benvanik
Copy link
Contributor

In 3baaa6e @saleelk added the ability to allocate an AQL queue in device memory via the HSA_ALLOCATE_QUEUE_DEV_MEM environment variable. I'd like to be able to use this programmatically with a queue creation flag as some of my queues are better in system memory and others in device memory (particularly those where I'm producing for the queue on the same device consuming it). In addition, as a middleware layer I don't control the environment and options that are set via them are generally not accessible.

hsa_queue_create doesn't seem to have a flags arg and I'm not sure of the best route for adding one such that creation could be controlled. Maybe an hsa_amd_queue_create?

@saleelk
Copy link
Contributor

saleelk commented Dec 16, 2024

yeah, I got dragged elsewhere and sort of paused on this end. Yes, we would need a new API that takes flags. I had a conversation with ROCr owner about this, but need to start that over again

@dayatsin-amd dayatsin-amd self-assigned this Dec 16, 2024
@benvanik
Copy link
Contributor Author

benvanik commented Dec 16, 2024

That'd be wonderful! I haven't yet gotten my code working with it as I'm getting GPU faults when HSA_ALLOCATE_QUEUE_DEV_MEM=1, but that's likely my issue (or an issue with some of my queues needing to be in host memory and some in device memory). Thanks for pushing on this: I'm round-tripping through system memory for queue operations and it's taking an eternity :)

@saleelk
Copy link
Contributor

saleelk commented Dec 17, 2024

i've looped you in on an internal chat

@atgutier
Copy link
Contributor

atgutier commented Jan 6, 2025

@saleelk or @dayatsin-amd are either of you taking a look at this? If not, I can create a PR for this.

@benvanik
Copy link
Contributor Author

benvanik commented Jan 6, 2025

Related to this, currently the amd_queue_t ends up in host memory and only the ringbuffer is going into device memory with the flag enabled. The queue struct is hot (write_dispatch_id/read_dispatch_id, etc) and we want that colocated with the ringbuffer in device memory as well.

@atgutier
Copy link
Contributor

@saleelk I'm taking a look at this and it seems fairly straightforward to add an API-based queue allocation mechanism for getting the queue buf in dev mem, additionally, using the same finegrain allocation method seems to work for the queue struct itself.

If I create a PR to support this that removes the ENV variable would that be ok? Or are you already relying on the ENV variable? If so I can just mark it deprecated so future users rely on the API and not the ENV var.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants