Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with CUDA 12.x Inference, Performance completely obliterated #71

Open
gilljon opened this issue Dec 13, 2024 · 4 comments
Open

Issue with CUDA 12.x Inference, Performance completely obliterated #71

gilljon opened this issue Dec 13, 2024 · 4 comments

Comments

@gilljon
Copy link

gilljon commented Dec 13, 2024

Steps to reproduce:

pip install span-marker
>>> m_cuda = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super").cuda()
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.75k/6.75k [00:00<00:00, 50.7MB/s]
model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.42G/1.42G [00:21<00:00, 65.9MB/s]
>>> m_cuda.device
device(type='cuda', index=0)
>>> m_cuda.predict("John Smith works at Amazon.")
[]
>>> m_cpu = m_cuda.to("cpu")
>>> m_cpu.predict("John Smith works at Amazon.")
SpanMarker model predictions are being computed on the CPU while CUDA is available. Moving the model to CUDA using `model.cuda()` before performing predictions is heavily recommended to significantly boost prediction speeds.
[{'span': 'John Smith', 'label': 'person-other', 'score': 0.9197737574577332, 'char_start_index': 0, 'char_end_index': 10}, {'span': 'Amazon', 'label': 'organization-company', 'score': 0.9607704877853394, 'char_start_index': 20, 'char_end_index': 26}]

CPU inference yields expected results but CUDA is returning empty for short texts... Not a problem if we have longer texts... why is this?

Output from nvcc --version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

This is running on an A10 but we observe the same results on a T4... Is SpanMarker not compatible with cu12x?

@gilljon gilljon changed the title Issue with CUDA Inference, Performance completely obliterated Issue with CUDA 12.x Inference, Performance completely obliterated Dec 13, 2024
@tomaarsen
Copy link
Owner

Hello!

I've never experienced this discrepancy between CPU and CUDA before, wow. I have no idea what could cause this - I don't touch anything as low level as CUDA, it's all just abstracted by torch.
Here's some sample code of mine:

from span_marker import SpanMarkerModel

model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super").cuda()
print(model.device)
# cuda:0
preds = model.predict("John Smith works at Amazon.")
print(preds)
# [{'span': 'John Smith', 'label': 'person-other', 'score': 0.9197737574577332, 'char_start_index': 0, 'char_end_index': 10}, {'span': 'Amazon', 'label': 'organization-company', 'score': 0.9607704877853394, 'char_start_index': 20, 'char_end_index': 26}]
model = model.to("cpu")
print(model.device)
# cpu
preds = model.predict("John Smith works at Amazon.")
print(preds)
# [{'span': 'John Smith', 'label': 'person-other', 'score': 0.9197738766670227, 'char_start_index': 0, 'char_end_index': 10}, {'span': 'Amazon', 'label': 'organization-company', 'score': 0.9607704877853394, 'char_start_index': 20, 'char_end_index': 26}]

with my nvcc:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:56:38_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.3, V12.3.52
Build cuda_12.3.r12.3/compiler.33281558_0

Perhaps this is an issue with your CUDA or torch installation?

  • Tom Aarsen

@gilljon
Copy link
Author

gilljon commented Dec 14, 2024

Hello!

I've never experienced this discrepancy between CPU and CUDA before, wow. I have no idea what could cause this - I don't touch anything as low level as CUDA, it's all just abstracted by torch. Here's some sample code of mine:

from span_marker import SpanMarkerModel

model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super").cuda()
print(model.device)
# cuda:0
preds = model.predict("John Smith works at Amazon.")
print(preds)
# [{'span': 'John Smith', 'label': 'person-other', 'score': 0.9197737574577332, 'char_start_index': 0, 'char_end_index': 10}, {'span': 'Amazon', 'label': 'organization-company', 'score': 0.9607704877853394, 'char_start_index': 20, 'char_end_index': 26}]
model = model.to("cpu")
print(model.device)
# cpu
preds = model.predict("John Smith works at Amazon.")
print(preds)
# [{'span': 'John Smith', 'label': 'person-other', 'score': 0.9197738766670227, 'char_start_index': 0, 'char_end_index': 10}, {'span': 'Amazon', 'label': 'organization-company', 'score': 0.9607704877853394, 'char_start_index': 20, 'char_end_index': 26}]

with my nvcc:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:56:38_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.3, V12.3.52
Build cuda_12.3.r12.3/compiler.33281558_0

Perhaps this is an issue with your CUDA or torch installation?

  • Tom Aarsen

Thanks for the reply. And yes, this is quite perplexing for my team...

What version of transformers and torch are you running? We are running:

torch==2.4.0
transformers==4.45.2

@tomaarsen
Copy link
Owner

I'm on:

torch==2.5.1+cu124
torch==4.46.3

although I've used other combinations in the past before.

Wow, I just bumped to torch==2.5.1 and I am now getting expected results...

Oh, excellent! Still very strange that 2.4.0 didn't work, that's still a very recent version.

  • Tom Aarsen

@gilljon
Copy link
Author

gilljon commented Dec 14, 2024

Hello!
I've never experienced this discrepancy between CPU and CUDA before, wow. I have no idea what could cause this - I don't touch anything as low level as CUDA, it's all just abstracted by torch. Here's some sample code of mine:

from span_marker import SpanMarkerModel

model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super").cuda()
print(model.device)
# cuda:0
preds = model.predict("John Smith works at Amazon.")
print(preds)
# [{'span': 'John Smith', 'label': 'person-other', 'score': 0.9197737574577332, 'char_start_index': 0, 'char_end_index': 10}, {'span': 'Amazon', 'label': 'organization-company', 'score': 0.9607704877853394, 'char_start_index': 20, 'char_end_index': 26}]
model = model.to("cpu")
print(model.device)
# cpu
preds = model.predict("John Smith works at Amazon.")
print(preds)
# [{'span': 'John Smith', 'label': 'person-other', 'score': 0.9197738766670227, 'char_start_index': 0, 'char_end_index': 10}, {'span': 'Amazon', 'label': 'organization-company', 'score': 0.9607704877853394, 'char_start_index': 20, 'char_end_index': 26}]

with my nvcc:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:56:38_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.3, V12.3.52
Build cuda_12.3.r12.3/compiler.33281558_0

Perhaps this is an issue with your CUDA or torch installation?

  • Tom Aarsen

Thanks for the reply. And yes, this is quite perplexing for my team...

What version of transformers and torch are you running? We are running:

torch==2.4.0
transformers==4.45.2

Wow, I just bumped to torch==2.5.1 and am getting expected results... probably something with the nvidia-cuda Python libraries I had installed but not sure. Might be worth seeing if you can repro with 2.4.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants