Issue with CUDA 12.x Inference, Performance completely obliterated #71

gilljon · 2024-12-13T20:40:56Z

Steps to reproduce:

pip install span-marker

>>> m_cuda = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super").cuda()
config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.75k/6.75k [00:00<00:00, 50.7MB/s]
model.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.42G/1.42G [00:21<00:00, 65.9MB/s]
>>> m_cuda.device
device(type='cuda', index=0)
>>> m_cuda.predict("John Smith works at Amazon.")
[]
>>> m_cpu = m_cuda.to("cpu")
>>> m_cpu.predict("John Smith works at Amazon.")
SpanMarker model predictions are being computed on the CPU while CUDA is available. Moving the model to CUDA using `model.cuda()` before performing predictions is heavily recommended to significantly boost prediction speeds.
[{'span': 'John Smith', 'label': 'person-other', 'score': 0.9197737574577332, 'char_start_index': 0, 'char_end_index': 10}, {'span': 'Amazon', 'label': 'organization-company', 'score': 0.9607704877853394, 'char_start_index': 20, 'char_end_index': 26}]

CPU inference yields expected results but CUDA is returning empty for short texts... Not a problem if we have longer texts... why is this?

Output from nvcc --version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

This is running on an A10 but we observe the same results on a T4... Is SpanMarker not compatible with cu12x?

The text was updated successfully, but these errors were encountered:

tomaarsen · 2024-12-14T08:22:20Z

Hello!

I've never experienced this discrepancy between CPU and CUDA before, wow. I have no idea what could cause this - I don't touch anything as low level as CUDA, it's all just abstracted by torch.
Here's some sample code of mine:

from span_marker import SpanMarkerModel

model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super").cuda()
print(model.device)
# cuda:0
preds = model.predict("John Smith works at Amazon.")
print(preds)
# [{'span': 'John Smith', 'label': 'person-other', 'score': 0.9197737574577332, 'char_start_index': 0, 'char_end_index': 10}, {'span': 'Amazon', 'label': 'organization-company', 'score': 0.9607704877853394, 'char_start_index': 20, 'char_end_index': 26}]
model = model.to("cpu")
print(model.device)
# cpu
preds = model.predict("John Smith works at Amazon.")
print(preds)
# [{'span': 'John Smith', 'label': 'person-other', 'score': 0.9197738766670227, 'char_start_index': 0, 'char_end_index': 10}, {'span': 'Amazon', 'label': 'organization-company', 'score': 0.9607704877853394, 'char_start_index': 20, 'char_end_index': 26}]

with my nvcc:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:56:38_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.3, V12.3.52
Build cuda_12.3.r12.3/compiler.33281558_0

Perhaps this is an issue with your CUDA or torch installation?

Tom Aarsen

gilljon · 2024-12-14T08:33:07Z

Hello!

I've never experienced this discrepancy between CPU and CUDA before, wow. I have no idea what could cause this - I don't touch anything as low level as CUDA, it's all just abstracted by torch. Here's some sample code of mine:

from span_marker import SpanMarkerModel

model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super").cuda()
print(model.device)
# cuda:0
preds = model.predict("John Smith works at Amazon.")
print(preds)
# [{'span': 'John Smith', 'label': 'person-other', 'score': 0.9197737574577332, 'char_start_index': 0, 'char_end_index': 10}, {'span': 'Amazon', 'label': 'organization-company', 'score': 0.9607704877853394, 'char_start_index': 20, 'char_end_index': 26}]
model = model.to("cpu")
print(model.device)
# cpu
preds = model.predict("John Smith works at Amazon.")
print(preds)
# [{'span': 'John Smith', 'label': 'person-other', 'score': 0.9197738766670227, 'char_start_index': 0, 'char_end_index': 10}, {'span': 'Amazon', 'label': 'organization-company', 'score': 0.9607704877853394, 'char_start_index': 20, 'char_end_index': 26}]

with my nvcc:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:56:38_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.3, V12.3.52
Build cuda_12.3.r12.3/compiler.33281558_0

Perhaps this is an issue with your CUDA or torch installation?

Tom Aarsen

Thanks for the reply. And yes, this is quite perplexing for my team...

What version of transformers and torch are you running? We are running:

torch==2.4.0
transformers==4.45.2

tomaarsen · 2024-12-14T08:39:14Z

I'm on:

torch==2.5.1+cu124
torch==4.46.3

although I've used other combinations in the past before.

Wow, I just bumped to torch==2.5.1 and I am now getting expected results...

Oh, excellent! Still very strange that 2.4.0 didn't work, that's still a very recent version.

Tom Aarsen

gilljon · 2024-12-14T08:40:00Z

Hello!
I've never experienced this discrepancy between CPU and CUDA before, wow. I have no idea what could cause this - I don't touch anything as low level as CUDA, it's all just abstracted by torch. Here's some sample code of mine:

from span_marker import SpanMarkerModel

model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-roberta-large-fewnerd-fine-super").cuda()
print(model.device)
# cuda:0
preds = model.predict("John Smith works at Amazon.")
print(preds)
# [{'span': 'John Smith', 'label': 'person-other', 'score': 0.9197737574577332, 'char_start_index': 0, 'char_end_index': 10}, {'span': 'Amazon', 'label': 'organization-company', 'score': 0.9607704877853394, 'char_start_index': 20, 'char_end_index': 26}]
model = model.to("cpu")
print(model.device)
# cpu
preds = model.predict("John Smith works at Amazon.")
print(preds)
# [{'span': 'John Smith', 'label': 'person-other', 'score': 0.9197738766670227, 'char_start_index': 0, 'char_end_index': 10}, {'span': 'Amazon', 'label': 'organization-company', 'score': 0.9607704877853394, 'char_start_index': 20, 'char_end_index': 26}]

with my nvcc:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:56:38_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.3, V12.3.52
Build cuda_12.3.r12.3/compiler.33281558_0

Perhaps this is an issue with your CUDA or torch installation?

Tom Aarsen

Thanks for the reply. And yes, this is quite perplexing for my team...

What version of transformers and torch are you running? We are running:

torch==2.4.0
transformers==4.45.2

Wow, I just bumped to torch==2.5.1 and am getting expected results... probably something with the nvidia-cuda Python libraries I had installed but not sure. Might be worth seeing if you can repro with 2.4.0.

gilljon changed the title ~~Issue with CUDA Inference, Performance completely obliterated~~ Issue with CUDA 12.x Inference, Performance completely obliterated Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with CUDA 12.x Inference, Performance completely obliterated #71

Issue with CUDA 12.x Inference, Performance completely obliterated #71

gilljon commented Dec 13, 2024

tomaarsen commented Dec 14, 2024

gilljon commented Dec 14, 2024 •

edited

Loading

tomaarsen commented Dec 14, 2024

gilljon commented Dec 14, 2024

Issue with CUDA 12.x Inference, Performance completely obliterated #71

Issue with CUDA 12.x Inference, Performance completely obliterated #71

Comments

gilljon commented Dec 13, 2024

tomaarsen commented Dec 14, 2024

gilljon commented Dec 14, 2024 • edited Loading

tomaarsen commented Dec 14, 2024

gilljon commented Dec 14, 2024

gilljon commented Dec 14, 2024 •

edited

Loading