Implement -nvidia=all #1205

iameli · 2019-11-19T22:02:26Z

"Please run this transcoder on all nvidia GPUs available" I think is going to be a pretty common case — it'd be nice if Kubernetes manifests and whatnot could contain a line like-nvidia=all so that I can have the same command work across a variety of hardware.

The text was updated successfully, but these errors were encountered:

iameli · 2019-11-21T03:56:40Z

(I kind of think that doing just -nvidia=0,1,2,3,4,5,6,7,8,9,10,11,12 up to some arbitrarily high limit works just fine though)

EDIT: It does not. It tries to transcode on the card, then comes back with an "invalid ordinal" error.

iameli · 2020-02-05T00:11:13Z

Thinking about this more, I think I'm more in favor of -nvidia=all rather than -nvidia=* to avoid folks accidentally globbing with the *.

iameli · 2020-02-18T02:12:39Z

@j0sh @AbAb1l Any update on this? It's not impossible to work around, but currently we're deploying three different sorts of deployments with different command line parameters to account for different sorts of boxes; this would make things cleaner

iameli · 2021-02-11T18:09:27Z

@ya7ya came up with this which works quite well, perhaps we could do something like this?:

nvidia-smi --query-gpu=index --format=csv,noheader | sed -z 's/\\n/,/g;s/,$/\\n/'

yondonfu · 2021-04-01T22:26:31Z

nvidia-smi --query-gpu=index --format=csv,noheader | sed -z 's/\\n/,/g;s/,$/\\n/' seems to return a new line delimited string:

0
1

nvidia-smi --query-gpu=index --format=csv,noheader | tr "\n" "," | sed 's/,$//' seems to work though:

0,1

That being said, device enumeration seems to be implemented directly using the CUDA API in other programs like ethminer and ffmpeg. I think nvidia-smi should be available on any machine that has a driver installed, but it could be nice to have the device enumeration be baked directly into LPMS to avoid an explicit dependency on an external binary.

I've also noticed that the default behavior for other programs that use Nvidia GPUs like ethminer and t-rex is to enumerate all devices by default if no device IDs are specified. Instead of requiring -nvidia=all to enumerate all device IDs, the behavior could be:

If -nvidia is not specified, Nvidia transcoding is disabled
If -nvidia is specified without any arguments, Nvidia transcoding is enabled and all devices are enumerated
If -nvidia is specified with a comma delimited string, Nvidia transcoding is enabled and the devices specified are used

jailuthra · 2021-04-12T07:31:02Z

Directly using cuda api like ethminer and ffmpeg will be tricky for us. We do not have a direct dependency on nvidia's libs and headers. Ffmpeg loads them internally, but does not provide a way access it externally. We could try loading it directly in LPMS similar to what FFmpeg does via ffnvcodec/dynlink_loader.h - but it will be a pain to setup and test.

it could be nice to have the device enumeration be baked directly into LPMS to avoid an explicit dependency on an external binary.

Good news: Nvidia Management Library (NVML), the underlying lib for nvidia-smi, is relatively straightforward to setup. It might also come in handly later to query utilization or other metrics.

A Cgo wrapper for NVML can be used directly to iterate over available devices. The wrapper links with libnvidia-ml.so.1 - which is shipped with nvidia drivers on Ubuntu and Arch linux. This wrapper worked out-of-the-box during my tests on linux, not sure about Windows yet.

If -nvidia is specified without any arguments, Nvidia transcoding is enabled and all devices are enumerated

👍

yondonfu · 2021-04-12T14:59:16Z

There also seems to be official NVML Go bindings from Nvidia, but that is Linux only and Windows support has not been added yet (sounds like it was supported in an older version of the bindings based on the comments).

darkdarkdragon · 2021-04-12T17:59:25Z

I think that Windows machines with more than one GPU will be rare and certainly Windows will not be used on mining farms - so I think we can support -nvidia=all only on Linux.

jailuthra · 2021-04-13T12:14:25Z

Summarizing the above discussion around NVML. There are 3 go wrappers -
(1) https://github.com/mindprince/gonvml - Community run, directly loads libnvidia-ml.so so I presume no windows support
(2) https://github.com/NVIDIA/go-nvml - Official NVIDIA bindings, dedicated to NVML, but lacks windows supports right now (it seems to be possible to add it later, from the open issue's discussion)
(3) https://github.com/NVIDIA/gpu-monitoring-tools/tree/master/bindings/go/nvml - Official NVIDIA monitoring toolkit for docker/k8s env - it supports NVML on windows too, but it isn't usable as a standalone binding

For now implementing this with (2), sticking to linux support only. If we see demand for this on Windows (or if I have spare time) we can probably add it ourselves and send a PR on the upstream issue.

iameli · 2021-04-14T05:01:51Z

@jailuthra What about this? Looks like it can list GPUs in a cross-platform kind of way. https://github.com/jaypipes/ghw#gpu

Edit: Tested it on Windows (mingw64) and this example script seemed to work:

package main

import (
	"fmt"

	"github.com/jaypipes/ghw"
)

func main() {
	gpu, err := ghw.GPU()
	if err != nil {
		fmt.Printf("Error getting GPU info: %v", err)
	}

	fmt.Printf("%v\n", gpu)

	for _, card := range gpu.GraphicsCards {
		fmt.Printf(" %v\n", card)
	}
}

Produced this output:

$ ./gpu-detection.exe
gpu (1 graphics card)
 card #0 @PCI\\VEN_10DE&DEV_1B06&SUBSYS_374C1458&REV_A1\\4&1FC990D7&0&0019 -> class: 'unknown' vendor: 'NVIDIA' product: 'NVIDIA GeForce GTX 1080 Ti'

Testing on the same machine booted into Linux now...

jailuthra · 2021-04-14T05:20:53Z

@jailuthra What about this? Looks like it can list GPUs in a cross-platform kind of way. https://github.com/jaypipes/ghw#gpu

Neat find! I had already implemented a fix using go-nvml and it works on my linux machine - but importing the library is making our windows CI build fail 😞

If an easy fix for that isn't possible I'll switch to ghw just to keep the build process sane - although it won't really work on windows for multiple chipsets, as it hardcodes gpu device id as 0

edit: ahh but we could use the number of chipsets returned on windows ^ and create our own array of ids like we're already doing with go-nvml.

iameli · 2021-04-14T06:02:17Z

Right on, that'd probably work. My Ubuntu installation on that machine seems to be broken, but here's the same script on a 8-GPU rig in BER:

./gpu-detection
gpu (8 graphics cards)
 card #0 @0000:01:00.0 -> class: 'Display controller' vendor: 'NVIDIA Corporation' product: 'TU116 [GeForce GTX 1660 SUPER]'
 card #1 @0000:02:00.0 -> class: 'Display controller' vendor: 'NVIDIA Corporation' product: 'TU116 [GeForce GTX 1660]'
 card #2 @0000:03:00.0 -> class: 'Display controller' vendor: 'NVIDIA Corporation' product: 'TU116 [GeForce GTX 1660]'
 card #3 @0000:04:00.0 -> class: 'Display controller' vendor: 'NVIDIA Corporation' product: 'TU116 [GeForce GTX 1660 SUPER]'
 card #4 @0000:05:00.0 -> class: 'Display controller' vendor: 'NVIDIA Corporation' product: 'TU116 [GeForce GTX 1660 SUPER]'
 card #5 @0000:06:00.0 -> class: 'Display controller' vendor: 'NVIDIA Corporation' product: 'TU116 [GeForce GTX 1660]'
 card #6 @0000:07:00.0 -> class: 'Display controller' vendor: 'NVIDIA Corporation' product: 'TU116 [GeForce GTX 1660]'
 card #7 @0000:08:00.0 -> class: 'Display controller' vendor: 'NVIDIA Corporation' product: 'TU116 [GeForce GTX 1660]'

Windows says vendor: 'NVIDIA' and Linux says vendor: 'NVIDIA Corporation' - I suppose we just allow any that contain the word NVIDIA?

jailuthra · 2021-04-14T06:09:19Z

@iameli Perfect, I've switched to ghw and it's working great on my linux machine too!

Windows says vendor: 'NVIDIA' and Linux says vendor: 'NVIDIA Corporation' - I suppose we just allow any that contain the word NVIDIA?

Yeah somehow I did exactly that without having that info :P

go-livepeer/common/util.go

Line 424 in 7fb3f3c

if strings.EqualFold(card.DeviceInfo.Vendor.Name[:6], "nvidia") {

jailuthra · 2021-04-14T06:34:35Z

cc @yondonfu

If -nvidia is specified without any arguments, Nvidia transcoding is enabled and all devices are enumerated

Golang's flag library does not support empty values for string flags (it only works for boolean flags like -testTranscoder)

Something like -nvidia= would have worked, but imo that would have been confusing and anyhow different from ethminer/t-rex. For now I've sticked to -nvidia=all unless a cleaner solution is feasible.

AbAb1l self-assigned this Feb 3, 2020

iameli changed the title ~~Implement -nvidia='*'~~ Implement -nvidia=all Feb 5, 2020

iameli assigned ya7ya and unassigned AbAb1l Feb 9, 2021

iameli unassigned ya7ya Mar 8, 2021

iameli added the tech debt label Mar 16, 2021

jailuthra self-assigned this Apr 4, 2021

jailuthra mentioned this issue Apr 4, 2021

Enumerate and auto-select available Nvidia devices livepeer/lpms#226

Closed

jailuthra mentioned this issue Apr 14, 2021

Enumerate Nvidia GPU IDs #1840

Merged

9 tasks

jailuthra closed this as completed in #1840 Apr 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement -nvidia=all #1205

Implement -nvidia=all #1205

iameli commented Nov 19, 2019 •

edited

Loading

iameli commented Nov 21, 2019 •

edited

Loading

iameli commented Feb 5, 2020

iameli commented Feb 18, 2020

iameli commented Feb 11, 2021

yondonfu commented Apr 1, 2021 •

edited

Loading

jailuthra commented Apr 12, 2021

yondonfu commented Apr 12, 2021 •

edited

Loading

darkdarkdragon commented Apr 12, 2021

jailuthra commented Apr 13, 2021

iameli commented Apr 14, 2021 •

edited

Loading

jailuthra commented Apr 14, 2021 •

edited

Loading

iameli commented Apr 14, 2021

jailuthra commented Apr 14, 2021 •

edited

Loading

jailuthra commented Apr 14, 2021

Implement -nvidia=all #1205

Implement -nvidia=all #1205

Comments

iameli commented Nov 19, 2019 • edited Loading

iameli commented Nov 21, 2019 • edited Loading

iameli commented Feb 5, 2020

iameli commented Feb 18, 2020

iameli commented Feb 11, 2021

yondonfu commented Apr 1, 2021 • edited Loading

jailuthra commented Apr 12, 2021

yondonfu commented Apr 12, 2021 • edited Loading

darkdarkdragon commented Apr 12, 2021

jailuthra commented Apr 13, 2021

iameli commented Apr 14, 2021 • edited Loading

jailuthra commented Apr 14, 2021 • edited Loading

iameli commented Apr 14, 2021

jailuthra commented Apr 14, 2021 • edited Loading

jailuthra commented Apr 14, 2021

iameli commented Nov 19, 2019 •

edited

Loading

iameli commented Nov 21, 2019 •

edited

Loading

yondonfu commented Apr 1, 2021 •

edited

Loading

yondonfu commented Apr 12, 2021 •

edited

Loading

iameli commented Apr 14, 2021 •

edited

Loading

jailuthra commented Apr 14, 2021 •

edited

Loading

jailuthra commented Apr 14, 2021 •

edited

Loading