-
Notifications
You must be signed in to change notification settings - Fork 693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What about the GPU? #273
Comments
We already have tensions due to bikeshed differences between arm and x86. I think that adding another set of hardware targets would create more tension: more operations would either have to be slow due to emulation costs to get uniform semantics on all targets, or more operations would have to have undefined behavior to allow everyone to run fast. I think that makes it unprofitable to consider the GPU at this time (or ever). -Fil
|
Hi Fil -- thanks, fair answer -- except the "(or ever)" bit :-P. There's nothing particularly magical about Khronos that gives them a perpetual lock on the GPU. Of course GPU vendors feel tension between standardization/interoperation, and new special and divergent features, but the extension model handles most of this. Radically different GPU architectures will require new thinking, for sure -- out of scope for WebAssembly and WebGL (and even Vulkan). But CPU (and memory model, with such things as store order), including floating point, and SIMD, with all the speciation we still see, have converged enough to be included in WebAssembly. Up to at least very useful OpenGL/ES3.1 levels of interop, I claim that GPUs have converged as much. More comments welcome. Not looking for more work, believe me. I'm looking for the equivalent of Occam's razor, or even an empirical law, that lets us defer GPU for now, or even forever. You may have a good case for "not now" -- community group is a consensus thing at best, so the group should decide. I don't think you've made a case for "not ever". /be |
It's a (mostly silent) goal of mine for WebAssembly to be able to eventually target GPUs, but I don't think it's an MVP feature at all. I think it's doable but difficult, as discussed in #41: the C++ standard committee has been looking at standardizing fixed-width and variable-width SIMD for a while, and I think WebAssembly will probably want to adopt a similar approach. We can engage with the same vendors (NVIDIA and Intel), but I suspect that we'll get to the same discussions as those the C++ standards committee is having (and which I attend). I'm hoping that we can define a "fast" subset of WebAssembly that'll work well for such targets, and that other operations Just Work™ but may be slow (heavy divergence, exceptions, ...). |
SIMD isn't particularly important for targeting GPUs, at least at a basic level. IIRC AMD and NVidia's GPU architectures have been scalar (not SIMD) for a long while now, and PowerVR is VLIW. Not sure about the other android architectures or Intel but I wouldn't be shocked if 4-element/etc SIMD is falling out of fashion entirely on GPUs. Scalar GPU compute provides better scalability and parallelism, at least for workloads that can parallelize. See ftp://download.nvidia.com/developer/cuda/seminar/TDCI_Arch.pdf for some (outdated, to be fair) context. |
@jfbastien: right, way post-MVP, which is why I mentioned FutureFeatures.md only. @kg: GPU should give 40x or better parallelism, agree SIMD is not the right model. /be |
If the problem here is just the pace of WebGL standardization, then it's out of scope here because it's an independent API concern. Other than that, what we'd need more than anything else to make progress in this space is people stepping up. The first step is for someone to step up with a pull request for what we might add to FutureFeatures.md to attract the kinds of ideas that would be useful to consider :-). |
My main question (for including a mention of GPUs in future features) is what problems we'd be trying to address. Since the overall memory/processing models of CPUs/GPUs are still intentionally quite distinct (even if the low-level instruction sets are converging), it seems like we wouldn't get any sorts of magic "run it on either a CPU or GPU" portability. Rather, it seems like the wins would be more around a unification of tooling/code at the different levels of the pipeline. Anything else? |
Another question is whether GPU ISA convergence (including mobile GPUs) is really approaching CPU levels or not. For example, floating point division by zero produces an undefined result in SPIR-V. We won't be willing to take such things lightly in WebAssembly. |
The computational model is probably always going to have differences, since the architectures have to be different to solve their specific concerns. Intel had to face this reality with Larrabee. It's quite reasonable to consider the target of 'basic compute/logic algorithms written in WebAssembly can be cross-compiled to OpenCL or GL shaders' and make that work. For some use cases that will be much better than nothing, because it provides a sensible fallback for both JS-only and wasm-capable browser runtimes. Incremental performance improvements like that provide good results in many cases. I'd argue that nobody will ever be writing all their shaders/GPU compute exclusively in WebAssembly (nor would they want to if they care about performance), but we can provide a nice medium. |
I agree with @lukewagner that the wins would mostly be around unifying tooling, making WebAssembly a useful static IR for interchange and storage, but not necessarily settling on a computational model for GPU programs. |
I'm going to close this issue since we appear to have answered the questions and don't have anything actionable remaining here. If someone wants us to do something more on this topic, they're still welcome to re-open or file new issues or pull requests. |
I just want to point out, that the CPU+GPU unification is already happening (AMD APU, Intel Iris HD, mobile SoCs) and is more and more tight integrated at the system/hardware level (cache coherency, MMU and address space sharing, heterogeneous IR (the RISC-V project is potentially also heterogeneous ISA), etc.), so the question is not IF but HOW to embrace that, because it is been already widely adopted. |
The most helpful way to get things started here would be to file issues pointing out specific ideas, features, or concepts in HSA, SPIR-V, RISC-V, or others, that WebAssembly should consider. |
As pointed by @keryell there was already some common issues discussed in the LLVM mailinglist that impact both LLVM RFC of SPIR-V and WebAssembly |
@keryell Do you know if there is someone at AMD already active in WebAssembly design? |
@bhack I do not know. Actually I no longer work at AMD, so I cannot even talk about it anyway. But I am still working on some similar subjects at Xilinx. So s/GPU/FPGA/g for now on my side. :-) |
@keryell Nice. Are you still working on SPIR-V on llvm? Do you see some overlapping with WebAssembly effort? |
It should be usefull to put Vulkan into WebAssembly or at least make it use Vulkan but without the need for all the usual initialization stuff. While I think Vulkan is great, I think also that in some places they just added to much complication (while I totally agree in preparing a "Pipeline" object) I don't see the utility in the fuss of managing video cards/ extensions few other things directly. Note that there may be no incentives from GPU vendors in designing a slightly simpler version of Vulkan for WebAssembly (in example how the heck would all intermediate Gaming software layer from NVidia and Intel would integrate with WebAssembly? I can see only AMD would benefit from bringing low level graphics into WebAssembly.) My opinion is that first there should be some kind of API that simplify the interfacing with Vulkan, then such interface can become a defacto standard for webAssembly. While this would not allow to use the maximal possible GPU power that would be already ways better than current status (too high CPU overhead). That would still leave market for Desktop an Console Games while allowing stunning graphics even in browser (one should note that increasing GPU usage requires very heavy assets, that are unluckily needed to be downloaded). |
People still interested in this topic could follow gpuweb/admin#1 (comment) |
Please, make GPU for WebAssembly, you promised around 2 years.... Khronos... |
I would like some standardized extensions, WASI might be a good place to gather information on similar community initiatives. https://github.com/bytecodealliance/wasmtime/blob/master/docs/WASI-api.md Alternatively, a SharedMemoryBuffer could be used with a JavaScript rendering library in the interim as a PoC. |
https://github.com/WebAssembly/design/FutureFeatures.md has long-SIMD, you can see other uses of SIMD by searching the design repository. Searching for GPU finds nothing; if you search closed issues, the GPU is invoked to broaden thinking about subnormals and similar such edge cases.
My question for everyone: should we consider lifting WebGL primitives -- or really OpenGL/ES3.1 and beyond -- into WebAssembly? Treating WebGL as a black box API has the following drawbacks:
See some work by Khronos Group that started from LLVM IR and diverged into SPIR-V:
https://www.khronos.org/spir
Of course the WebAssembly community group can say "not in scope, use WebGL or SPIR-V or whatever the embedding provides." That's a possible answer, but I suspect we should not "default" into it for want of asking the question.
I know folks at OTOY would be interested in this approach. It does not need to slow anything down in WebAssembly as scoped so far. I'm really asking whether the GPU is in-bounds as a hardware unit to program via WebAssembly in the same way, as directly and with full optimization wins, as the SIMD units and the CPU are. Thanks,
/be
The text was updated successfully, but these errors were encountered: