Mac/Windows Support #3

pierotofy · 2024-02-16T17:31:22Z

Haven't tested there, but should work with minimal/no changes.

DiHubKi · 2024-02-18T12:24:02Z

It's working on Windows

Changes

model.cpp line 189

std::vector<long int> repeats;
to
std::vector<int64_t> repeats;

opensplat.cpp line 125

model.savePlySplat(p.replace_filename(fs::path(p.stem().string() + "_" + std::to_string(step) + p.extension().string()).string()));
to
model.savePlySplat((p.replace_filename(fs::path(p.stem().string() + "_" + std::to_string(step) + p.extension().string())).string()));

CMakeLists.txt line 48

if (MSVC)
   file(GLOB TORCH_DLLS "${TORCH_INSTALL_PREFIX}/lib/*.dll")
   file(GLOB OPENCV_DLL "${OPENCV_DIR}/x64/vc16/bin/opencv_world490.dll")
   set(DLLS_TO_COPY ${TORCH_DLLS} ${OPENCV_DLL})
   add_custom_command(TARGET opensplat
       POST_BUILD
       COMMAND ${CMAKE_COMMAND} -E copy_if_different
       ${DLLS_TO_COPY}
       $<TARGET_FILE_DIR:opensplat>)
endif (MSVC)

pierotofy · 2024-02-18T18:17:37Z

That's awesome! Thanks for testing and confirming it works.

Would you be interested in opening a pull request with these changes? 🙏

DiHubKi · 2024-02-19T07:51:46Z

Okay

pierotofy · 2024-02-19T16:24:16Z

Mac support will likely require porting gsplat to CPU (maybe via HIP).

BarnabasTakacs · 2024-02-23T18:10:07Z

Hi, this is great stuff, thank you for doing it.

I am trying to recompile it too on Windows but I got an error when it reaches make -j$(nproc) command.
How did you compile it? Using the same as in the GitHub page or you needed to modify a bit?

dm-de · 2024-02-23T23:04:10Z

run
cmake --build .

dm-de · 2024-02-23T23:46:05Z

@Disa-Kizonda
which versions do you use?
msvc ?
cuda ?
opencv ?
libtorch ?

My versions:
msvc 19.39.33520.0
cuda 11.8 (https://developer.download.nvidia.com/compute/cuda/11.8.0/network_installers/cuda_11.8.0_windows_network.exe)
opencv 4.9 (https://github.com/opencv/opencv/releases/download/4.9.0/opencv-4.9.0-windows.exe)

I had no success building with:
https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-2.2.1%2Bcu118.zip

I had success building with:
https://download.pytorch.org/libtorch/libtorch-win-shared-with-deps-1.13.1%2Bcu116.zip

I got many warnings - but exe seems to start.
It is so bad, that we had no information.

pierotofy · 2024-02-24T00:02:22Z

Instructions for Windows will be a bit different (e.g. using cmake --build . as it's been pointed out). We need to update the README. I'd love if somebody could document the process once they have it running on Windows.

dm-de · 2024-02-24T00:27:43Z

Now, I used libtorch 2.1.0 for cuda 11.8
https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-2.1.0%2Bcu118.zip

edit: not listed, but this seems more updated version
https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-2.1.2%2Bcu118.zip

according to this, it's best selection
#17 (comment)

I had no more bad warnings.

Only during compiling cuda:

Details

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include\cuda\std\detail\libcxx\include\support\atomic\atomic_msvc.h(15): warning C4005: "_Compiler_barrier": Makro-Neudefinition

C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\xatomic.h(55): note: Siehe vorherige Definition von "_Compiler_barrier"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include\cuda\std\detail\libcxx\include\support\atomic\atomic_msvc.h(15): warning C4005: "_Compiler_barrier": Makro-Neudefinition

C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\xatomic.h(55): note: Siehe vorherige Definition von "_Compiler_barrier"

I don't know if this is normal...

BarnabasTakacs · 2024-02-24T06:51:32Z

Confirming Windows compilation, this is what you should see when it all works

-- Building for: Visual Studio 16 2019
-- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19045.
-- The C compiler identification is MSVC 19.29.30147.0
-- The CXX compiler identification is MSVC 19.29.30147.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Warning (dev) at CMakeLists.txt:5 (set):
implicitly converting 'OPENCV_DIR' to 'STRING' type.
This warning is for project developers. Use -Wno-dev to suppress it.

-- Found CUDA: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.8 (found version "11.8")
-- The CUDA compiler identification is NVIDIA 11.8.89
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.8/bin/nvcc.exe - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.8/include (found version "11.8.89")
-- Caffe2: CUDA detected: 11.8
-- Caffe2: CUDA nvcc is: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.8/bin/nvcc.exe
-- Caffe2: CUDA toolkit directory: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.8
-- Caffe2: Header version is: 11.8
-- C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.8/lib/x64/nvrtc.lib shorthash is dd482e34
-- USE_CUDNN is set to 0. Compiling without cuDNN support
-- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
-- Autodetected CUDA architecture(s): 7.5
-- Added CUDA NVCC flags for: -gencode;arch=compute_75,code=sm_75
-- Found Torch: E:/BarnaDesktop/OpenSplat/Installs/libtorch-win-shared-with-deps-2.1.2+cu118/lib/torch.lib
-- OpenCV ARCH: x64
-- OpenCV RUNTIME: vc16
-- OpenCV STATIC: OFF
-- Found OpenCV: C:/opencv/build (found version "4.9.0")
-- Found OpenCV 4.9.0 in C:/opencv/build/x64/vc16/lib
-- You might need to add C:\opencv\build\x64\vc16\bin to your PATH to be able to run your applications.
-- Configuring done
-- Generating done
-- Build files have been written to: E:/OpenSplat/build

It shows a few warnings (in yellow), but the exe will be generated in opensplat.vcxproj -> E:\OpenSplat\build\Debug\opensplat.exe

BarnabasTakacs · 2024-02-24T08:56:35Z

If the above installations are correct, you can open build/opensplat.sln directly in Visual Studio and compile (or debug) there.

dm-de · 2024-02-24T12:10:47Z

Installed software:
Visual Studio 2022 C++
https://github.com/Kitware/CMake/releases/download/v3.28.3/cmake-3.28.3-windows-x86_64.msi
https://developer.download.nvidia.com/compute/cuda/11.8.0/network_installers/cuda_11.8.0_windows_network.exe
https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-2.1.2%2Bcu118.zip
https://github.com/opencv/opencv/releases/download/4.9.0/opencv-4.9.0-windows.exe

Build:
"C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat"
git clone https://github.com/pierotofy/OpenSplat OpenSplat
cd OpenSplat
mkdir build && cd build
cmake -DCMAKE_PREFIX_PATH=C:/path_to/libtorch_2.1.2_cu11.8 -DOPENCV_DIR=C:/path_to/OpenCV_4.9.0/build -DCMAKE_BUILD_TYPE=Release ..
cmake --build . --config Release

Optional: Edit cuda target (only if required) before cmake --build .
C:/path_to/OpenSplat/build/gsplat.vcxproj
for example: arch=compute_75,code=sm_75

Run:
cd Release
opensplat /path/to/banana -n 2000

TimBoostGraphics · 2024-02-27T17:26:47Z

@dm-de did you manage to make it working? I'm trying but after building & running "opensplat /path/to/banana -n 2000" it just loads the images and stops.

pfxuan · 2024-02-27T22:08:58Z

@pierotofy Any ideas to port gsplat cuda code into MPS, MLX or CPU based architecture? I can easily compile gsplat with Mac M2 chip in NO_CUDA mode. But the most important part (csrc cuda extension) would be skipped. https://github.com/nerfstudio-project/gsplat/blob/main/setup.py#L132

DiHubKi · 2024-02-28T00:36:55Z

@TimBoostGraphics

not enough vram try -d 8

TimBoostGraphics · 2024-02-28T08:54:38Z

@Disa-Kizonda Your solution works but I have an A6000 with 48gb of vram and the opensplat command is barely using vram while running. It's crashing in oversplat.cpp on this line.
torch::Tensor gt = cam.getImage(model.getDownscaleFactor(step));

dm-de · 2024-02-28T17:17:54Z

@dm-de did you manage to make it working? I'm trying but after building & running "opensplat /path/to/banana -n 2000" it just loads the images and stops.

I have got it working...
I was able to run it with 16 banana images scaled down to 1000px with 8 GB vram.
Lower down scale factor than 3 does not work for me
Memory consumption seems huge...
I have only 8GB and I think i will not use opensplat how it's today.

I used also :
https://github.com/MrNeRF/gaussian-splatting-cuda
I was able to run with 250 images - around same size.
Harder to compile. Follow instructions:
MrNeRF/gaussian-splatting-cuda#4 (comment)

pierotofy · 2024-02-28T17:43:19Z

I have only 8GB and I think i will not use opensplat how it's today.

That's strange; I've run the banana dataset with the defaults on a card that has 2GB of memory. There might be something going on.

pierotofy · 2024-02-28T17:45:11Z

@pfxuan gsplat currently requires CUDA. (The BUILD_NO_CUDA option does not make it work without CUDA, it just doesn't build the CUDA parts for you, which is helpful during development).

Adding CPU support will require a rewrite of gsplat for the CPU using multithreading, or porting the CUDA coda to HIP (https://github.com/ROCm/HIP) which should have support for compiling to CPU. The latter might be easier to do.

dm-de · 2024-02-28T18:18:35Z

That's strange; I've run the banana dataset with the defaults on a card that has 2GB of memory. There might be something going on.

banana with -d 3
total system vram usage: 6,5GB (starts from ~2GB)
vram grow quick first and then it is stabilised

with banana -d 2
stopps immediately after image loading - zero samples shot
Crash without error (330mb CrashDump is saved at %appdata%\local\CrashDumps)

edit: crash typically happen, when the graphics memory runs out

GFX Card: Quadro RTX 4000

pierotofy · 2024-02-29T20:57:34Z

I wonder if it's Windows related (I've run the software on Linux).

salovision · 2024-03-11T16:56:46Z

I've tested this on Windows with RTX 2080 and 8GB of memory. I can run opensplat /path/to/banana -n 2000 successfully and it takes about 2.7GB of vram (peak) according to nvidia-smi. However, I had to fix couple of issues first:

cv_utils.cpp, in function tensorToImage(), replace:
uint8_t *dataPtr = static_cast<uint8_t *>((t * 255.0).toType(torch::kU8).data_ptr());
with
torch::Tensor scaledTensor = (t * 255.0).toType( torch::kU8 );
uint8_t* dataPtr = static_cast<uint8_t*>(scaledTensor.data_ptr());
Reason: This sometimes crashes because the data_ptr() points to Tensor object that's already destructed.
opensplat.cpp, in function main(), replace:
InfiniteRandomIterator<Camera> camsIter(cams);
with
std::vector< size_t > indices( cams.size() );
std::iota( indices.begin(), indices.end(), 0 );
InfiniteRandomIterator<size_t> camsIter(indices);
and
Camera cam = camsIter.next();
with
Camera& cam = cams[ camsIter.next() ];
Reason: Original camsIter.next() returned new copy constructed Camera class causing Camera::getImage() to not cache the imagePyramids properly, so it ended up resizing new image every time slowing it down.
model.cpp, in function Model::forward(), I removed calls to cam.scaleOutputResolution() and calculated scaled cam.fx, cam.fy, cam.cx, cam.cy, cam.height and cam.width locally in that function. Problem with the scaleOutputResolution() is that it will quantize camera width/height since it operates with floats and ints and the rescaling back does not really work. It does not return the original image size and this will cause other crashes later where the rgb and gt tensor sizes do not match.

Also it currently seem to be CPU-bound, where mainLoss.backward() seem to take majority of the time. So there are probably lot of opportunities for optimizing it still further. Thanks for taking the initiative to rewrite this in c++/cuda instead of Python!

pierotofy · 2024-03-11T17:01:52Z

Thanks for testing and the detailed explanation of changes!

Would you be interested in opening a pull request with the aforementioned changes? It might help other users on Windows.

salovision · 2024-03-11T17:05:58Z

Thanks for testing and the detailed explanation of changes!

Would you be interested in opening a pull request with the aforementioned changes? It might help other users on Windows.

I can do that. There are obviously multiple ways of fixing some things like the scaleOutputResolution(), personally I would remove that function altogether and do the calculations inside Model::forward() as it's safer not to modify the original camera parameters. But I can make a pull request and you can decide if those fixes sound good to you :)

pierotofy · 2024-03-12T02:27:16Z

@dm-de try with the latest main branch, as the changes from #37 might have fixed the memory issue you were experiencing? 🙏

ichsan2895 · 2024-03-13T13:08:34Z

Something is going on. I try it on Ubuntu 22.04 LTS, and it consumes >10 GB with -d 1 in 251 images with 960 x 540 resolution. I check it from nvidia-smi. But since @pierotofy has said that memory allocated by libtorch is not automatically released, but it’s kept in an available state. Don’t trust the number from nvidia-smi. The actual usage is much lower.

dm-de · 2024-03-13T17:47:57Z

My results Windows x64 / Quadro RTX 4000 8GB

iter x1000	time	splats x1000	gb
start	12:33	0	0,8
1	12:34	82	2
2	12:34	273	3,7
2,9	12:35	407	5,5 ?	slow after 3k
4	12:37	423	6,1
5	12:38	454	6,8
5,9	12:39	471	7,7	very slow after 6k
7	12:44	465	4,8
8	12:48	468	4,9
8,9	12:52	470	4,9
10	12:57	465	6,6
11	13:01	465	6,6
11,9	13:04	464	6,6
stop			0,6

salovision · 2024-03-17T09:31:03Z

Indeed, there is a problem with the memory management where the refining steps (densify/culling) will essentially recreate all tensors (by using Tensor::index) and the current cuda memory manager likes to keep old tensors in memory for caching. While this may work in some projects, but here it will just exhaust all memory which after everything slows down dramatically.

Fortunately, there is an easy fix to empty the cuda memory cache after every refine step.

In Model::afterTrain() function, add c10::cuda::CUDACachingAllocator::emptyCache(); to the end of the function, after line max2DSize = torch::Tensor();. You also need to #include <c10/cuda/CUDACachingAllocator.h> in the beginning of the file.

Here's a comparison before and after the change:

Windows 10 / RTX 2080 / 8GB VRAM

Step	Splats	Min:Sec	VRAM	Splats	Min:Sec	VRAM
200	33951	00:15	1364MB	33951	00:15	1364MB
400	33951	00:30	1364MB	33951	00:30	1364MB
600	33951	00:45	1364MB	33951	00:45	1364MB
800	46498	01:00	1372MB	46423	00:59	1372MB
1000	82050	01:15	1416MB	82165	01:14	1414MB
1200	134395	01:31	1610MB	134517	01:30	1488MB
1400	200933	01:47	2044MB	201939	01:46	1598MB
1600	274549	02:03	2602MB	276601	02:02	1650MB
1800	353564	02:20	3322MB	357344	02:19	1802MB
2000	431318	02:38	4358MB	437070	02:36	1972MB
2200	508949	02:57	5496MB	516026	02:55	2134MB
2400	585358	03:16	6948MB	593383	03:14	2232MB
2600	664535	03:37	8191MB	673355	03:34	2420MB
2800	740199	09:29	8191MB	750424	03:55	2530MB
3000	816083	16:05	8191MB	827623	04:16	2766MB
3200				827623	04:40	2472MB
3400				833699	05:03	2474MB
3600				812580	05:27	2752MB
3800				839227	05:50	2774MB
4000				871056	06:14	2812MB
4200				906333	06:38	2836MB
4400				947372	07:02	3028MB
4600				991100	07:27	3090MB
4800				1035020	07:52	3140MB
5000				1085761	08:18	3222MB

Previously learning stalled after 2600 steps when VRAM was full, 3000 steps took 16 minutes where it now takes 4 minutes when the cache is cleared after refining. Now I can finally use this project with bigger data sets and without running out of VRAM even with an 8GB card.

pfxuan · 2024-03-17T18:57:37Z

Thanks for bringing up the cache issue. It appears emptyCache() also could help to control VRAM consumption from AMD GPU.

GPU: AMD RX 6700 XT

Before
VRAM: 12,200 MiB
Time: 02:35

After
VRAM: 2,544 MiB
Time: 02:35

I'll add this fix into both cuda and rocm code shortly.

ichsan2895 · 2024-03-18T13:24:27Z

Its important fix (also fixing #43), I think @pierotofy should tagging OpenSplat v1.0.3.

josephldobson · 2024-03-18T22:39:51Z

Is rewriting the CUDA kernels in MSL possible? This is something I'd be interested in learning how to do over the summer, has anyone started this?

pierotofy · 2024-03-18T23:29:36Z

That would be a cool project! I don't see why it wouldn't; I'm very close to having a CPU-only implementation ready (https://github.com/pierotofy/OpenSplat/tree/cpudiff), so a MSL version would just be another port.

andrewkchan · 2024-04-14T02:45:59Z

@josephldobson fyi! #76

pierotofy added the enhancement New feature or request label Feb 16, 2024

pierotofy changed the title ~~Mac/Windows Builds~~ Mac/Windows Support Feb 16, 2024

DiHubKi mentioned this issue Feb 19, 2024

Windows fix #14

Merged

salovision mentioned this issue Mar 11, 2024

General fixes that came up with Windows build #37

Merged

pfxuan mentioned this issue Mar 17, 2024

Reduce VRAM consumption #43

Merged

This was referenced Mar 20, 2024

Installation issues #44

Open

Mac Support #50

Merged

pierotofy closed this as completed in #50 Mar 21, 2024

pfxuan mentioned this issue Apr 14, 2024

Add MPS support with fused kernels #76

Merged

embercult mentioned this issue May 4, 2024

Docker build on a100 gpu libtorch cuda error #90

Closed

Payday02 mentioned this issue Feb 2, 2025

Opensplat installed for ROCM/HIP but defaults to CPU #155

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mac/Windows Support #3

Mac/Windows Support #3

pierotofy commented Feb 16, 2024 •

edited

Loading

DiHubKi commented Feb 18, 2024 •

edited

Loading

pierotofy commented Feb 18, 2024

DiHubKi commented Feb 19, 2024

pierotofy commented Feb 19, 2024

BarnabasTakacs commented Feb 23, 2024

dm-de commented Feb 23, 2024

dm-de commented Feb 23, 2024

pierotofy commented Feb 24, 2024

dm-de commented Feb 24, 2024 •

edited

Loading

BarnabasTakacs commented Feb 24, 2024 •

edited

Loading

BarnabasTakacs commented Feb 24, 2024

dm-de commented Feb 24, 2024 •

edited

Loading

TimBoostGraphics commented Feb 27, 2024 •

edited

Loading

pfxuan commented Feb 27, 2024

DiHubKi commented Feb 28, 2024 •

edited

Loading

TimBoostGraphics commented Feb 28, 2024

dm-de commented Feb 28, 2024

pierotofy commented Feb 28, 2024

pierotofy commented Feb 28, 2024 •

edited

Loading

dm-de commented Feb 28, 2024 •

edited

Loading

pierotofy commented Feb 29, 2024

salovision commented Mar 11, 2024

pierotofy commented Mar 11, 2024 •

edited

Loading

salovision commented Mar 11, 2024

pierotofy commented Mar 12, 2024 •

edited

Loading

ichsan2895 commented Mar 13, 2024

dm-de commented Mar 13, 2024

salovision commented Mar 17, 2024

pfxuan commented Mar 17, 2024

ichsan2895 commented Mar 18, 2024 •

edited

Loading

josephldobson commented Mar 18, 2024

pierotofy commented Mar 18, 2024 •

edited

Loading

andrewkchan commented Apr 14, 2024

Mac/Windows Support #3

Mac/Windows Support #3

Comments

pierotofy commented Feb 16, 2024 • edited Loading

DiHubKi commented Feb 18, 2024 • edited Loading

pierotofy commented Feb 18, 2024

DiHubKi commented Feb 19, 2024

pierotofy commented Feb 19, 2024

BarnabasTakacs commented Feb 23, 2024

dm-de commented Feb 23, 2024

dm-de commented Feb 23, 2024

pierotofy commented Feb 24, 2024

dm-de commented Feb 24, 2024 • edited Loading

BarnabasTakacs commented Feb 24, 2024 • edited Loading

BarnabasTakacs commented Feb 24, 2024

dm-de commented Feb 24, 2024 • edited Loading

TimBoostGraphics commented Feb 27, 2024 • edited Loading

pfxuan commented Feb 27, 2024

DiHubKi commented Feb 28, 2024 • edited Loading

TimBoostGraphics commented Feb 28, 2024

dm-de commented Feb 28, 2024

pierotofy commented Feb 28, 2024

pierotofy commented Feb 28, 2024 • edited Loading

dm-de commented Feb 28, 2024 • edited Loading

pierotofy commented Feb 29, 2024

salovision commented Mar 11, 2024

pierotofy commented Mar 11, 2024 • edited Loading

salovision commented Mar 11, 2024

pierotofy commented Mar 12, 2024 • edited Loading

ichsan2895 commented Mar 13, 2024

dm-de commented Mar 13, 2024

salovision commented Mar 17, 2024

pfxuan commented Mar 17, 2024

ichsan2895 commented Mar 18, 2024 • edited Loading

josephldobson commented Mar 18, 2024

pierotofy commented Mar 18, 2024 • edited Loading

andrewkchan commented Apr 14, 2024

pierotofy commented Feb 16, 2024 •

edited

Loading

DiHubKi commented Feb 18, 2024 •

edited

Loading

dm-de commented Feb 24, 2024 •

edited

Loading

BarnabasTakacs commented Feb 24, 2024 •

edited

Loading

dm-de commented Feb 24, 2024 •

edited

Loading

TimBoostGraphics commented Feb 27, 2024 •

edited

Loading

DiHubKi commented Feb 28, 2024 •

edited

Loading

pierotofy commented Feb 28, 2024 •

edited

Loading

dm-de commented Feb 28, 2024 •

edited

Loading

pierotofy commented Mar 11, 2024 •

edited

Loading

pierotofy commented Mar 12, 2024 •

edited

Loading

ichsan2895 commented Mar 18, 2024 •

edited

Loading

pierotofy commented Mar 18, 2024 •

edited

Loading