CIS565: Project 3: CUDA Simulation and GLSL Visualization

For this project, I wrote code to implement an N-Body simulation in CUDA, visualized using GLSL. The N-Body simulator is like a gravity sim resembling a solar system where planets would orbit around a star. This assignment was an exercise on the use of shared memory and how efficient it is for programs, since the last two (the Raytracer and Pathtracer) didn't explicitly focus on performance and efficiency. As always, a framework starter code was provided by our TA, Liam Boone.

The code I've written could be optimized a lot further, since it contains quite a lot of uncoalesced global memory accesses and shared memory access bank conflicts (where the threads loop through each element in shared/global memory). One way that this could be done is by launching a block of threads for every object in which each thread will only calculate the force/acceleration on that object due to a single other object in the scene. A Parallel reduction could then be performed to find the total force/acceleration on that body. However, it is impossible to predict what the effect will be on performance without performing performance profiling using NSight, which is out of bounds for me.

Nevertheless, using this code, I was able to witness a HUGE speedup when using shared memory as opposed to global (53 fps vs. 243).

SCREENSHOTS

Flocking:

Gravity Simulation:

DETAILS

In this project, the positions, velocities and accelerations of all objects are stored in global memory locations dev_pos, dev_vel and dev_acc respectively. I was required to write device functions to calculate the accelerations for every object using both the global memory and shared memory. As an added bonus, I was able to implement prefetching for shared memory (where instead of directly loading a value from global memory into shared, we pre-load into a register ahead of the current iteration and then load that into shared during the current iteration) and two other integration schemes: Verlet and Leapfrog (a Symplectic Euler integrator was provided by default).

In addition to the above, I was also required to do my own simulation. I implemented dynamic flocking, where planets dynamically drop in and out of flocks. Such flocks are created on the fly as planets move around. This dynamic flocking runs when the command line parameter is set to 'true'.

PERFORMANCE EVALUATION

Performance of the program was compared for different number of planets/objects being simulated, using global memory, shared memory and prefetched version of shared memory. Here are the results:

With visualization on:

Memory type	Number of objects	Framerate	Number of objects	Framerate
Global	2500	1.75
Shared	2500	12	5000	6.77
Shared (Prefetched)	2500	12	5000	6.77

5000 objects were not simulated in global memory since the framerate was close to 0.

With visualization off:

Memory type	Number of objects	Framerate (avg.)	Number of objects	Framerate (avg.)
Global	1,500,000	53	3,000,000	53
Shared	1,500,000	615	3,000,000	620
Shared (Prefetched)	1,500,000	630	3,000,000	630
Global	5,000,000	53	10,000,000	53
Shared	5,000,000	630	10,000,000	615
Shared (Prefetched)	5,000,000	630	10,000,000	615
Global	20,000,000	53	50,000,000	53
Shared	20,000,000	620	50,000,000	620
Shared (Prefetched)	20,000,000	615	50,000,000	630

These results show that shared memory is WAY better than global memory. As I mentioned above, if the bank conflicts resulting out of threads accessing multiple shared memory locations were to be corrected, the program would run much faster.

The results also show no great advantage while using prefetching. I believe this is because there are not many independent instructions to mask out the latency involved in accessing global memory.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Part1		Part1
screenshots		screenshots
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CIS565: Project 3: CUDA Simulation and GLSL Visualization

SCREENSHOTS

DETAILS

PERFORMANCE EVALUATION

About

Releases

Packages

Languages

rohith10/Project3-Simulation

Folders and files

Latest commit

History

Repository files navigation

CIS565: Project 3: CUDA Simulation and GLSL Visualization

SCREENSHOTS

DETAILS

PERFORMANCE EVALUATION

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages