Flocking with CUDA

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 1 - Flocking

Jonathan Lee
Tested on: Windows 7, i7-7700 @ 4.2GHz 16GB, GTX 1070 (Personal Machine)

Overview

Flocking

In this simulation, three types of steering were used to simulate flocking.

Cohesion - boids move towards the perceived center of mass of their neighbors
Separation - boids avoid getting too close to their neighbors
Alignment - boids generally try to move with the same direction and speed as their neighbors

Searching

Naive Search - Loops through all N boids to update position and velocity.
Naive Uniform Grid - Rather than looping through all N boids in the simulation, we only loop through the boids that are in the 8 neighboring cells of the current boid being processed.
Coherent Uniform Grid - This is similar to the Naive Uniform Grid however, this improves upon the Naive Uniform Grid in that we completely eliminate an extra fetch to obtain position and velocity. Because of this, there is a drastic improvement in performance.

Results

10,000 Boids

100,000 Boids

500,000 Boids

Performance Analysis

Number of Boids

With each of the three methods, the framerate eventually converges to 0 as the number of boids increases. I believe that this is due to the fact that each cell can contain more and more boids which requires an additional loop.

Block Size

Decreasing the block size did cause a dip in performance across all three searches. I didn't notice any alarming performance differences at 32, 64, and 256. I believe that this is due to the warp size being capped at 32 threads.

One thing that I noticed with changing the block size was that once I changed the blocksize to anything greater than 512, the simulation crashes with too many resources requested for launch. After researching the error, I found out that this meant that there weren't enough available registers on the multiprocessor to handle the block size.

Uniform vs. Coherent Grids

With fewer boids, the difference between the Uniform and Coherent Grids seems pretty insignificant. However, as you increase the number of boids, the difference becomes extremely noticable. Searching using the Uniform Grid requires an extra lookup through dev_particlesArrayIndices whereas searching using the Coherent Grid does not. A single index can provide the boid's position, velocity, and grid cell.

Neighbor Checking

Increasing the neighbor check from 8 to 27 neighboring cells caused a decrease in performance. In the chart below, we can see that the FPS decreases as more boids are added. Not only that, but the simulation crashed sooner for both Uniform and Coherent Grid Searches.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
cmake		cmake
external		external
images		images
shaders		shaders
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
GNUmakefile		GNUmakefile
INSTRUCTION.md		INSTRUCTION.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flocking with CUDA

Overview

Flocking

Searching

Results

10,000 Boids

100,000 Boids

500,000 Boids

Performance Analysis

Number of Boids

Block Size

Uniform vs. Coherent Grids

Neighbor Checking

About

Releases

Packages

Languages

AgentLee/Project1-CUDA-Flocking

Folders and files

Latest commit

History

Repository files navigation

Flocking with CUDA

Overview

Flocking

Searching

Results

10,000 Boids

100,000 Boids

500,000 Boids

Performance Analysis

Number of Boids

Block Size

Uniform vs. Coherent Grids

Neighbor Checking

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages