This repository contains implementations of vector addition in both CPU (C) and GPU (CUDA) to demonstrate and compare their performance characteristics.
cpu_vector_add.c
: CPU implementation of vector additionvector_add.cu
: GPU (CUDA) implementation of vector additiontoy.cu
: A toy CUDA program for testing and learning purposesMakefile
: Compilation instructions for all programs
- GCC compiler for CPU code
- NVIDIA CUDA Toolkit for GPU code
- Make utility
Use the provided Makefile to compile the programs:
- To compile all programs:
make
ormake all
- To compile only the CPU version:
make cpu
- To compile only the GPU version:
make gpu
- To compile only the toy CUDA program:
make toy
- To clean up compiled executables:
make clean
After compilation, you can run the programs as follows:
- CPU version:
./cpu
- GPU version:
./gpu
- Toy CUDA program:
./toy
This program performs vector addition on the CPU. It includes timing measurements for:
- Memory allocation
- Array initialization
- Vector addition operation
- Total execution time
This CUDA program performs vector addition on the GPU. It includes timing measurements for:
- Memory allocation (on GPU)
- Data transfer (Host to Device and Device to Host)
- Kernel execution (actual vector addition)
- Total execution time
This is a simple CUDA program for learning and testing purposes. It may demonstrate basic CUDA concepts or serve as a template for further CUDA development.
Run both the CPU and GPU versions and compare their execution times. Note that:
- The GPU version includes data transfer overhead, which may impact performance for smaller datasets.
- The GPU version is expected to perform better for larger datasets or more complex operations.
- Both CPU and GPU versions use the
-O3
optimization flag.