The aim of this project is to investigate whether the int8 architecture can provide acceleration compared to the fp16/fp32 architecture (in particular, there must be good INT8 computing structures for this to be profitable)
You may find more info in project presentation
In this branch we used directly builded libtorch as in Building libtorch using CMake
We builded it in Debug mode, to do this the one needs to run the following commands in /cpp folder.
Warning
Overall build will require a little less than 23 GB of disk space and about 14 GB of CPU RAM
git clone -b main --recurse-submodule https://github.com/pytorch/pytorch.git
mkdir pytorch-build
cd pytorch-build
cmake -DBUILD_SHARED_LIBS:BOOL=ON -DCMAKE_BUILD_TYPE:STRING=Debug -DPYTHON_EXECUTABLE:PATH=`which python3` -DCMAKE_INSTALL_PREFIX:PATH=../pytorch-install ../pytorch
cmake --build . --target install
Then, also in /cpp folder run
mkdir build
cd build
cmake ..
make
- FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference
- Intra-operator parallelism settings in PyTorch
- PyTorch Benchmark
- PyTorch Performance Tuning Guide
- PyTorch Numeric Suite and guide on it
- PyTorch Numeric Suite FX
- Pareto-Optimal Quantized ResNet Is Mostly 4-bit
- Introducing Quantized Tensor
- How to optimize GEMM
- Eigen library