Skip to content

Quantized inference on CPU (int8 / int4 / mixed precision )

Notifications You must be signed in to change notification settings

alexeybelkov/YSDA-CPU-inference

Repository files navigation

YSDA-CPU-inference

Quantized inference on CPU (int8 / int4 / mixed precision )

The aim of this project is to investigate whether the int8 architecture can provide acceleration compared to the fp16/fp32 architecture (in particular, there must be good INT8 computing structures for this to be profitable)
You may find more info in project presentation

C++ config

In this branch we used directly builded libtorch as in Building libtorch using CMake
We builded it in Debug mode, to do this the one needs to run the following commands in /cpp folder.

Warning

Overall build will require a little less than 23 GB of disk space and about 14 GB of CPU RAM

git clone -b main --recurse-submodule https://github.com/pytorch/pytorch.git
mkdir pytorch-build
cd pytorch-build
cmake -DBUILD_SHARED_LIBS:BOOL=ON -DCMAKE_BUILD_TYPE:STRING=Debug -DPYTHON_EXECUTABLE:PATH=`which python3` -DCMAKE_INSTALL_PREFIX:PATH=../pytorch-install ../pytorch
cmake --build . --target install

Then, also in /cpp folder run

mkdir build
cd build
cmake ..
make

Usefull links

Additioanl topic-related papers from experienced researchers

About

Quantized inference on CPU (int8 / int4 / mixed precision )

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages