fix: full gpu hybrid model #963

andrei-stoian-zama · 2024-12-18T20:46:48Z

Adds support for GPU (and other devices) for the local computation of the hybrid model.
- use torch.Tensors for the GLWE backend
- keep the tensor device consistent
- add a Torch quantizer
Memory optimization:
- Removes the ONNX model for quantized models built for fully linear layers in the hybrid model
- Remove calibration data once hybrid model calibration is done
Upgrades the use-case example CI to Ubuntu 22 (python 3.10)
Improves the LLAMA lora fine tuning use case (adds FHE execution)

Closes https://github.com/zama-ai/concrete-ml-internal/issues/4682

github-actions · 2025-01-04T16:46:18Z

⚠️ Known flaky tests have been rerun ⚠️

One or several tests initially failed but were identified as known flaky. tests. Therefore, they have been rerun and passed. See below for more details.

Failed tests details

Known flaky tests that initially failed:

tests/torch/test_compile_torch.py::test_compile_torch_or_onnx_conv_networks[False-True-CNN_conv1d-relu]

github-actions · 2025-01-04T16:46:20Z

Coverage passed ✅

Coverage details

---------- coverage: platform linux, python 3.8.18-final-0 -----------
Name    Stmts   Miss  Cover   Missing
-------------------------------------
TOTAL    8543      0   100%

63 files skipped due to complete coverage.

kcelia · 2025-01-06T09:37:44Z

src/concrete/ml/quantization/post_training.py

@@ -730,14 +730,18 @@ def _quantize_layers(self, *input_calibration_data: numpy.ndarray):
                node_results[output_name] = node_output[0]
                constants.add(output_name)

-    def quantize_module(self, *calibration_data: numpy.ndarray) -> QuantizedModule:
+    def quantize_module(
+        self, *calibration_data: numpy.ndarray, keep_onnx: Optional[bool] = True


you mean keep_onnx: Optional[bool] = None ? or keep_onnx: bool = True

kcelia · 2025-01-06T09:42:55Z

src/concrete/ml/quantization/quantizers.py

@@ -686,6 +691,8 @@ def quant(self, values: numpy.ndarray) -> numpy.ndarray:
        assert self.offset is not None
        assert self.scale is not None

+        assert dtype in (numpy.int64, numpy.int32, numpy.float32, numpy.float64)


maybe:

valid_dtypes = (numpy.int64, numpy.int32, numpy.float32, numpy.float64) assert dtype in valid_dtypes, f"Invalid dtype: `{dtype}`. Expected one of {valid_dtypes}."

kcelia · 2025-01-06T10:34:01Z

use_case_examples/lora_finetuning/GPT2FineTuneHybrid.ipynb

+      "\u001b[0;31mNotImplementedError\u001b[0m: GLWE backend deployment is not yet supported"
+     ]
+    }
+   ],


~~There are 2 errors in this notebook~
NotImplementedError: GLWE backend deployment is not yet supported
AssertionError: assert self.private_key is not None`

I fixed and re-ran this notebook in #969

Got it: https://zama-ai.slack.com/archives/C0455QDGT6C/p1736158348432659?thread_ts=1736152797.642209&cid=C0455QDGT6C

cla-bot bot added the cla-signed label Dec 18, 2024

Base automatically changed from llama_fine_tuning to main December 19, 2024 15:33

andrei-stoian-zama added 3 commits December 20, 2024 17:18

fix: full gpu hybrid model

1ace32e

fix: typing

08a9038

fix: refactor quantizer

2a326f3

andrei-stoian-zama force-pushed the chore/optimize_mem_and_runtime_fhe_disable branch from 43601e1 to 2a326f3 Compare December 31, 2024 09:42

andrei-stoian-zama added 4 commits December 31, 2024 14:31

fix: working torch quantizer

5fd4db8

chore: batch inputs to Lora, GPU optim

8865156

fix: pcc

f00bfc2

fix: multiple

e32d9c2

andrei-stoian-zama marked this pull request as ready for review January 3, 2025 20:06

andrei-stoian-zama requested a review from a team as a code owner January 3, 2025 20:06

andrei-stoian-zama added 9 commits January 3, 2025 21:45

fix: pcc

d436337

fix: ci

c25f7a3

fix: ci

bb672fd

fix: ci

3dce9c8

fix: ci

5f7f78e

fix: bugs

f2f6566

fix: coverage

42b9721

fix: tests

e2ac202

fix: makefile usecases

4a63088

andrei-stoian-zama requested a review from jfrery January 4, 2025 18:01

andrei-stoian-zama mentioned this pull request Jan 5, 2025

chore: update lora notebook + fix order inputs in llama #967

Closed

kcelia reviewed Jan 6, 2025

View reviewed changes

jfrery approved these changes Jan 6, 2025

View reviewed changes

andrei-stoian-zama merged commit b446cb2 into main Jan 6, 2025
20 of 21 checks passed

andrei-stoian-zama deleted the chore/optimize_mem_and_runtime_fhe_disable branch January 6, 2025 15:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: full gpu hybrid model #963

fix: full gpu hybrid model #963

andrei-stoian-zama commented Dec 18, 2024 •

edited

Loading

github-actions bot commented Jan 4, 2025

Known flaky tests that initially failed:

github-actions bot commented Jan 4, 2025

kcelia Jan 6, 2025 •

edited

Loading

kcelia Jan 6, 2025

kcelia Jan 6, 2025 •

edited

Loading

andrei-stoian-zama Jan 6, 2025

kcelia Jan 6, 2025

fix: full gpu hybrid model #963

fix: full gpu hybrid model #963

Conversation

andrei-stoian-zama commented Dec 18, 2024 • edited Loading

github-actions bot commented Jan 4, 2025

⚠️ Known flaky tests have been rerun ⚠️

Known flaky tests that initially failed:

github-actions bot commented Jan 4, 2025

Coverage passed ✅

kcelia Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

kcelia Jan 6, 2025

Choose a reason for hiding this comment

kcelia Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

andrei-stoian-zama Jan 6, 2025

Choose a reason for hiding this comment

kcelia Jan 6, 2025

Choose a reason for hiding this comment

andrei-stoian-zama commented Dec 18, 2024 •

edited

Loading

kcelia Jan 6, 2025 •

edited

Loading

kcelia Jan 6, 2025 •

edited

Loading