Inference Checkpoints #4620

cmikeh2 · 2023-11-04T20:31:03Z

No description provided.

Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Connor Holmes <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]> Co-authored-by: Logan Adams <[email protected]>

RezaYazdaniAminabadi · 2023-11-07T22:41:52Z

deepspeed/inference/v2/inference_parameter.py

+        Create the inference parameter.
+        """
+        param = InferenceParameter(core_param)
+        param._aux_attrs = kwargs


@cmikeh2, can you please clarify what the aux_attr is used for?

are you maybe thinking about scales that are required when adding in the quantization?

That's exactly right, scales for quantization or anything other metadata we create when transforming the parameter can be stored as an auxiliary attr. The declaration for something like that would be:

p = InferenceParameter.initialize(param, scales=scales) assert torch.equal(p, param) assert torch.equal(p.scales, scales)

RezaYazdaniAminabadi · 2023-11-07T22:43:22Z

deepspeed/inference/v2/model_implementations/common_parameters/embedding_parameters.py

    def finalize(self) -> torch.Tensor:
-        return self.params
-        #return self.inference_model.transform_embed_param(self.params)
+        print("EmbeddingParameter.finalize")


Do we want to remove the debugging code here?

RezaYazdaniAminabadi · 2023-11-07T22:44:34Z

deepspeed/inference/v2/model_implementations/flat_model_helpers.py

+    """
+    dtype: str
+    shape: Tuple[int, ...]
+    strides: Tuple[int, ...]


what is the stride used for here?

If we ever have some kind of non-contiguous parameter, this will make sure we return it correctly as is rather than silently changing the underlying storage. It shouldn't be the common case, but is low overhead to support and might remove some weird bugs in the future.

RezaYazdaniAminabadi · 2023-11-07T23:11:42Z

@cmikeh2, I am seeing that you are flattening the parameters and use a big allocated memory for it, but I don't see where the saving of the data is happening so that we can later load it! Is it as simple as torch.save the injected model with the new inference-engine and reload it back?

RezaYazdaniAminabadi · 2023-11-07T23:14:51Z

deepspeed/inference/v2/model_implementations/flat_model_helpers.py

+                setattr(layer_container, p_name, None)
+                continue
+
+            dummy_tensor = torch.empty([], dtype=STR_TO_DTYPE[p_metadata.core_param.dtype])


can we not pass just the string to figure out the dtype in the alloc_fn rather than creating a dummy tensor with it?

Strings would also work. The dummy tensor has no storage, and I'd really prefer to just pass the torch.dtype, but I couldn't figure out the correct argument for that. The whole binding around blob is kind of ugly, but when I tried using UntypedStorage which is supposedly the correct way there were phantom allocations I was never able to track down.

cmikeh2 · 2023-11-07T23:32:51Z

@cmikeh2, I am seeing that you are flattening the parameters and use a big allocated memory for it, but I don't see where the saving of the data is happening so that we can later load it! Is it as simple as torch.save the injected model with the new inference-engine and reload it back?

The save is exposed as a serialize method on the engine here:

DeepSpeed/deepspeed/inference/v2/engine_v2.py

Line 222 in 0faea35

def serialize(self, save_path: str) -> None:

. It's mostly just going to save a couple of configs and the large tensor though.

The load codepath is here:

DeepSpeed/deepspeed/inference/v2/model_implementations/inference_policy_base.py

Line 208 in 0faea35

buffer = torch.load(buffer_path)

.

RezaYazdaniAminabadi · 2023-11-09T18:19:12Z

deepspeed/inference/v2/engine_factory.py

-        policy = MistralPolicy(checkpoint_engine, model_config)
+    if os.path.exists(os.path.join(path, "ds_model_config.pkl")):
+
+        # Load metadata, for grabbing the policy name we'll have all ranks just check for


I think this part needs to be supported in a higher abstraction layer, as it is not just related to HF models and we should be able to use it with different checkpoint formats.

…ft/DeepSpeed into cholmes/checkpoints-inference-v2

jeffra and others added 6 commits November 3, 2023 10:24

DeepSpeed-FastGen (#622)

49cea1a

Co-authored-by: Michael Wyatt <[email protected]> Co-authored-by: Ammar Ahmad Awan <[email protected]> Co-authored-by: Connor Holmes <[email protected]> Co-authored-by: Masahiro Tanaka <[email protected]> Co-authored-by: Logan Adams <[email protected]>

Merge branch 'master' into staging-inference-v2-5

74b6f76

Isolate src code and tests for faster CI

61ad33d

Local unit tests passing

c1b1c2a

Merge branch 'master' into cholmes/isolate-src-code

32b3cbd

Functional PoC

5ee1dd6

cmikeh2 requested review from RezaYazdaniAminabadi, jeffra, mrwyattii, awan-10, arashb, tjruwase and loadams as code owners November 4, 2023 20:31

Merge branch 'master' into cholmes/checkpoints-inference-v2

aff9b4c

RezaYazdaniAminabadi reviewed Nov 7, 2023

View reviewed changes

Merge branch 'master' into cholmes/checkpoints-inference-v2

0faea35

RezaYazdaniAminabadi reviewed Nov 9, 2023

View reviewed changes

cmikeh2 and others added 8 commits November 9, 2023 21:02

Typing cleanup, remove private access

07a1c93

fix a few things with writing and loading the new checkpoint

75b0e89

Merge branch 'cholmes/checkpoints-inference-v1' of github.com:microso…

573c21b

…ft/DeepSpeed into cholmes/checkpoints-inference-v2

Add documentation on local methods

be3a27a

Scope out changes in unit tests from the CI

04e6907

Fix failing unit tests, remove duplicates

d3434b3

refactor build_hf_engine and expose a new engine-creation

27ba203

small fix

91385d2

cmikeh2 added 5 commits November 10, 2023 02:12

Unit test implementation

89566b4

Formatting

295fa61

Remove debug code

9b0bcca

Remove unused files

c65aebc

Engine factor documentation

ee87ca7

cmikeh2 changed the base branch from master to cholmes/checkpoints-inference-v2-2 November 10, 2023 02:31

cmikeh2 changed the base branch from cholmes/checkpoints-inference-v2-2 to master November 10, 2023 02:32

Merge branch 'master' into cholmes/checkpoints-inference-v2

b12d6be

cmikeh2 changed the base branch from master to cholmes/checkpoints-inference-v2-2 November 10, 2023 02:34

cmikeh2 merged commit 19b2587 into cholmes/checkpoints-inference-v2-2 Nov 10, 2023
15 of 16 checks passed

cmikeh2 deleted the cholmes/checkpoints-inference-v2 branch November 10, 2023 02:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference Checkpoints #4620

Inference Checkpoints #4620

cmikeh2 commented Nov 4, 2023

RezaYazdaniAminabadi Nov 7, 2023

RezaYazdaniAminabadi Nov 7, 2023

cmikeh2 Nov 7, 2023

RezaYazdaniAminabadi Nov 7, 2023

RezaYazdaniAminabadi Nov 7, 2023

cmikeh2 Nov 7, 2023

RezaYazdaniAminabadi commented Nov 7, 2023

RezaYazdaniAminabadi Nov 7, 2023

cmikeh2 Nov 7, 2023

cmikeh2 commented Nov 7, 2023

RezaYazdaniAminabadi Nov 9, 2023

Inference Checkpoints #4620

Inference Checkpoints #4620

Conversation

cmikeh2 commented Nov 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RezaYazdaniAminabadi commented Nov 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmikeh2 commented Nov 7, 2023

Choose a reason for hiding this comment