[VM][DMLC] Lower memory usage when loading and dumping weights #13877

AndrewZhaoLuo · 2023-01-31T00:45:14Z

Right now there is a bad pattern in VM executable where when loading weights, we load serialized representation in memory, and then deserialize off the in-memory store without progressively freeing memory.

This is bad because if our weights take up ~ 5GB, then the serialized representation in memory takes up 5GB and the deserialized representation will take ~ 5 GB too. This means peak memory use for using the VM for execution is 2 * the size of the weight models.

This is bad, especially with some of the larger models out there today.

This fixes thing by using a stream from disk, and depending on the standard C file interface to buffer things for performant results.

Some before and after graphs though loading and benchmarking a model with ~5GB weights:

Before:

After:

This is a draft since:

I've only tested loading weights, but we can see similar savings in other similar streams.
We need to make a decision on DMLC stream interface. The main issue is that a lot of existing code depends on DMLC stream interface, but DMLC itself is a header only library. We only have access to in-memory streams in the current state. The way I have gotten around this is by implementing a simple class.
We need to decide best way forward. The one in this PR is simple, though technically duplicates some code from DMLC core lib
Alternatives are including DMLC as dependency, adding to DMLC functionality and pulling those things changes, or get rid of DMLC stream interface entirely
This one is the simplest which is why I will do this for the draft.

tvm-bot · 2023-01-31T00:45:19Z

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

No users to tag found in teams: vm, dmlc _{See #10317 for details}

_{Generated by tvm-bot}

AndrewZhaoLuo · 2023-01-31T00:45:40Z

cc @tqchen @jwfromm

tqchen · 2023-01-31T23:00:01Z

The approach of having overload file support util is fine, one thing is that it would needs to be part of the runtime folder as it is simple enough.

Given most of the cases are on GPU, having ability to be able to load one array into CPU, copy into GPU then immediately free that CPU array can also be effective.

AndrewZhaoLuo · 2023-02-01T18:42:33Z

@tqchen thanks for the comments.

PTAL, ready for review.

tqchen

Thanks @AndrewZhaoLuo , one minor comment

include/tvm/runtime/dmlc_file_stream.h

* initial commit * update additional use cases * typo * asf header, summary * clean up * lint * move code to src/runtime/file_utils.h * file utils is cool

AndrewZhaoLuo added 2 commits January 30, 2023 16:05

initial commit

cf9d455

update additional use cases

c7d7622

AndrewZhaoLuo changed the title ~~[VM] Lower memory usage when loading and dumping weights~~ [VM][DMLC] Lower memory usage when loading and dumping weights Jan 31, 2023

AndrewZhaoLuo added 2 commits January 30, 2023 16:47

typo

a439a8a

asf header, summary

4e78807

AndrewZhaoLuo added 2 commits February 1, 2023 10:27

clean up

2a12aca

lint

5e92c88

AndrewZhaoLuo marked this pull request as ready for review February 1, 2023 18:41

tqchen reviewed Feb 1, 2023

View reviewed changes

include/tvm/runtime/dmlc_file_stream.h Outdated Show resolved Hide resolved

AndrewZhaoLuo added 2 commits February 1, 2023 12:25

move code to src/runtime/file_utils.h

2c0b23f

file utils is cool

42d7a37

tqchen approved these changes Feb 1, 2023

View reviewed changes

AndrewZhaoLuo merged commit 9008ec2 into apache:main Feb 2, 2023

masahi mentioned this pull request Feb 28, 2023

[Runtime] Fix high RAM usage when saving / loading paramters of big models #14147

Merged

ysh329 mentioned this pull request Apr 17, 2023

[Release] v0.12.0 Release Candidate Notes #14645

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VM][DMLC] Lower memory usage when loading and dumping weights #13877

[VM][DMLC] Lower memory usage when loading and dumping weights #13877

AndrewZhaoLuo commented Jan 31, 2023

tvm-bot commented Jan 31, 2023 •

edited

Loading

AndrewZhaoLuo commented Jan 31, 2023

tqchen commented Jan 31, 2023

AndrewZhaoLuo commented Feb 1, 2023

tqchen left a comment

[VM][DMLC] Lower memory usage when loading and dumping weights #13877

[VM][DMLC] Lower memory usage when loading and dumping weights #13877

Conversation

AndrewZhaoLuo commented Jan 31, 2023

tvm-bot commented Jan 31, 2023 • edited Loading

AndrewZhaoLuo commented Jan 31, 2023

tqchen commented Jan 31, 2023

AndrewZhaoLuo commented Feb 1, 2023

tqchen left a comment

Choose a reason for hiding this comment

tvm-bot commented Jan 31, 2023 •

edited

Loading