Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VM][DMLC] Lower memory usage when loading and dumping weights #13877

Merged
merged 8 commits into from
Feb 2, 2023

Conversation

AndrewZhaoLuo
Copy link
Contributor

Right now there is a bad pattern in VM executable where when loading weights, we load serialized representation in memory, and then deserialize off the in-memory store without progressively freeing memory.

This is bad because if our weights take up ~ 5GB, then the serialized representation in memory takes up 5GB and the deserialized representation will take ~ 5 GB too. This means peak memory use for using the VM for execution is 2 * the size of the weight models.

This is bad, especially with some of the larger models out there today.

This fixes thing by using a stream from disk, and depending on the standard C file interface to buffer things for performant results.

Some before and after graphs though loading and benchmarking a model with ~5GB weights:

Before:

image

After:

image

This is a draft since:

  • I've only tested loading weights, but we can see similar savings in other similar streams.
  • We need to make a decision on DMLC stream interface. The main issue is that a lot of existing code depends on DMLC stream interface, but DMLC itself is a header only library. We only have access to in-memory streams in the current state. The way I have gotten around this is by implementing a simple class.
  • We need to decide best way forward. The one in this PR is simple, though technically duplicates some code from DMLC core lib
  • Alternatives are including DMLC as dependency, adding to DMLC functionality and pulling those things changes, or get rid of DMLC stream interface entirely
  • This one is the simplest which is why I will do this for the draft.

@tvm-bot
Copy link
Collaborator

tvm-bot commented Jan 31, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

  • No users to tag found in teams: vm, dmlc See #10317 for details

Generated by tvm-bot

@AndrewZhaoLuo AndrewZhaoLuo changed the title [VM] Lower memory usage when loading and dumping weights [VM][DMLC] Lower memory usage when loading and dumping weights Jan 31, 2023
@AndrewZhaoLuo
Copy link
Contributor Author

cc @tqchen @jwfromm

@tqchen
Copy link
Member

tqchen commented Jan 31, 2023

The approach of having overload file support util is fine, one thing is that it would needs to be part of the runtime folder as it is simple enough.

Given most of the cases are on GPU, having ability to be able to load one array into CPU, copy into GPU then immediately free that CPU array can also be effective.

@AndrewZhaoLuo AndrewZhaoLuo marked this pull request as ready for review February 1, 2023 18:41
@AndrewZhaoLuo
Copy link
Contributor Author

@tqchen thanks for the comments.

PTAL, ready for review.

Copy link
Member

@tqchen tqchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AndrewZhaoLuo , one minor comment

include/tvm/runtime/dmlc_file_stream.h Outdated Show resolved Hide resolved
@AndrewZhaoLuo AndrewZhaoLuo merged commit 9008ec2 into apache:main Feb 2, 2023
AndrewZhaoLuo added a commit that referenced this pull request Feb 10, 2023
* initial commit

* update additional use cases

* typo

* asf header, summary

* clean up

* lint

* move code to src/runtime/file_utils.h

* file utils is cool
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants