Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenCLML] Reactor and introduce on chip memory and memory planner #14922

Merged
merged 5 commits into from
Jun 5, 2023

Conversation

srkreddy1238
Copy link
Contributor

Introduced thread context with CLMLWorkspace.
Organized the code as runtime, utils and memory planners Introcuded recording queue support and on chip memory support. On chip memory allocation planner to acommodate multiple tensors at a time. DDR memory planner introduced to reuse the underlaying memory across multiple tensor descriptors.

Dense layer support refactored to use GEMM.
CLML binary operators doesn't support broadcasting. Hence introduced an explicite broadcast op as a work around.

clml SDK codegen is enhanced accordingly.

@tvm-bot
Copy link
Collaborator

tvm-bot commented May 23, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

  • No users to tag found in teams: openclml See #10317 for details

Generated by tvm-bot

@srkreddy1238 srkreddy1238 changed the title Reactor and introduce on chip memory and memory planner [OpenCLML] Reactor and introduce on chip memory and memory planner May 23, 2023
@srkreddy1238 srkreddy1238 force-pushed the clml_rqueues_on_chip branch 4 times, most recently from 9da9bd2 to 676d4e6 Compare May 24, 2023 10:05
Introduced thread context with CLMLWorkspace.
Organized the code as runtime, utils and memory planners
Introcuded recording queue support and on chip memory support.
On chip memory allocation planner to acommodate multiple tensors at a time.
DDR memory planner introduced to reuse the underlaying memory across
multiple tensor descriptors.

Dense layer support refactored to use GEMM.
CLML binary operators doesn't support broadcasting. Hence introduced an explicite
broadcast op as a work around.

clml SDK codegen is enhanced accordingly.
@srkreddy1238 srkreddy1238 force-pushed the clml_rqueues_on_chip branch from 676d4e6 to 3bf7e63 Compare May 24, 2023 14:22
@srkreddy1238 srkreddy1238 requested a review from echuraev May 25, 2023 03:49
Copy link
Contributor

@echuraev echuraev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to add new tests for memory planner?

python/tvm/relay/op/contrib/clml.py Outdated Show resolved Hide resolved
python/tvm/relay/op/contrib/clml.py Show resolved Hide resolved
src/runtime/contrib/clml/clml_memory_planner.cc Outdated Show resolved Hide resolved
src/runtime/contrib/clml/clml_memory_planner.cc Outdated Show resolved Hide resolved
src/runtime/contrib/clml/clml_utils.cc Outdated Show resolved Hide resolved
src/runtime/contrib/clml/clml_runtime.cc Outdated Show resolved Hide resolved
@srkreddy1238
Copy link
Contributor Author

Is it necessary to add new tests for memory planner?

We definitely need few test cases. Let me find a way of exposing the plan to verify externally.

@echuraev
Copy link
Contributor

Probably you can take a look at the OpenCL tests: https://github.com/apache/tvm/blob/main/tests/cpp-runtime/opencl/opencl_texture_pool_test.cc

@srkreddy1238
Copy link
Contributor Author

@echuraev have you ever built gtests (opencl-cpptest bin) for Android ?

@srkreddy1238 srkreddy1238 force-pushed the clml_rqueues_on_chip branch from 3e02f6e to ffb3f82 Compare May 31, 2023 07:09
@echuraev
Copy link
Contributor

echuraev commented May 31, 2023

@echuraev have you ever built gtests (opencl-cpptest bin) for Android ?

Yes, I did it. You can build opencl-cpptest by the following commands:

$ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK}/build/cmake/android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-23 -DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_OPENCL=ON -DUSE_CPP_RPC=ON -DUSE_HEXAGON_SDK=OFF -DUSE_OPENCL_GTEST=/path/to/googletests ..

$ make -j8 opencl-cpptest

@srkreddy1238 srkreddy1238 requested a review from echuraev June 1, 2023 02:12
cmake/modules/OpenCL.cmake Show resolved Hide resolved
tests/cpp-runtime/opencl/clml_memory_planner.cc Outdated Show resolved Hide resolved
tests/cpp-runtime/opencl/clml_memory_planner.cc Outdated Show resolved Hide resolved
using namespace tvm::runtime;
using namespace tvm::runtime::cl;

void InitMemoryPlan(tvm::runtime::contrib::CachedLayer& layer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that in class CLMLRuntime you have almost the same methods. Probably it is better to test them? You can change private to protected and inherit CLMLRuntime class in your test class. It should be like a test wrapper above CLMLRuntime class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only InitMemoryPlan can be reused but PlanMemory has dependency with JSONRuntime nodes. initializing CLMLRuntime here requires Json graph and it's dependents. Hence I tried isolating the test environment within "CahcedLayer``` object. Let me see how much I can reuse from CLMLRuntime.

Copy link
Contributor

@echuraev echuraev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@echuraev echuraev merged commit 1366f2e into apache:main Jun 5, 2023
junrushao pushed a commit to junrushao/tvm that referenced this pull request Jun 22, 2023
…pache#14922)

* Reactor and introduce in chip memory and memory planner

Introduced thread context with CLMLWorkspace.
Organized the code as runtime, utils and memory planners
Introcuded recording queue support and on chip memory support.
On chip memory allocation planner to acommodate multiple tensors at a time.
DDR memory planner introduced to reuse the underlaying memory across
multiple tensor descriptors.

Dense layer support refactored to use GEMM.
CLML binary operators doesn't support broadcasting. Hence introduced an explicite
broadcast op as a work around.

clml SDK codegen is enhanced accordingly.

* * review comments

* * Memory planner cpp_runtime tests.

* * gtest build rules while in android environments.

* * review comments

---------

Co-authored-by: Siva Rama Krishna Reddy B <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants