Add Relay option to link parameters into runtime Modules #6917

areusch · 2020-11-13T21:55:40Z

Implements RFC: https://discuss.tvm.apache.org/t/rfc-linked-parameters-for-cpu-targets/8452

This PR implements changes to support generating Modules that contain parameters pre-linked. For CPU contexts, these modules don't require any copying before launching inference via GraphRuntime. Both C++ and C runtimes are supported.

* Some platforms need to use an alternate printf() to support basic things like %zu. Since %zu is platform-specific, we prefer to use a printf() that supports it or allow the platform to fix it up as needed.

areusch · 2020-11-23T23:56:04Z

@kparzysz-quic addressed those comments.
@manupa-arm @tqchen @ZihengJiang please take another look when you have a minute and explicit approve if you're ok with this change.

tqchen · 2020-11-24T00:51:49Z

src/target/source/codegen_params.cc

+      os.fill(' ');
+      os.setf(std::ios::left, std::ios::adjustfield);
+      if (arr_type.bits() == 32) {
+        PrintArray<float>(tensor->dl_tensor.data, num_elements, one_element_size_bytes,


I still think it is easier to use uint to print the case of float in most of our cases. It also helps us to get around the problem of printing exp and mantisa

I mean, the printing problem is just an alignment thing. it doesn't affect the functionality. I don't see why adding an implicit cast to generated code that the compiler will never even know about is easier? because remember, we are not going to add the cast into the generated function that is accessing these floats--it is just going to get a float* and then by generating uint32_t here, the float* will magically access uint32_t data. I agree with you that it doesn't seeeem like anything would break too badly given they are the same size, but it's kind of a long limb to go out on just to support bfloat16. we would also need to do byte ordering.

To put it in another way. We will need support for bfloat16 and fp16 now given that they are already part of the type system. Addtiionally, there might also be need to support customized data types.

The main reason to go for the uint repr is its bit accurate precision(we don't need to worry about printing out INF, nan, or loss of accuracy due to printing and parsing of floating pt -- although partly addressed by hex printing, still somewhat complicated) and simplicity in implementation.

Additionally, for big endian machines and small endian machines we are already using the bits as the only information in our runtime https://github.com/apache/tvm/blob/main/include/tvm/runtime/ndarray.h#L429, which means our runtime won't simply work beyond such cases. Right now we also did not see an example that goes beyond these two cases.

If we do think it is important to keep float and double printing as their own type. I think we could do that, but need to make sure the rest of the codepath uses uint so that it works for bfloat16, fp16 and other potential extension data types now.

okay, as discussed i've added support for float16 and bfloat16. left the rest of the float printing as it was before.

tqchen · 2020-11-24T00:53:46Z

I still think we can print all values by their bits and print via uint. Unless there is a very strange endian being used. It would also helps us to support types that are not supported by LLVM natively, like bfloat16

areusch · 2020-11-24T01:10:28Z

@tqchen it seems like the main benefit is supporting bfloat16...i'd like to argue that's out of scope for this PR. further, we don't really have a big-endian board handy to test against anyway. I don't think that change would take a long time to implement, but we don't have a diverse enough test fleet now to have good confidence in it.

pyproject.toml

manupak

I did a second pass. Thanks for breaking out some of it as a separate PR.
Some comments and some of it might end up being clarifications.

manupak · 2020-11-24T11:52:22Z

src/target/source/codegen_c_host.cc

+           << "        ((uint64_t*)out_ret_value)[0] = (uint64_t) (uintptr_t) "
+           << ::tvm::runtime::symbol::tvm_param_prefix << kv.first << ";\n"
+           << "        out_ret_tcode[0] = " << kTVMOpaqueHandle << ";\n"
+           << "        return 0;\n";


Don't we need a default here ? prolly to give a meaningful error ?

it would be nice to return a specific error code here (this was my original implementation), but unfortunately it's not easy to catch on the other side. we need to do a bit more legwork to be able to catch specific function return values in c++/python. as it stands, returning nullptr from this function is specific enough, and has the benefit that an exception flow isn't triggered by default for non-parameters.

manupak · 2020-11-24T11:54:20Z

src/target/source/codegen_c_host.cc

+                << "}  // extern \"C\"\n"
+                << "#endif\n";
+    stream << "    case " << kv.second->id << ":\n"
+           << "        ((uint64_t*)out_ret_value)[0] = (uint64_t) (uintptr_t) "


Looks like a copy is introduced (Correct me if Im wrong), which is something we would want to avoid if possible in memory constrained devices.

it's just copying the pointer here. I believe that should be ok?

Oh ok. So kv,first is the array's name, right ? Any reason why is it casted to uint64_t ?

correct. ordinarily we would cast to void*, but since void* may be less than 64 bits, we cast to uint64_t which is the largest type stored in TVMValue.

I still think void* should be enough here. void* would be less than 64 bits if the machine compiled for is less than 64bits. right? otherwise this will make it store the pointer in 64bit space where the machine could have a address space defined less than 64 bits, unless Im missing something here. In this specific case, the array defined for the constant will have an address depending the target machine it would be compiled for, hence my confusion.

Do we need to perform pointer arithmetic on this ? -- Im asking this because uintptr_t cast is used. If so, using uintptr_t should be sufficient instead of void*.

I guess this argument will break if TVMValue already uses 64bit spaces for addresses irrespective of what we do here.

TVMValue is always 64 bits wide, though, because it's a union including int64_t. so when addressing it as an array, need to cast appropriately. i'm not sure if the uintptr_t cast is excessive--quite possibly it is.

Alright then. I will take your word for it. :). See if you can drop uintptr_t if that is not required given that it is going to be written to 64bit wide location anyway.

manupak · 2020-11-24T12:09:12Z

src/target/source/codegen_params.cc

+  }
+}
+
+void NDArrayDataToC(::tvm::runtime::NDArray arr, int indent_chars, std::ostream& os) {


It would be great if a brief could be added to show how a simple NDArray would get converted and appear in the source.

documented this function in codegen_params.h and added an example there.

manupak · 2020-11-24T12:30:41Z

src/target/source/codegen_params.cc

+  }
+}
+
+void NDArrayDataToC(::tvm::runtime::NDArray arr, int indent_chars, std::ostream& os) {


[Clarification] Can we use a raw binary string (e.g. "\xeb\x2a) to represent the content here ? Do we need to show the element wise break down? I would reckon DLTensors would anyway take it as a void* right ?

we could. the main thing this drags in is whether we should be determining things like the byte order and alignment of the target machine, or leave that to the compiler (LLVM or in this case, external C). i've been so far arguing not to do any of this--I don't really see a tangible benefit and it adds considerable complexity at this level that's hard to test. also, it makes it quite easy to inspect the generated parameters, which will often differ from the parameters originally supplied to relay.build().

tqchen

The NDArray deletion part LGTM, see more discussions on the printing repr and whether or not w should use uint

tqchen · 2020-11-24T16:12:14Z

src/target/source/codegen_params.cc

+    num_elements *= shape_elem;
+  }
+
+  std::unique_ptr<DLManagedTensor, DLManagedTensorDeleter> tensor(arr.ToDLPack());


We don't need ToDLPack, and can directly get DLTensor* from the arr.operator->

tqchen · 2020-11-24T16:24:53Z

src/target/source/codegen_params.cc

+      os.fill(' ');
+      os.setf(std::ios::left, std::ios::adjustfield);
+      if (arr_type.bits() == 32) {
+        PrintArray<float>(tensor->dl_tensor.data, num_elements, one_element_size_bytes,


To put it in another way. We will need support for bfloat16 and fp16 now given that they are already part of the type system. Addtiionally, there might also be need to support customized data types.

The main reason to go for the uint repr is its bit accurate precision(we don't need to worry about printing out INF, nan, or loss of accuracy due to printing and parsing of floating pt -- although partly addressed by hex printing, still somewhat complicated) and simplicity in implementation.

Additionally, for big endian machines and small endian machines we are already using the bits as the only information in our runtime https://github.com/apache/tvm/blob/main/include/tvm/runtime/ndarray.h#L429, which means our runtime won't simply work beyond such cases. Right now we also did not see an example that goes beyond these two cases.

If we do think it is important to keep float and double printing as their own type. I think we could do that, but need to make sure the rest of the codepath uses uint so that it works for bfloat16, fp16 and other potential extension data types now.

tqchen · 2020-11-25T14:07:22Z

src/runtime/graph/graph_runtime.cc

+void GraphRuntime::LinkedNDArrayDeleter(Object* container) {
+  // container is the NDArray::Container which needs to get deleted.
+  // The data member points to global const memory, so it does not need deleting.
+  delete reinterpret_cast<NDArray::Container*>(container);


Should be static cast instead, since NDArray::Container* is a subclass of Object*, this can cause bug when Container and Object class have a different beginning offset due to subclassing and layout

ah, thanks. done.

manupak

Mostly LGTM bar others' pending comments.

tqchen · 2020-11-25T21:45:00Z

LGTM now, @FrozenGene please take another look

tqchen · 2020-11-26T14:09:59Z

Thanks @areusch @ZihengJiang @FrozenGene @kparzysz-quic @manupa-arm

* refactor RPCSessionContext utils * Make TVMLogf platform-independent. * Some platforms need to use an alternate printf() to support basic things like %zu. Since %zu is platform-specific, we prefer to use a printf() that supports it or allow the platform to fix it up as needed.

areusch added 22 commits November 13, 2020 10:38

refactor RPCSessionContext utils

40e5672

Make TVMLogf platform-independent.

99ef7e4

* Some platforms need to use an alternate printf() to support basic things like %zu. Since %zu is platform-specific, we prefer to use a printf() that supports it or allow the platform to fix it up as needed.

test pass, make runtime part work (wip)

b9db147

llvm and c backends work!

8d62592

switch to floating point hex

e0259b0

c backend works works

cb7c001

crt tests work

bbdfd3d

CRT works!

1afa10e

make stm repo work (half done)

b85d90f

works-ish on micro

bbb6e80

final changes for link-params

6e19b25

missed stuff

22a587c

git-clang-format

f7b15b7

black format

ef6e14f

git-clang-format again

6d6aa66

address c++ lint

261eda7

git-clang-format

c0d2c0d

rm extra comments

601616a

git-clang-format

cf22894

pylint

ad5837e

Merge remote-tracking branch 'origin/main' into linked-params

69f127b

pylint again

154bf5f

tqchen added the status: need review label Nov 14, 2020

areusch added 7 commits November 13, 2020 17:16

rm debugging breaking build

4d9fc2e

fix incorrect parameter passing in GraphRuntimeModule

891ccf5

fixes for LLVM 4.0 and i386

df132fa

set default for --link-params

8bb51e4

switch link order for proper library symbol resolution

03432d2

git-clang-format

b13472a

black format + pylint

02d9744

areusch added 4 commits November 23, 2020 15:10

address kparzysz comments

e4296ef

git-clang-format

a05871f

cpplint

dd862fc

fix compile bugs on linux

b08e24f

tqchen requested changes Nov 24, 2020

View reviewed changes

FrozenGene requested changes Nov 24, 2020

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

areusch added 2 commits November 23, 2020 20:45

revert pyproject, address tqchen, kparzysz comments

bcbeda4

git-clang-format

bf42077

manupak reviewed Nov 24, 2020

View reviewed changes

tqchen requested changes Nov 24, 2020

View reviewed changes

areusch added 3 commits November 24, 2020 13:48

address tqchen, others' comments

883c878

git-clang-format

4400a34

remove fls, which isn't widely available

f53c2e2

tqchen requested changes Nov 25, 2020

View reviewed changes

address tqchen comments

754cf35

manupak approved these changes Nov 25, 2020

View reviewed changes

tqchen approved these changes Nov 25, 2020

View reviewed changes

FrozenGene approved these changes Nov 26, 2020

View reviewed changes

tqchen added status: accepted and removed status: need review labels Nov 26, 2020

tqchen merged commit 81d9f11 into apache:main Nov 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Relay option to link parameters into runtime Modules #6917

Add Relay option to link parameters into runtime Modules #6917

areusch commented Nov 13, 2020 •

edited

Loading

areusch commented Nov 23, 2020

tqchen Nov 24, 2020

areusch Nov 24, 2020

tqchen Nov 24, 2020

areusch Nov 24, 2020

tqchen commented Nov 24, 2020

areusch commented Nov 24, 2020

manupak left a comment

manupak Nov 24, 2020

areusch Nov 24, 2020

manupak Nov 24, 2020

areusch Nov 24, 2020

manupak Nov 24, 2020 •

edited

Loading

areusch Nov 24, 2020

manupak Nov 25, 2020 •

edited

Loading

areusch Nov 25, 2020

manupak Nov 25, 2020 •

edited

Loading

manupak Nov 24, 2020

areusch Nov 24, 2020

manupak Nov 24, 2020

areusch Nov 24, 2020

tqchen left a comment

tqchen Nov 24, 2020

areusch Nov 24, 2020

tqchen Nov 24, 2020

tqchen Nov 25, 2020 •

edited

Loading

areusch Nov 25, 2020

manupak left a comment

tqchen commented Nov 25, 2020

tqchen commented Nov 26, 2020

Add Relay option to link parameters into runtime Modules #6917

Add Relay option to link parameters into runtime Modules #6917

Conversation

areusch commented Nov 13, 2020 • edited Loading

areusch commented Nov 23, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Nov 24, 2020

areusch commented Nov 24, 2020

manupak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manupak Nov 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manupak Nov 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manupak Nov 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen Nov 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manupak left a comment

Choose a reason for hiding this comment

tqchen commented Nov 25, 2020

tqchen commented Nov 26, 2020

areusch commented Nov 13, 2020 •

edited

Loading

manupak Nov 24, 2020 •

edited

Loading

manupak Nov 25, 2020 •

edited

Loading

manupak Nov 25, 2020 •

edited

Loading

tqchen Nov 25, 2020 •

edited

Loading