[Hexagon] Run single RPC server on Android in each testing session #11547

mehrdadh · 2022-06-02T17:45:51Z

This PR changes Hexagon launcher in pytest config to reuse existing RPC server on Android side and only create new RPC server on the first test.

cc @Lunderberg @kparzysz-quic

kparzysz-quic

There seem to be several things going on at the same time. Ideally, each self-contained change would go into its own PR, but if you want to combine them into one, could you list them in more detail in the PR description?
My two questions here are (1) whether we can do the same thing for hw and simulator, and (2) whether del is a reliable way to close an RPC session.

kparzysz-quic · 2022-06-03T20:49:38Z

python/tvm/contrib/hexagon/pytest_plugin.py

+    android_serial_number,
+) -> HexagonLauncherRPC:
+    """Initials and returns hexagon launcher which reuses RPC info and Android serial number."""
+    if hexagon_server_process._serial_number != "simulator":


Why is there a difference here between hw and simulator?

I think simulator is more dependent to have all the execution files in the same workspace and since it was already working fine, I tried not to change it. We could also change that in this PR or a follow up PR.

kparzysz-quic · 2022-06-03T20:52:31Z

python/tvm/contrib/hexagon/session.py

@@ -93,7 +93,8 @@ def __enter__(self):
            raise exception

    def __exit__(self, exc_type, exc_value, exc_traceback):
-        pass
+        # close session to the tracker
+        del self._rpc


I'm not sure how well this works in python for actually destroying objects. It will unbind the name, but I don't think that there is any guarantee that the underlying object will actually be destroyed. @Lunderberg: do you have any opinions about this?

Assuming that this is the only reference to self._rpc, this would destroy the object, because CPython uses a reference count in addition to generational garbage collection. That said, that's more of an implementation detail, and is pretty fragile to rely upon.

It looks like the key function we want to call is RPCEndpoint::Shutdown, usually called when the RPCEndpoint object is destructed. Unfortunately, I don't see any method in either RPCClientSession or RPCSession. I think we could add a close method to the python RPCSession object, which would then be called from __exit__.

def close(self): self._sess.__del__() self._sess = None

kparzysz-quic · 2022-06-03T20:57:04Z

python/tvm/contrib/hexagon/session.py

@@ -109,7 +110,7 @@ def device(self):

        return self._device

-    def upload(self, local_path: Union[str, pathlib.Path], remote_filename: str):
+    def upload(self, local_path: Union[str, pathlib.Path], remote_filename: str) -> pathlib.Path:


This seems like an unrelated change.

I need this change to make this approach work. It is used in load_module function.

Is this related to the change from allowing just filenames to requiring a full remote path, so that the caller would have somewhere from which they could learn the full remove path?

yeah, previously some of the paths were only working if we were using the same working directory for HexagonLauncher and Session.

kparzysz-quic · 2022-06-03T20:57:18Z

python/tvm/contrib/hexagon/session.py

@@ -194,33 +196,36 @@ def get_graph_executor(
        self._set_device_type(graph_mod)
        return tvm.contrib.graph_executor.create(graph_json, graph_mod, self.device)

-    def get_aot_executor(
+    def get_graph_debug_executor(


This seems like another unrelated change.

Unfortunately github didn't present this correctly. I removed get_aot_executor function because it didn't work correctly and also it was not used anywhere in the codebase. Therefore it will remain broken if we keep it and others might try to use it.

Also I changed get_graph_debug_executor to work with the current approach. Just realized we are missing a test for that as well. I suggest to add a test for it since this api is useful for debugging.

I think we should add a test for the get_aot_executor function, rather than removing it entirely. The current tests use get_executor_from_factory, which calls into _aot_executor_from_factory. This is exactly what we need for CI, where each model is uploaded once and run once, but isn't as useful when a model is uploaded once and run many times, potentially across multiple executions.

The get_aot_executor path is intended to support that use case, where an AOT module has been compiled and uploaded, and in each future occurrence only needs to be loaded.

we could either add a test for get_aot_executor or rework _aot_executor_from_factory to use get_aot_executor function. I prefer the latter.

Lunderberg · 2022-06-06T14:28:11Z

python/tvm/contrib/hexagon/build.py

@@ -58,7 +62,9 @@ def _get_hexagon_rpc_lib_dir() -> pathlib.Path:

 def _get_test_directory_name() -> str:
    """Generate a time-stamped name for use as a test directory name."""
-    return datetime.datetime.now().strftime("%Y-%m-%dT%H-%M-%S")
+    date_str = datetime.datetime.now().strftime("%Y-%m-%dT%H-%M-%S")


While I think this is a good change to add to ensure unique folders even if a test concludes in less than a second, I think we should separate it out into an independent PR.

sure, thanks!

Thank you for spinning it out, and #11593 is reviewed and approved!

Lunderberg · 2022-06-06T14:34:04Z

python/tvm/contrib/hexagon/build.py

-            or a full path in the remote system. If it is a file name,
-            the file must already have been uploaded to the remote,
-            and be placed in the remote workspace.
+            be a full path in the remote system.


I like the symmetry of loading a file using the same path as returned from upload, but are there cases where we'd want to work within the existing workspace, without requiring the full path? The use cases coming to mind would be uploading a model once, then running several independent tests on it. Within a single process, the model to run should be passed to each independent target, so this change wouldn't affect it. If the upload happens in one process, then the independent tests happen in subsequent processes, it might cause an issue, but then the workspace itself would need to be passed through.

So I don't think I see any major concerns, but reducing previously functional behavior does give me pause.

Lunderberg · 2022-06-06T14:41:02Z

python/tvm/contrib/hexagon/session.py

@@ -194,33 +196,36 @@ def get_graph_executor(
        self._set_device_type(graph_mod)
        return tvm.contrib.graph_executor.create(graph_json, graph_mod, self.device)

-    def get_aot_executor(
+    def get_graph_debug_executor(


I think we should add a test for the get_aot_executor function, rather than removing it entirely. The current tests use get_executor_from_factory, which calls into _aot_executor_from_factory. This is exactly what we need for CI, where each model is uploaded once and run once, but isn't as useful when a model is uploaded once and run many times, potentially across multiple executions.

The get_aot_executor path is intended to support that use case, where an AOT module has been compiled and uploaded, and in each future occurrence only needs to be loaded.

Lunderberg · 2022-06-06T14:48:02Z

python/tvm/contrib/hexagon/session.py

@@ -109,7 +110,7 @@ def device(self):

        return self._device

-    def upload(self, local_path: Union[str, pathlib.Path], remote_filename: str):
+    def upload(self, local_path: Union[str, pathlib.Path], remote_filename: str) -> pathlib.Path:


Is this related to the change from allowing just filenames to requiring a full remote path, so that the caller would have somewhere from which they could learn the full remove path?

mehrdadh · 2022-06-07T19:34:49Z

@Lunderberg I changed _aot_executor_from_factory to use get_aot_executor. PTKL, thanks!

Lunderberg

With the updates to the AOT executor handling, I think this looks good to me. As discussed, we should follow-up with an issue for explicit closing of an RPCSession, because the del self._rpc relies on the python garbage collection and so the timing of it is a bit fragile for long term use.

mehrdadh marked this pull request as ready for review June 2, 2022 17:45

github-actions bot requested review from Lunderberg and kparzysz-quic June 2, 2022 17:47

mehrdadh marked this pull request as draft June 2, 2022 17:50

mehrdadh force-pushed the hexagon/single_hexagon_session branch 2 times, most recently from 98fb3b0 to 2848ce7 Compare June 3, 2022 17:03

mehrdadh marked this pull request as ready for review June 3, 2022 17:03

kparzysz-quic reviewed Jun 3, 2022

View reviewed changes

Lunderberg reviewed Jun 6, 2022

View reviewed changes

mehrdadh added 4 commits June 8, 2022 15:50

Reuse hexagon launcher in test session

bfb2272

separate random name generation

6f86717

revert get_aot_executor

f5d0490

Fix launcher for simulator case

af22ddd

mehrdadh force-pushed the hexagon/single_hexagon_session branch from 7a82d6b to 9ac0be9 Compare June 8, 2022 16:58

add stop server for simulator

c6bec0d

mehrdadh force-pushed the hexagon/single_hexagon_session branch from 9ac0be9 to c6bec0d Compare June 8, 2022 17:08

Lunderberg approved these changes Jun 10, 2022

View reviewed changes

kparzysz-quic merged commit dc522a6 into apache:main Jun 10, 2022

mehrdadh deleted the hexagon/single_hexagon_session branch June 14, 2022 16:37

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hexagon] Run single RPC server on Android in each testing session #11547

[Hexagon] Run single RPC server on Android in each testing session #11547

mehrdadh commented Jun 2, 2022

kparzysz-quic left a comment

kparzysz-quic Jun 3, 2022

mehrdadh Jun 3, 2022

kparzysz-quic Jun 3, 2022

Lunderberg Jun 6, 2022

kparzysz-quic Jun 3, 2022

mehrdadh Jun 3, 2022

Lunderberg Jun 6, 2022

mehrdadh Jun 6, 2022

kparzysz-quic Jun 3, 2022

mehrdadh Jun 3, 2022

mehrdadh Jun 3, 2022

Lunderberg Jun 6, 2022

mehrdadh Jun 6, 2022

Lunderberg Jun 6, 2022

mehrdadh Jun 6, 2022

Lunderberg Jun 6, 2022

Lunderberg Jun 6, 2022

Lunderberg Jun 6, 2022

Lunderberg Jun 6, 2022

mehrdadh commented Jun 7, 2022

Lunderberg left a comment

[Hexagon] Run single RPC server on Android in each testing session #11547

[Hexagon] Run single RPC server on Android in each testing session #11547

Conversation

mehrdadh commented Jun 2, 2022

kparzysz-quic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mehrdadh commented Jun 7, 2022

Lunderberg left a comment

Choose a reason for hiding this comment