Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: plugin defined protocols not working in 0.8.1 #2224

Closed
dbluhm opened this issue May 4, 2023 · 22 comments · Fixed by #2255
Closed

Regression: plugin defined protocols not working in 0.8.1 #2224

dbluhm opened this issue May 4, 2023 · 22 comments · Fixed by #2255
Assignees

Comments

@dbluhm
Copy link
Contributor

dbluhm commented May 4, 2023

ca8308f broke the ability to load a protocol in a plugin. Below is the resulting error:

agent_1  | 2023-05-04 20:57:40,368 aries_cloudagent.transport.pack_format DEBUG Expanded message: {'@type': 'https://didcomm.org/data-transfer/0.1/provide-data', '@id': 'a4ce6262-233d-43bb-9c60-ae6b98986218', 'goal_code': 'test_goal', 'data~attach': [{'description': 'test data attachment', 'data': {'json': {'test': 'data'}}}]}
agent_1  | 2023-05-04 20:57:40,368 aiohttp.access INFO 10.89.0.5 [04/May/2023:20:57:40 +0000] "POST / HTTP/1.1" 200 149 "-" "Python/3.7 aiohttp/3.8.1"
agent_1  | 2023-05-04 20:57:40,372 aries_cloudagent.core.conductor ERROR Exception in message handler:
agent_1  | Traceback (most recent call last):
agent_1  |   File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step
agent_1  |     result = coro.send(None)
agent_1  |   File "/usr/src/app/.venv/lib/python3.9/site-packages/aries_cloudagent/core/dispatcher.py", line 155, in handle_message
agent_1  |     (message, warning) = await self.make_message(
agent_1  |   File "/usr/src/app/.venv/lib/python3.9/site-packages/aries_cloudagent/core/dispatcher.py", line 318, in make_message
agent_1  |     _, warning = await validate_get_response_version(
agent_1  |   File "/usr/src/app/.venv/lib/python3.9/site-packages/aries_cloudagent/core/util.py", line 49, in validate_get_response_version
agent_1  |     version_definition = await get_version_def_from_msg_class(
agent_1  |   File "/usr/src/app/.venv/lib/python3.9/site-packages/aries_cloudagent/core/util.py", line 151, in get_version_def_from_msg_class
agent_1  |     definition_path = _get_path_from_msg_class(msg_class)
agent_1  |   File "/usr/src/app/.venv/lib/python3.9/site-packages/aries_cloudagent/core/util.py", line 117, in _get_path_from_msg_class
agent_1  |     path = split_str + path.rsplit(split_str, 1)[1]
agent_1  | IndexError: list index out of range

The lines added in aries_cloudagent.core.util make it impossible to load a protocol because the path to the protocol is not guaranteed to include aries_cloudagent as part of it, resulting in the IndexError seen above.

This is quite unfortunate and effectively renders several plugins in the wild incompatible with 0.8.1.

A somewhat minimal reproducible example can be found here: Indicio-tech/aries-acapy-plugin-data-transfer#5
Failing integration test here: https://github.com/Indicio-tech/aries-acapy-plugin-data-transfer/actions/runs/4887340386/jobs/8723798002?pr=5 (see "Print logs on failure" step in job to see logs pasted above)

cc @shaangill025 @swcurran

@dbluhm
Copy link
Contributor Author

dbluhm commented May 4, 2023

Heads up, @anwalker293 @mepeltier @reflectivedevelopment , impacts several of our plugins.

@swcurran
Copy link
Contributor

swcurran commented May 4, 2023

Arrggh….

What is the right answer here?

Do we need an 0.8.2 that addresses this issue to continue to work with existing plugins, or is it appropriate/better than we have this as a breaking change that needs to be addressed?

Sorry this was not caught as a breaking change.

@dbluhm
Copy link
Contributor Author

dbluhm commented May 4, 2023

Every release, I think to myself that we should really have an automated system for running our plugins against the release candidates but just haven't gotten around to it yet 😅 We did do some testing during the rc phase but I made an absent-minded mistake and tests that I thought were running against 0.8.1-rcX actually were still running against 0.7.X.

If we didn't do a patch release, what would publishing the change look like? A 0.8.2 release is what my mind naturally goes to but I'm not sure what the alternatives would look like

@swcurran
Copy link
Contributor

swcurran commented May 4, 2023

Happy to do a 0.8.2. We just need to know what the change needs to be. What is the right thing to do for ACA-Py and ACA-Py users?

@swcurran
Copy link
Contributor

swcurran commented May 5, 2023

FYI — we’re looking at adding redis into AATH, which might help us with some ongoing plugin testing...

@usingtechnology
Copy link
Contributor

Added a comment and "fix" to the example PR that worked for me (running the example docker compose locally).

Bigger issue, and been mentioned previously is how to load plugins. That will definitely affect the search path that ACA-Py is using to locate the plugin code.

In the provided example (and thanks for having something easy for me to replicate), ACA-Py is loaded as a Python library, and the plugin code is the "root", so "acapy" doesn't know where the plugin code is... in this case we set ACAPY_HOME environment variable.

Other plugins will build their images differently, and how/where their plugin code relates to where the ACA-Py code is... the recent mediator service + redis plugin is done a different way than this, and differently than traction...

Do we want to close this (if the example PR builds and tests and merges), or leave this open and investigate some more? I know that Traction has not bumped up to 0.8.1 and they could have issues.

@dbluhm
Copy link
Contributor Author

dbluhm commented Jun 2, 2023

Heads up @frostyfrog

@dbluhm
Copy link
Contributor Author

dbluhm commented Jun 2, 2023

@usingtechnology re: the fix by adding ACAPY_HOME env var: wouldn't this cause builtin protocols to then fail in a similar way to how plugged in protocols were failing? The plugin package root like acapy_plugin_data_transfer wouldn't be present in the path of builtin protocols the same way aries_cloudagent wasn't present in the path of plugged in protocols.

@usingtechnology
Copy link
Contributor

I honestly do not know. One problem is that there is no "standard" on bundling plugins with ACA-Py; I can point at 3 different ways it is happening. In this case, the plugin is the "parent" project, ACA-Py is one of its requirements. Is that the same for all plugin developers and consumers? For another project, where redis queue plugin is used with the mediator, the plugin is installed to the aries-cloudagent-python:py3.9-0.8.1 and that works without setting ACAPY_HOME (although that also uses a weird path in the --plugin. So many ways to skin a cat. This "fix" only works for this and similarly built images.

This discussion and your issue would leave me to believe that we need to set a standard ASAP - or at least known methods that will work and are supported. And how to publish/bundle plugins for use by 3rd parties

@WadeBarnes
Copy link
Contributor

How does the work here, #2138, play into this?

@usingtechnology
Copy link
Contributor

I think that’s what we need to carry on with, take @ianco work and start hardening it and putting it into practice.
@dbluhm how are you putting images together for PROD? Are you installing them on top of ACA-Py, so they are found in the package path? Or like you are doing here?
Maybe for allowing devs to test their plugin, we should have an explicit search path like EXT_PLUGIN_HOME. But production images plugins are libs installed on approved images?

TheTechmage added a commit to TheTechmage/aries-cloudagent-python that referenced this issue Jun 2, 2023
Instead of relying solely on the file-path when loading a package, load
from the module name instead. This should work for *most* external
packages that are not installed into the ACA-Py folder. This should
resolve the issue described in openwallet-foundation#2224

Signed-off-by: Colton Wolkins (Indicio work address) <[email protected]>
@TheTechmage
Copy link
Contributor

Apologies for taking until today to get this committed and pushed up. I was working on a potential solution yesterday (which can be seen in the commit that got added to the thread) and as far as I can tell, falling back to the module name when the file path doesn't work seems to resolve the issue that @dbluhm was describing. However, I'm not entirely convinced if it's the right fix or just a workaround. At this point in time, I have only tested it with the mentioned data-transfer plugin and the integration tests are working now.

@swcurran
Copy link
Contributor

swcurran commented Jun 2, 2023

This seems to be an urgent issue to resolve — correct? @usingtechnology — this might adjust your priorities (although it fits with the “dev experience and docs” item, I guess). @dbluhm — I think Jason (@usingtechnology ) and the Indicio team have the most experience with plugins, have read @ianco’s work on this, and should drive the direction.

Perhaps a meeting early next week to (try to) make decisions? And of course, we’d love @ianco to weigh in, if he is interested. He’s already provided a lot of ideas on this.

@usingtechnology
Copy link
Contributor

Yes, I don’t think we can put this off. I agree that we should meet early next week. Let’s do something more robust and less hacky. Definitely a few different angles to consider, but I think we can come up with something. At quick thought is: we build out plug-in config to specify the path to the plug-in. Nice to have simple defaults but obviously cannot cover all cases. So it’s either get more restrictive on a standard or allow independence.

@dbluhm
Copy link
Contributor Author

dbluhm commented Jun 2, 2023

To load a plugin, we provide the python module path to the --plugin argument. To load the plugin, the plugin must be available on the python path, which can be pretty easily modified. I think this system is already pretty robust when it comes to managing where your plugins are located on your system/image. The problem is that the utility method _get_path_from_msg_class is working from a message class within the plugin and trying to work backwards from there so the caller can ultimately locate the definition.py for the protocol.

I think our problem isn't necessarily in locating and loading of plugins but rather more in how we're attempting to handle multiple versions of protocols by working backwards from that message class. I do agree that we need to further refine the plugin interface and maybe this would help with this problem but I think we can address the problem well in advance of refactoring the plugin system.

@usingtechnology
Copy link
Contributor

Sorry, sitting at a ferry terminal so not at my machine to review code. Is this method new? Like the plug-in is loaded ok, but there is a new runtime resolution method? Are message classes always in the same place relative to the plug-in root, or is that up to the developer?

@dbluhm
Copy link
Contributor Author

dbluhm commented Jun 2, 2023

Yes, the method is relatively new, having been included in a tag for the first time in 0.8.0-rc0. Relevant commit is ca8308f, which was added as part of this pr: #1940.

Yep, the plugin loads fine but on handling a message in the plugged in protocol, the code added in #1940 triggers the errors in the original description.

As of right now, the message classes are not guaranteed to be in the same place relative to the plugin root. Neither is it required for the plugin to even have a definitiion.py, depending on how it's loaded. This would be helped by Ian's recommendations for refining the plugin system.

@usingtechnology
Copy link
Contributor

Thanks for that. Ok, when I’m back Monday I’ll do a deep dive and get familiar with old and new.

@swcurran
Copy link
Contributor

swcurran commented Jun 2, 2023

If we can do a quick fix for that for now, that would be great. If it is that recent, I assume we can come up with something better and get a 0.8.2 out ASAP.

@usingtechnology
Copy link
Contributor

@dbluhm - see PR #2255. This will help with your setup but is still not going to solve it for everyone (Traction). I would like to know your process for testing this out with your plugin. I must be missing something simple because I could not figure out an easy way to get a local version of ACA-Py code into your image for your plugin. I did it, but not ideal...

Didn't have to change anything in your code, didn't have to add ACAPY_HOME, just use ACA-Py as fixed in #2255 and your test passed.

@dbluhm
Copy link
Contributor Author

dbluhm commented Jun 7, 2023

@usingtechnology what we typically do to test unreleased versions of ACA-Py: Indicio-tech/aries-acapy-plugin-data-transfer@7c869ef

The dockerfile used in the integration tests installs its dependencies from the poetry.lock.

@usingtechnology
Copy link
Contributor

thanks for that. was definitely doing something more difficult.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants