Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the client to be async all the way through #317

Merged
merged 2 commits into from
Apr 7, 2024

Conversation

einarmo
Copy link
Contributor

@einarmo einarmo commented Mar 27, 2024

This is a complete rewrite of the client, in order to make it async the entire way through, removing the need for any internal tokio runtimes.

First of all, sorry about this. It's never nice to receive such a large PR out of nowhere. Consider this an announcement that this exists. I hope we can work together to figure out a way to get this into the library in some way, as I consider the changes made here to be positive.

No matter where this PR goes I will likely keep working on the things outlined in the "Future projects" section of async_client.md.

See docs/async_client.md for an overview of the technical details on this change. Very briefly: the client has been rewritten from the ground up to be entirely async. It is now possible to run a session without spawning a single thread.

Background

I have some background with OPC-UA, mostly in .NET, and I've fallen for rust. I consider rust a natural fit for OPC-UA. The focus on correctness is helpful when working with such a huge complicated standard, and it should be possible to make an OPC-UA client/server implementation very lightweight.

For a project, I need a lightweight OPC-UA client. In particular, I want it to be so lightweight that I can run thousands of clients without issue. This means that a client should have almost no overhead, and I need it to be async so that I can run them cooperatively. This is essentially impossible in any other OPC-UA framework I know of.

I first considered making my own OPC-UA library, but that idea was quickly discarded as it would be a waste of time to redo the significant amount of work that goes into cryptography, binary encoding/decoding, and other core utilities already implemented and tested here.

Initially, this was supposed to just change the existing client implementation to async, step by step, but it quickly became clear that a lot of changes needed to be done in order to make that happen, and an intermediate state would be very hard to create, at least without a final goal to reference, so I decided to start fresh. Despite what the diff line count would have you believe, this does reuse a lot of code from the old client.

Scope

The only limitation on this project was that I wanted to avoid touching anything outside of the client itself. Of course, samples and tests also had to be updated, and a few utilities were changed for technical reasons, see async_client.md for details on that too.

Tests

A significant change to working with the library is that the tests can now be run normally, using just cargo test, it takes about 15 seconds on my machine, which is pretty good. The downside is that the need to use spawn_blocking in the tests makes them handle panics very poorly. Dealing with this is going to require changing the server to be async as well (which I do think should be done).

@einarmo einarmo force-pushed the async-client branch 4 times, most recently from 49675b1 to 54972bc Compare March 27, 2024 08:54
@locka99
Copy link
Owner

locka99 commented Apr 2, 2024

Hi thanks a million for this submission. It'll take me some time to go through it so have some patience and pester me if I don't come back in a week or so.

@einarmo
Copy link
Contributor Author

einarmo commented Apr 3, 2024

Of course. Thank you for looking at it. Feel free to reach out if there's anything.

@locka99
Copy link
Owner

locka99 commented Apr 7, 2024

Hi Einar, I think the patch looks good and I really appreciate what must have been a substantial effort. There is a lot to it but I ran through some basic scenarios against OPCUA servers which passed so I am happy to accept the patch.

@locka99 locka99 merged commit eab4caa into locka99:master Apr 7, 2024
1 check failed
@einarmo
Copy link
Contributor Author

einarmo commented Apr 7, 2024

Wow, I did not expect this to go so quickly. Thank you for the trust and for accepting the patch. I do have some more stuff coming eventually, though hopefully nothing quite so extensive.

I hope to expand a great deal on the integration tests as well, if I find the time.

@evanjs
Copy link

evanjs commented Apr 7, 2024

Thanks for your contribution, @einarmo

I've been using this library for a service I've been working on (for production) and have run into some nits along the way

Really encouraging to see others that want to improve this library

Curious if a server rewrite would be a good time to look at #281 or #285

Heck, even the rewrite alone might resolve the issues 🤞

@einarmo
Copy link
Contributor Author

einarmo commented Apr 7, 2024

@evanjs if I do get around to the server (I've started looking a bit, but it will likely be quite a while before I have anything substantial), then I would be very surprised if I didn't somehow fix these two issues.

Comment on lines +874 to +875
// TODO: This whole thing should probably be optional, so that users can
// customize the process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice TODO 😉

I actually just stumbled upon this one as I noticed that each time we had a connection issue, we ended up with an additional subscription. This was caused by this method recreating the existing subscriptions (one of the servers we connect to does not support transfers) and some logic of our own that tries to transfer and/or recreate subscriptions after a connection issue as well 😅

I actually want to manage this myself so I can also reconcile the monitored items (depending on the moment of the interruption and the time it was disconnected it could be the desired items to monitor have changed).

But looking at this (to determine if I could somehow use this instead), I wonder how you can use this method and be sure everything is actually transferred or recreated? It doesn't return any errors so you would still need to query and reconcile all open subscriptions right?

Yet the challenge with that seems to be that if you have multiple subscriptions (which we sometimes have) there is no way to map the old subscription IDs to the new subscription IDs (in case they are recreated).

So wondering how you are doing that (if your doing something like that at all) as the TODO says exactly what I was thinking about as solution to the behavior I noticed 😊

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really, my thought with that TODO was that advanced users would probably have to do it themselves. It's a bit of an oversight that there's no way to disable it. Part of the reason why I did it this way is that it would take a fair bit more work to add mechanisms to call TransferSubscriptions manually, since there is no way to get the actual monitored items from the server, so there's no way to map the subscription IDs as you say.

For you, the best solution would probably be to just have a way to disable it entirely. Maybe we could create some system that would make this approach more reliable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah absolutely... We already have code in place to manage it ourselves so indeed I (for now) just ended up commenting out the line calling this method. Will make a PR to make it optional later this week when I'm done testing.

Yet thinking about making another PR is making me a little bit worried, the number of custom patches in my local fork are becoming a bit challenging to manage in relation to the upstream pending patches 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm some 50 commits ahead on my fork, everything except the client rewritten completely, more or less. I think we're hitting the point where we really need to find a way forward on this project.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fully agree... I really respect and appreciate all the time and effort locka99 is putting/has put into this, so don't want to keep stalking him on issues and PRs 😊

Yet we are using this crate in some of our production code (well, not the actual crate we use a fork and point Cargo to that repo) so it is important for us to keep this project maintained (either here, which I would prefer, or elsewhere).

}
}
Err(e) => warn!("Failed to decode data change notification: {e}"),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask the reasoning behind changing this logic as compared to how the callback worked in the old client?

While I understand that for some use cases it might be (slightly) easier to get each value on its own (as compared to having to loop through them), is does however remove some context and prevents being able to batch process values received from a single publish response.

In our case this actually breaks some of our logic and instead of being able to use and rely on the values returned by the publish responses, we now have to use an arbitrary de-bounce delay to group/batch some received values. But especially with low publish intervals this now almost becomes a continuous stream instead of batched responses.

As I mentioned I understand that depending on how you want to process or store the values the new approach might be slightly handier, but it prevents the "batched" approaches. While if we just forward the batch that we received we would allow people to make that choice on their own and either process the vales as a batch or loop over them and process them one by one.

So long story short, unless there is a very compelling reason for having it this way I would like to make a PR to change this behavior to how it worked before. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sure, if you want to change it back that is probably fine. If you do that, mind implementing the current approach as a layer on top of the more simple API you suggest?

For a lot of simple applications it's convenient to have an easy to use and, importantly, easy to understand interface for consuming notifications.

Copy link
Contributor Author

@einarmo einarmo Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "easy to understand" part is the main justification behind the change. Consuming subscriptions is the thing you will want to do with a starter application, and I would like it to be as simple as possible.

That said, it should be easy enough to just have two traits, one for your low level callbacks, and one for my high level, then blanket implement the former for the latter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah fully agree... As said I can see use cases for both approaches so sounds like a good idea to just offer them both instead of just one. Will have a look at it somewhere in the coming week 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants