Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simple-client sample seems broken #324

Open
cjjb01 opened this issue May 1, 2024 · 6 comments
Open

simple-client sample seems broken #324

cjjb01 opened this issue May 1, 2024 · 6 comments

Comments

@cjjb01
Copy link

cjjb01 commented May 1, 2024

As the simple-server sample only publishes v1 and v2, the client is subscribed to v1,v2,v3,v4 and does not receive updates for v3,v4 and does nothing after the first update.
After changing the client to only subscribe to v1,v2 it behaves as expected and receives updates from the server.

@AiyionPrime
Copy link
Contributor

Hey @cjjb01; I just git bisected your problem, as I did not encounter it in 0.12.0.
It appears the first commit introducing the bad behaviour is eab4caa, which was merged in #317.

@einarmo Can you reproduce the problem with your machine as well? Or is this unrelated to your (merged) PR?

@einarmo
Copy link
Contributor

einarmo commented Jul 2, 2024

I think I noticed this at one point and found it only happens with this server SDK, so I wonder if it's a change in the client exposing a bug in the server, or it may be a bug in the client. I'll look into it when I find some time.

@AiyionPrime
Copy link
Contributor

I think it would be worthwhile to implement integration tests against the examples.
In a downstream project I implemented two which might better live in this repo.
One of them verifies one line of subscribed data output, the other 10.
The former succeeds the latter only in 0.12.0.

@locka99 answered a few days ago; he's currently pretty busy and it might take a moment until he gets back to this project.
I think opening more PRs until then is just a waste of time; just wanted to let you know the integration tests which would have caught this beforehand do exist downstream.
If you find the time to evaluate the issue, just let me know and I'll pass you freshly rebased commits of them.

@einarmo
Copy link
Contributor

einarmo commented Jul 16, 2024

Alright. Whatever this is, it doesn't happen when running the client against other servers, or at all in my rewrite of the server part of the SDK (which finally runs the sample server now...).

From what I can tell the sample server is supposed to publish on v3 and v4 as well, (at least it does now...) so there is more wrong here. I wonder if maybe this is caused by a deadlock in the sampling logic? Perhaps triggered by the client now being able to send requests in parallel?

@AiyionPrime
Copy link
Contributor

I'll see to verifying that tomorrow. Maybe I'll get that finding into the pipeline as well.

@AiyionPrime
Copy link
Contributor

Thanks for having looked into it that fast!

AiyionPrime added a commit to vorausrobotik/opcua that referenced this issue Jul 30, 2024
as it's currently not working for at least two different reasons:

Commit(43afb6a): 'Implement JSON serialization on most built-in data types.',

which broke the JS parsing and updating of subscribed values.
And the other being

Commit(eab4caa): 'Rewrite the client to be async all the way through',

which indirectly halted subscriptions for servers implemented with this crate (locka99#324).

Re-adding this sample would be highly appreciated.
If possible with tests, which might prevent its recurring decay.
locka99 pushed a commit that referenced this issue Jul 31, 2024
as it's currently not working for at least two different reasons:

Commit(43afb6a): 'Implement JSON serialization on most built-in data types.',

which broke the JS parsing and updating of subscribed values.
And the other being

Commit(eab4caa): 'Rewrite the client to be async all the way through',

which indirectly halted subscriptions for servers implemented with this crate (#324).

Re-adding this sample would be highly appreciated.
If possible with tests, which might prevent its recurring decay.
This was referenced Aug 15, 2024
einarmo added a commit to einarmo/opcua that referenced this issue Aug 15, 2024
# What and Why

This is a rewrite of the server from scratch, with the primary goal of taking the server implementation from a limited, mostly embedded server, to a fully fledged, general purpose server SDK. The old way of using the server does still _more or less_ exist, see samples for the closest current approximation, but the server framework has changed drastically internally, and the new design opens the door for making far more complex and powerful OPC-UA servers.

## Goals

Currently my PC uses about ~1% CPU in release mode running the demo server, which updates 1000 variables once a second. This isn't bad, but I want this SDK to be able to handle servers with _millions_ of nodes. In practice this means several things:

 - It must be possible to store nodes externally, in some database or other system.
 - Monitored items must be notification based. There is always going to be sampling somewhere, but in practice large OPC-UA servers are _always_ push based, the polling is usually deferred to underlying systems, which may or may not be OPC-UA based at all.
 - It must be possible to store different sets of nodes in different ways. If anyone wanting to write a more complex server needs to reimplement diagnostics and the core namespace, the SDK isn't particularly useful. We could hard code that, but it seems better to create an abstraction.

## High level changes

First of all, there are some fundamental structural changes to better handle multiple clients and ensure compliance with the OPC-UA standard. Each TCP connection now runs in a tokio `task`, and most requests will actually spawn a `task` themselves. This is reasonably similar to how frameworks like `axum` handle web requests.

Subscriptions and sessions are now stored centrally, which allows us to implement `TransferSubscriptions` and properly handle subscriptions outliving their session as they are supposed to in OPC-UA. I think technically you can run multiple sessions on a single connection now, though I have no way to test this at the moment.

The web server is gone. It could have remained, but I think it deserves a rethink. It would be better (IMO), and deal with issues such as locka99#291, if we integrate with the `metrics` library, and optionally export some other metrics using some other generic interface. In general I think OPC-UA is plenty complicated enough without extending it with tangentially related features, though again this might be related to the shift I'm trying to create here from a specialized embedded server SDK, to a generic OPC-UA SDK.

Events are greatly changed, and quite unfinished. I believe a solid event implementation requires not just more thought, but a proper derive macro to make implementing them tolerable. The old approach relied on storing events as nodes, which works, and has some advantages, but it's not particularly efficient, and required setting a number of actually superfluous values, i.e. setting the displayname of an event, which is a value that cannot be accessed, as I understand it. The new approach is just storing them as structs, `dyn Event`.

## Node managers

The largest change is in how most services work. The server now contains a list of `NodeManager`s, an idea stolen from the .NET reference SDK, though I've gone further than they do there. Each node manager implements services for a collection of nodes, typically the nodes from one or more namespaces. When a request arrives we give each node manager the request items that belongs to it, so when we call `Read`, for example, a node manager will get the `ReadValueId`s where the `NodeManager` method `owns_node` returns `true`.

There are some exceptions, notably the `view` services can often involve requests that cross node-manager boundaries. Even with this, the idea is that this complexity is hidden from the user.

Implementing a node manager from scratch is challenging, see `node_manager/memory/diagnostics.rs` for an example of a node manager with very limited scope (but one where the visible nodes are dynamic!).

To make it easier for users to develop their own servers, we provide them with a few partially implemented node managers that can be extended:

 - The `InMemoryNodeManager` deals with all non-value attributes, as well as `Browse`, and provides some convenient methods for setting values in the address space. Node managers based on this use the old `AddressSpace`. Each such node manager contains something implementing `InMemoryNodeManagerImpl`, which is a much more reasonable task to implement. See `tests/utils/node_manager.rs` for a very complete example, or `node_manager/memory/core.rs` for a more realistic example (this node manager implements the core namespace, which may also be interesting).
 - The `SimpleNodeManager` is an approximation of the old way to use the SDK. Nodes are added externally, and you can provide getters, setters, and method callbacks. These are no longer part of the address space.

More node managers can absolutely be added if we find good abstractions, but these are solid enough to let us implement what we need for the time being.

# Lost features

Some features are lost, some forever, others until we get around to reimplementing them. I could have held off on this PR until they were all ready, but it's already large enough.

 - Diagnostics are almost entirely gone, though there is a sort of framework for them. In practice the whole thing needs a rethink, and it's an isolated enough task that it made sense to leave out for now.
 - Setting sampling interval to `-1` no longer works. I wanted to make everything work, but in typical OPC-UA fashion some features are just incredibly hard to properly implement in a performant way. I'm open for suggestions for implementing this in a good way, but it's such a niche feature that I felt it was fine to leave it out for now.
 - The web server, as mentioned above.
 - Auditing. Same reason as diagnostics, but also because events are so cumbersome at the moment.

# General improvements

Integration tests are moved into the library as cargo integration tests, and they are quite nice. I can run `cargo test` in about ~30 seconds, most of which is spent on some expensive crypto methods. There is a test harness that allows you to spin up a server using port `0`, meaning that you get dynamically assigned a port, which means we can run tests in parallel arbitrarily.

This almost certainly fixes locka99#359, locka99#358, locka99#324, locka99#291, and locka99#281, and probably more.

# Future work

See `todo.md`, the loose ends mentioned in this PR description need to be tied up, and there is a whole lot of other stuff in that file that would be nice to do.
einarmo added a commit to einarmo/opcua that referenced this issue Dec 15, 2024
# What and Why

This is a rewrite of the server from scratch, with the primary goal of taking the server implementation from a limited, mostly embedded server, to a fully fledged, general purpose server SDK. The old way of using the server does still _more or less_ exist, see samples for the closest current approximation, but the server framework has changed drastically internally, and the new design opens the door for making far more complex and powerful OPC-UA servers.

## Goals

Currently my PC uses about ~1% CPU in release mode running the demo server, which updates 1000 variables once a second. This isn't bad, but I want this SDK to be able to handle servers with _millions_ of nodes. In practice this means several things:

 - It must be possible to store nodes externally, in some database or other system.
 - Monitored items must be notification based. There is always going to be sampling somewhere, but in practice large OPC-UA servers are _always_ push based, the polling is usually deferred to underlying systems, which may or may not be OPC-UA based at all.
 - It must be possible to store different sets of nodes in different ways. If anyone wanting to write a more complex server needs to reimplement diagnostics and the core namespace, the SDK isn't particularly useful. We could hard code that, but it seems better to create an abstraction.

## High level changes

First of all, there are some fundamental structural changes to better handle multiple clients and ensure compliance with the OPC-UA standard. Each TCP connection now runs in a tokio `task`, and most requests will actually spawn a `task` themselves. This is reasonably similar to how frameworks like `axum` handle web requests.

Subscriptions and sessions are now stored centrally, which allows us to implement `TransferSubscriptions` and properly handle subscriptions outliving their session as they are supposed to in OPC-UA. I think technically you can run multiple sessions on a single connection now, though I have no way to test this at the moment.

The web server is gone. It could have remained, but I think it deserves a rethink. It would be better (IMO), and deal with issues such as locka99#291, if we integrate with the `metrics` library, and optionally export some other metrics using some other generic interface. In general I think OPC-UA is plenty complicated enough without extending it with tangentially related features, though again this might be related to the shift I'm trying to create here from a specialized embedded server SDK, to a generic OPC-UA SDK.

Events are greatly changed, and quite unfinished. I believe a solid event implementation requires not just more thought, but a proper derive macro to make implementing them tolerable. The old approach relied on storing events as nodes, which works, and has some advantages, but it's not particularly efficient, and required setting a number of actually superfluous values, i.e. setting the displayname of an event, which is a value that cannot be accessed, as I understand it. The new approach is just storing them as structs, `dyn Event`.

## Node managers

The largest change is in how most services work. The server now contains a list of `NodeManager`s, an idea stolen from the .NET reference SDK, though I've gone further than they do there. Each node manager implements services for a collection of nodes, typically the nodes from one or more namespaces. When a request arrives we give each node manager the request items that belongs to it, so when we call `Read`, for example, a node manager will get the `ReadValueId`s where the `NodeManager` method `owns_node` returns `true`.

There are some exceptions, notably the `view` services can often involve requests that cross node-manager boundaries. Even with this, the idea is that this complexity is hidden from the user.

Implementing a node manager from scratch is challenging, see `node_manager/memory/diagnostics.rs` for an example of a node manager with very limited scope (but one where the visible nodes are dynamic!).

To make it easier for users to develop their own servers, we provide them with a few partially implemented node managers that can be extended:

 - The `InMemoryNodeManager` deals with all non-value attributes, as well as `Browse`, and provides some convenient methods for setting values in the address space. Node managers based on this use the old `AddressSpace`. Each such node manager contains something implementing `InMemoryNodeManagerImpl`, which is a much more reasonable task to implement. See `tests/utils/node_manager.rs` for a very complete example, or `node_manager/memory/core.rs` for a more realistic example (this node manager implements the core namespace, which may also be interesting).
 - The `SimpleNodeManager` is an approximation of the old way to use the SDK. Nodes are added externally, and you can provide getters, setters, and method callbacks. These are no longer part of the address space.

More node managers can absolutely be added if we find good abstractions, but these are solid enough to let us implement what we need for the time being.

# Lost features

Some features are lost, some forever, others until we get around to reimplementing them. I could have held off on this PR until they were all ready, but it's already large enough.

 - Diagnostics are almost entirely gone, though there is a sort of framework for them. In practice the whole thing needs a rethink, and it's an isolated enough task that it made sense to leave out for now.
 - Setting sampling interval to `-1` no longer works. I wanted to make everything work, but in typical OPC-UA fashion some features are just incredibly hard to properly implement in a performant way. I'm open for suggestions for implementing this in a good way, but it's such a niche feature that I felt it was fine to leave it out for now.
 - The web server, as mentioned above.
 - Auditing. Same reason as diagnostics, but also because events are so cumbersome at the moment.

# General improvements

Integration tests are moved into the library as cargo integration tests, and they are quite nice. I can run `cargo test` in about ~30 seconds, most of which is spent on some expensive crypto methods. There is a test harness that allows you to spin up a server using port `0`, meaning that you get dynamically assigned a port, which means we can run tests in parallel arbitrarily.

This almost certainly fixes locka99#359, locka99#358, locka99#324, locka99#291, and locka99#281, and probably more.

# Future work

See `todo.md`, the loose ends mentioned in this PR description need to be tied up, and there is a whole lot of other stuff in that file that would be nice to do.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants