Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow to publish array segments instead of full byte arrays #489

Closed
wants to merge 1 commit into from
Closed

allow to publish array segments instead of full byte arrays #489

wants to merge 1 commit into from

Conversation

zgabi
Copy link

@zgabi zgabi commented Oct 11, 2018

Proposed Changes

IModel.BasicPublish currently accepts only a byte array, so the library user usually needs to copy the data to a new byte array which has the required length. It is a quite big overhead and also needs more garbage collection.

This PR adds a new BasicPublish method to the IModel interface which accepts an offset and count parameters (like .NET's Stream write, etc)

Types of Changes

What types of changes does your code introduce to this project?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes issue #NNNN)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation (correction or otherwise)
  • Cosmetics (whitespace, appearance)

Checklist

  • I have read the CONTRIBUTING.md document
  • I have signed the CA (see https://cla.pivotal.io/sign/rabbitmq)
  • All tests pass locally with my changes
    I can see some exceptions when I run run-test.bat, but they are not releated to my change. I've set RABBITMQ_RABBITMQCLT_PATH but still complains.
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in related repositories

@michaelklishin
Copy link
Member

michaelklishin commented Oct 11, 2018

Thank you for your time and considering a contribution.

I find this API addition to be ugly and too specific to a single user's needs. Fundamental efficiency improvements such as the recent work by @Pliner is what we should be after. I 👎 this.

@michaelklishin
Copy link
Member

Also, as with all efficiency improvements it would be nice to see some benchmarks and what kind of difference does using this new method bring. "Quite a big overhead" is not something that our team can quantify.

@lukebakken
Copy link
Contributor

the library user usually needs to copy the data to a new byte array which has the required length

Hmm users use serialization classes and / or ToByteArray methods almost all of the time. I'm not sure what specific use-case this addresses.

@bording
Copy link
Collaborator

bording commented Oct 11, 2018

Hmm users use serialization classes and / or ToByteArray methods almost all of the time. I'm not sure what specific use-case this addresses.

That touches on the larger issue, in that ArraySegment is a rather difficult type to use. It was never plumbed through all of the BCL APIs that you'd want to actually be able to use it easily.

On the other hand, Span<T> / Memory<T> have gotten that additional support (on .NET Core only at the moment), so moving to using those is a much more interesting proposition.

@zgabi
Copy link
Author

zgabi commented Oct 11, 2018

Yes, the best would be to use Span/ReadOnlySpan, but it is not available in .NET framework.
By the way, ArraySegment is only in the internal methods, the public interface is accepting offset and count as other .NET Stream methods.

So I'd prefer Spans, but the existing code is already using ArraySegments in other classes, this is why I choosed that.

I'll try to create some performace test.

And yes, it is ugly to have 2 methods (which accepts only the byte array, and the 2nd which accepts the offset/count, too), but I did not want to add breaking change. When a breaking change is not a problem, change the existing method.

I think nowadays a good serializer is using a buffer pool to avoid memory allocations. But because of pooling the returned buffer is maybe bigger than needed. And RabbitMQ can't handle that, we have to copy the data to a new buffer which has the exact size.

@zgabi
Copy link
Author

zgabi commented Oct 11, 2018

And one more thing, @Pliner 's change is reduces the allocation inside RabbitMQ client. In my case the extra allocation is in the caller's code, so it is not an alternative, both can/should be applied.

Or as you mentioned, it can use Spans (for BasicPublish I think it should be ReadOnlySpan), it would solve my issue, too, but it is a bigger breaking change.

@michaelklishin
Copy link
Member

We have been doing breaking changes in master for 6.0 for over a month now. The only question here is, what .NET versions can we support with that kind of API.

@bording
Copy link
Collaborator

bording commented Oct 11, 2018

We have been doing breaking changes in master for 6.0 for over a month now. The only question here is, what .NET versions can we support with that kind of API.

While the Span and Memory types are available for older versions of the .NET Framework via the System.Memory package, Span on the .NET Framework is the "slow" version that doesn't work as well because it requires CLR changes that are only in .NET Core.

You also don't get all of the BCL overloads that take Span/Memory on the .NET Framework yet.

So, while switching to Span would be the best option from an efficiency standpoint, realistically, doing that means you're running on .NET Core 2.1 and up only.

@michaelklishin
Copy link
Member

@bording thank you for the insightful response. We should file a new issue about a Span/Memory overload for a future version (which would admittedly be a couple of years away).

Let's wait for the numbers before we make the call on this change.

@zgabi
Copy link
Author

zgabi commented Oct 11, 2018

Just found an already existing issue for this PR: #303

But as you wrote ArraySegment is not so nice, For spans we have to wait "couple of years", I think offset + length is the best solution currently which is also widely available in .NET framework.

I'd be happy to modify this PC according your preferences, just let me know what interface parameter changes would you accept.

If you prefer I can change the internal ArraySements parameters to offset+length, too, but it needs a smaller change in the Apigen class too, since the body parameter now has a AmqpContentBodyMapping attribute, and similar attributes will be needed for the offset and length.

@kjnilsson
Copy link
Contributor

I don't mind the changes to the public api (BasicPublish) and I think they could be useful for a small number of users. The problem as I see it is that the internals now expect an array segment which means that the common case (where a user provides and wants to send a full buffer) requires an ArraySegment to be created. I don't know if .NET does something special in this case but this seems like an unnecessary object allocation?

I'd like to see the PR cut down as @zgabi suggests to allow users to specify offset and length instead and also mirror that internally. That should satisfy the the users that want to avoid copying.

@zgabi
Copy link
Author

zgabi commented Oct 11, 2018

ArraySegment is not a class, so it is not an unnecessary object allocation.

So is it ok for everybody if I add offset and length everywhere?

By the way, in this case I think this method could be changed, too:

internal static ArraySegment<byte> GetBufferSegment(this MemoryStream ms)

Return only the byte[], the length is always ms.Position anyway. With this change all existing ArraySegment reference will be removed from RabbitMQ client.

@michaelklishin
Copy link
Member

michaelklishin commented Oct 11, 2018

@zgabi not sure what you mean by "everywhere" but it is not OK to modify existing methods. I don't think this user-specific API element should be driving the implementation either.

@zgabi
Copy link
Author

zgabi commented Oct 11, 2018

Eveywhere = where I added ArraySegment I'll add int offset and int count instead of changing the byte[] to ArraySegment.

I don't mind how (int parameters, ArraySegment, *Span, other custom class - ok class is a very bad idea, it is an unnecessary allocation again -, custom struct), but the actual length of the given buffer somehow should be passed to the Command class.

Another question is:
Add a new overload to the IModel interface or change the existing.
Adding a new method is ugly, modifying is breaking change. Probably all of the users are using the BasicPublish method, so changing it affects everybody, however the fix is trivial for them.

@michaelklishin
Copy link
Member

I'm afraid the consensus (or lack thereof) around this PR is: we are not convinced such use case-specific changes should be accepted to the API. We are open to being convinced otherwise but there hasn't been much activity => closing.

Thank you for taking the time, @zgabi!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants