-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary encode #102
Binary encode #102
Conversation
Um... The only output I can find for the failed tests is from an apparently successful configuration. Does anyone know how to see what actually failed. |
Hmm, try again now - if you click on the "Details" next to the failed "Complete Jenkins Job (private)" here on github, then you get to the Jenkins page and in the "Configuration" there's "distro=f_33 distro=centos_stream distro=f_rawhide". If you click on one of them, then "Console Output" (on the left banner), then there's:
But I'm not sure why - I can run that just fine on my machine... hmm.. |
Regarding the Would be great if we could add a custom format specifier for An alternative would be to simply add And I can't think of any other sane alternative right now... So ok, let's got with the solution as you provided, it's probably the least problematic/disputable one. (With vector buffers it's a bit different because we're adding the iovecs, and the data is completely external.) The compiler just fusses a bit about this, but otherwise the patches look good: Thanks! |
Well. the cause of the unmap failure is obvious. The mmap failure is less so. If the length was 0, that would cause an EINVAL failure according the the man page. However, there should have been some things in the database, unless perhaps the request came in too soon (I'm not sure if that can even happen). Maybe there was an error on the server size. I'll look into it some more. At any rate, if nothing has been added to the buffer before buffer_get_fd() is called on prefixed buffers, it needs to change the size from 0 to MSG_SIZE_PREFIX_LEN. |
At its start, the db has only reserved keys in it. In case we're not building blkid module (I've noticed |
Like I mentioned in the pull request message, if you are willing to have base64_encode() alloc some memory that we copy to the buffer and then free, then we don't need alloc empty memory. It's just a question of whether avoiding writing to buffer memory from outside the buffer code is worth the extra malloc. I'm fine with either way.
I'll fix that.
|
FYI - you are welcome to use the Jenkins node to reproduce the results. We just need to take it offline/put it back in the pool. I think you've got permission to do that - but I'm happy to help organize etc. |
So to fix the "Failed to map memory with key-value store: Invalid argument." error, my plan is to go through the code and make sure that we check if buf->stat.usage.used == 0 on size-prefixed buffers and update it to count the prefix size, in all the places where we need to do that. There is one place were we don't need to update it, but I think we might want to. That's in buffer_write(). Currently, buffer_write_all() intentionally doesn't write anything, even the size prefix, if a prefixed buffer is empty. However, the callers of buffer_read() treat reading a complete empty buffer as different than reading a complete buffer of any other size. This is fine, since the callers of read_buffer() can't do anything with an empty buffer anyway, but it seems more sensible for the writers to still write the size prefix for empty buffers, and reads to treat them as having read a complete buffer. Obviously the readers will then need to check if they got the buffer size that they were expecting, which they don't currently do. This is a bug. They shouldn't just assume that because the buffer they got is non-zero, that it is the size they are expecting, and start accessing memory without checking. And since we need to check that the complete buffers we read are the size we were expecting anyways, I don't see why a complete buffer with no data should be treated any differently than any other complete buffer with the wrong size of data. Let me know if you think I should change the write_buffer() behavior as well. |
Let's stay with your current patch - probably better to avoid the extra malloc rather than being so strict about buffers...
OK. For linear buffers, that's clear... But for vector buffers, we need to take care that the The intention was to update the size prefix only when needed, that is right before writing out the buffer. So either before
Hmm, but the reader shouldn't see empty data (unless there's a problem during read). The write end only writes if there are data to write, otherwise skips (the
Yeah, could be done that way too. I just wanted to avoid any writes if there's no real data to write, just like it is with As for checking that we got the amount of data as given by the prefix, that should be covered by: sid/src/base/buffer-type-linear.c Lines 339 to 341 in 0370661
Or did you mean a different check? |
I would personally prefer to always pass the fd, even if it's empty. But as long as we pass the fd, if you really don't like updating the size, we don't have to in this case, The receiver can still read the size prefix from the passed memfd. Whether we set it to 0 or MSG_SIZE_PREFIX_LEN, the receiver can know that if it's not larger than MSG_SIZE_PREFIX_LEN, the buffer is empty.
What I'm talking about is when buffer_read() returns 0 bytes read (without an error), but buffer_is_complete() returns false. This is a complete but empty buffer (buffer_is_complete() will always return false for empty buffers), and callers of buffer_read() treat this differently then when they read a complete buffer of any other size.
Unless someone does Also, I'm just not sure that this gets us, except for extra code on the reader end to deal with the special case of empty prefixed buffers. The code to check if you read is a complete but empty buffer with a prefix either already needs to be there to check for complete but too small buffers, or already is there like here: sid/src/resource/worker-control.c Lines 252 to 260 in 352df2a
which already handles the case of an empty buffer where the prefix was written.
I was talking about: Lines 3629 to 3634 in 352df2a
If buffer_is_complete() returns true, the code also assumes that the buffer it read is large enough to hold a usid_msg_header. But just because the entire client buffer was sent, doesn't mean that the entire client buffer was large enough to hold a usid_msg_header. To avoid the possibility of reading off the end of the buffer, sid needs to check that msg.size is large enough before accessing msg.header. Checking if the buffer is empty isn't a special case if the prefix size is written for empty buffers. It just gets caught in the check for if the buffer is too small. At any rate, this is your project. So if you want to leave buffer_write() and buffer_get_fd() the way they are, I'll do that. |
The arguments to _build_kv_buffer() have been changed to make it possible to control whether or not udev keys are dumped. _cmd_exec_dump() now also dumps the reserved udev keys that are saved to the main kv_store.
With vector buffers, it is possible to first allocate space and add it to the buffer, and then fill it. This wasn't possible with linear buffers, which required already existing space to copy the data from. buffer_add() should allow a NULL data pointer, and simply memset the allocated space to zero in this case. This allows linear buffers to be used with functions that are designed to fill preallocated memory.
Copy BSD Licensed base64 code from: https://web.mit.edu/freebsd/head/contrib/wpa/src/utils/base64.c https://web.mit.edu/freebsd/head/contrib/wpa/src/utils/base64.h The source code has had its includes updated to compile correctly in the sid code base, and its LICENSE reference comments updated to point to the LICENSE file in the sid code. It has also been reformatted with to match the sid formatting style. and BSD License from: https://web.mit.edu/freebsd/head/contrib/wpa/
Add print_binary_* functions, to print base64 binary data. In order to keep from mallocing twice, add base64_len_encode() to get the encoded string size, and modify base64_encode() to encode the data to a preallocated buffer.
If the value doesn't only include printing characters or doesn't end with a NULL byte, print it as base64 encoded binary data.
Well, honestly, I'm still not quite sure about the issue here :) If we're expecting size-prefixed buffer on the read side, we can have one of these, which should cover all possibilities, I hope:
The only thing here is that the receiving ("read") side needs to know that we are expecting size-prefixed buffer. So the other end always needs to know what type of buffer to use - whether there should be a prefix or not.
Yeah, that's a bug if when looking at this from the perspective of "never sending empty buffers"!
Ehm, that's probably a superfluous check, because the writer shouldn't be even sending the empty buffer - so it shouldn't happen that the size prefix says that the overall buffer size == the size of the prefix itself and nothing else (== no data) - in that case, the buffer shouldn't be even sent on the other ("write") side. (If by chance we're sending an "empty buffer" with no data, that is a bug then.)
Oh! Nice catch! Sure, we need to fix that, add the check, of course...
Well, I see your point - the thing with trying to not send empty buffers and fiddling with the So if things get simplified and we have less potential for bugs with always setting the So sure, if things will get simplified, go ahead! I just wanted to present what was meant with the original intention (to hide the difference between |
Don't merge this yet. Regardless of whether or not we change how size prefixed write work, I have some cleanup that I notice I need to do on these commits. There's also another issue that came up at the meeting today. David Lehman would prefer a library interface for getting information from SID. It seems like sidctl could just become a wrapper around this library. This makes one of the choices I made in my last pull request seem less correct, in hindsight. I didn't want to make the kv_store value formats part of the interface (by defining them in iface/usid.h). However, this meant that sidctl needed to link against libsidresource. It would also mean that anything that linked against this new libsidctl library would need to link against libsidresource. It seems like I might be better to simply define the value formats in usid.h, to keep libsidctl as simple as possible. If we end with a solution that doesn't involve exporting the worker changes using a memfd, then we can switch to using whatever format we want for the dump transfer. Thoughts? |
Rewinding an empty size-prefixed buffer failed, even if the rewind target was valid (just after the size prefix). Calling buffer_get_data() on an empty size-prefixed buffer would return a value around SIZE_MAX, due to a size_t rollover. Also, the functions that read from a passed fd assumed that the size_prefix would always be non-zero, since mmap requires a non-zero length. This wasn't true for empty size-prefixed buffers.
When mmap() fails, it returns MAP_FAILED, not NULL, so check for that to see if we need to munmap().
OK.
Indeed, I somehow didn't realize fully that we actually created a completely new sid-tools dependency on sid-resource-libs. Now, looking at this again, the So what if we simply formatted the output on SID (daemon) side and just enhance the protocol to say in what format we need the output. Right now, we dump to internal format on daemon side, then we send the buffer over to Actually, the |
(On SID daemon side, the formatting would happen in a separate process, working on db snapshot, so we don't need to be afraid that we'd pause the main daemon code from accepting new requests while formatting. And from this perspective, it probably doesn't matter at all whether SID worker does the formatting or the |
So, it turns out I was overlooking some complexity here. Server functions that call buffer_read() currently accept input that comes along with a client side hang up. For _on_connection_event(), this means that after sid gets a command, and responds to the client, it will get a hangup event that sets EPOLLIN and EPOLLHUP. When we try to read the buffer, we will just get the EOF, with no actual data. With the existing code this will return 0, since the buffer has been reset, and is currently empty. So having this special case, where buffer_read() successfully returns 0 for reading no data with an empty buffer, is actually what handles client hangups. Now, we could change size prefixed buffers to always write the prefix, and still handle hangups. But it doesn't really remove complexity like I thought it did, and there's not much point in change for change's sake. |
Minor cleanups. No functional changes. |
I don't see anything wrong with having SID always responding to commands with a msg header and a series of strings designed to be printable. That's a pretty reasonable format. Things that want easily parsable output can either rely on json formatting, or by getting their output in separate strings, like we do for the udev variables. This does mean that we're transferring more data than we would with a different interface, but since we're currently building an output buffer and then printing it, just transferring the formatted output would take up less total memory than we currently use, for commands that transfer the data via memfds. |
I'm working changing where we do the formatting, and turning sidctl and usid into wrappers around a library in a separate branch. So from my point of view, this branch in good to merge. If there were any other changes you were waiting for, before merging it, let me know. |
Sure, let's have the other part separate, will merge this part now... |
This patchset adds support for dumping udev variables in sidctl, and printing binary dump data in base64 encoding. A number of the changes for encoding the base64 data are there to avoid mallocing memory for the encoded data, which would just be freed after writing it to the memfd buffer. If you don't care about the extra malloc for binary data, this patchset can get simplified.