-
Notifications
You must be signed in to change notification settings - Fork 730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async container info #1326
Async container info #1326
Conversation
843d0ea
to
859bafa
Compare
0ab8777
to
aa5c232
Compare
I think this is in pretty good shape, although I still want to do some perf tests. Could you take a look? |
Ok could you take another look? One pending source of failures on osx is tbb. I could just change it to protect all that stuff with HAS_CAPTURE but I'd like to avoid it if possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only remaining comment is non-blocking.
09ca9ec
to
60e4ce5
Compare
Any opinions on getting rid of m_enabled = false on the first failed container metadata lookup? I think it's more robust this way--what I found in busy environments is that it was pretty easy to get a single lookup failure which would effectively disable all future lookups. |
b14a0ec
to
8fe1adb
Compare
d332d31
to
0eb84d4
Compare
@@ -195,7 +156,6 @@ docker::docker_response libsinsp::container_engine::docker::get_docker(sinsp_con | |||
{ | |||
case 0: /* connection failed, apparently */ | |||
g_logger.format(sinsp_logger::SEV_NOTICE, "Docker connection failed, disabling Docker container engine"); | |||
m_enabled = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where does this happen now? or do we keep trying to hit docker forever now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that was my last general comment on the PR. We always try to fetch forever now. The general consensus was that this was better than giving up forever after the first failure.
6a5a176
to
6e888f6
Compare
/* PPME_CONTAINER_JSON_E */{"container", EC_INTERNAL, EF_SKIPPARSERESET | EF_MODIFIES_STATE, 1, {{"json", PT_CHARBUF, PF_NA} } }, | ||
/* PPME_CONTAINER_JSON_X */{"container", EC_INTERNAL, EF_UNUSED, 0}, | ||
/* PPME_CONTAINER_JSON_E */{"container", EC_PROCESS, EF_MODIFIES_STATE, 1, {{"json", PT_CHARBUF, PF_NA} } }, | ||
/* PPME_CONTAINER_JSON_X */{"container", EC_PROCESS, EF_UNUSED, 0}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the flags change? (not saying it's unneeded, trying to understand)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EC_INTERNAL means they aren't returned from sinsp::next(), but we rely on the container event now to indicate that a container has been created. EF_SKIPPARSERRESET means it would be skipped by falco and also the thread state isn't properly created in sinsp_parser::reset(), but we want the thread info for the event.
return Json::FastWriter().write(obj); | ||
} | ||
|
||
bool sinsp_container_manager::container_to_sinsp_event(const string& json, sinsp_evt* evt) | ||
bool sinsp_container_manager::container_to_sinsp_event(const string& json, sinsp_evt* evt, int64_t tid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't tids unsigned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in threadinfo.h: https://github.com/draios/sysdig/blob/dev/userspace/libsinsp/threadinfo.h#L238
container_info.m_type = s_cri_runtime_type; | ||
// It might also be a docker container, so set the type | ||
// to UNKNOWN for now. | ||
container_info.m_type = CT_UNKNOWN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when both docker and cri fail to return metadata for a container? Are we stuck with CT_UNKNOWN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I suppose we could have a combined type or just pick one of them, but I think it'd be better to have this explicit type for when we don't know the metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gnosek and I chatted offline and we're going to get rid of CT_UNKNOWN and rely on m_metadata_complete to internally denote that container info is incomplete. Since docker runs first, it will set the type to CT_DOCKER. If cri looks up info successfully it will set the type properly. If both fail, the type will remain at CT_DOCKER, which seems as good as you can given the cgroup layouts are the same.
// use this to ensure that the tid of the CONTAINER_JSON event | ||
// we eventually emit is the top running thread in the | ||
// container. | ||
typedef tbb::concurrent_hash_map<std::string, int64_t> top_tid_table; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to track the top threadinfo (or rather, why do we need a special thread chosen from the container at all)?
What happens if we fail to find an event from the toppest-of-top tids during the async lookup (and presumably use a child tid instead)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We track the threadinfo so the container event has some thread info associated with it. And we need to update this on the fly as it's very common that the initial process that created the container exits once the actual entrypoint is started, so without any extra steps you'd have a container event with a zombie threadinfo.
It's not the end of the world if it's not exactly the top thread but it is useful to be some thread in the container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not convinced we need a tid in the container event. Where do we use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we aren't doing any filtering of containerinfo
s with incomplete metadata in the container.*
filterchecks code, use cases like sysdig 'container.id != host' -p "%container.image"
are now returning incomplete
until the fetching is completed.
Can we go ahead and do it? Or have you already thought about this and something worse happens if we do?
This also means that Falco rules with "negative predicates" - !=
, not in
... - based on most container fields will misbehave unless we address it in some way, right?
userspace/libsinsp/container_info.h
Outdated
@@ -134,7 +134,8 @@ class sinsp_container_info | |||
m_cpu_period(100000), | |||
m_has_healthcheck(false), | |||
m_healthcheck_exe(""), | |||
m_is_pod_sandbox(false) | |||
m_is_pod_sandbox(false), | |||
m_metadata_complete(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't the default value false
? Can't we set it to true
immediately if the metadata fetch is synchronous and when we have all the data for the async case? (this is also related to the "top-level" PR review comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept the default at true so container engines that don't have any notion of background metadata lookups can just create a container_info and know the metadata will be flagged as complete. It's only docker (and later cri) that will set the flag to false and then true when the metadata is all fetched.
About the container info with incomplete, I'll check but I thought the way I had it set up is that the container event e.g. an event that can be returned via sinsp::next() is only emitted when the info is complete. The container manager does create a container info object with image, etc == incomplete, but I intended that there's no sinsp event emitted. Yeah double checked with the following:
|
Yes the container event is emitted only when the metadata fetching is completed, but that's not what I'm talking about. Since the And let me reiterate, this means that Falco rules with "negative predicates" - |
34d7859
to
d3cea6d
Compare
Ok, here's a falco rules change that should make the container_started macro only trigger when there is complete metadata: falcosecurity/falco#570. |
ccaa47c
to
bdcd391
Compare
userspace/libsinsp/parsers.cpp
Outdated
const Json::Value& mesos_task_id = container["mesos_task_id"]; | ||
if(!mesos_task_id.isNull() && mesos_task_id.isConvertibleTo(Json::stringValue)) | ||
{ | ||
container_info.m_mesos_task_id = mesos_task_id.asString(); | ||
} | ||
|
||
#ifdef HAS_ANALYZER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we drop the #ifdef
? I'd hope we eventually remove the flag completely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll remove it, but I'll remove it around all references of m_metadata_deadline, not just this one, right?
Add a non-blocking queue to the inspector and in sinsp::next() try to read events from that queue, similar to how we look at m_metaevt. If any events are found, they are returned from sinsp::next() instead of reading from the libscap. In order to support truly portable sinsp_event objects, we need (optional) storage for the matching libscap event in m_pevt. This is in m_pevt_storage, NULL by default but if non-NULL will be freed. This will be used to pass container events to the inspector via the queue.
Make sure container_json events have a real thread id by passing it as an argument to container_to_sinsp_event. Modify notify_new_container to use the non-blocking queue to pass events to the inspector instead of m_meta_evt.
Previously, CONTAINER_JSON events had some, but not all, of the info that was in a sinsp_container_info object. Fix this so all important info is both dumped to json in container_to_json and parsed in parse_container_json_evt.
Refactor docker metadata fetches to be asynchronous, using the framework in sysdig::async_key_value_source. docker_async_source (a global static so it can be long-lived) is now responsible for looking up docker metadata. Some methods that used to be in docker:: like parse_docker(), get_docker(), etc move to docker_async_source. It dequeues requests, calling parse_docker() to look up the information as needed, and calls store_value() to pass the metadata back to the caller. docker::resolve now uses parse_docker_async to schedule the lookup with docker_async_source. Before scheduling the lookup it creates a stub container with type UNKNOWN (UNKNOWN because you can't tell the difference between cri and docker containers only from the cgroup), with the id set, and with a name/image set to "incomplete". This ensures that threads have some associated container info object with it. In the callback once the full metadata is available, it calls notify_new_container, which creates the CONTAINER_JSON event and pushes it to the inspector. There's a fair amount of bookeeping to make sure that the container metadata has a valid tid. The very first tid that creates the container often exits after forking of the real container entrypoint, so you need to keep track of the "top tid" in the container for every call to docker::resolve() and replace it if you find it's exited. Previously, on the first error fetching container metadata, a flag m_enabled would be set to false, and all subsequent attempts to fetch container metadata would be skipped. Now that lookups are done in the background, I think it makes sense to always try a lookup for every container, even after failures. So remove m_enabled entirely from the docker engine. Also, as a part of this change, reorganize the way the async lookups are done to better support the choices of different osen (linux, windows, and HAS_CAPTURE). Instead of compiling out the curl handles when HAS_CAPTURE is false, always compile the code for them but don't even attempt any lookups in docker::resolve. Note that docker_async_source::get_docker no longer changes behavior based on HAS_CAPTURE.
cri::resolve might be called after docker::resolve, so there could be a container_info object with type == UNKNOWN. Update it to handle this i.e. do the lookup anyway.
We need it now that tbb is a core part of the inspector and that the curl part of the docker container engine isn't #ifdefd with HAS_CAPTURE. Both are already downloaded/built as dependencies of sysdig so it's just a matter of CMakeLists.txt config. Also, disable libidn2 explicitly when building libcurl. On OSX builds this gets picked up from an autodetected library, while it does not get picked up on linux. We already disable libidn1, so also disable libidn2.
Instead of inferring is_pod_sandbox from the container name only for docker, save/load it to json directly from the container info. This ensures it works for other container engines than docker.
Instead of having a CT_UNKNOWN type, iniitally set the type to CT_DOCKER when starting the async lookup. Cri runs next and if the grpc lookup completes successfully this will be replaced with CT_CRIO. If both grpc and docker metadata lookups fail the type will remain at CT_DOCKER.
It's a bit simpler than including sinsp.h, which has many many dependent header files.
It's required for the container.* filterchecks to work.
Call detect_docker, which identifies a docker container, before starting the async lookup thread. If no docker containers are ever used, the async thread won't be started.
Instead of just setting the image to incomplete, set all container metadata fields to incomplete.
bdcd391
to
8d65ac8
Compare
It's filled in when using the analyzer, but we can define it in both cases and just not use it when there is no analyzer.
Changes to stop looking up docker container metadata inline with event processing and perform the lookups in the background instead. A new event PPME_CONTAINER_JSON_E (*) signals when the metadata lookup is complete and full information about the container exists.
While the metadata lookup is underway, a "stub" container is created with a name="incomplete", image="incomplete". This ensures that there is a valid container object for processes running in a container. Obviously, the negative impact is that while the lookup is underway, for any initial activity in the container you can't know anything about the container that it's running in, other than the container id.
This fixes #1321.
(*) The event previously existed, but had the EF_INTERNAL flag so it was hidden and not ever returned by the inspector. It was used to save container information to trace files.