-
Notifications
You must be signed in to change notification settings - Fork 911
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getting Null as container name #925
Comments
/milestone 0.19.0 |
/assign @fntlnz |
/assign @leodido |
From repo planning: @mfdii suggests that this has something to do with how we cache metadata. |
I confirm that I'm able to reproduce this
|
After a very long bisect session I've been able to identify that this was introduced after this PR in sysdig: draios/sysdig#1326 |
The problem is that container metadata is now fetched asynchronously, so when Falco detects the event there's a possibility that container metadata hadn't been fetched yet, this is why sometimes we see the metadata and sometimes not. |
We are still working on this. I don't feel like we have room of improvement in this part until we introduce a processing queue (which is in plan for when we do the input gRPC api - #908). We could solve this by adding some kind of synchronization mechanism like a mutex but on most of the systems this would increase a lot the drop rate. So the plan is to postpone this until gRPC input interface is done and we have the syscall input with metadata fetching done. /milestone 1.0.0 |
Hey guys, |
Hey @nebi-frame, We haven't planned a date, but we can bring it up on the Wed call. Do you think you will be able to attend so we can talk more about this? |
Sure I’m available
…On Mon, Jan 6, 2020 at 16:58 Kris Nova ***@***.***> wrote:
Hey @nebi-frame <https://github.com/nebi-frame>,
We haven't planned a date, but we can bring it up on the Wed call. Do you
think you will be able to attend so we can talk more about this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#925?email_source=notifications&email_token=ALJTFQZ6OSUNODPNOFGKZULQ4OSPFA5CNFSM4JNQUOBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIG5UZI#issuecomment-571333221>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALJTFQ4CIIPJ5LPUM3LDH2LQ4OSPFANCNFSM4JNQUOBA>
.
|
Is this a thing where folks are looking for help? |
Yes!!! and welcome @holdenk |
Yes - we have an end user who is asking for a fix here - I am sure they would appreciate anyone looking at this. We can pay with stickers, t-shirts, hugs, and coffee. |
Awesome, if you've got some time I'll bug you with some detail questions as I start to get lost in a new code base :) |
From a quick chat in falco.cpp look where we call next and add mutexes around there. And then look for where the container metadata is coming from idk? I think I missed something ehre @kris-nova. |
So check out the main daemon loop in The mutex will need to lock right after this method call, and also wait for the "container metadata" context that comes from lua here We need to make it such that Falco can't go into deadlock if for some reason Lua returns We also need this mutex to be an opt-in feature that is exposed via a feature flag. Probably something like |
@kris-nova @holdenk maybe we don't need a mutex here. The asynchronous nature of container metadata is what is causing the problem here so a thing we can do is to add a flag to tell container metadata to not be asynchronous. Asynchronous metadata fetches can be deactivated by calling In our case, it can be done in Wdyt? |
@fntlnz what's the decision here? |
@fntlnz would making this call be sync slow things down unacceptably? |
@holdenk IMHO yes. Also, that would increase the drop rate noticeably. A temporary solution could be to introduce a flag for Falco that forces the inspector - ie., To do it there is this method: Thanks for looking into this 🤗 |
Discussing this in today's call to make a decision |
/assign @fntlnz |
Can I join the call as well? |
We made a decision, we are going to fix this by making the synchronous mode opt-in for those who want it. In this way we can fix the problem for everyone right now and we can focus on the inputs api to solve the root cause as leo was mentioning here. |
Thanks everyone, I'm excited for the synchronous mode 👍 |
/milestone 0.21.0 |
@fntlnz Thanks to updating this issue pls :) |
Anyone who wants to give the fix a try, you can help us by testing the artifacts here #1099 (comment) Remember to configure Falco to use the new flag or nothing will change! |
Amazing! thank you so much guys we really appreciate your efforts! |
What happened:
Sometimes we are getting
null
as container.name, parent process name from Falco.What you expected to happen:
A real value
How to reproduce it (as minimally and precisely as possible):
Listen for any event and look at what you get for
container.name
Anything else we need to know?:
We raised this issue on slack channel and some members suggested us to make sure falco starts after docker so we put sleep statements. Unfortunately, It seems problem still occurs.
Environment:
falco --version
): 0.17.1 - 0.18.0cat /etc/os-release
): Amazon Linux 2, ECSThe text was updated successfully, but these errors were encountered: