-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nodes: add projects and projects_data to process node call #392
Conversation
// send to node manager from here. | ||
log.Infof("send info about node %s to node manager", msg.Report.NodeName) | ||
|
||
nodeMetadata, err := gatherInfoForNode(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had to move the call to process node to the compliance ingestion publisher so i could access projects as well as node status
5ca9654
to
eecf82b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks fine, but I'm not sure I understand what the projects data is for. How does it get used? If that information comes from the report, could the node have a reference to the report id an access the data that way?
#365 (comment) |
@@ -76,6 +76,8 @@ message NodeMetadata { | |||
repeated chef.automate.domain.compliance.api.common.Kv tags = 10; | |||
chef.automate.domain.nodemanager.api.nodes.LastContactData run_data = 11; | |||
chef.automate.domain.nodemanager.api.nodes.LastContactData scan_data = 12; | |||
repeated chef.automate.domain.nodemanager.api.nodes.ProjectsData projects_data = 13; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be possible to add each of the values needed for the project data as a field in the database? One reason is that I am not sure how to create an SQL update query if all the data is in JSON. The other reason this information might be useful for the 'one node view', because these fields are the main fields used in client runs. If we are already collecting this information it would be helpful to make it more accessible.
I am thinking in the future we might want to move away from the node-state index and more towards using the node manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update nodes set projects = $1 where projects_data ->> 'environment' AND...
i would imagine a sql query would look something like that.
i'm open to moving them to their own fields, but before going down that road, i'm curious about a thing: i'm not clear on what best practices really are for postgres -- is it ok to have a bunch of columns? is querying using the json functionality slower than doing it on fields that are each stored as a column in db? if we say we want them as individual columns, should we then have a projects_data table and a nodes_projects_data table, similar to the setup we have for tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I did not know you could search in JSON like that in SQL. That is nice. I don't know what would be faster. I will look around in the morning to see what I can find.
I am fine with how this is now. In the future, if we need to can we also use that JSON trick to migrate the data into columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya, we should be able to do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My possibly inaccurate memories of this stuff:
- The downside to adding a lot of columns to a table is each row takes up more space on disk/in memory and the usual code quality concerns (eg the table represents several related concepts rather than one thing and you need to use special syntax so libraries don't always support it). For adding columns vs a json column, I think you probably get the same result either way, but just to guess, postgres is probably more optimized around columns vs json because that's a more general use case.
- Accessing json columns isn't significantly slower, but there is a bit of overhead. If the performance is a concern you can add indexes on json keys just like regular columns.
To me the compelling use case for json columns is when the data isn't known ahead of time, eg the json comes from a third party with no defined schema, or the data is so variable that most rows wouldn't have data for all the fields.
i'm gonna rebase on master |
Signed-off-by: Victoria Jeffrey <[email protected]>
Signed-off-by: Victoria Jeffrey <[email protected]>
Signed-off-by: Victoria Jeffrey <[email protected]>
Signed-off-by: Victoria Jeffrey <[email protected]>
Signed-off-by: Victoria Jeffrey <[email protected]>
eecf82b
to
66a0034
Compare
🔩 Description
fixes #365
ingested nodes are tagged with projects as part of the ingestion process.
this change ensures that
api/v0/nodes
api/v0/nodes
👍 Definition of Done
projects info and projects data are passed through to nodemanager process node
👟 Demo Script / Repro Steps
rebuild nodemanager
ingest and compliance too
send nodes in (chef_load_nodes and chef_load_compliance_scans -N20)
open up nodemanager db
see some info there for projects data. if nodes are tagged with projects also see those associations in the nodes_projects table