Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nodes: add projects and projects_data to process node call #392

Merged
merged 8 commits into from
May 28, 2019

Conversation

vjeffrey
Copy link

🔩 Description

fixes #365

ingested nodes are tagged with projects as part of the ingestion process.
this change ensures that

  • the projects for the node are passed on to the node definition at api/v0/nodes
  • the projects data (for nodes not from scan job) is passed on to the node definition at api/v0/nodes

👍 Definition of Done

projects info and projects data are passed through to nodemanager process node

👟 Demo Script / Repro Steps

rebuild nodemanager
ingest and compliance too
send nodes in (chef_load_nodes and chef_load_compliance_scans -N20)
open up nodemanager db
see some info there for projects data. if nodes are tagged with projects also see those associations in the nodes_projects table

// send to node manager from here.
log.Infof("send info about node %s to node manager", msg.Report.NodeName)

nodeMetadata, err := gatherInfoForNode(msg)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to move the call to process node to the compliance ingestion publisher so i could access projects as well as node status

@vjeffrey vjeffrey force-pushed the vj/process-node-projects-data branch from 5ca9654 to eecf82b Compare May 20, 2019 22:00
Copy link
Contributor

@phiggins phiggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks fine, but I'm not sure I understand what the projects data is for. How does it get used? If that information comes from the report, could the node have a reference to the report id an access the data that way?

@vjeffrey
Copy link
Author

I'm not sure I understand what the projects data is for

#365 (comment)
-- in that issue we were discussing how to keep the nodes in sync with the project update manager: "We will have to ensure that the node manager stores all the fields necessary for the project ingest rules."

@@ -76,6 +76,8 @@ message NodeMetadata {
repeated chef.automate.domain.compliance.api.common.Kv tags = 10;
chef.automate.domain.nodemanager.api.nodes.LastContactData run_data = 11;
chef.automate.domain.nodemanager.api.nodes.LastContactData scan_data = 12;
repeated chef.automate.domain.nodemanager.api.nodes.ProjectsData projects_data = 13;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be possible to add each of the values needed for the project data as a field in the database? One reason is that I am not sure how to create an SQL update query if all the data is in JSON. The other reason this information might be useful for the 'one node view', because these fields are the main fields used in client runs. If we are already collecting this information it would be helpful to make it more accessible.

I am thinking in the future we might want to move away from the node-state index and more towards using the node manager.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update nodes set projects = $1 where projects_data ->> 'environment' AND... i would imagine a sql query would look something like that.

i'm open to moving them to their own fields, but before going down that road, i'm curious about a thing: i'm not clear on what best practices really are for postgres -- is it ok to have a bunch of columns? is querying using the json functionality slower than doing it on fields that are each stored as a column in db? if we say we want them as individual columns, should we then have a projects_data table and a nodes_projects_data table, similar to the setup we have for tags?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I did not know you could search in JSON like that in SQL. That is nice. I don't know what would be faster. I will look around in the morning to see what I can find.

I am fine with how this is now. In the future, if we need to can we also use that JSON trick to migrate the data into columns?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, we should be able to do that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My possibly inaccurate memories of this stuff:

  • The downside to adding a lot of columns to a table is each row takes up more space on disk/in memory and the usual code quality concerns (eg the table represents several related concepts rather than one thing and you need to use special syntax so libraries don't always support it). For adding columns vs a json column, I think you probably get the same result either way, but just to guess, postgres is probably more optimized around columns vs json because that's a more general use case.
  • Accessing json columns isn't significantly slower, but there is a bit of overhead. If the performance is a concern you can add indexes on json keys just like regular columns.

To me the compelling use case for json columns is when the data isn't known ahead of time, eg the json comes from a third party with no defined schema, or the data is so variable that most rows wouldn't have data for all the fields.

@vjeffrey
Copy link
Author

i'm gonna rebase on master

@vjeffrey vjeffrey force-pushed the vj/process-node-projects-data branch from eecf82b to 66a0034 Compare May 28, 2019 12:49
@vjeffrey vjeffrey merged commit c83a296 into master May 28, 2019
@chef-ci chef-ci deleted the vj/process-node-projects-data branch May 28, 2019 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add projects to processNode
3 participants