nodes: add projects and projects_data to process node call #392

vjeffrey · 2019-05-20T19:56:57Z

🔩 Description

fixes #365

ingested nodes are tagged with projects as part of the ingestion process.
this change ensures that

the projects for the node are passed on to the node definition at api/v0/nodes
the projects data (for nodes not from scan job) is passed on to the node definition at api/v0/nodes

👍 Definition of Done

projects info and projects data are passed through to nodemanager process node

👟 Demo Script / Repro Steps

rebuild nodemanager
ingest and compliance too
send nodes in (chef_load_nodes and chef_load_compliance_scans -N20)
open up nodemanager db
see some info there for projects data. if nodes are tagged with projects also see those associations in the nodes_projects table

vjeffrey · 2019-05-20T19:57:59Z

components/compliance-service/ingest/pipeline/publisher/node_manager_publisher.go

+			// send to node manager from here.
+			log.Infof("send info about node %s to node manager", msg.Report.NodeName)
+
+			nodeMetadata, err := gatherInfoForNode(msg)


had to move the call to process node to the compliance ingestion publisher so i could access projects as well as node status

phiggins

The code looks fine, but I'm not sure I understand what the projects data is for. How does it get used? If that information comes from the report, could the node have a reference to the report id an access the data that way?

vjeffrey · 2019-05-21T06:50:52Z

I'm not sure I understand what the projects data is for

#365 (comment)
-- in that issue we were discussing how to keep the nodes in sync with the project update manager: "We will have to ensure that the node manager stores all the fields necessary for the project ingest rules."

lancewf · 2019-05-21T18:27:16Z

components/nodemanager-service/api/manager/manager.proto

@@ -76,6 +76,8 @@ message NodeMetadata {
 	repeated chef.automate.domain.compliance.api.common.Kv tags = 10;
 	chef.automate.domain.nodemanager.api.nodes.LastContactData run_data = 11;
 	chef.automate.domain.nodemanager.api.nodes.LastContactData scan_data = 12;
+	repeated chef.automate.domain.nodemanager.api.nodes.ProjectsData projects_data = 13;


Could it be possible to add each of the values needed for the project data as a field in the database? One reason is that I am not sure how to create an SQL update query if all the data is in JSON. The other reason this information might be useful for the 'one node view', because these fields are the main fields used in client runs. If we are already collecting this information it would be helpful to make it more accessible.

I am thinking in the future we might want to move away from the node-state index and more towards using the node manager.

update nodes set projects = $1 where projects_data ->> 'environment' AND... i would imagine a sql query would look something like that.

i'm open to moving them to their own fields, but before going down that road, i'm curious about a thing: i'm not clear on what best practices really are for postgres -- is it ok to have a bunch of columns? is querying using the json functionality slower than doing it on fields that are each stored as a column in db? if we say we want them as individual columns, should we then have a projects_data table and a nodes_projects_data table, similar to the setup we have for tags?

Oh, I did not know you could search in JSON like that in SQL. That is nice. I don't know what would be faster. I will look around in the morning to see what I can find.

I am fine with how this is now. In the future, if we need to can we also use that JSON trick to migrate the data into columns?

ya, we should be able to do that

My possibly inaccurate memories of this stuff:

The downside to adding a lot of columns to a table is each row takes up more space on disk/in memory and the usual code quality concerns (eg the table represents several related concepts rather than one thing and you need to use special syntax so libraries don't always support it). For adding columns vs a json column, I think you probably get the same result either way, but just to guess, postgres is probably more optimized around columns vs json because that's a more general use case.

Accessing json columns isn't significantly slower, but there is a bit of overhead. If the performance is a concern you can add indexes on json keys just like regular columns.

To me the compelling use case for json columns is when the data isn't known ahead of time, eg the json comes from a third party with no defined schema, or the data is so variable that most rows wouldn't have data for all the fields.

vjeffrey · 2019-05-28T12:48:12Z

i'm gonna rebase on master

Signed-off-by: Victoria Jeffrey <[email protected]>

vjeffrey commented May 20, 2019

View reviewed changes

vjeffrey force-pushed the vj/process-node-projects-data branch from 5ca9654 to eecf82b Compare May 20, 2019 22:00

phiggins approved these changes May 20, 2019

View reviewed changes

lancewf reviewed May 21, 2019

View reviewed changes

lancewf approved these changes May 22, 2019

View reviewed changes

Victoria Jeffrey added 8 commits May 28, 2019 06:49

add projects data and projects to the process node func

ddc246c

move call to nodemanager in compliance ingestion to publisher

17dab9f

fixes now that i have internet again

0d6833d

testing all the bits

b85653c

Signed-off-by: Victoria Jeffrey <[email protected]>

cleanup

a3ff1ca

Signed-off-by: Victoria Jeffrey <[email protected]>

add a test for the projects data

b88e96a

Signed-off-by: Victoria Jeffrey <[email protected]>

fix the unit test

8e23994

Signed-off-by: Victoria Jeffrey <[email protected]>

fixes

66a0034

Signed-off-by: Victoria Jeffrey <[email protected]>

vjeffrey force-pushed the vj/process-node-projects-data branch from eecf82b to 66a0034 Compare May 28, 2019 12:49

vjeffrey merged commit c83a296 into master May 28, 2019

chef-ci deleted the vj/process-node-projects-data branch May 28, 2019 15:50

ekmixon mentioned this pull request Dec 1, 2023

[Snyk] Fix for 1 vulnerabilities ekmixon/automate#145

Open

ekmixon mentioned this pull request Dec 20, 2023

[Snyk] Fix for 29 vulnerabilities ekmixon/automate#149

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nodes: add projects and projects_data to process node call #392

nodes: add projects and projects_data to process node call #392

vjeffrey commented May 20, 2019

vjeffrey May 20, 2019

phiggins left a comment

vjeffrey commented May 21, 2019

lancewf May 21, 2019

vjeffrey May 21, 2019

lancewf May 21, 2019

vjeffrey May 22, 2019

phiggins May 24, 2019

vjeffrey commented May 28, 2019

nodes: add projects and projects_data to process node call #392

nodes: add projects and projects_data to process node call #392

Conversation

vjeffrey commented May 20, 2019

🔩 Description

👍 Definition of Done

👟 Demo Script / Repro Steps

vjeffrey May 20, 2019

Choose a reason for hiding this comment

phiggins left a comment

Choose a reason for hiding this comment

vjeffrey commented May 21, 2019

lancewf May 21, 2019

Choose a reason for hiding this comment

vjeffrey May 21, 2019

Choose a reason for hiding this comment

lancewf May 21, 2019

Choose a reason for hiding this comment

vjeffrey May 22, 2019

Choose a reason for hiding this comment

phiggins May 24, 2019

Choose a reason for hiding this comment

vjeffrey commented May 28, 2019