Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add plan command to the CLI #31

Merged
merged 7 commits into from
Dec 9, 2022
Merged

Add plan command to the CLI #31

merged 7 commits into from
Dec 9, 2022

Conversation

b-per
Copy link
Collaborator

@b-per b-per commented Nov 29, 2022

Fixes #29

First idea on how we could tackle plan.
Not a big fan of the UX around it and the function could be cleaner but I wanted to avoid copy/pasting most of the sync command.

Here is the output of python src/main.py plan <myfile>
image

I don't feel like having the plan output with the rest of the logs make it stand out enough.
We should maybe collect the changes and show them all at the end of the command?

@b-per b-per requested a review from nicholasyager November 29, 2022 13:59
Copy link
Contributor

@nicholasyager nicholasyager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@b-per This pull request definitely does what it says on the tin. Having that been said, I'm not 100% in love with the architecture leveraged to implement the sync vs plan logic.

Currently, each API operation is co-located, with a basic condition against no_update being checked to determine if the operation will be executed or planned. This works, but has the drawback that it's relatively brittle -- we can relatively easily introduce a defect that will result in an operation being executed instead of logged. For example, adding a new operation and forgetting to add the no_update check!

I wonder what other architectures we could use here? One option that immediately comes to mind is create a "ChangeSet" data structure, which tracks which objects will be operated on, and the operations we will be performing. For example, creating a job, updating an environment_id, deleting an environment variable, etc. If we have a ChangeSet encapsulate our set of operations, then plan would generate and report on that ChangeSet, whereas sync would apply that ChangeSet against dbt Cloud.

I'm sure there are many other valid and safe approaches available, too!

@b-per, are there any other necessary and sufficient approaches that you'd like to experiment with? Alternatively, am I overthinking this existing approach?

@b-per
Copy link
Collaborator Author

b-per commented Dec 2, 2022

I completely agree. This is super brittle and ugly 🤣 but it got the functionality out the door

I like the idea of a ChangeSet that would collect all changes and that we could then .print() or .apply() in the subcommandsplan and sync/apply.

I'll give it a go today!

@b-per
Copy link
Collaborator Author

b-per commented Dec 2, 2022

I have some code working at 90% but realized that this changeset brings one difficulty to the mix 😄

The logic is:

  1. we checks new/same/deleted jobs
    • we collect them for future display/action
  2. for each existing job (because it needs to exist and have an id) we check if the env var overwrite is correct
    • we collect the values for future display/action

This flow breaks when we add a new job which has env var overwrite, because in that case, we collect an action to create the job, but the job is not created at that time. So we don't collect an action to check the env vars on this job.

Right now, if we create a new job, after one sync it will be created without its env var overwrite, and a second sync would then pick up those env var overwrite.

I am going to think of an idea of how we can manage that but let me know if something comes to your mind directly on how to fix it.

(it was not a problem with the old approach because we were executing each step one by one, so when checking the env var, the job was already created)

@b-per
Copy link
Collaborator Author

b-per commented Dec 2, 2022

Easy fix would be to trigger twice in a row the apply 🤣

b-per added 4 commits December 3, 2022 00:02
This class stores the changes identified in 2 ways:
- a descriptive way, with an action/type/id
- a programmatic way, on how it should be handled (func and params)
The logic becomes a bit more complex to handle the case
where we setup env var overwrites in a new job that doesn't
exist in Cloud yet
@b-per
Copy link
Collaborator Author

b-per commented Dec 2, 2022

Pushed some changes with the ChangeSet discussed.

The new UX is much more flexible 🙏

Right now it is like this for plan
image

And for sync
image

Copy link
Contributor

@nicholasyager nicholasyager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@b-per This is a significantly more maintainable approach! Thank you for making these changes. Light testing suggests that this is working as expected ✅

🚢

Comment on lines +14 to +15
def __str__(self):
return f"{self.action.upper()} {string.capwords(self.type)} {self.identifier}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's room in the future for this to be a little more detailed (e.g. what was the value before, and what is it now?). This is a great start 👍🏻

identifier=identifier,
type="job",
action="update",
sync_function=dbt_cloud.update_job,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really clever approach! I appreciate how configurable this callback function approach will be 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide a plan command like Terraform
2 participants