Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't create workflow with template bigger than 8000 characters #413

Closed
invidian opened this issue Jan 11, 2021 · 11 comments
Closed

Can't create workflow with template bigger than 8000 characters #413

invidian opened this issue Jan 11, 2021 · 11 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@invidian
Copy link
Contributor

It seems workflow/template combination cannot be bigger than 8000 characters due to use of pg_notify.

Expected Behaviour

Workflow with template bigger than 8000 characters can be created.

Current Behaviour

Tink returns the following error:

rpc error: code = Unknown desc = INSERT: pq: payload string too long

Possible Solution

Probably don't use pg_notify (as partially suggested in #379 (comment)).

Steps to Reproduce (for bugs)

Context

While implementing https://github.com/tinkerbell/cluster-api-provider-tink#17 with updates sandbox to include events code, I noticed, that I can no longer create workflows.

See also meteor/postgres-packages#56

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:

  • Link to your project or a code example to reproduce issue:

@invidian
Copy link
Contributor Author

Hmm, it seems once hit, I'm not able to remove old templates either :/

@gianarb
Copy link
Contributor

gianarb commented Jan 12, 2021

I still regret the decision of using triggers. @mmlb do you have any feedback? We can not limit the size of a template here I suppose. The event should represent the current state of the resource, so we can't limit what we store.

@invidian every action creates an event, delete does not make an exception here

@gianarb gianarb added the kind/bug Categorizes issue or PR as related to a bug. label Jan 12, 2021
@invidian
Copy link
Contributor Author

@invidian every action creates an event, delete does not make an exception here

What I experienced it is I tried creating big workflow, then it blocked removing other templates for some reason.

@mmlb
Copy link
Contributor

mmlb commented Jan 12, 2021

:( I was not aware of that limitation of pg_notify, it seems we'll need to drop that then.

invidian added a commit to kinvolk/cluster-api-provider-tink that referenced this issue Jan 12, 2021
@gianarb
Copy link
Contributor

gianarb commented Jan 15, 2021

I would like to drop a 💣 . This looks like the right time.

I asked @gauravgahlot to help me with this issue. I realized that triggers are not that bad, but we have a problem with the overall concept of the template I think. Or at least, the way we implemented them is too simple. Let me explain, later you will understand the connection with this issue.

First, Kubernetes relays on ETCD watchers that are in some way similar to what we do with Postgres triggers. It is a database feature in both cases. Removing triggers is a bit too complicated because it requires coordination in a HA environment as @mmlb tried to say to me multi times, but I was too busy thinking about my own idea (now I get it sorry ✌️ ).

Template looks like this:

message WorkflowTemplate {
  string id = 1;
  string name = 2;
  reserved 3; // obsolete data
  google.protobuf.Timestamp created_at = 4;
  google.protobuf.Timestamp updated_at = 5;
  google.protobuf.Timestamp deleted_at = 6;
  string data = 7;
}

A workflow that starts from a template looks like this:

message Workflow {
  string id = 1;
  string template = 2;
  string hardware = 3;
  State state = 4;
  google.protobuf.Timestamp created_at = 5;
  google.protobuf.Timestamp updated_at = 6;
  google.protobuf.Timestamp deleted_at = 7;
  string data = 8;
}

A template can be: CREATED, DELETED, and UPDATED. The inconsistent is with the UPDATE action.

At time A: we create a TEMPLATE with ID: AAA-BBB
At time B: we create a workflow with ID: 111-222 from TemplateID: AAA-BBB
At time C: we update the template AAA-BBB

Now we can't figure out anymore what 111-222 did, because we lost that information. Possible solutions:

  1. Immutable template. That's easy, you can not UPDATE the template. not really friendly. When you delete a template what do you do? You delete the workflows as well or you accept that you won't be able to figure out what the workflow executed?
  2. Coming ou with a concept of revision. The literature here is solid. The Template.Data field does not contain the actual template but a reference to a particular revision. Revisions are immutable, when you update a template you create a new revision.

I like solution 2, and it solves the 8k hard limit because the workflow events won't carry on the data payload but only the revision (it also decreases the amount of data we replicate).

@invidian
Copy link
Contributor Author

@gianarb from user perspective, the Templates concept could remain the same, right? Just template update implementation would have to change I think. The workflow then will have a reference to specific template revision, which will be used when you run e.g. workflow get.

Also when you delete the template, you can remove all revisions which do not have any references. The same trigger could occur after removing the workflow to keep things clean.

Otherwise this seems like a reasonable solution to me 👍

@gianarb
Copy link
Contributor

gianarb commented Jan 15, 2021

Yes the UX is the same. The user won't notice. If we implement revision we can come with some sort of extended UX to be able to review and jump from a revision to another.

About the cleanup:

  1. Revisions are immutable, you can' remove them. We can probably implement some TTL if needed but only if a revision is not referenced in any workflow.
  2. Revision gets delete if you remove a template
  3. You should not be able to remove a template if there are workflows associated with it

I am saying that because we need to be careful, storage is cheap. I don't want to remove a revision or a template that can help to debug an issue even if it happened 1 month ago.

@mmlb
Copy link
Contributor

mmlb commented Jan 20, 2021

@gianarb I think the overall idea makes sense. I don't know if we need to deal with template revisions at all though. I do quite like idea of a template is immutable, I was likely to propose this at some point. If you want to update a template... just create a new one instead. It makes things easier to understand. Workflows are tied to templates, you can't do anything to a template that would affect a workflow. I don't think we need to further complicate things by introducing template revisions :D.

GC can be a whole separate conversation and while the storage is cheap and its not a problem, lets ignore it :D.

@stolsma
Copy link

stolsma commented Jan 21, 2021

@gianarb if this is what you are proposing:
tb_trigger

Then I like it! ;-) Immutability for WorkflowTemplates is needed! Maybe we can do the same with the Hardware table (change traceability would be nice there too), split data from definition as I think there could happen the same problem if more and more "action" definition metadata will be stored there. Or I'm a wrong there?

@gianarb
Copy link
Contributor

gianarb commented Jan 21, 2021

GC can be a whole separate conversation and while the storage is cheap and its not a problem, lets ignore it :D.

yep right @mmlb .

If you want to update a template... just create a new one instead.

This is too simple. We give the possibility to use to specify template name and one of the thread off is that they will have to delete and re-create a new template if they want to keep the same name (but deletion is logical because if they have workflows stored it can't go away, it is more li a deactivation (you can't start new workflows from this template). If we want to keep a friendly name for a template we need to put a revision place. The complexity we want to expose to users: rollback for example is on us. We can start with just GET revision by ID for now.

if we remove the template name as a concept we can make the templates immutable but they won't be that friendly to use I suppose because discoverability will be pretty bad. 👍


@stolsma yes, that's the idea! We can discuss the name we adopt. Revision desbrices the domain very well from my experience. So: WorkflowTemplate.current should be WorkflowTemplate.latest_revision for example. And we can call WorkflowTemplateData something like WorkflowTemplateRevision or similar. Not sure yet but the picture is right.


About hardware, I sort of agree but I think you are asking for an overall concept of versioning. Kubernetes has that feature built for all the resources because ETCD serves that for free. With Postgres, I am not sure how we can get that.
The solution I would like to provide to your question is: "if you want to version the visualize the evolution of any resource you can use the event system Tinkerbell provides. You can Watch for Hardware Update and Create an event and you can store those events elsewhere". I am not sure at this point we have the possibility to onboard such a feature as part of Tinkerbell. But you can implement that yourself today via the events system.

@gauravgahlot gauravgahlot self-assigned this Feb 9, 2021
gauravgahlot added a commit to gauravgahlot/tink that referenced this issue Feb 17, 2021
Fixes: tinkerbell#413

The migration introduces template revisions. A template can have multiple
revisions, which are stored in the 'template_revisions' table.

The migration also transforms the existing templates to the new
template-revisions structure. Therefore ensuring that end users do not
face any issues with the introduction of revisions.

Signed-off-by: Gaurav Gahlot <[email protected]>
@gauravgahlot gauravgahlot mentioned this issue Feb 17, 2021
3 tasks
gauravgahlot added a commit to gauravgahlot/tink that referenced this issue Feb 23, 2021
Fixes: tinkerbell#413

The migration introduces template revisions. A template can have multiple
revisions, which are stored in the 'template_revisions' table.

The migration also transforms the existing templates to the new
template-revisions structure. Therefore ensuring that end users do not
face any issues with the introduction of revisions.

Signed-off-by: Gaurav Gahlot <[email protected]>
gauravgahlot added a commit to gauravgahlot/tink that referenced this issue Mar 1, 2021
Fixes: tinkerbell#413

The migration introduces template revisions. A template can have multiple
revisions, which are stored in the 'template_revisions' table.

The migration also transforms the existing templates to the new
template-revisions structure. Therefore ensuring that end users do not
face any issues with the introduction of revisions.

Signed-off-by: Gaurav Gahlot <[email protected]>
gauravgahlot added a commit to gauravgahlot/tink that referenced this issue Mar 5, 2021
Fixes: tinkerbell#413

The migration introduces template revisions. A template can have multiple
revisions, which are stored in the 'template_revisions' table.

The migration also transforms the existing templates to the new
template-revisions structure. Therefore ensuring that end users do not
face any issues with the introduction of revisions.

Signed-off-by: Gaurav Gahlot <[email protected]>
gauravgahlot added a commit to gauravgahlot/tink that referenced this issue Mar 11, 2021
Fixes: tinkerbell#413

The migration introduces template revisions. A template can have multiple
revisions, which are stored in the 'template_revisions' table.

The migration also transforms the existing templates to the new
template-revisions structure. Therefore ensuring that end users do not
face any issues with the introduction of revisions.

Signed-off-by: Gaurav Gahlot <[email protected]>
gauravgahlot added a commit to gauravgahlot/tink that referenced this issue Mar 12, 2021
Fixes: tinkerbell#413

The migration introduces template revisions. A template can have multiple
revisions, which are stored in the 'template_revisions' table.

The migration also transforms the existing templates to the new
template-revisions structure. Therefore ensuring that end users do not
face any issues with the introduction of revisions.

Signed-off-by: Gaurav Gahlot <[email protected]>
gauravgahlot added a commit to gauravgahlot/tink that referenced this issue Mar 16, 2021
Fixes: tinkerbell#413

The migration introduces template revisions. A template can have multiple
revisions, which are stored in the 'template_revisions' table.

The migration also transforms the existing templates to the new
template-revisions structure. Therefore ensuring that end users do not
face any issues with the introduction of revisions.

Signed-off-by: Gaurav Gahlot <[email protected]>
@gianarb
Copy link
Contributor

gianarb commented Apr 19, 2021

With #464 events are just a memory. With them this issue!
Closing this for now

@gianarb gianarb closed this as completed Apr 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants