Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support Styx style schedule in flyteidl #529

Closed
3 of 13 tasks
honnix opened this issue Sep 29, 2020 · 8 comments
Closed
3 of 13 tasks

[Feature] Support Styx style schedule in flyteidl #529

honnix opened this issue Sep 29, 2020 · 8 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@honnix
Copy link
Member

honnix commented Sep 29, 2020

Motivation: Why do you think this is important?
When integrating with our scheduler Styx, we will need to extend the semantics of schedule to support Styx cron syntax, cron aliases, as well as offset.

Goal: What should the final outcome look like, ideally?
Scheduleprotobuf definition is extended to support Styx style schedule.

Describe alternatives you've considered
We tried reusing existing cron but the syntax is a bit different and flytekit doesn't support Styx style cron aliases; and more importantly, we need to support offset which is one the most important concepts in Styx.

Flyte component

  • Overall
  • Flyte Setup and Installation scripts
  • Flyte Documentation
  • Flyte communication (slack/email etc)
  • FlytePropeller
  • FlyteIDL (Flyte specification language)
  • Flytekit (Python SDK)
  • FlyteAdmin (Control Plane service)
  • FlytePlugins
  • DataCatalog
  • FlyteStdlib (common libraries)
  • FlyteConsole (UI)
  • Other

[Optional] Propose: Link/Inline
We could extend Schedule protobuf in the following way:

// Styx style schedule
// More details can be found at https://github.com/spotify/styx#schedule-string
message StyxSchedule {
    string schedule = 1;
    string offset = 2;
}

// Defines complete set of information required to trigger an execution on a schedule.
message Schedule {

    oneof ScheduleExpression {
        // Uses AWS syntax: "Minutes Hours Day-of-month Month Day-of-week Year"
        // e.g. for a schedule that runs every 15 minutes: "0/15 * * * ? *"
        string cron_expression = 1;
        FixedRate rate = 2;
        StyxSchedule styx_schedule = 4;       // <-----------------
    }

    // Name of the input variable that the kickoff time will be supplied to when the workflow is kicked off.
    string kickoff_time_input_arg = 3;
}

Additional context
The way we integrate Styx with Flyte is having a deployment service talking flyteadmin interface in the northbound and deploying entities to both Flyte and Styx.

flytekit -> deployment service -> Flyteadmin
                              \
                                -> Styx                     

The deployment service does not forward schedule to flyteadmin, but only to Styx.

Is this a blocker for you to adopt Flyte
Yes unfortunately.

@honnix honnix added enhancement New feature or request untriaged This issues has not yet been looked at by the Maintainers labels Sep 29, 2020
@schottra
Copy link
Contributor

This will require some additional work in flyteconsole in order to correctly display the schedule.
As things currently are, I think the schedule string will not render at all because it will exist in a field that Console is not aware of.

@honnix
Copy link
Member Author

honnix commented Sep 29, 2020

@schottra Thanks for pointing out! We will work on a PR to fix this if the proposed IDL change makes sense and gets merged.

@kumare3 kumare3 added this to the 0.9.0 milestone Sep 30, 2020
@honnix
Copy link
Member Author

honnix commented Oct 7, 2020

We are keeping the existing cron_expression supported and that should not fail anything on flyteconsole. To support the newly added proto message, we will need to make some changes that will come later.

@kumare3
Copy link
Contributor

kumare3 commented Oct 8, 2020

@honnix I think this can be closed right?

@honnix
Copy link
Member Author

honnix commented Oct 8, 2020

Sort of I think. Do we want to fix console to be able to render the new message?

@honnix
Copy link
Member Author

honnix commented Oct 20, 2020

@kumare3 @schottra I proposed a small change in flyteorg/flyteconsole#107. PTAL, thanks.

@honnix honnix removed the untriaged This issues has not yet been looked at by the Maintainers label Oct 20, 2020
@honnix honnix self-assigned this Oct 20, 2020
@kumare3
Copy link
Contributor

kumare3 commented Oct 20, 2020

@honnix, @schottra and I had a discussion today about the use of offsets.

Firstly, let us define offsets
Offsets are used to shift the passed in time parameter for a schedule using a duration field. The shift is done by actually changing the time of execution. Thus an hourly schedule offset by -P1H implies that the schedule is shifted by 1 hour in the past. This is useful to indicate that the execution should work on a partition or a dataset that was generated 1H in the past. It is independent of the window itself. For example, in the case where data lands after 3H's, but we run the schedule every hour and consider the data for 1H only, the window is 1H and the offset is 3H.

NOTE
Offsets can be completely implemented using just a cron schedule and default input parameters to a launch-plan. But, to achieve this the code would need to change. So to achieve the delayed window computation like the second case defined above, we could create a launch plan like this

  lp = wf.create_launch_plan(
         cron="@hourly",
         inputs={
                       "window": "P1H",
                       "delay": "P3H",
        },
)

Here window refers to the window of computation for the data. delay refers to the actual delay in computation of the data. Thus every hour, the computation may look back 3H and use data in the (T-3H, T-2H) period.

The same can be achieved using an offset

  lp = wf.create_launch_plan(
         cron=("@hourly", offset="P3H"),
         inputs={
                       "window": "P1H",
        },
)

Thus the schedule could be varied based on the data.

User interface consideration
In the UI we will show this as

*schedules*
every hour (offset (-) 3h)

Conclusion
We have decided to implement an offset in the IDL, but, use will be discouraged except for when using Styx, till we figure out the right set of usecases

@honnix
Copy link
Member Author

honnix commented Oct 21, 2020

I think we can close this now. Thanks for everyone's help.

@honnix honnix closed this as completed Oct 21, 2020
eapolinario pushed a commit to eapolinario/flyte that referenced this issue Dec 20, 2022
* wip

Signed-off-by: Yee Hing Tong <[email protected]>

* wip

Signed-off-by: Yee Hing Tong <[email protected]>

* comments

Signed-off-by: Yee Hing Tong <[email protected]>

* stylistic changes

Signed-off-by: Samhita Alla <[email protected]>

Co-authored-by: Samhita Alla <[email protected]>
eapolinario pushed a commit to eapolinario/flyte that referenced this issue Dec 20, 2022
* fix release process

Signed-off-by: Yuvraj <[email protected]>

* more changes

Signed-off-by: Yuvraj <[email protected]>

* more changes

Signed-off-by: Yuvraj <[email protected]>

* fix workflow

Signed-off-by: Yuvraj <[email protected]>

Co-authored-by: Yuvraj <[email protected]>
eapolinario pushed a commit to eapolinario/flyte that referenced this issue Jul 24, 2023
* updated flyteidl

Signed-off-by: Daniel Rammer <[email protected]>

* updated pod timestamp recording and added recorded_at on task event

Signed-off-by: Daniel Rammer <[email protected]>

* updated flyteidl and flyteplugins

Signed-off-by: Daniel Rammer <[email protected]>

* setting occurred_at when node inputs begin resolution

Signed-off-by: Daniel Rammer <[email protected]>

* added reported_at to node execution events

Signed-off-by: Daniel Rammer <[email protected]>

* updated flyteidl and flyteplugins dependencies

Signed-off-by: Daniel Rammer <[email protected]>

* updated flyteidl and flyteplugins deps

Signed-off-by: Daniel Rammer <[email protected]>

---------

Signed-off-by: Daniel Rammer <[email protected]>
eapolinario pushed a commit to eapolinario/flyte that referenced this issue Aug 9, 2023
* updated flyteidl

Signed-off-by: Daniel Rammer <[email protected]>

* updated pod timestamp recording and added recorded_at on task event

Signed-off-by: Daniel Rammer <[email protected]>

* updated flyteidl and flyteplugins

Signed-off-by: Daniel Rammer <[email protected]>

* setting occurred_at when node inputs begin resolution

Signed-off-by: Daniel Rammer <[email protected]>

* added reported_at to node execution events

Signed-off-by: Daniel Rammer <[email protected]>

* updated flyteidl and flyteplugins dependencies

Signed-off-by: Daniel Rammer <[email protected]>

* updated flyteidl and flyteplugins deps

Signed-off-by: Daniel Rammer <[email protected]>

---------

Signed-off-by: Daniel Rammer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants