Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add semantic exit codes #1264

Merged
merged 1 commit into from
Oct 9, 2015
Merged

Conversation

Tarrasch
Copy link
Contributor

@Tarrasch Tarrasch commented Oct 2, 2015

This commits adds a long requested feature, to be able to tell what the luigi
client did (or didn't) based on its exit code. This closes #687.

@erikbern
Copy link
Contributor

erikbern commented Oct 2, 2015

LGTM although I think this further solidifies the special property of ExternalTask which I don't think is great (we already have a bunch of code in the worker class that makes distinction based on those etc)

@Tarrasch
Copy link
Contributor Author

Tarrasch commented Oct 6, 2015

Not sure what to do about the concept of ExternalTasks. Isn't it natural that it exists in real-world settings? I mean, eventually there will be a data source you rely on that is not meant to be created within the luigi framework.

@erikbern
Copy link
Contributor

erikbern commented Oct 6, 2015

I like ExternalTask. But I think it should just be a subclass of Task and there shouldn't be a bunch of magic that detects whether a certain task is a Task or an ExternalTask

@Tarrasch Tarrasch force-pushed the semantic-error-codes branch from fc8ddac to 92d9140 Compare October 7, 2015 11:36
[retcode]
----------

Configure return codes for the luigi binary. In the case of multiple return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What use is there for making them configurable? An unusual enabler of accidental complexity.

Perhaps declare in documentation a "reserved" range of return codes that Luigi might define later, the rest being available for application-specific usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It adds complexity (for users), but I think it's justified since real-world scheduling is complex, and whoever uses luigi in a big setting should actively think about the corner cases. Keep in mind that:

  • People might already rely on the current behaviour (exit code 1 on unhandled exceptions).
  • You want different return codes in different contexts.
    • Sometimes you simply want the "run until you succeed" behaviour that is the default (exit code 0 for failures/missingdata/alreadyrunning).
    • Sometimes you want to run a command line, and feel confident that if I get exit code 0, my root task is in fact complete (then you would configure all exit codes).
    • If you use Jenkins as a cron-invoker, you would like task_failed to have a dedicated error code, causing "red" in Jenkins. Meanwhile already running or missing data exit codes should indicate "pending", or indicating your Jenkins dependencies mismatch what's configured in luigi.

That said. I would be thrilled to see this being simplified (convention over configuration), but I don't want to render it useful for only 50% of the users who want exit codes.

Perhaps declare in documentation a "reserved" range of return codes that Luigi might define later, the rest being available for application-specific usage.

I'm not sure what you mean. You basically want to enforce that one can only choose exit codes within the range [0, 4] or something and that [5..255] are reserved for luigi itself?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Enforce" sounds as if it were a burden. It is a best practice that avoids problems in the future (custom thingies clashing with vendor/Luigi thingies). Various kinds of specifications, when dealing with limited amount of things, do make such RFU provisions.

I meant neither of those ranges to be so small as [0, 4]... Say, [0..127] reserved for Luigi, and [128..255] for anyone who wraps or forks or plugs into it (so they are promised to never conflict with something from vanilla Luigi).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I do understand both of the black bullet points. My question was different - I'm questioning why we seem to encourage the concrete values of "different return codes" to vary from one environment to another. Better standardize on one meaning of each return code across all Luigi community?

@ulzha
Copy link
Contributor

ulzha commented Oct 7, 2015

We do have some beginnings of an event system. Event handlers are a preferred way of programmatically obtaining rich information from innards of a complex system in a uniform way. This implementation (run_with...) adds a ad-hoc layer for that, while the "running with events" layer/convention, which we already have, can serve the same purpose, and supports more than 8 bits of bandwidth, so why not use that. (Almost. Currently the event context only can be a Task. Adjusting it to support something like a Worker-wide event is needed. Also a way to configure loading of user modules dynamically, so any Luigi user can develop proprietary handlers easily.) The users who want to customize return codes then can load handlers that do sys.exit(x)...

@Tarrasch
Copy link
Contributor Author

Tarrasch commented Oct 7, 2015

Sounds sensible @ulzha. But It seems like run_with... can just be made private _run_with..., and we could change the underlying implementation to use the event system (once we have it working for worker-wide events) but still expose the same cli/config API for the exit codes -- if we are happy with the configurable complex API this PR is suggesting.

@stowell
Copy link

stowell commented Oct 7, 2015

I'll leave the actual implementation details and style to the Guardians of Luigi (tm). This change would get us what we need: a way to determine from the shell if a task failed to complete for any reason.

@Tarrasch Tarrasch force-pushed the semantic-error-codes branch 2 times, most recently from e29b8e9 to 506b14d Compare October 8, 2015 12:41
running the tasks.

Please set these exit codes within the range of 0 to 127, inclusive. Other exit
codes are reserved for yet to be defined purposes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reads like the opposite thing of what we want... It's the 0 to 127 codes that we are reserving (hoping to standardize usage of any newly introduced codes in that range in the future), and the rest we are allowing to use freely.

"If you customize return codes, prefer to set them in range 128 to 255 to avoid conflicts. Return codes in range 0 to 127 are reserved for possible future use by Luigi contributors."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean now. Thanks for clarifying by example.

This commits adds a long requested feature, to be able to tell what the luigi
client did (or didn't) based on its exit code. This closes spotify#687.
@Tarrasch Tarrasch force-pushed the semantic-error-codes branch from 506b14d to 8e9b439 Compare October 8, 2015 13:12
@ulzha
Copy link
Contributor

ulzha commented Oct 8, 2015

LGTM. Would be cool if requesters of #687 (@boosh @themalkolm ) gave a look too.

I caught myself in too many experts exchanges today about "standard" return code allocation practices existing out there. E.g. this is insightful...

Tarrasch added a commit that referenced this pull request Oct 9, 2015
@Tarrasch Tarrasch merged commit de4f9a3 into spotify:master Oct 9, 2015
@Tarrasch Tarrasch deleted the semantic-error-codes branch October 9, 2015 08:38
@Tarrasch
Copy link
Contributor Author

Tarrasch commented Oct 9, 2015

Boom! @ulzha I already pinged them in #687. I suppose they are busy :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Task scripts should return non-zero if any tasks were DISABLED or FAILED
4 participants