Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDK] Improve Validation for Objective Function #1968

Open
andreyvelich opened this issue Oct 4, 2022 · 5 comments
Open

[SDK] Improve Validation for Objective Function #1968

andreyvelich opened this issue Oct 4, 2022 · 5 comments

Comments

@andreyvelich
Copy link
Member

/kind feature

Check this thread: #1951 (comment).
We need to improve validation for objective function in tune API.
For example, validate that user's imports are properly declared in the function.


Love this feature? Give it a 👍 We prioritize the features with the most 👍

@andreyvelich
Copy link
Member Author

/area sdk

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@tenzen-y
Copy link
Member

/lifecycle frozen

@hnanacc
Copy link

hnanacc commented Mar 24, 2024

Hello,

I was trying to work on this issue and wanted to know if one of the following solutions could be suitable for our use-case?

  1. Use regex to extract imported names and compare with first-level symbols in function scope to find unimported names.
import inspect

def objective(params):
    import time
    def run_step():
        stt = time.perf_counter()
        acc = transformers.train()
        ent = time.perf_counter()
        return acc

    return run_step()

source = inspect.getsource(objective)

imported_names = get_imported_names(source)

# OUTPUT: 
# ['time']

other_tokens = get_nonimport_tokens(source)

# OUTPUT:
# ['run_step', 'ent', 'stt', 'time', 'transformers', 'acc']

unimported = compare_and_filter_defined(imported_names, other_tokens)

# OUTPUT:
# ['transformers']

NOTE: This approach could be tricky and may miss some edge cases. Further, this may also be hard to extend/maintain depending on the kind of validation we may want to do in the future.

  1. Use a linter (ex. pylint) to lint the code string and check for the required errors.
import inspect
from astroid import AstroidBuilder
from pylint.lint import PyLinter

def objective(params):
    import time
    def run_step():
        stt = time.perf_counter()
        acc = transformers.train()
        ent = time.perf_counter()
        return acc

    return run_step()

ast_builder = AstroidBuilder(...)
ast_linter = PyLinter(...)

source = inspect.getsource(objective)

ast_repr = ast_builder.string_build(source)         # builds the abstract syntax tree (AST)
checked = ast_linter.check_astroid_module(ast_repr) # lints the AST, given checkers.

unimported = get_import_errors(ast_linter)          # see if any 'import-error E0401' messages exists.

# OUTPUT:
# ['transformers']

NOTE: For this approach, we need to add an additional dependency, which may already be present for linting, but in return we get correctness and simpler code which should be easier to maintain. We can also use the pylint cli with the required validations, as we are anyways creating a file for the objective function.

$ pylint $program_path/ephemeral_objective.py

Please let me know if any of the above approach could be implemented or if there is an alternative.

Thanks,
Harshal

@andreyvelich
Copy link
Member Author

andreyvelich commented Mar 28, 2024

Thank you for your suggestions @hnanacc, it looks good.
Maybe we should try to explore if we have any ways to serialize function for Kubernetes container without asking using to make all import embedded under training function ?
I know that CloudPickle: https://github.com/cloudpipe/cloudpickle can help us to ship Python code over network, but maybe we miss something from inspect library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants