Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split up parse_repo and apply_total #2128

Closed
nossrannug opened this issue Dec 9, 2021 · 6 comments · Fixed by #2226
Closed

Split up parse_repo and apply_total #2128

nossrannug opened this issue Dec 9, 2021 · 6 comments · Fixed by #2226

Comments

@nossrannug
Copy link
Contributor

Is your feature request related to a problem? Please describe.
I have a framework that handles the offline store. It creates the tables, indexes, reads data from different data sources, does some transformations, and then inserts into the offline store. As a part of this, I can construct the entities, feature views, feature services, etc, a instance of the ParsedRepo class for Feast. What I need is the ability to pass my instance into the apply_total method.

Describe the solution you'd like
Remove the call to parse_repo from the apply_total ( link ) and instead perform that call elsewhere and pass the ParsedRepo object into the apply_total function.

Describe alternatives you've considered
Copy the apply_total method into my code and make the changes needed.
The reason why I don't like this approach is because I would be maintaining a copy/paste of the method, while the change in the Feast repo wouldn't impact functionality since feast apply would still call parse_repo and then pass the object to apply_total.

@cli.command("apply", cls=NoOptionDefaultFormat)
@click.option(
    "--skip-source-validation",
    is_flag=True,
    help="Don't validate the data sources by checking for that the tables exist.",
)
@click.pass_context
def apply_total_command(ctx: click.Context, skip_source_validation: bool):
    """
    Create or update a feature store deployment
    """
    repo = ctx.obj["CHDIR"]
    cli_check_repo(repo)
    repo_config = load_repo_config(repo)
    try:
        parsed_repo = parse_repo(repo) # <- this
        apply_total(parsed_repo, repo_config, repo, skip_source_validation)
    except FeastProviderLoginError as e:
        print(str(e))
@pyalex
Copy link
Collaborator

pyalex commented Dec 10, 2021

Hi @nossrannug , I think this apply_total was not intended to be used directly and seems to be written specifically for CLI. That being said, it could be easily split into sub-functions. And those sub-functions could be reused directly or new interfaces can be created.
If you have time, please feel free to contribute.

@mickey-liu
Copy link
Contributor

@pyalex I'm looking to pick up this issue as an intro task to OSS Feast. Looking at the apply_total code which has been refactored a bit since this issue was opened, would appreciate some pointers on how to best resolve this. I was thinking of doing something along the lines of this.

@mavysavydav
Copy link
Collaborator

Hi @nossrannug, I'd like to better understand your use case and see if it's something others may also need. In your case, is it possible to not construct a ParsedRepo yourself? Is it the case that you want to construct the entities, features, etc in realtime and might not have the usual repo files to be parsed by apply_total? I'd like to understand the context around the need to do this, thanks

@nossrannug
Copy link
Contributor Author

Hi @mavysavydav,

We are parsing definitions, that we have today, to construct the RepoContents object. This could be existing tables or some code. We would want to pass our RepoContents object into apply_total to update the Feast registry rather than writing the Feast definitions. That way we don't have to create all the FeatureView, DataSource, etc definitions, but instead can continue to do what we have been doing and are able to use Feast as a feature store.

We could also have this process create the repo files as build output and then run feast apply, but that sounds horrible 😆

I also don't think it's a big change or a bad change.

def _prepare_registry_and_repo(repo_config, repo_path):
    store = FeatureStore(config=repo_config)
    project = store.project
    if not is_valid_name(project):
        print(
            f"{project} is not valid. Project name should only have "
            f"alphanumerical values and underscores but not start with an underscore."
        )
        sys.exit(1)
    registry = store.registry
    registry._initialize_registry()
    sys.dont_write_bytecode = True
    repo = parse_repo(repo_path)
    return project, registry, repo, store

# Would become:
def _prepare_registry(repo_config):
    store = FeatureStore(config=repo_config)
    project = store.project
    if not is_valid_name(project):
        print(
            f"{project} is not valid. Project name should only have "
            f"alphanumerical values and underscores but not start with an underscore."
        )
        sys.exit(1)
    registry = store.registry
    registry._initialize_registry()
    sys.dont_write_bytecode = True
    return project, registry, store

And apply_total:

@log_exceptions_and_usage
def apply_total(repo_config: RepoConfig, repo_path: Path, skip_source_validation: bool):

    os.chdir(repo_path)
    project, registry, repo, store = _prepare_registry_and_repo(repo_config, repo_path)

    if not skip_source_validation:
        data_sources = [t.batch_source for t in repo.feature_views]
        # Make sure the data source used by this feature view is supported by Feast
        for data_source in data_sources:
            data_source.validate(store.config)

    # For each object in the registry, determine whether it should be kept or deleted.
    (
        all_to_apply,
        all_to_delete,
        views_to_delete,
        views_to_keep,
    ) = extract_objects_for_apply_delete(project, registry, repo)

    diff = store.apply(all_to_apply, objects_to_delete=all_to_delete, partial=False)

    log_cli_output(diff, views_to_delete, views_to_keep)

# Would become:
@log_exceptions_and_usage
def apply_total(repo_config: RepoConfig, repo: RepoContents, skip_source_validation: bool):
    project, registry, store = _prepare_registry(repo_config)

    if not skip_source_validation:
        data_sources = [t.batch_source for t in repo.feature_views]
        # Make sure the data source used by this feature view is supported by Feast
        for data_source in data_sources:
            data_source.validate(store.config)

    # For each object in the registry, determine whether it should be kept or deleted.
    (
        all_to_apply,
        all_to_delete,
        views_to_delete,
        views_to_keep,
    ) = extract_objects_for_apply_delete(project, registry, repo)

    diff = store.apply(all_to_apply, objects_to_delete=all_to_delete, partial=False)

    log_cli_output(diff, views_to_delete, views_to_keep)

And the code that calls apply_total would call parse_repo to get the object to pass into the method.

*code examples have not been tested :)

@mavysavydav
Copy link
Collaborator

Hmmm ok got it. I think to proceed we should keep apply_total the same but have apply_total call a function that can take in a RepoContents. This is because some customers including us (Twitter) have built on top of apply_total and the naming of apply_total suggests it should consists of the "total" of the apply call. This new function that apply_total calls would basically would be the same as the current apply_total except it also takes in the RepoContents. @mickey-liu

@nossrannug
Copy link
Contributor Author

So something like this?

@log_exceptions_and_usage
def apply_total(repo_config: RepoConfig, repo_path: Path, skip_source_validation: bool):

    os.chdir(repo_path)
    repo = _parse_repo(repo_path)
    do_stuff(repo_config, repo, sip_source_validation)

def do_stuff(repo_config: RepoConfig, repo: RepoContents, skip_source_validation: bool):
   project, registry, store = _prepare_registry(repo_config)

    if not skip_source_validation:
        data_sources = [t.batch_source for t in repo.feature_views]
        # Make sure the data source used by this feature view is supported by Feast
        for data_source in data_sources:
            data_source.validate(store.config)

    # For each object in the registry, determine whether it should be kept or deleted.
    (
        all_to_apply,
        all_to_delete,
        views_to_delete,
        views_to_keep,
    ) = extract_objects_for_apply_delete(project, registry, repo)

    diff = store.apply(all_to_apply, objects_to_delete=all_to_delete, partial=False)

    log_cli_output(diff, views_to_delete, views_to_keep)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants