-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split up parse_repo and apply_total #2128
Comments
Hi @nossrannug , I think this |
@pyalex I'm looking to pick up this issue as an intro task to OSS Feast. Looking at the apply_total code which has been refactored a bit since this issue was opened, would appreciate some pointers on how to best resolve this. I was thinking of doing something along the lines of this. |
Hi @nossrannug, I'd like to better understand your use case and see if it's something others may also need. In your case, is it possible to not construct a ParsedRepo yourself? Is it the case that you want to construct the entities, features, etc in realtime and might not have the usual repo files to be parsed by apply_total? I'd like to understand the context around the need to do this, thanks |
Hi @mavysavydav, We are parsing definitions, that we have today, to construct the RepoContents object. This could be existing tables or some code. We would want to pass our RepoContents object into apply_total to update the Feast registry rather than writing the Feast definitions. That way we don't have to create all the FeatureView, DataSource, etc definitions, but instead can continue to do what we have been doing and are able to use Feast as a feature store. We could also have this process create the repo files as build output and then run I also don't think it's a big change or a bad change. def _prepare_registry_and_repo(repo_config, repo_path):
store = FeatureStore(config=repo_config)
project = store.project
if not is_valid_name(project):
print(
f"{project} is not valid. Project name should only have "
f"alphanumerical values and underscores but not start with an underscore."
)
sys.exit(1)
registry = store.registry
registry._initialize_registry()
sys.dont_write_bytecode = True
repo = parse_repo(repo_path)
return project, registry, repo, store
# Would become:
def _prepare_registry(repo_config):
store = FeatureStore(config=repo_config)
project = store.project
if not is_valid_name(project):
print(
f"{project} is not valid. Project name should only have "
f"alphanumerical values and underscores but not start with an underscore."
)
sys.exit(1)
registry = store.registry
registry._initialize_registry()
sys.dont_write_bytecode = True
return project, registry, store And apply_total: @log_exceptions_and_usage
def apply_total(repo_config: RepoConfig, repo_path: Path, skip_source_validation: bool):
os.chdir(repo_path)
project, registry, repo, store = _prepare_registry_and_repo(repo_config, repo_path)
if not skip_source_validation:
data_sources = [t.batch_source for t in repo.feature_views]
# Make sure the data source used by this feature view is supported by Feast
for data_source in data_sources:
data_source.validate(store.config)
# For each object in the registry, determine whether it should be kept or deleted.
(
all_to_apply,
all_to_delete,
views_to_delete,
views_to_keep,
) = extract_objects_for_apply_delete(project, registry, repo)
diff = store.apply(all_to_apply, objects_to_delete=all_to_delete, partial=False)
log_cli_output(diff, views_to_delete, views_to_keep)
# Would become:
@log_exceptions_and_usage
def apply_total(repo_config: RepoConfig, repo: RepoContents, skip_source_validation: bool):
project, registry, store = _prepare_registry(repo_config)
if not skip_source_validation:
data_sources = [t.batch_source for t in repo.feature_views]
# Make sure the data source used by this feature view is supported by Feast
for data_source in data_sources:
data_source.validate(store.config)
# For each object in the registry, determine whether it should be kept or deleted.
(
all_to_apply,
all_to_delete,
views_to_delete,
views_to_keep,
) = extract_objects_for_apply_delete(project, registry, repo)
diff = store.apply(all_to_apply, objects_to_delete=all_to_delete, partial=False)
log_cli_output(diff, views_to_delete, views_to_keep) And the code that calls *code examples have not been tested :) |
Hmmm ok got it. I think to proceed we should keep apply_total the same but have apply_total call a function that can take in a RepoContents. This is because some customers including us (Twitter) have built on top of apply_total and the naming of apply_total suggests it should consists of the "total" of the apply call. This new function that apply_total calls would basically would be the same as the current apply_total except it also takes in the RepoContents. @mickey-liu |
So something like this? @log_exceptions_and_usage
def apply_total(repo_config: RepoConfig, repo_path: Path, skip_source_validation: bool):
os.chdir(repo_path)
repo = _parse_repo(repo_path)
do_stuff(repo_config, repo, sip_source_validation)
def do_stuff(repo_config: RepoConfig, repo: RepoContents, skip_source_validation: bool):
project, registry, store = _prepare_registry(repo_config)
if not skip_source_validation:
data_sources = [t.batch_source for t in repo.feature_views]
# Make sure the data source used by this feature view is supported by Feast
for data_source in data_sources:
data_source.validate(store.config)
# For each object in the registry, determine whether it should be kept or deleted.
(
all_to_apply,
all_to_delete,
views_to_delete,
views_to_keep,
) = extract_objects_for_apply_delete(project, registry, repo)
diff = store.apply(all_to_apply, objects_to_delete=all_to_delete, partial=False)
log_cli_output(diff, views_to_delete, views_to_keep) |
Is your feature request related to a problem? Please describe.
I have a framework that handles the offline store. It creates the tables, indexes, reads data from different data sources, does some transformations, and then inserts into the offline store. As a part of this, I can construct the entities, feature views, feature services, etc, a instance of the ParsedRepo class for Feast. What I need is the ability to pass my instance into the
apply_total
method.Describe the solution you'd like
Remove the call to
parse_repo
from theapply_total
( link ) and instead perform that call elsewhere and pass the ParsedRepo object into theapply_total
function.Describe alternatives you've considered
Copy the
apply_total
method into my code and make the changes needed.The reason why I don't like this approach is because I would be maintaining a copy/paste of the method, while the change in the Feast repo wouldn't impact functionality since
feast apply
would still callparse_repo
and then pass the object toapply_total
.The text was updated successfully, but these errors were encountered: