Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Update docs for offline and online stores #2946

Merged
merged 32 commits into from
Jul 22, 2022

Conversation

kevjumba
Copy link
Collaborator

Signed-off-by: Kevin Zhang [email protected]

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

@codecov-commenter
Copy link

codecov-commenter commented Jul 15, 2022

Codecov Report

Merging #2946 (5bc74db) into master (aa2a86a) will decrease coverage by 0.04%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #2946      +/-   ##
==========================================
- Coverage   77.65%   77.61%   -0.05%     
==========================================
  Files         186      186              
  Lines       16354    16354              
==========================================
- Hits        12700    12693       -7     
- Misses       3654     3661       +7     
Flag Coverage Δ
integrationtests 68.58% <ø> (-0.14%) ⬇️
unittests 58.51% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdk/python/tests/utils/online_read_write_test.py 93.54% <0.00%> (-6.46%) ⬇️
.../integration/online_store/test_online_retrieval.py 96.84% <0.00%> (-3.16%) ⬇️
sdk/python/feast/infra/online_stores/datastore.py 87.96% <0.00%> (-1.39%) ⬇️
...east/infra/materialization/lambda/lambda_engine.py 93.75% <0.00%> (+1.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aa2a86a...5bc74db. Read the comment docs.

@kevjumba kevjumba force-pushed the update_offline_store_docs branch from a194832 to 36fb89c Compare July 19, 2022 23:01
@@ -30,6 +30,8 @@ There are two methods that deal with reading data from the offline stores`get_hi
* `pull_latest_from_table_or_query` is invoked when running materialization (using the `feast materialize` or `feast materialize-incremental` commands, or the corresponding `FeatureStore.materialize()` method. This method pull data from the offline store, and the `FeatureStore` class takes care of writing this data into the online store.
* `get_historical_features` is invoked when reading values from the offline store using the `FeatureStore.get_historical_features()` method. Typically, this method is used to retrieve features when training ML models.
* `pull_all_from_table_or_query` is a method that pulls all the data from an offline store from a specified start date to a specified end date.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might call out that this method is optional since it is only used to work with SavedDatasets

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you commented on the wrong method , you mean write_logged_features right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope. pull_all_from_table_or_query right now in our logic is only exposed for SavedDatasets afaict

@@ -6,9 +6,9 @@ Feast makes adding support for a new offline store (database) easy. Developers c

In this guide, we will show you how to extend the existing File offline store and use in a feature repo. While we will be implementing a specific store, this guide should be representative for adding support for any new offline store.

The full working code for this guide can be found at [feast-dev/feast-custom-offline-store-demo](https://github.com/feast-dev/feast-custom-offline-store-demo).
The full working code for this guide can be found at [feast-dev/feast-custom-offline-store-demo](https://github.com/feast-dev/feast-custom-offline-store-demo) and an example of a custom offline store that was contributed by developers can be found [here](https://github.com/feast-dev/feast/pull/2401).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk if I would use this as an example. It's a really large PR with also online store + registry store components

Would be better if we created some example PR (similar to what is in this guide)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I can link the spark pr instead. I think the example guide for the feast custom offline store is fine for high level but I dont' see a clear way of implementing some dummy procedures with. more clarity than the feast custom offline store. Actual implementations to reference would actually then be actual real world pr would help them work out the little kinks in implementation. I agree that the postgres one is a little bit too large but I think the spark one is relevant and useful

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you convert the dummy example in the guide into an actual PR so they can see?

You can also just point users to the directory that has the offline / online store implementations too

@@ -128,6 +130,8 @@ Custom offline stores may need to implement their own instances of the `Retrieva

The `RetrievalJob` interface exposes two methods - `to_df` and `to_arrow`. The expectation is for the retrieval job to be able to return the rows read from the offline store as a parquet DataFrame, or as an Arrow table respectively.

Users who want to have their offline store support batch materialization (detailed in this [RFC](https://docs.google.com/document/d/1J7XdwwgQ9dY_uoV9zkRVGQjK9Sy43WISEW6D5V9qzGo/edit#heading=h.9gaqqtox9jg6)) will also need to implement `to_remote_storage` to write distribute the reading and writing of offline store records to a distributed framework. If this functionality is not needed, the RetrievalJob will default to local materialization.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Users who want to have their offline store support batch materialization (detailed in this [RFC](https://docs.google.com/document/d/1J7XdwwgQ9dY_uoV9zkRVGQjK9Sy43WISEW6D5V9qzGo/edit#heading=h.9gaqqtox9jg6)) will also need to implement `to_remote_storage` to write distribute the reading and writing of offline store records to a distributed framework. If this functionality is not needed, the RetrievalJob will default to local materialization.
Users who want to have their offline store support scalable batch materialization for online use cases (detailed in this [RFC](https://docs.google.com/document/d/1J7XdwwgQ9dY_uoV9zkRVGQjK9Sy43WISEW6D5V9qzGo/edit#heading=h.9gaqqtox9jg6)) will also need to implement `to_remote_storage` to write distribute the reading and writing of offline store records to a distributed framework. If this is not implemented, Feast will default to local materialization (pulling all records in memory to materialize).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixe..

@@ -215,7 +219,7 @@ class CustomFileDataSource(FileSource):

After implementing these classes, the custom offline store can be used by referencing it in a feature repo's `feature_store.yaml` file, specifically in the `offline_store` field. The value specified should be the fully qualified class name of the OfflineStore.&#x20;

As long as your OfflineStore class is available in your Python environment, it will be imported by Feast dynamically at runtime.
As long as your OfflineStore class is available in your Python environment, it will be imported by Feast dynamically at runtime. It is crucial to specify the type as the package that Feast can import.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be more clear if you left this as a comment in the example feature_store.yaml

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -260,23 +264,56 @@ driver_hourly_stats_view = FeatureView(

## 6. Testing the OfflineStore class

### Contrib

Generally, new offline stores should go in the contrib folder and have alpha functionality tags. The contrib folder is designated for community contributions that are not fully maintained by Feast maintainers and may contain potential instability and have API changes. It is recommended to add warnings to users that the offline store functionality is still in alpha development and is not fully stable. In order to be classified as fully stable and be moved to the main offline store folder, the offline store should integrate with the full integration test suite in Feast and pass all of the test cases.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alpha functionality tags is ambiguous imo.

Maybe say our standard is to print messages stating it's alpha status? This is also where a sample simple PR we provide is useful

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


### Documentation

Remember to update the documentation for your offline store. This can be found in `docs/reference/offline-stores/` and `docs/reference/data-sources/`. You should also add a reference in `docs/reference/data-sources/README.md` and `docs/SUMMARY.md`. Add a new markdown documentation and document the functions similar to how the other offline stores are documented. Be sure to cover how to create the datasource and most importantly, what configuration is needed in the `feature_store.yaml` file in order to create the datasource and also make sure to flag that the datasource is in alpha development.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "add documentation" instead of update.

There's also additional work around adding python code docs (i.e. call make build-sphinx)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

Remember to update the documentation for your offline store. This can be found in `docs/reference/offline-stores/` and `docs/reference/data-sources/`. You should also add a reference in `docs/reference/data-sources/README.md` and `docs/SUMMARY.md`. Add a new markdown documentation and document the functions similar to how the other offline stores are documented. Be sure to cover how to create the datasource and most importantly, what configuration is needed in the `feature_store.yaml` file in order to create the datasource and also make sure to flag that the datasource is in alpha development.


An example of a full pull request for adding a custom offline store can be found [here](https://github.com/feast-dev/feast/pull/2401).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again would not use this as an example because it's too large and hard to parse

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed above.


### Documentation

Remember to update the documentation for your online store. This can be found in `docs/reference/online-stores/`. You should also add a reference in `docs/reference/online-stores/README.md` and `docs/SUMMARY.md`. Add a new markdown documentation and document the functions similar to how the other online stores are documented. Be sure to cover how to create the datasource and most importantly, what configuration is needed in the `feature_store.yaml` file in order to create the datasource and also make sure to flag that the online store is in alpha development.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might additionally want to cover what the data model is

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

go.sum Outdated
@@ -87,6 +87,14 @@ github.com/envoyproxy/go-control-plane v0.9.10-0.20210907150352-cf90f659a021/go.
github.com/envoyproxy/go-control-plane v0.10.2-0.20220325020618-49ff273808a1/go.mod h1:KJwIaB5Mv44NWtYuAOFCVOjcI94vtpEz2JU/D2v6IjE=
github.com/envoyproxy/protoc-gen-validate v0.1.0/go.mod h1:iSmxcyjqTsJpI2R4NaDN7+kN2VEUnK/pcBlmesArF7c=
github.com/fatih/color v1.7.0/go.mod h1:Zm6kSWBoL9eyXnKyktHP6abPY2pDugNf5KwzbycvMj4=
github.com/feast-dev/gopy v0.4.1-0.20220714205859-591500e3215f h1:tTjEpVu4H/ZGh4wo3WETbA9dutNM6bXMXvyZbb9GLCs=
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unintentional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed after rebase.

@adchia adchia self-assigned this Jul 20, 2022
@kevjumba kevjumba force-pushed the update_offline_store_docs branch from 36fb89c to eb028f6 Compare July 20, 2022 17:46
@kevjumba
Copy link
Collaborator Author

@adchia pTal

@@ -128,6 +151,8 @@ Custom offline stores may need to implement their own instances of the `Retrieva

The `RetrievalJob` interface exposes two methods - `to_df` and `to_arrow`. The expectation is for the retrieval job to be able to return the rows read from the offline store as a parquet DataFrame, or as an Arrow table respectively.

Users who want to have their offline store support scalable batch materialization for online use cases (detailed in this [RFC](https://docs.google.com/document/d/1J7XdwwgQ9dY_uoV9zkRVGQjK9Sy43WISEW6D5V9qzGo/edit#heading=h.9gaqqtox9jg6)) will also need to implement `to_remote_storage` to distribute the reading and writing of offline store records to a distributed framework. If this is not implemented, Feast will default to local materialization (pulling all records into memory to materialize).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Users who want to have their offline store support scalable batch materialization for online use cases (detailed in this [RFC](https://docs.google.com/document/d/1J7XdwwgQ9dY_uoV9zkRVGQjK9Sy43WISEW6D5V9qzGo/edit#heading=h.9gaqqtox9jg6)) will also need to implement `to_remote_storage` to distribute the reading and writing of offline store records to a distributed framework. If this is not implemented, Feast will default to local materialization (pulling all records into memory to materialize).
Users who want to have their offline store support scalable batch materialization for online use cases (detailed in this [RFC](https://docs.google.com/document/d/1J7XdwwgQ9dY_uoV9zkRVGQjK9Sy43WISEW6D5V9qzGo/edit#heading=h.9gaqqtox9jg6)) will also need to implement `to_remote_storage` to distribute the reading and writing of offline store records to blob storage (such as S3). This may be used by a custom [Materialization Engine](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/materialization/batch_materialization_engine.py#L72) to parallelize the materialization of data by processing it in chunks. If this is not implemented, Feast will default to local materialization (pulling all records into memory to materialize).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Even if you have created the `OfflineStore` class in a separate repo, you can still test your implementation against the Feast test suite, as long as you have Feast as a submodule in your repo. In the Feast submodule, we can run all the unit tests with:

In order to test against the test suite, you need to create a custom `DataSourceCreator`. This class will need to implement our testing infrastructure methods, `create_data_source` and `created_saved_dataset_destination`. `create_data_source` creates a datasource for testing from the dataframe given that will register the dataframe with your offline store and return a datasource pointing to that location. See `BigQueryDataSourceCreator` for an implementation of a datawarehouse data source creator. **Saved datasets** are special datasets used for data validation and is Feast's way of snapshotting your data for future data validation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In order to test against the test suite, you need to create a custom `DataSourceCreator`. This class will need to implement our testing infrastructure methods, `create_data_source` and `created_saved_dataset_destination`. `create_data_source` creates a datasource for testing from the dataframe given that will register the dataframe with your offline store and return a datasource pointing to that location. See `BigQueryDataSourceCreator` for an implementation of a datawarehouse data source creator. **Saved datasets** are special datasets used for data validation and is Feast's way of snapshotting your data for future data validation.
In order to test against the test suite, you need to create a custom `DataSourceCreator`. This class will need to implement our testing infrastructure methods, `create_data_source` and `created_saved_dataset_destination`. `create_data_source` should create a datasource forbased on the dataframe passed in. It may be implemented by uploading the contents of the dataframe into the offline store and returning a datasource object pointing to that location. See `BigQueryDataSourceCreator` for an implementation of a data source creator. **Saved datasets** are special datasets used for data validation and is Feast's way of snapshotting your data for future data validation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

```
This should run the unit tests and the unit tests should all pass. Please add unit tests for your data source that test out basic functionality of reading and writing to and from the datasource. This should just be class level functionality that ensures that the methods you implemented for the OfflineStore and the DataSource associated with it work as expected. In order to be approved to merge into Feast, these unit tests should all pass and demonstrate that the DataSource works as intended.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This should run the unit tests and the unit tests should all pass. Please add unit tests for your data source that test out basic functionality of reading and writing to and from the datasource. This should just be class level functionality that ensures that the methods you implemented for the OfflineStore and the DataSource associated with it work as expected. In order to be approved to merge into Feast, these unit tests should all pass and demonstrate that the DataSource works as intended.
This command runs the python unit tests. It's required that unit tests should all pass for contributed components.
Contributors should add unit tests for contributed data source that test out basic functionality of reading and writing to and from the datasource. This should just be class level functionality that ensures that the methods you implemented for the OfflineStore and the DataSource associated with it work as expected. In order to be approved to merge into Feast, these unit tests should all pass and demonstrate that the DataSource works as intended.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test out basic functionality of reading and writing to and from the datasource

This actually is not clear at all. Do you mean the test should write from say bigquery? How is that a unit test then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I guess most datasources will not have unit tests then. I guess as long as the offline source passes the integration tests, we can trust the store.


Remember to add documentation for your offline store. This should be added to `docs/reference/offline-stores/` and `docs/reference/data-sources/`. You should also add a reference in `docs/reference/data-sources/README.md` and `docs/SUMMARY.md`. Add a new markdown documentation and document the functions similar to how the other offline stores are documented. Be sure to cover how to create the datasource and most importantly, what configuration is needed in the `feature_store.yaml` file in order to create the datasource and also make sure to flag that the datasource is in alpha development. Please also add some documentation on what the data model is for the specific online store for more clarity.

Finally, add the python code docs by running to document the functions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary right? i.e. the docs are generated automatically when changes are merged, we just need to make sure they are referenced sdk/python/docs/index.rst.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let me update.

```

The universal tests, which are integration tests specifically intended to test offline and online stores, can be run with:
This should run the unit tests and the unit tests should all pass. Please add unit tests for your online store that test out basic functionality of reading and writing to and from the online store. This should just be class level functionality that ensures that the methods you implemented for the OnlineStore work as expected. In order to be approved to merge into Feast, these unit tests should all pass and demonstrate that the DataSource works as intended.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same change and questions as the offline store.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this section entirely. Will basically point to universal tests only to validate the datasource

make test-python-universal
```

The unit tests should succeed, but the universal tests will likely fail. The tests are parametrized based on the `FULL_REPO_CONFIGS` variable defined in `sdk/python/tests/integration/feature_repos/repo_configuration.py`. To overwrite these configurations, you can simply create your own file that contains a `FULL_REPO_CONFIGS`, and point Feast to that file by setting the environment variable `FULL_REPO_CONFIGS_MODULE` to point to that file. In this repo, the file that overwrites `FULL_REPO_CONFIGS` is `feast_custom_online_store/feast_tests.py`, so you would run
A sample `FULL_REPO_CONFIGS_MODULE` looks something like this:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would follow the pattern of the other code snippets, which also include a "file title" that shows where this lives

See the example of using:
{% code title="feast_custom_offline_store/file.py" %}
... your code
{% endcode %}

that we did before

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took a pass through the offline store only, but I imagine most of the comments also translate to the online store version


Remember to add the documentation for your online store. This should be added to `docs/reference/online-stores/`. You should also add a reference in `docs/reference/online-stores/README.md` and `docs/SUMMARY.md`. Add a new markdown documentation and document the functions similar to how the other online stores are documented. Be sure to cover how to create the datasource and most importantly, what configuration is needed in the `feature_store.yaml` file in order to create the datasource and also make sure to flag that the online store is in alpha development.

Finally, add the python code docs by running to document the functions by making sure the classes are being referenced by `sdk/python/docs/index.rst`. An example of this below:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you didn't add the make command here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Achal mentioned that that is no longer needed and to just update the rst file instead.

The OfflineStore class contains a couple of methods to read features from the offline store. Unlike the OnlineStore class, Feast does not manage any infrastructure for the offline store.&#x20;

There are two methods that deal with reading data from the offline stores`get_historical_features`and `pull_latest_from_table_or_query`.

* `pull_latest_from_table_or_query` is invoked when running materialization (using the `feast materialize` or `feast materialize-incremental` commands, or the corresponding `FeatureStore.materialize()` method. This method pull data from the offline store, and the `FeatureStore` class takes care of writing this data into the online store.
* `get_historical_features` is invoked when reading values from the offline store using the `FeatureStore.get_historical_features()` method. Typically, this method is used to retrieve features when training ML models.
* `pull_all_from_table_or_query` is a method that pulls all the data from an offline store from a specified start date to a specified end date.
* `write_logged_features` is a method that takes a pyarrow table or a path that points to a parquet file and writes the data to a defined source defined by `LoggingSource` and `LoggingConfig`. This method is **optional** since it is only used internally for **SavedDatasets**.
* `offline_write_batch` is a method that supports directly pushing a pyarrow table to a feature view. Given a feature view with a specific schema, this function should write the pyarrow table to the batch source defined. More details about the push api can be found [here](docs/reference/data-sources/push.md).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird that you only have "write_logged_features" as optional when really pretty much everything is optional except for get_historical_features

Would have the "optional" methods towards the end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just treated "optional" as any methods that are labeled abstract even though they have a "pass" I still think the decorator necessitates implementation. I added optional to offline_write batch though.

@@ -23,16 +23,25 @@ The process for using a custom offline store consists of 4 steps:
&#x20;OfflineStore class names must end with the OfflineStore suffix!
{% endhint %}

### Contrib

Generally, new offline stores should go in the contrib folder and should have warnings within the classes that state the offline store is in alpha status. The contrib folder is designated for community contributions that are not fully maintained by Feast maintainers and may contain potential instability and have API changes. It is recommended to add warnings to users that the offline store functionality is still in alpha development and is not fully stable. In order to be classified as fully stable and be moved to the main offline store folder, the offline store should integrate with the full integration test suite in Feast and pass all of the test cases.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generally pretty hard to read imo. Can you make this simpler and more scannable?

e.g.

Contrib offline stores

New offline stores go in sdk/python/feast/infra/offline_stores/contrib/.

What is a contrib plugin?

  • Not guaranteed to implement all interface methods
  • Not guaranteed to be stable
  • Should have warnings for users to indicate this is contrib

How do I make a contrib plugin an "official" plugin?
To move an offline store plugin out of contrib, you need:

  • GitHub actions (i.e. make test-python-integration) is setup to run all tests against the offline store and pass.
  • At least two contributors own the plugin (ideally tracked in an OWNERS / CODEOWNERS)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

### Contrib

Generally, new offline stores should go in the contrib folder and should have warnings within the classes that state the offline store is in alpha status. The contrib folder is designated for community contributions that are not fully maintained by Feast maintainers and may contain potential instability and have API changes. It is recommended to add warnings to users that the offline store functionality is still in alpha development and is not fully stable. In order to be classified as fully stable and be moved to the main offline store folder, the offline store should integrate with the full integration test suite in Feast and pass all of the test cases.

The OfflineStore class contains a couple of methods to read features from the offline store. Unlike the OnlineStore class, Feast does not manage any infrastructure for the offline store.&#x20;

There are two methods that deal with reading data from the offline stores`get_historical_features`and `pull_latest_from_table_or_query`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to contradict the text you have below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


### Dependencies

Finally, if your offline store requires special packages, add them to our `sdk/python/setup.py` under a new `<OFFLINE>_STORE_REQUIRED` list with the packages and add it to the setup script so that if your offline store is needed, users can install the necessary python packages. These packages should be defined as extras so that they are not installed by users by default.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is offline in brackets? shouldn't it be <offline_store> ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


### Dependencies

Finally, if your offline store requires special packages, add them to our `sdk/python/setup.py` under a new `<OFFLINE>_STORE_REQUIRED` list with the packages and add it to the setup script so that if your offline store is needed, users can install the necessary python packages. These packages should be defined as extras so that they are not installed by users by default.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove Finally since this isn't the last step

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


### Dependencies

Finally, if your offline store requires special packages, add them to our `sdk/python/setup.py` under a new `<OFFLINE>_STORE_REQUIRED` list with the packages and add it to the setup script so that if your offline store is needed, users can install the necessary python packages. These packages should be defined as extras so that they are not installed by users by default.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, would remove the "if your offline store requires special packages" part. Every offline store will need to pull in some deps to support a new offline store

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


### Add Documentation

Remember to add documentation for your offline store. This should be added to `docs/reference/offline-stores/` and `docs/reference/data-sources/`. You should also add a reference in `docs/reference/data-sources/README.md` and `docs/SUMMARY.md`. Add a new markdown documentation and document the functions similar to how the other offline stores are documented. Be sure to cover how to create the datasource and most importantly, what configuration is needed in the `feature_store.yaml` file in order to create the datasource and also make sure to flag that the datasource is in alpha development. Please also add some documentation on what the data model is for the specific online store for more clarity.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Convert this to individual steps they can follow instead of many sentences

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, comments mostly translate to the online store guide too

- Not guaranteed to be stable.
- Should have warnings for users to indicate this is a contrib plugin that is not maintained by the maintainers.

#### How do I make a contrib plugin and "official" plugin?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### How do I make a contrib plugin and "official" plugin?
#### How do I make a contrib plugin an "official" plugin?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.


#### How do I make a contrib plugin and "official" plugin?
To move an offline store plugin out of contrib, you need:
- Github actions (i.e `make test-python-integration`) is setup to run all tests against the offline store and pass.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Github actions (i.e `make test-python-integration`) is setup to run all tests against the offline store and pass.
- GitHub actions (i.e `make test-python-integration`) is setup to run all tests against the offline store and pass.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

#### How do I make a contrib plugin and "official" plugin?
To move an offline store plugin out of contrib, you need:
- Github actions (i.e `make test-python-integration`) is setup to run all tests against the offline store and pass.
- At least two contributors own the plugin (ideally tracked in our OWNERS / CODEOWNERS file).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- At least two contributors own the plugin (ideally tracked in our OWNERS / CODEOWNERS file).
- At least two contributors own the plugin (ideally tracked in our `OWNERS` / `CODEOWNERS` file).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed



2. The universal tests, which are integration tests specifically intended to test offline and online stores, should be run against Feast to ensure that the Feast APIs works with your online store.
- To run the integration tests, you must parametrize the integration test suite based on the `FULL_REPO_CONFIGS` variable defined in `sdk/python/tests/integration/feature_repos/repo_configuration.py` to use your own custom online store.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of confusing to me. I don't think users should try to "run the integration tests" since that is for official plugins.

It's useful context to understand the second bullet though. Would rephrase that Feast parametrizes integration tests using the FULL_REPO_CONFIGS variable, which you overwrite to enable testing for the contrib offline store

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -9,14 +9,14 @@ def test_serialize_entity_key():
# Should be fine
serialize_entity_key(
EntityKeyProto(
join_keys=["user"], entity_values=[ValueProto(int64_val=int(2 ** 15))]
join_keys=["user"], entity_values=[ValueProto(int64_val=int(2**15))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase against master?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

kevjumba added 9 commits July 22, 2022 10:27
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
@kevjumba kevjumba force-pushed the update_offline_store_docs branch from fa0e55d to 977221f Compare July 22, 2022 17:42
kevjumba added 6 commits July 22, 2022 10:46
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
kevjumba added 7 commits July 22, 2022 12:58
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
kevjumba added 10 commits July 22, 2022 13:31
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Signed-off-by: Kevin Zhang <[email protected]>
Copy link
Collaborator

@adchia adchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adchia, kevjumba

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@adchia adchia enabled auto-merge (squash) July 22, 2022 21:41
@feast-ci-bot feast-ci-bot merged commit 1479519 into feast-dev:master Jul 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants