diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index a5d02c4f64..c8e313850f 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -23,7 +23,6 @@ * [Feature view](getting-started/concepts/feature-view.md) * [Feature retrieval](getting-started/concepts/feature-retrieval.md) * [Point-in-time joins](getting-started/concepts/point-in-time-joins.md) - * [Registry](getting-started/concepts/registry.md) * [Permission](getting-started/concepts/permission.md) * [\[Alpha\] Saved dataset](getting-started/concepts/dataset.md) * [Components](getting-started/components/README.md) @@ -45,7 +44,6 @@ * [Real-time credit scoring on AWS](tutorials/tutorials-overview/real-time-credit-scoring-on-aws.md) * [Driver stats on Snowflake](tutorials/tutorials-overview/driver-stats-on-snowflake.md) * [Validating historical features with Great Expectations](tutorials/validating-historical-features.md) -* [Using Scalable Registry](tutorials/using-scalable-registry.md) * [Building streaming features](tutorials/building-streaming-features.md) ## How-to Guides @@ -114,6 +112,12 @@ * [Hazelcast (contrib)](reference/online-stores/hazelcast.md) * [ScyllaDB (contrib)](reference/online-stores/scylladb.md) * [SingleStore (contrib)](reference/online-stores/singlestore.md) +* [Registries](reference/registries/README.md) + * [Local](reference/registries/local.md) + * [S3](reference/registries/s3.md) + * [GCS](reference/registries/gcs.md) + * [SQL](reference/registries/sql.md) + * [Snowflake](reference/registries/snowflake.md) * [Providers](reference/providers/README.md) * [Local](reference/providers/local.md) * [Google Cloud Platform](reference/providers/google-cloud-platform.md) diff --git a/docs/getting-started/components/registry.md b/docs/getting-started/components/registry.md index 0939fb53fc..0c85c5ad36 100644 --- a/docs/getting-started/components/registry.md +++ b/docs/getting-started/components/registry.md @@ -1,31 +1,51 @@ # Registry -The Feast feature registry is a central catalog of all the feature definitions and their related metadata. It allows data scientists to search, discover, and collaborate on new features. +The Feast feature registry is a central catalog of all feature definitions and their related metadata. Feast uses the registry to store all applied Feast objects (e.g. Feature views, entities, etc). It allows data scientists to search, discover, and collaborate on new features. The registry exposes methods to apply, list, retrieve and delete these objects, and is an abstraction with multiple implementations. -Each Feast deployment has a single feature registry. Feast only supports file-based registries today, but supports four different backends. +Feast comes with built-in file-based and sql-based registry implementations. By default, Feast uses a file-based registry, which stores the protobuf representation of the registry as a serialized file in the local file system. For more details on which registries are supported, please see [Registries](../../reference/registries/). -* `Local`: Used as a local backend for storing the registry during development -* `S3`: Used as a centralized backend for storing the registry on AWS -* `GCS`: Used as a centralized backend for storing the registry on GCP -* `[Alpha] Azure`: Used as centralized backend for storing the registry on Azure Blob storage. +## Updating the registry -The feature registry is updated during different operations when using Feast. More specifically, objects within the registry \(entities, feature views, feature services\) are updated when running `apply` from the Feast CLI, but metadata about objects can also be updated during operations like materialization. +We recommend users store their Feast feature definitions in a version controlled repository, which then via CI/CD +automatically stays synced with the registry. Users will often also want multiple registries to correspond to +different environments (e.g. dev vs staging vs prod), with staging and production registries with locked down write +access since they can impact real user traffic. See [Running Feast in Production](../../how-to-guides/running-feast-in-production.md#1.-automatically-deploying-changes-to-your-feature-definitions) for details on how to set this up. -Users interact with a feature registry through the Feast SDK. Listing all feature views: +## Accessing the registry from clients + +Users can specify the registry through a `feature_store.yaml` config file, or programmatically. We often see teams +preferring the programmatic approach because it makes notebook driven development very easy: + +### Option 1: programmatically specifying the registry ```python -fs = FeatureStore("my_feature_repo/") -print(fs.list_feature_views()) +repo_config = RepoConfig( + registry=RegistryConfig(path="gs://feast-test-gcs-bucket/registry.pb"), + project="feast_demo_gcp", + provider="gcp", + offline_store="file", # Could also be the OfflineStoreConfig e.g. FileOfflineStoreConfig + online_store="null", # Could also be the OnlineStoreConfig e.g. RedisOnlineStoreConfig +) +store = FeatureStore(config=repo_config) +``` + +### Option 2: specifying the registry in the project's `feature_store.yaml` file + +```yaml +project: feast_demo_aws +provider: aws +registry: s3://feast-test-s3-bucket/registry.pb +online_store: null +offline_store: + type: file ``` -Or retrieving a specific feature view: +Instantiating a `FeatureStore` object can then point to this: ```python -fs = FeatureStore("my_feature_repo/") -fv = fs.get_feature_view(“my_fv1”) +store = FeatureStore(repo_path=".") ``` {% hint style="info" %} -The feature registry is a [Protobuf representation](https://github.com/feast-dev/feast/blob/master/protos/feast/core/Registry.proto) of Feast metadata. This Protobuf file can be read programmatically from other programming languages, but no compatibility guarantees are made on the internal structure of the registry. -{% endhint %} - +The file-based feature registry is a [Protobuf representation](https://github.com/feast-dev/feast/blob/master/protos/feast/core/Registry.proto) of Feast metadata. This Protobuf file can be read programmatically from other programming languages, but no compatibility guarantees are made on the internal structure of the registry. +{% endhint %} \ No newline at end of file diff --git a/docs/getting-started/concepts/README.md b/docs/getting-started/concepts/README.md index 1769a2d741..9b967fb5af 100644 --- a/docs/getting-started/concepts/README.md +++ b/docs/getting-started/concepts/README.md @@ -24,10 +24,6 @@ [point-in-time-joins.md](point-in-time-joins.md) {% endcontent-ref %} -{% content-ref url="registry.md" %} -[registry.md](registry.md) -{% endcontent-ref %} - {% content-ref url="dataset.md" %} [dataset.md](dataset.md) {% endcontent-ref %} diff --git a/docs/getting-started/concepts/registry.md b/docs/getting-started/concepts/registry.md deleted file mode 100644 index 8ac32ce87b..0000000000 --- a/docs/getting-started/concepts/registry.md +++ /dev/null @@ -1,107 +0,0 @@ -# Registry - -Feast uses a registry to store all applied Feast objects (e.g. Feature views, entities, etc). The registry exposes -methods to apply, list, retrieve and delete these objects, and is an abstraction with multiple implementations. - -### Options for registry implementations - -#### File-based registry -By default, Feast uses a file-based registry implementation, which stores the protobuf representation of the registry as -a serialized file. This registry file can be stored in a local file system, or in cloud storage (in, say, S3 or GCS, or Azure). - -The quickstart guides that use `feast init` will use a registry on a local file system. To allow Feast to configure -a remote file registry, you need to create a GCS / S3 bucket that Feast can understand: -{% tabs %} -{% tab title="Example S3 file registry" %} -```yaml -project: feast_demo_aws -provider: aws -registry: - path: s3://[YOUR BUCKET YOU CREATED]/registry.pb - cache_ttl_seconds: 60 -online_store: null -offline_store: - type: file -``` -{% endtab %} - -{% tab title="Example GCS file registry" %} -```yaml -project: feast_demo_gcp -provider: gcp -registry: - path: gs://[YOUR BUCKET YOU CREATED]/registry.pb - cache_ttl_seconds: 60 -online_store: null -offline_store: - type: file -``` -{% endtab %} -{% endtabs %} - -However, there are inherent limitations with a file-based registry, since changing a single field in the registry -requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or -bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for -multiple feature views or time ranges concurrently). - -#### SQL Registry -Alternatively, a [SQL Registry](../../tutorials/using-scalable-registry.md) can be used for a more scalable registry. - -The configuration roughly looks like: -```yaml -project: -provider: -online_store: redis -offline_store: file -registry: - registry_type: sql - path: postgresql://postgres:mysecretpassword@127.0.0.1:55001/feast - cache_ttl_seconds: 60 - sqlalchemy_config_kwargs: - echo: false - pool_pre_ping: true -``` - -This supports any SQLAlchemy compatible database as a backend. The exact schema can be seen in [sql.py](https://github.com/feast-dev/feast/blob/master/sdk/python/feast/infra/registry/sql.py) - -### Updating the registry - -We recommend users store their Feast feature definitions in a version controlled repository, which then via CI/CD -automatically stays synced with the registry. Users will often also want multiple registries to correspond to -different environments (e.g. dev vs staging vs prod), with staging and production registries with locked down write -access since they can impact real user traffic. See [Running Feast in Production](../../how-to-guides/running-feast-in-production.md#1.-automatically-deploying-changes-to-your-feature-definitions) for details on how to set this up. - -### Accessing the registry from clients - -Users can specify the registry through a `feature_store.yaml` config file, or programmatically. We often see teams -preferring the programmatic approach because it makes notebook driven development very easy: - -#### Option 1: programmatically specifying the registry - -```python -repo_config = RepoConfig( - registry=RegistryConfig(path="gs://feast-test-gcs-bucket/registry.pb"), - project="feast_demo_gcp", - provider="gcp", - offline_store="file", # Could also be the OfflineStoreConfig e.g. FileOfflineStoreConfig - online_store="null", # Could also be the OnlineStoreConfig e.g. RedisOnlineStoreConfig -) -store = FeatureStore(config=repo_config) -``` - -#### Option 2: specifying the registry in the project's `feature_store.yaml` file - -```yaml -project: feast_demo_aws -provider: aws -registry: s3://feast-test-s3-bucket/registry.pb -online_store: null -offline_store: - type: file -``` - -Instantiating a `FeatureStore` object can then point to this: - -```python -store = FeatureStore(repo_path=".") -``` \ No newline at end of file diff --git a/docs/reference/registries/README.md b/docs/reference/registries/README.md new file mode 100644 index 0000000000..1310506f1d --- /dev/null +++ b/docs/reference/registries/README.md @@ -0,0 +1,23 @@ +# Registies + +Please see [Registry](../../getting-started/architecture-and-components/registry.md) for a conceptual explanation of registries. + +{% content-ref url="local.md" %} +[local.md](local.md) +{% endcontent-ref %} + +{% content-ref url="s3.md" %} +[s3.md](s3.md) +{% endcontent-ref %} + +{% content-ref url="gcs.md" %} +[gcs.md](gcs.md) +{% endcontent-ref %} + +{% content-ref url="sql.md" %} +[sql.md](sql.md) +{% endcontent-ref %} + +{% content-ref url="snowflake.md" %} +[snowflake.md](snowflake.md) +{% endcontent-ref %} diff --git a/docs/reference/registries/gcs.md b/docs/reference/registries/gcs.md new file mode 100644 index 0000000000..13c9657aa1 --- /dev/null +++ b/docs/reference/registries/gcs.md @@ -0,0 +1,23 @@ +# GCS Registry + +## Description + +GCS registry provides support for storing the protobuf representation of your feature store objects (data sources, feature views, feature services, etc.) uing Google Cloud Storage. + +While it can be used in production, there are still inherent limitations with a file-based registries, since changing a single field in the registry requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for multiple feature views or time ranges concurrently). + +An example of how to configure this would be: + +## Example + +{% code title="feature_store.yaml" %} +```yaml +project: feast_gcp +registry: + path: gs://[YOUR BUCKET YOU CREATED]/registry.pb + cache_ttl_seconds: 60 +online_store: null +offline_store: + type: dask +``` +{% endcode %} \ No newline at end of file diff --git a/docs/reference/registries/local.md b/docs/reference/registries/local.md new file mode 100644 index 0000000000..ad1d98cea9 --- /dev/null +++ b/docs/reference/registries/local.md @@ -0,0 +1,23 @@ +# Local Registry + +## Description + +Local registry provides support for storing the protobuf representation of your feature store objects (data sources, feature views, feature services, etc.) in local file system. It is only intended to be used for experimentation with Feast and should not be used in production. + +There are inherent limitations with a file-based registries, since changing a single field in the registry requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for multiple feature views or time ranges concurrently). + +An example of how to configure this would be: + +## Example + +{% code title="feature_store.yaml" %} +```yaml +project: feast_local +registry: + path: registry.pb + cache_ttl_seconds: 60 +online_store: null +offline_store: + type: dask +``` +{% endcode %} \ No newline at end of file diff --git a/docs/reference/registries/s3.md b/docs/reference/registries/s3.md new file mode 100644 index 0000000000..65069c415c --- /dev/null +++ b/docs/reference/registries/s3.md @@ -0,0 +1,23 @@ +# S3 Registry + +## Description + +S3 registry provides support for storing the protobuf representation of your feature store objects (data sources, feature views, feature services, etc.) in S3 file system. + +While it can be used in production, there are still inherent limitations with a file-based registries, since changing a single field in the registry requires re-writing the whole registry file. With multiple concurrent writers, this presents a risk of data loss, or bottlenecks writes to the registry since all changes have to be serialized (e.g. when running materialization for multiple feature views or time ranges concurrently). + +An example of how to configure this would be: + +## Example + +{% code title="feature_store.yaml" %} +```yaml +project: feast_aws_s3 +registry: + path: s3://[YOUR BUCKET YOU CREATED]/registry.pb + cache_ttl_seconds: 60 +online_store: null +offline_store: + type: dask +``` +{% endcode %} \ No newline at end of file diff --git a/docs/reference/registry/snowflake.md b/docs/reference/registries/snowflake.md similarity index 97% rename from docs/reference/registry/snowflake.md rename to docs/reference/registries/snowflake.md index 31b0db9582..00d87b1977 100644 --- a/docs/reference/registry/snowflake.md +++ b/docs/reference/registries/snowflake.md @@ -1,4 +1,4 @@ -# Snowflake registry +# Snowflake Registry ## Description diff --git a/docs/tutorials/using-scalable-registry.md b/docs/reference/registries/sql.md similarity index 97% rename from docs/tutorials/using-scalable-registry.md rename to docs/reference/registries/sql.md index 25746f60e2..631a20cbe3 100644 --- a/docs/tutorials/using-scalable-registry.md +++ b/docs/reference/registries/sql.md @@ -1,9 +1,4 @@ ---- -description: >- - Tutorial on how to use the SQL registry for scalable registry updates ---- - -# Using Scalable Registry +# SQL Registry ## Overview