From 977a91cc5d5ce000bcbf406e5a306df2c0c916f4 Mon Sep 17 00:00:00 2001 From: Pratishtha Abrol Date: Fri, 2 Jul 2021 20:16:22 +0530 Subject: [PATCH] updated README with usage instructions --- README.md | 79 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 77 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index fad819a..d170fe8 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,83 @@ # Schema Curation Custom Component -> Outreachy TFX custom component project +[![Python](https://img.shields.io/pypi/pyversions/tfx.svg?style=plastic)](https://github.com/tensorflow/tfx) +[![TensorFlow](https://img.shields.io/badge/TFX-orange)](https://www.tensorflow.org/tfx) -This repo contains the code for Schema Curation Custom Component made as a part of [TFX-Addons](https://github.com/tensorflow/tfx-addons/) through the [Outreachy](https://www.outreachy.org/outreachy-may-2021-internship-round/communities/tensorflow/#create-custom-components-and-tools-for-tensorflow-) program. You may view the linked Pull Request in TFX-Addons [here](https://github.com/tensorflow/tfx-addons/pull/32) and the issue [here](https://github.com/tensorflow/tfx-addons/issues/8) for relevant discussions related to the project. +This is a TFX-component that allows its users to apply a user code to a schema produced by the [SchemaGen](https://www.tensorflow.org/tfx/guide/schemagen) component, and curate it based on domain knowledge. It fits seamlessly into the ML-pipline made with TFX, and allows schema manipulation based on a module file provided by the User. + +## Documentation + +### Inputs: +The custom component takes for input the user *module file*, and the *schema* generated by the SchemaGen component on the specified data. + +### Output: +On running the component, it outputs the *modified schema* based on the code provided in the module file. + +## Module file + +### The Schema Curation *schema_fn*: +The Schema Curation component provides a solution to curating the schema based on user knowledge. As a user, you only have to define a single function called the `schema_fn`. in `schema_fn` you define a series of funcitons that manipulate the input schema to produce the required one. + +An example is: + +``` +def schema_fn(schema): + """modifies the infered schema. + Args: + schema:schema generated by SchemaGen component of tfx + """ + #changing "tips" into optional feature + feature = tfdv.get_feature(schema, 'tips') + feature.presence.min_fraction = 0.9 + + return schema +``` + +## Project Structure + +### Directory Structure +``` +schemacomponent +├── component +│ ├── component.py +│ ├── component_test.py +│ ├── executor.py +│ ├── __init__.py +│ └── __pycache__ +│ ├── component.cpython-38.pyc +│ └── executor.cpython-38.pyc +├── CONTRIBUTING.md +├── data +│ └── data.csv +├── example +│ ├── __init__.py +│ ├── module_file.py +│ ├── taxi_example_colab.ipynb +│ ├── taxi_example_local.py +│ ├── taxi_pipeline_hello_e2e_test.py +│ └── taxi_pipeline_hello.py +├── __init__.py +├── PROPOSAL.md +└── README.md +``` + + +The project follows the structure specified by the [TFX](https://www.tensorflow.org/tfx) documentation for a [TFX fully custom component](https://www.tensorflow.org/tfx/guide/custom_component). + +The `SchemaCurationSpec` class defines the input, output and execution parameters required by the component. + +The `Executor` class defines the functioning of the component, a subclass of the `base_executor.BaseExecutor` with the overriden `Do` function. + +Finally the `SchemaCuration` class integrates the fully custom component into the ML pipeline. + +### Unit Tests + +The component includes separate unit tests for the component and the executor. + + +## Credits + +Schema Curation Custom Component was made as a part of [TFX-Addons](https://github.com/tensorflow/tfx-addons/) through the [Outreachy](https://www.outreachy.org/outreachy-may-2021-internship-round/communities/tensorflow/#create-custom-components-and-tools-for-tensorflow-) program. You may view the linked Pull Request in TFX-Addons [here](https://github.com/tensorflow/tfx-addons/pull/32) and the issue [here](https://github.com/tensorflow/tfx-addons/issues/8) for relevant discussions related to the project. ## The Team: ### Mentors: