Skip to content

Commit

Permalink
Merge pull request #20 from rcrowe-google/Pratishtha/base/README
Browse files Browse the repository at this point in the history
updated README with usage instructions
  • Loading branch information
pratishtha-abrol authored Jul 2, 2021
2 parents 2bb6c89 + 977a91c commit d749760
Showing 1 changed file with 77 additions and 2 deletions.
79 changes: 77 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,83 @@
# Schema Curation Custom Component

> Outreachy TFX custom component project
[![Python](https://img.shields.io/pypi/pyversions/tfx.svg?style=plastic)](https://github.com/tensorflow/tfx)
[![TensorFlow](https://img.shields.io/badge/TFX-orange)](https://www.tensorflow.org/tfx)

This repo contains the code for Schema Curation Custom Component made as a part of [TFX-Addons](https://github.com/tensorflow/tfx-addons/) through the [Outreachy](https://www.outreachy.org/outreachy-may-2021-internship-round/communities/tensorflow/#create-custom-components-and-tools-for-tensorflow-) program. You may view the linked Pull Request in TFX-Addons [here](https://github.com/tensorflow/tfx-addons/pull/32) and the issue [here](https://github.com/tensorflow/tfx-addons/issues/8) for relevant discussions related to the project.
This is a TFX-component that allows its users to apply a user code to a schema produced by the [SchemaGen](https://www.tensorflow.org/tfx/guide/schemagen) component, and curate it based on domain knowledge. It fits seamlessly into the ML-pipline made with TFX, and allows schema manipulation based on a module file provided by the User.

## Documentation

### Inputs:
The custom component takes for input the user *module file*, and the *schema* generated by the SchemaGen component on the specified data.

### Output:
On running the component, it outputs the *modified schema* based on the code provided in the module file.

## Module file

### The Schema Curation *schema_fn*:
The Schema Curation component provides a solution to curating the schema based on user knowledge. As a user, you only have to define a single function called the `schema_fn`. in `schema_fn` you define a series of funcitons that manipulate the input schema to produce the required one.

An example is:

```
def schema_fn(schema):
"""modifies the infered schema.
Args:
schema:schema generated by SchemaGen component of tfx
"""
#changing "tips" into optional feature
feature = tfdv.get_feature(schema, 'tips')
feature.presence.min_fraction = 0.9
return schema
```

## Project Structure

### Directory Structure
```
schemacomponent
├── component
│ ├── component.py
│ ├── component_test.py
│ ├── executor.py
│ ├── __init__.py
│ └── __pycache__
│ ├── component.cpython-38.pyc
│ └── executor.cpython-38.pyc
├── CONTRIBUTING.md
├── data
│ └── data.csv
├── example
│ ├── __init__.py
│ ├── module_file.py
│ ├── taxi_example_colab.ipynb
│ ├── taxi_example_local.py
│ ├── taxi_pipeline_hello_e2e_test.py
│ └── taxi_pipeline_hello.py
├── __init__.py
├── PROPOSAL.md
└── README.md
```


The project follows the structure specified by the [TFX](https://www.tensorflow.org/tfx) documentation for a [TFX fully custom component](https://www.tensorflow.org/tfx/guide/custom_component).

The `SchemaCurationSpec` class defines the input, output and execution parameters required by the component.

The `Executor` class defines the functioning of the component, a subclass of the `base_executor.BaseExecutor` with the overriden `Do` function.

Finally the `SchemaCuration` class integrates the fully custom component into the ML pipeline.

### Unit Tests

The component includes separate unit tests for the component and the executor.


## Credits

Schema Curation Custom Component was made as a part of [TFX-Addons](https://github.com/tensorflow/tfx-addons/) through the [Outreachy](https://www.outreachy.org/outreachy-may-2021-internship-round/communities/tensorflow/#create-custom-components-and-tools-for-tensorflow-) program. You may view the linked Pull Request in TFX-Addons [here](https://github.com/tensorflow/tfx-addons/pull/32) and the issue [here](https://github.com/tensorflow/tfx-addons/issues/8) for relevant discussions related to the project.

## The Team:
### Mentors:
Expand Down

0 comments on commit d749760

Please sign in to comment.