Skip to content

Commit

Permalink
fix: fix img links
Browse files Browse the repository at this point in the history
  • Loading branch information
yoonhyejin committed Aug 24, 2023
1 parent aab5b6a commit 3b27224
Show file tree
Hide file tree
Showing 283 changed files with 1,274 additions and 926 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ metadata-ingestion/generated/**

# docs
docs/generated/
docs-website/versioned_docs/
tmp*
temp/**

Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,12 @@ Please follow the [DataHub Quickstart Guide](https://datahubproject.io/docs/quic
## Development

If you're looking to build & modify datahub please take a look at our [Development Guide](https://datahubproject.io/docs/developers).
<p align="center">
<a href="https://demo.datahubproject.io/">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/entity.png"/>
</a>
</p>

[![DataHub Demo GIF](docs/imgs/entity.png)](https://demo.datahubproject.io/)

## Source Code and Repositories

Expand Down
40 changes: 19 additions & 21 deletions docs/actions/concepts.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# DataHub Actions Concepts

The Actions framework includes pluggable components for filtering, transforming, and reacting to important DataHub, such as
The Actions framework includes pluggable components for filtering, transforming, and reacting to important DataHub, such as

- Tag Additions / Removals
- Glossary Term Additions / Removals
Expand All @@ -17,17 +17,16 @@ Finally, the framework is highly configurable & scalable. Notable highlights inc
- **At-least Once Delivery**: Native support for independent processing state for each Action via post-processing acking to achieve at-least once semantics.
- **Robust Error Handling**: Configurable failure policies featuring event-retry, dead letter queue, and failed-event continuation policy to achieve the guarantees required by your organization.


### Use Cases

Real-time use cases broadly fall into the following categories:

- **Notifications**: Generate organization-specific notifications when a change is made on DataHub. For example, send an email to the governance team when a "PII" tag is added to any data asset.
- **Workflow Integration**: Integrate DataHub into your organization's internal workflows. For example, create a Jira ticket when specific Tags or Terms are proposed on a Dataset.
- **Synchronization**: Syncing changes made in DataHub into a 3rd party system. For example, reflecting Tag additions in DataHub into Snowflake.
- **Auditing**: Audit who is making what changes on DataHub through time.
- **Auditing**: Audit who is making what changes on DataHub through time.

and more!
and more!

## Concepts

Expand All @@ -40,62 +39,61 @@ The Actions Framework consists of a few core concepts--

Each of these will be described in detail below.

![](imgs/actions.png)
**In the Actions Framework, Events flow continuously from left-to-right.**
<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/actions.png"/>
</p>

**In the Actions Framework, Events flow continuously from left-to-right.**

### Pipelines

A **Pipeline** is a continuously running process which performs the following functions:

1. Polls events from a configured Event Source (described below)
2. Applies configured Transformation + Filtering to the Event
2. Applies configured Transformation + Filtering to the Event
3. Executes the configured Action on the resulting Event

in addition to handling initialization, errors, retries, logging, and more.
in addition to handling initialization, errors, retries, logging, and more.

Each Action Configuration file corresponds to a unique Pipeline. In practice,
each Pipeline has its very own Event Source, Transforms, and Actions. This makes it easy to maintain state for mission-critical Actions independently.
each Pipeline has its very own Event Source, Transforms, and Actions. This makes it easy to maintain state for mission-critical Actions independently.

Importantly, each Action must have a unique name. This serves as a stable identifier across Pipeline run which can be useful in saving the Pipeline's consumer state (ie. resiliency + reliability). For example, the Kafka Event Source (default) uses the pipeline name as the Kafka Consumer Group id. This enables you to easily scale-out your Actions by running multiple processes with the same exact configuration file. Each will simply become different consumers in the same consumer group, sharing traffic of the DataHub Events stream.

### Events

**Events** are data objects representing changes that have occurred on DataHub. Strictly speaking, the only requirement that the Actions framework imposes is that these objects must be
**Events** are data objects representing changes that have occurred on DataHub. Strictly speaking, the only requirement that the Actions framework imposes is that these objects must be

a. Convertible to JSON
b. Convertible from JSON

So that in the event of processing failures, events can be written and read from a failed events file.

So that in the event of processing failures, events can be written and read from a failed events file.

#### Event Types

Each Event instance inside the framework corresponds to a single **Event Type**, which is common name (e.g. "EntityChangeEvent_v1") which can be used to understand the shape of the Event. This can be thought of as a "topic" or "stream" name. That being said, Events associated with a single type are not expected to change in backwards-breaking ways across versons.

### Event Sources

Events are produced to the framework by **Event Sources**. Event Sources may include their own guarantees, configurations, behaviors, and semantics. They usually produce a fixed set of Event Types.
Events are produced to the framework by **Event Sources**. Event Sources may include their own guarantees, configurations, behaviors, and semantics. They usually produce a fixed set of Event Types.

In addition to sourcing events, Event Sources are also responsible for acking the succesful processing of an event by implementing the `ack` method. This is invoked by the framework once the Event is guaranteed to have reached the configured Action successfully.
In addition to sourcing events, Event Sources are also responsible for acking the succesful processing of an event by implementing the `ack` method. This is invoked by the framework once the Event is guaranteed to have reached the configured Action successfully.

### Transformers

**Transformers** are pluggable components which take an Event as input, and produce an Event (or nothing) as output. This can be used to enrich the information of an Event prior to sending it to an Action.
**Transformers** are pluggable components which take an Event as input, and produce an Event (or nothing) as output. This can be used to enrich the information of an Event prior to sending it to an Action.

Multiple Transformers can be configured to run in sequence, filtering and transforming an event in multiple steps.

Transformers can also be used to generate a completely new type of Event (i.e. registered at runtime via the Event Registry) which can subsequently serve as input to an Action.
Transformers can also be used to generate a completely new type of Event (i.e. registered at runtime via the Event Registry) which can subsequently serve as input to an Action.

Transformers can be easily customized and plugged in to meet an organization's unqique requirements. For more information on developing a Transformer, check out [Developing a Transformer](guides/developing-a-transformer.md)


### Action

**Actions** are pluggable components which take an Event as input and perform some business logic. Examples may be sending a Slack notification, logging to a file,
or creating a Jira ticket, etc.
or creating a Jira ticket, etc.

Each Pipeline can be configured to have a single Action which runs after the filtering and transformations have occurred.
Each Pipeline can be configured to have a single Action which runs after the filtering and transformations have occurred.

Actions can be easily customized and plugged in to meet an organization's unqique requirements. For more information on developing a Action, check out [Developing a Action](guides/developing-an-action.md)


Binary file removed docs/actions/imgs/actions.png
Binary file not shown.
Loading

0 comments on commit 3b27224

Please sign in to comment.