Skip to content

Commit

Permalink
docs: add sync status feature guide (datahub-project#5897)
Browse files Browse the repository at this point in the history
  • Loading branch information
hsheth2 authored and cccs-tom committed Nov 18, 2022
1 parent 00f89bf commit 349e814
Show file tree
Hide file tree
Showing 2 changed files with 47 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -454,6 +454,7 @@ module.exports = {
"docs/tags",
"docs/features/dataset-usage-and-query-history",
"docs/posts",
"docs/sync-status",
// "docs/wip/ui-ingestion-guide", -- not needed
// "docs/wip/personal-access-tokens-guide", -- not needed

Expand Down
46 changes: 46 additions & 0 deletions docs/sync-status.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
import FeatureAvailability from '@site/src/components/FeatureAvailability';

# About DataHub Sync Status

<FeatureAvailability/>

When looking at metadata in DataHub, it's useful to know if the information you're looking at is relevant.
Specifically, if metadata is stale, or hasn't been updated in a while, then you should consider refreshing that metadata
using [metadata ingestion](./../metadata-ingestion/README.md) or [deleting](./how/delete-metadata.md) it if it no longer exists.

## Sync Status Setup, Prerequisites, and Permissions

The sync status feature is enabled by default and does not require any special setup.

## Using Sync Status

The DataHub UI will display the sync status in the top right corner of the page.

The last synchronized date is basically the last time an ingestion run saw an entity. It is computed as the most recent update to the entity, excluding changes done through the UI. If an ingestion run restates an entity but doesn't actually cause any changes, we still count that as an update for the purposes of sync status.

<details>
<summary>Technical details: computing the last synchronized timestamp</summary>

To compute the last synchronized timestamp, we look at the system metadata of all aspects associated with the entity.
We exclude any aspects where the system metadata `runId` value is unset or equal to `no-run-id-provided`, as this is what filters out changes made through the UI.
Finally, we take the most recent system metadata `lastObserved` timestamp across the aspects and use that as the last synchronized timestamp.

</details>

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/master/imgs/sync-status-normal.png"/>
</p>

We'll automatically assign a color based on the sync status recency:

- Green: last synchronized in the past week
- Yellow: last synchronized in the past month
- Red: last synchronized more than a month ago

You can hover over the sync status message in the UI to view the exact timestamp of the most recent sync.

<p align="center">
<img width="70%" src="https://raw.githubusercontent.com/datahub-project/static-assets/master/imgs/sync-status-hover-card.png"/>
</p>

_Need more help? Join the conversation in [Slack](http://slack.datahubproject.io)!_

0 comments on commit 349e814

Please sign in to comment.