Skip to content

Releases: datahub-project/datahub

DataHub v0.10.0

07 Feb 21:16
cf1e627
Compare
Choose a tag to compare

Release Highlights

Potential Downtime

This release introduces substantial improvements to search functionality which require reindexing indices.

During the reindexing:

  • a system-update job will set indices to read-only and create a backup/clone of each index
  • new components will be prevented from start-up until the reindex completes
  • Helm deployments will go into read-only mode and new ingestion runs will fail

This process can take anywhere from 5 minutes to multiple hours; as rough estimate, please expect it to take 1 hour for every 2.3 million entities. After the reindex is complete, please check your ingestion run to re-run any that did not complete.

If you are deploying containers yourself

If you're deploying the Docker containers yourself (without Helm or Docker-Compose Quickstart), then you'll need to ensure that you first run the acryldata/datahub-upgrade docker image (v0.10.0 tag) with the following environment variables enabled.

Then, run the container this with the command

docker run acryldata/datahub-upgrade:v0.10.0 -u SystemUpdate

For the full set of environment variables required, check out the default docker.env provided for Docker Compose deployments.

This will run the required reindex against your elasticsearch instance, after which other DataHub components should start correctly. If you do not run the datahub-upgrade container successfully, other components in the stack will fail to start correctly.

User Experience

We have some really exciting improvements to the DataHub user experience in this release!

Improved documentation editor, contributed by @ngamanda and the Grab Team.
This work provides a much more intuitive documentation editing experience within the UI, providing “what you see is what you get” formatting & removing the need for markdown expertise.

Additionally, you can easily:

  • Add links to other entities/users within DataHub
  • embed and resize tables & images
  • toggle between font sizes and formats
  • embed syntax-highlighted code blocks

Filter lineage graphs based on time windows
You can now easily see the full lineage graph of an entity at a specific point in time. This makes it much easier to understand how interdependencies have evolved over time and to troubleshoot data issues in the past.

Improvements in Search
As noted above, we have rolled out substantial improvements to Search functionality, making it easier than ever for end-user to find the entities that matter most. This release includes:

  • Stemm & Synonyms
  • Search by full or partial URN
  • Autocomplete improvements
  • Quoted search analyzer for exact & prefix match

Metadata Ingestion

Here are some of the most notable ingestion-related improvements:

  • Redshift: You can now extract lineage information from unload queries – thanks for the contrib, @mmmeeedddsss
  • PowerBI: Ingestion now maps Workspaces to DataHub Containers – thanks for the contrib, @looppi
  • BigQuery: You can now extract lineage metadata from the Catalog API – thanks for the crontrib, @PatrickfBraz
  • Glue: Ingestion now uses table name as the human-readable name – thanks for the contrib, @danielcmessias

Developer Experience

  • This release introduces DataHub Lite - a new experimental lightweight implementation of DataHub. It is intended to enable local developer tooling use-cases such as simple access to metadata for scripts and other tools. DataHub Lite is compatible with the DataHub metadata format and all the ingestion connectors that DataHub supports. Checkout the docs here.

Breaking Changes

#7103 This should only impact users who have configured explicit non-default names for DataHub's Kafka topics. The environment variables used to configure Kafka topics for DataHub used in the kafka-setup docker image have been updated to be in-line with other DataHub components, for more info see our docs on Configuring Kafka in DataHub . They have been suffixed with _TOPIC where as now the correct suffix is _TOPIC_NAME. This change should not affect any user who is using default Kafka names.

What's Changed

  • fix(ci): only scan on master branch by @anshbansal in #7047
  • fix(ci): use trivy offline scanning by @anshbansal in #7050
  • docs(get-started) Simplify copy on Get Started landing page by @maggiehays in #7043
  • fix(ingest/kafka): fix ResourceType import error for confluent_kafka<1.9.0 by @mayurinehate in #7046
  • docs(dbt): fix indentation in dbt meta mapping docs by @jx2lee in #7045
  • fix(ingest): temporarily disable vertica tests by @hsheth2 in #7059
  • feat(editor): improve documentation editor using Remirror by @ngamanda in #6631
  • fix(bootstrap): add EDIT_LINEAGE privilege to some default policies by @aditya-radhakrishnan in #7060
  • feat(ingest): add entity registry in codegen by @hsheth2 in #6984
  • feat(ingest): extract powerbi endorsements to tags by @looppi in #6638
  • feat(ingestion): pull metabase database, schema names from raw query and api by @remisalmon in #7039
  • fix(ingest): support multiple entity_registry sections by @hsheth2 in #7066
  • ci(ingest): add flag to skip tests but run codegen during release by @hsheth2 in #7067
  • fix(ingest): preserve dbt column name casing by @hsheth2 in #7063
  • fix(ingest/tableau): fix node limit exceeded error for workbooks query by @mayurinehate in #7068
  • fix(build/airflow): Fixing gradlew path by @treff7es in #7069
  • feat(ingest): support snapshots in dbt and dbt-cloud by @hsheth2 in #7062
  • fix(ui) Fix duplicate schema field rendering with siblings by @chriscollins3456 in #7057
  • refactor(ingest/athena): Replace s3_staging_dir parameter in Athena source with query_result_location by @bossenti in #7044
  • feat(ingest): fix handling of unions with aliases in post restli conversion by @hsheth2 in #7058
  • fix(ui) Make checkboxes in ingestion forms easier to see by @chriscollins3456 in #7061
  • fix(ingest): support git clone of non-github repos by @hsheth2 in #7065
  • feat(ingest): reporting revamp, part 1 by @hsheth2 in #7031
  • fix(secret-service): fix default encrypt key by @david-leifker in #7074
  • feat(datahub-lite): introduces a new experimental lightweight impleme… by @shirshanka in #7052
  • feat(datahub-lite): adding tab completion, small serialization fixes by @shirshanka in #7079
  • docs: add docs for managed DataHub v0.1.72 by @anshbansal in #7070
  • docs(readme): add inovex as adopter by @DSchmidtDev in #7077
  • docs: add warning about clearing cookies for login by @anshbansal in #7084
  • feat(cache): add hazelcast distributed cache option by @RyanHolstien in #6645
  • docs(datahub-lite): small improvement for zsh tab completion by @shirshanka in #7085
  • fix(ingest/bigquery): clear stateful ingestion correctly by @hsheth2 in #7075
  • fix(graphql): Return with appropriate status code instead of stacktrace by @szalai1 in #7086
  • fix(sso): Clear cookies on SSO redirect error by @aditya-radhakrishnan in #7088
  • fix(docs): add missing mutation literal by @ruedigerblock in #7082
  • fix(ui): display the correct access token expiry in AccessTokenModal by @ngamanda in #7078
  • fix(cli/lite): fix datahub lite serve command by @hsheth2 in #7089
  • fix(profiling): Fix syntax for APPROX_COUNT_DISTINCT on bigquery and snowflake by @feljen in #7087
  • fix(ingest): fix logic error of google protobuf wrapper type. by @wngus606 in #7076
  • feat(ui): Documentation Editor Improvements by @jjoyce0510 in #7072
  • fix(uri): marks uri field as deprecated, removes problem code, and adds coercer for usages of URI typeref by @RyanHolstien in #7093
  • fix(build): postgres docker secret by @david-leifker in https://github.com/datahub-pr...
Read more

DataHub v0.9.6.1

31 Jan 21:29
Compare
Choose a tag to compare

Release Highlights

Please upgrade from 0.9.6 ASAP to avoid ongoing issues creating and using secrets.

Important Release Notes

With this release, if you are using Neo4J as your graph implementation, you need to set:
GRAPH_SERVICE_DIFF_MODE_ENABLED=false

For GMS (or MAE Consumer for standalone mode).

Bug fix for secrets encryption

  • Prevents decryption errors for existing secrets
  • Affects reading ingestion secret created with a previous release
  • Affects native user password validation

What's Changed

Full Changelog: v0.9.6...v0.9.6.1

DataHub v0.9.6

14 Jan 00:59
5951379
Compare
Choose a tag to compare

⚠️ This Release has been patched. Please upgrade to 0.9.6.1 ⚠️

As of January 19th, 2023 0.9.6.1 is now the official release build, and should be used over 0.9.6. Upgrade to 0.9.6.1 when possible to avoid issues creating and using secrets.



Release Highlights

Important Release Notes

With this release, if you are using Neo4J as your graph implementation, you need to set:
GRAPH_SERVICE_DIFF_MODE_ENABLED=false

For GMS (or MAE Consumer for standalone mode).

User Experience

  • We now support embedding Dashboards, Charts, and Datasets. This allows us to do things like directly embed Looker / Tableau / Mode / Redash Looks, Dashboards, Explores into the Dataset pages themselves.

image

  • [Experimental] You can now customize the number of queries displayed on the Query tab of a Dataset entity

image

  • Improved error messaging for bulk editing via the UI

Metadata Ingestion

  • Update to data profiling to allow configurable number of sample values to be returned
  • Postgres ingestion now supports emitting lineage edges for Views - shoutout to @LucasRoesler for the contribution!
  • Snowflake ingestion now supports extracting tags - shoutout to @frsann for the contribution!
  • Vertica ingestion now supports projections and lineage- thanks for the contribution, @vishalkSimplify!
  • Glue ingestion now emits an s3 lineage edge when data was written with an s3a/s3n client - thanks for the contribution, @danielli-ziprecruiter!

Developer Experience

  • Fixes quickstart/docker compose issues for M1 machines
  • Improvements in reliability and performance of the Restli Service endpoints for ingestion:
    • Scale Restli Service thread pool based on CPU
    • Add retry (exp backoff) to Restli Entity Client
    • MCE no longer relies on GMS for Restli service
    • Converted Restli Service from standalone servlet to Spring injectable
    • Docker build externalized (significantly faster on m1, <7 minute build times, based on this)
    • Frontend asset generation refactor (causing tests to fail intermittently)

What's Changed

  • feat(ingest): add pydantic helper for removed fields by @hsheth2 in #6853
  • chore(0.9.5): Bump defaults for release v0.9.5 by @jjoyce0510 in #6856
  • Revert "fix(ci): remove warnings due to deprecated action" by @anshbansal in #6857
  • refactor(restli-mce-consumer) by @david-leifker in #6744
  • fix(ci): reduce smoke test run time by @anshbansal in #6841
  • fix(security): require signed/encrypted jwt tokens by @david-leifker in #6565
  • feat(ingest): update profiling to fetch configurable number of sample values by @mayurinehate in #6859
  • feat(ingest/airflow): support raw dataset urns in airflow lineage by @hsheth2 in #6854
  • refactor(graphql): make graphqlengine easier to use by @anshbansal in #6865
  • fix(kafka): datahub-upgrade job by @david-leifker in #6864
  • feat(ingest): pass timeout config in kafka admin client api calls by @mayurinehate in #6863
  • chore(ingest): loosen requirements file by @hsheth2 in #6867
  • feat(ingest): upgrade pydantic version by @cccs-eric in #6858
  • fix(elasticsearch): fixes out of order runId writes by @david-leifker in #6845
  • chore(ingest): loosen additional requirements by @hsheth2 in #6868
  • feat(ingest): bigquery/snowflake - Store last profile date in state by @treff7es in #6832
  • docs(google-analytics): Correct grammatical error in README.md by @jx2lee in #6870
  • feat(CI): add venv caching by @szalai1 in #6843
  • feat(ingest/snowflake): handle failures gracefully and raise permission failures by @mayurinehate in #6748
  • fix(runid): always update runid, except when queued by @david-leifker in #6876
  • fix(ingest): conditionally include env in assertion guid by @hsheth2 in #6811
  • chore(ci): update dependencies docs-website by @anshbansal in #6871
  • feat(ui) - Add a custom error message for bulk edit to add clarity by @mkamalas in #6775
  • docs(adding users): Refreshing the docs for adding new DataHub Users by @jjoyce0510 in #6879
  • test(mce-consumer): mockbeans by @david-leifker in #6878
  • feat(ingest): avoid embedding serialized json in metadata files by @hsheth2 in #6742
  • refactor(gradle): move the local docker registry to common location by @david-leifker in #6881
  • refactor(smoke): use env variables by @anshbansal in #6866
  • fix(lint): pin pydantic version by @anshbansal in #6886
  • refactor(docs): Correctly spell elasticsearch in docs by @jjoyce0510 in #6880
  • fix(ingest): okta undefined variable error by @anshbansal in #6882
  • fix(ci): reduce flakiness in add_users, siblings smoke test by @anshbansal in #6883
  • fix(ingest): fall back to default table comment method for all Trino query errors by @marvin-roesch in #6873
  • test(misc): misc test updates by @david-leifker in #6890
  • deprecate(ingest): bigquery - Removing bigquery-legacy source by @treff7es in #6851
  • chore(ingest): remove inferred args to MCPW, part 1 by @hsheth2 in #6819
  • test(ingest/kafka-connect): make docker setup more reliable by @hsheth2 in #6902
  • fix(ingest): profiling (bigquery) - Address biquery profiling query error due to timestamp vs data mismatch by @treff7es in #6874
  • fix(cli): Make datahub quickstart work with latest docker compose in M1 by @pedro93 in #6891
  • fix(cli): fix delete urn cli bug + stricter type annotations by @hsheth2 in #6903
  • fix(ingest/airflow): reorder imports to avoid cyclical dependencies by @stijndehaes in #6719
  • feat: remove jq requirement + tweak modeldocgen args by @hsheth2 in #6904
  • chore(ingest): loosen pyspark and pydeequ deps by @hsheth2 in #6908
  • docs(ingest/looker): fix typos + update lookml github action example by @hsheth2 in #6910
  • fix(ingest/metabase): use card_id in dashboard to chart lineage by @ccpypy in #6583
  • fix(es-setup): create data stream on non-aws by @szalai1 in #6926
  • Adding missing Platform logos by @maggiehays in #6892
  • feat(ingestion): PowerBI# Improve PowerBI source ingestion by @mohdsiddique in #6549
  • Fix compose context for kafka-setup by @szalai1 in #6923
  • feat(backend): Supporting Embeddable Previews for Dashboards, Charts, Datasets by @jjoyce0510 in #6875
  • chore(deps): bump json5 from 2.2.1 to 2.2.3 in /docs-website by @dependabot in #6930
  • chore(deps): bump json5 from 1.0.1 to 1.0.2 in /datahub-web-react by @dependabot in #6931
  • fix(ci): managed ingestion test fix by @anshbansal in #6946
  • feat(ingest): add include_table_location_lineage flag for SQL common by @hsheth2 in #6934
  • feat(ingest): allow extracting snowflake tags by @frsann in #6500
  • chore(ingest): unpin pydantic dep by @hsheth2 in #6909
  • chore(ingest): partially revert pyspark dep from #6908 by @hsheth2 in #6954
  • fix(ingest): use branch info when cloning git repos by @hsheth2 in #6937
  • chore(ingest): remove i...
Read more

DataHub v0.9.5

23 Dec 20:32
c482ef0
Compare
Choose a tag to compare

Release Highlights

Notice: This PR includes a fix for Single Sign-On (OIDC) that was introduced in the previous release, v0.9.4.

Important Release Notes

With this release, if you are using Neo4J as your graph implementation, you need to set:
GRAPH_SERVICE_DIFF_MODE_ENABLED=false

For GMS (or MAE Consumer for standalone mode).

User Experience

  • Manual Lineage is LIVE! You can now add and remove lineage between entities in the Lineage Visualization screen, making it easier than ever to manage the complex relationships between your data resources.

ui_lineage_1
ui_lineage_2
ui_lineage_3

  • Our new Views feature makes it easy to create curated sets of Entities within DataHub. This is a great way to start to isolate the entities that matter most, and provide your DataHub end-users with a streamlined view of the assets that are relevant to their use cases. See the original demo video.

create_view
sharing_views

  • In-App Product Tours are here! When logging into DataHub and/or visiting a new page type for the first time, new users will be prompted with a helpful walkthrough of core functionality to get them familiar with the platform. We’ll continue to add modules as we roll out new features!

in_app_product_tour

  • Automatically send updates to Slack and/or Microsoft Teams when changes are made within DataHub by leveraging our the new Slack and Teams Actions.

Metadata Ingestion

We’re continuing to improve the user experience for UI-based ingestion for the following sources:

  • DataBricks Unity Catalog
  • dbt Cloud
  • MySQL
  • Trino/Presto
  • Microsoft SQL Server
  • MariaDB

If you’re just getting started with UI-based Ingestion, check out our new BigQuery & Snowflake guides.

Stateful ingestion is now supported for Iceberg (thanks for the contrib, @cccs-Dustin!) and LDAP (thanks for the contrib, @bda618!)

What's Changed

New Contributors

Full Changelog: v0.9.4...v0.9.5

[Known Issues] DataHub v0.9.4

20 Dec 23:14
e6c48e5
Compare
Choose a tag to compare

Known Issues

In this release, the version of our OIDC SSO library was majorly upgraded. There is an issue with how the newer version of the library interacts with OIDC providers. We have addressed this issue in v0.9.5. We recommend avoiding upgrading to this version if your organization is actively using OIDC to manage user authentication.

Important Release Notes

With this release, if you are using Neo4J as your graph implementation, you need to set:
GRAPH_SERVICE_DIFF_MODE_ENABLED=false

For GMS (or MAE Consumer for standalone mode).

What's Changed

  • chore(): Updating default CLI version, update updating-datahub.md by @jjoyce0510 in #6590
  • fix(ingest): profiling - Profiling failed if column cardinality threw an error by @treff7es in #6582
  • fix(actions): add missing datahub-gms-protocol env var by @shirshanka in #6593
  • fix(ingest): restrict snowflake-connector-python dependency by @mayurinehate in #6594
  • feat(ingest/bigquery): avoid creating/deleting tables for profiling by @hsheth2 in #6578
  • fix(ingest): unify emit interface by @hsheth2 in #6592
  • fix(security): security version updates by @david-leifker in #6602
  • docs: remove Kafka Streams from documentation by @maver1ck in #6596
  • refactor(ui): Improving Kafka UI Ingestion Form, Create Domain, Create Secret Modals by @jjoyce0510 in #6588
  • fix(ingest): clarify tableau auth error messages by @hsheth2 in #6600
  • docs(graphql): fix deleteTest "Create"->"Delete" by @nickwu241 in #6574
  • fix(gms/startup): remove set -x from start.sh by @timcosta in #6589
  • feat(sql): Add SQL index on createdon field by @pedro93 in #6522
  • feat(ml model): updating view of ml model feature list by @gabe-lyons in #6576
  • fix(ingest/bigquery): ignore complex types from profiling by @treff7es in #6613
  • feat(ingest): add external url for snowflake objects by @mayurinehate in #6580
  • chore(ingest): bump and pin mypy by @hsheth2 in #6584
  • fix(ingest): only require github_info for lookml and not looker by @hsheth2 in #6608
  • docs(ingest): add airflow docs that use the PythonVirtualenvOperator by @hsheth2 in #6604
  • fix(ui) Fix double scroll in embedded list search sections by @chriscollins3456 in #6618
  • feat(ingest): print detailed GMS error messages by @djordje-mijatovic in #6519
  • Townhall agenda wikimedia by @maggiehays in #6622
  • fix(analytics): skip ListDomains if user cannot manage domains and have only one loading message by @aditya-radhakrishnan in #6624
  • feat(quickstart): add support for passing thru env vars needed by Sla… by @shirshanka in #6591
  • docs(actions): slack, teams by @shirshanka in #6632
  • fix(logging): Remove lombok as source of slf4j-api by @david-leifker in #6616
  • docs: add links from main README to slack, teams actions by @shirshanka in #6633
  • feat(ingest): Support config variable for specifying a direct privat… by @mayurinehate in #6609
  • Add AWS Postgres Iam Auth jar to GMS by @syedzoherer in #6371
  • feat(ingest/snowflake): support filtering by fully qualified schema_pattern by @mayurinehate in #6611
  • feat(ingest/kafka-connect): support MongoSourceConnector by @frsann in #6416
  • feat(graph) Add createdOn, createdActor, updatedOn, updatedActor to graph edges by @chriscollins3456 in #6615
  • refactor(ui): Making improvements to UI ingestion forms, adding MySQL, Trino, Presto, MSSQL, MariaDB forms by @jjoyce0510 in #6607
  • perf(ui-ingestion): cache on creation or deletion of ingestion sources to reduce latency by @aditya-radhakrishnan in #6647
  • feat(ingest): add dummy data source for automated testing by @anshbansal in #6550
  • docs(managed datahub): adding release notes for v0.1.70 by @anshbansal in #6655
  • feat(gms): Pluggable Authentication & Authorization Framework by @mohdsiddique in #6634
  • docs: move rfcs to separate repo by @laulpogan in #6621
  • fix(ingest): fix lingering demo-data source issues by @hsheth2 in #6659
  • feat(ingest): bigquery - Running lineage extraction after metadata extraction by @treff7es in #6653
  • fix(ingest): issue deprecation warning correctly by @hsheth2 in #6623
  • chore(ingest): remove feast-legacy by @hsheth2 in #6661
  • fix(ingest/snowflake): support domains for snowflake schema containers by @hsheth2 in #6662
  • build(deps): bump decode-uri-component from 0.2.0 to 0.2.2 in /datahub-web-react by @dependabot in #6617
  • feat(ingest/dbt): add support for latest DBT version 1.3 by @MatthieuBlais in #6651
  • docs: add languages to code highlighting by @hsheth2 in #5576
  • docs(typo) Correct typo in domains.md by @maggiehays in #6667
  • feat(gms): Enable auth-api publishing to maven by @mohdsiddique in #6671
  • fix(ingest/powerbi-report-server): deprecate unused graphql config by @daha in #6630
  • fix(docker): Fix datahub-frontend dockerfile by @jjoyce0510 in #6670
  • fix(ingest): profiling - Changing profiling defaults by @treff7es in #6640
  • feat(ci): add smoke test for domain mutation by @anshbansal in #6641
  • fix(datahub-protobuf): fix missing httpclient dependency by @shirshanka in #6672
  • feat(ingest): update snowflake docs, add simple validations by @mayurinehate in #6636
  • fix(gms): DataHub Auth API java doc fix by @mohdsiddique in #6674
  • feat(ingest): run profiler in more cardinality cases by @hsheth2 in #6397
  • docs(search) update broken youtube link by @maggiehays in #6678
  • docs(protobuf): update examples for protobuf by @david-leifker in #6681
  • feat(ingest): support knowledge links in business glossary by @mohdsiddique in #6375
  • fix(ingestion/vertica): support columns with timestamp precision by @inancdokurel in #6295
  • feat(ingest): add timestamps for snowflake objects by @mayurinehate in #6570
  • feat(onboarding): adds framework and some steps for onboarding steps UI by @aditya-radhakrishnan in #6462
  • feat(ingest): use entry point for registering transformers by @Masterchen09 in #6628
  • chore(ci): update base ingestion image requirements file by @anshbansal in #6687
  • fix(ci): reduce warnings due to deprecated action by @anshbansal in #6686
  • refactor(ui): Adding caching for users, groups, and roles by @jjoyce0510 in #6673
  • fix(ci): revert confluent kafka in base image by @anshbansal in #6690
  • fix(security): version bump to latest minor python image by @david-leifker in #6694
  • docs(ingest/salesforce): list required permissions by @orlandine in #6610
  • feat(ingest): bigquery - option to set on behalf project by @treff7es in #6660
  • ci: stop commenting test results on PR by @hsheth2 in #6700
  • fix(auth-api): Attempting to fix publish for auth-api by @jjoyce0510 in https:...
Read more

DataHub v0.9.3

01 Dec 03:23
4ca3327
Compare
Choose a tag to compare

Release Highlights

Important Release Notes

With this release, if you are using Neo4J as your graph implementation, you need to set:
GRAPH_SERVICE_DIFF_MODE_ENABLED=false

For GMS (or MAE Consumer for standalone mode).

User Experience

  • Column Level Lineage Impact Analysis is live! Read more about it here
  • You can now sort Dataset field names alphabetically - this is super handy for finding columns within wide datasets that may not have an easy-to-follow order by default

  • New - an “Explore All” button on the home page, making it easier to jump into the search experience

  • Plus! We now have a “Share” button on entity pages, making it easier for you to share DataHub links with others

  • [Community Contribution] You can now assign the same user as different owner types - thanks for the contrib, @rtekal!

  • [Community Contribution] You can now see recommendations for Recently Edited entities on the homepage! - thanks for the contrib, @CorentinDuhamel

Metadata Ingestion

  • Snowflake Automated PII Classification is here! We’re eager for feedback on the utility of this feature - check out this guide, take it for a spin, and let us know what you think!
  • NEW! dbt Cloud ingestion is ready for ya - check out the module details here
  • We’ve simplified the configs required to add stateful ingestion to an ingestion source - check out the updated docs here
  • Speaking of stateful ingestion, it’s now available with:
    • Looker & LookML ingestion sources
    • [Community Contribution] Container-level ingestion – thanks for the contrib, @wangsaisai!

Developer Experience

  • [Community Contribution] For those of you deploying DataHub with Neo4j, we now support Lineage Impact analysis via Neoj4 mulithop functionality. Thanks for the contrib, @djordje-mijatovic!
  • We’ve loosened our SQLAlchemy dependencies to support Airflow 2.3+

What's Changed

  • fix(spark-lineage): Smoke test fix + smoke test m1 support by @treff7es in #6372
  • feat(ingest): supports MCEs in domain transformer by @hsheth2 in #6364
  • feat(ingest): enable container stateful ingestion by @wangsaisai in #6343
  • build(ingest): pin mypy version by @hsheth2 in #6391
  • build: use acryl's gradle-avro-plugin by @hsheth2 in #6390
  • fix(ingest): unity - add missing date type by @ms32035 in #6385
  • fix(ingest): unity-catalog - Removing unneeded sqlalchemy dependency to fix install by @treff7es in #6379
  • feat(ingest/tableau): re-authenticate if the token expires by @hsheth2 in #6380
  • fix(ingest): use profiler config settings correctly by @hsheth2 in #6354
  • fix(ingest): handle error when query returns no columns in snowflake lineage by @mayurinehate in #6404
  • fix(ingest): fix missing snowflake lineage when table_pattern is set by @mayurinehate in #6410
  • feat(ingest): loosen sqlalchemy dep & support airflow 2.3+ by @hsheth2 in #6204
  • fix(ingest/s3): add status aspect for detected s3 datasets by @mayurinehate in #6402
  • fix(ingest/snowflake): loosen snowflake connector version requirement by @hsheth2 in #6418
  • fix(mysql): fix native data type for mysql set type by @mayurinehate in #6407
  • perf(ui): virtualized schema table rows by @stanbaker in #6287
  • fix(ui) Improve HoverEntityTooltip and truncate parent glossary nodes by @chriscollins3456 in #6417
  • feat(ingest): support incremental lineage to dbt node from external platform by @mayurinehate in #6392
  • fix(ingest): init dataset props if missing in transformer by @hsheth2 in #6429
  • fix(change-event): remove unnecessary dependencies on EntityChangeEventGeneratorRegistryFactory by @aditya-radhakrishnan in #6431
  • build(deps): bump moment-timezone from 0.5.34 to 0.5.35 in /datahub-web-react by @dependabot in #5783
  • feat(frontend): Adding support to show externalUrl and institutionalMemoryFields for MLModels by @lurecas in #6053
  • feat(model): adds properties, ownership, deprecated, institutional memory and tags as aspects for data platform instance entity by @sgomezvillamor in #5728
  • docs(ingest/airflow): clarify docs around 1.x compat by @hsheth2 in #6436
  • feat(recommendations): add last edited entities by @CorentinDuhamel in #6329
  • fix(ingest): correctly compute entity change percentage by @hsheth2 in #6438
  • docs(townhall) Updating Townhall History by @maggiehays in #6336
  • Neo4j multihop support by @djordje-mijatovic in #6104
  • fix(mae-consumer): Set proper variable expansion for JMX_OPTS and JAVA_OPTS in MAE docker by @skrydal in #6378
  • docs(ingest): move prerequisite section before the ingestion recipe example by @mayurinehate in #6341
  • fix(dataset): improve glossary term load performance for datasets by @Reilman79 in #6396
  • feat(lineage) Implement CLL impact analysis for inputFields by @chriscollins3456 in #6426
  • feat(ui) Add upgrade step to enable CLL impact analysis for existing data by @chriscollins3456 in #6427
  • Added functionality to copy fieldpath and urn of each column by @Ankit-Keshari-Vituity in #6398
  • fix(ingestion): add output converters for ODBC unsuported datatype in… by @LavinaVRovine in #6134
  • fix(ui) Fix parentNodes overfetching everywhere it's used by @chriscollins3456 in #6446
  • fix(ingest): snowflake - Fixing top query trimming in snowflake by @treff7es in #6447
  • feat(elasticsearch): Updates to elasticsearch configuration, dao, tests by @david-leifker in #6269
  • chore(ingest): fix mssql lint by @hsheth2 in #6453
  • fix(ingest): add cli info to ingestion reporter by @hsheth2 in #6451
  • fix(ui) Fix glossary side browser width fluctuating by @chriscollins3456 in #6457
  • fix(python): Fix python dependencies for doc generation by @david-leifker in #6460
  • docs(website): add homepage links by @jeffmerrick in #6458
  • build(ingest): loosen jinja2 dependency for superset by @KulykDmytro in #6433
  • fix(ingest): lowercase db name in mssql ingestion by @hsheth2 in #6448
  • fix(ingest): handle missing schema in transformer by @hsheth2 in #6445
  • feat(ingest): allow specific profiler config fields to override profile_table_level_only by @hsheth2 in #6366
  • docs(enrichment) updating enrichment landing page by @maggiehays in #6286
  • fix(home-page): remove redundant getAuthenticatedUser query by @aditya-radhakrishnan in #6464
  • feat(ingest): detect old or missing docker compose by @hsheth2 in #6466
  • feat(ingestion): powerbi # Power BI report support by @mohdsiddique in #6339
  • fix(ingest/dbt): disable incremental lineage by default by @hsheth2 in #6467
  • fix(loggin): print logging timestamp in ISO8601 format instead of jus… by @szalai1 in #6474
  • docs(ingest/trino): add example of http connect...
Read more

DataHub v0.9.2

04 Nov 22:30
4c6dd06
Compare
Choose a tag to compare

Release Highlights

This is a Bug Fix (non-scheduled) release to address the Known Issues in v0.9.1.

User Experience

  • Improvements to Nav Bar UX
  • Improvements to filtering for Related Glossary Entities and Tags (migrated to using keyword filters and fixed longstanding urn-related bug)
  • Enable providing an Ownership Type for Glossary Terms, Nodes, and Domains
  • Allow adding links without a full domain to entity (community requested)

Actions

  • Adding new Entity Change Events for DataProcessInstanceRunEvent and AssertionRunEvent

Access Management

  • Added new Metadata Privilege called "Manage Children" which permits creating and deleting Glossary Terms and Glossary Nodes inside a particular Node.

Fixes

  • Properly escape schema field urns with URN-encoded characters
  • Fix around the visibility of deleting a Term Group with children
  • Properly show personal access token duration beyond 1 month during creation

What's Changed

New Contributors

Full Changelog: v0.9.1...v0.9.2

DataHub v0.9.1

01 Nov 15:34
4b31204
Compare
Choose a tag to compare

Release Highlights

Known Issues

  • In embedded search experiences (Glossary Terms, Domains, Lineage), filters can become "locked" in place once selected. This is addressed in v0.9.2

User Experience

  • Column-level Impact Analysis is here! You can now see the full end-to-end list of column dependencies; watch the demo here

  • When creating a Glossary Term from the UI, you can now add the description in the same step

  • We now support adding Domains to Glossary Terms

  • You can now preview Entity Names and Types in browser tabs

  • Login with SSO button on the login page.

Bug Fixes

  • Assertions Tab functionality is restored
  • SSO: Continuous login loop bug reported when the session cookie size exceed 4096 characters has been address.
  • Ingestion scheduler for > 30 ingestion sources is now fixed. Previously there was a bug causing certain ingestion to become unscheduled.

Metadata Ingestion

  • New Ingestion Source: Databricks Unity Catalog - check out the docs here
  • Tableau: Column-level lineage and Stateful Ingestion are now supported
  • LookML: Improved column-level lineage
  • BigQuery: we have promoted bigqery-beta to bigquery
  • Snowflake: Stateful Ingestion now supports deleting Containers

DataHub Docs Site

We continue to push improved feature guides to the DataHub docs site, including:

What's Changed

Read more

DataHub v0.9.0

13 Oct 11:26
0427122
Compare
Choose a tag to compare

Release Highlights

Known Issues

Assertions Tab UX bug

This release introduced a bug in the assertions tab causing assertion results to be hidden. This will be addressed in the subsequent release.

Release Notes

We’re excited to announce the release of DataHub v0.9.0!

This minor release includes an upgrade to Java 11 and surfacing Column-Level Lineage support within the DataHub UI.

Here are some additional highlights:

User Experience

  • Column-Level Lineage is now surfaced within the DataHub UI!
  • Advanced Search now supports searching by Column-level details (i.e. name, description, tag, etc.), as well as complex AND/OR statements. For example:
    • Show results that match any filters
    • Show results that match all filters
    • Owner is either of Shannon or Mark
    • Oner is not Shannon nor Mark
    • Try it in demo here
  • You can now add invite users and assign them to a default DataHub Role
  • Improvements to site performance during the Browse experience

Developer Experience

  • DataHub has been upgraded to Java 11!
  • Improved tracking of GraphQL errors for bug resolution
  • CorpUser and CorpGroup are now available via the Python SDK

Metadata Ingestion

  • Automatically extract Column-Level Lineage from Snowflake & Looker sources
  • dbt Meta Mapping is now supported at the Column Level - this means you can automatically extract Tags and Glossary Terms from your dbt model and surface them in DataHub

What's Changed

  • fix(ingest): bigquery-beta - Getting datasets with biquery client by @treff7es in #6039
  • feat(roles): add ability to invite users into a role by @aditya-radhakrishnan in #6015
  • refactor(java11) - convert most modules to java 11 by @leifker in #5836
  • docs(readme): Fixing broken article link by @davrax in #6042
  • refactor(ingest): streamline pydantic configs by @hsheth2 in #6011
  • docs(ingest): add example of dbt column_meta_mapping by @hsheth2 in #6038
  • refactor(ingest): use aspect map in transformers by @hsheth2 in #6040
  • feat(ui): Adding placeholder entity for DataPlatform by @jjoyce0510 in #6045
  • feat(ingest): implement compression for CheckpointState by @alexey-kravtsov in #6007
  • feat(advanced-search): adding select value modal by @gabe-lyons in #6026
  • fix(ingest): bigquery-beta - Additional fixes for Bigquery beta by @treff7es in #6051
  • feat(advanced search): adding advanced search filter component & prereqs for it by @gabe-lyons in #6055
  • docs(ingest): add path spec examples for s3 by @mayurinehate in #6050
  • fix(deps): metadata-io - remove parquet dependency by @shirshanka in #6046
  • fix(ingestion): Tableau test case execution fix by @mohdsiddique in #6005
  • feat(ingest): list referenced env variables in recipe by @hsheth2 in #6043
  • fix(ingest): compat with mypy 0.981 by @hsheth2 in #6056
  • fix(elasticsearch_index): create datahub_usage_event index where datahub_analytics_enabled set to false by @GyuhoonK in #5974
  • docs(approval workflows): adding approval workflow docs by @gabe-lyons in #5896
  • feat(retention): disable applying retention on bootstrap by @anshbansal in #6066
  • fix(ingest): correct tableau browse paths by @hsheth2 in #6064
  • fix(ingest): bigquery-beta - handling complex types properly by @treff7es in #6062
  • docs: create SECURITY.md by @laulpogan in #6069
  • fix(containers): show soft deleted status of containers by @gabe-lyons in #6072
  • docs(ingest): clarify bigquery-beta multiproject setup by @hsheth2 in #6071
  • chore(setup): change defaults for partitions by @anshbansal in #6074
  • refactor(browse): Improving Browse Feature Performance by @jjoyce0510 in #6073
  • feat(ingest): add column-level lineage support for snowflake by @mayurinehate in #6034
  • feat(ingest): looker - support for simple column level lineage by @shirshanka in #6084
  • fix(elastic-setup) Fixing env var logic by @pedro93 in #6079
  • Revert "chore(setup): change defaults for partitions (#6074)" by @pedro93 in #6086
  • fix(mae-consumer): fix regression on base64 encoding by @codesorcery in #6061
  • fix(elasticsearch) Analytics indices creation on AWS ES by @tomas-kubin in #5502
  • docs(ingest): note that Athena doesn't support lineage by @hsheth2 in #6081
  • fix(ingest): alias for mssql-odbc source by @hsheth2 in #6080
  • fix(ingest): presto-on-hive - Setting display name properly by @treff7es in #6065
  • fix(schema filter): fix schema infinite rerender by @gabe-lyons in #6082
  • feat(monitoring): track graphql errors in metrics by @szalai1 in #6087
  • feat(advanced search): Add component to show all advanced search filters & add new filter by @gabe-lyons in #6058
  • fix(ingest): bump lkml version by @hsheth2 in #6091
  • fix(ingest): lookml - extract column correctly by @shirshanka in #6093
  • feat(retention): change default policy, add API to apply retention by @anshbansal in #6088
  • fix(lineage): fix missed casing in lineage registry by @gabe-lyons in #6078
  • fix(ingest): bigquery-beta - Lowering a bit memory footprint of bigquery usage by @treff7es in #6095
  • feat(ingest): remove hardcoded env variable default for cli version by @shirshanka in #6075
  • docs: add information about mapping ports for datahub-gms by @shirshanka in #6092
  • chore(deps): upgrade graphql-java deps to 19.0 by @shirshanka in #6099
  • chore(deps): upgrade neo4j to 4.4.x by @shirshanka in #6101
  • feat(docs): Improve documentation about Search by @szalai1 in #5889
  • feat(ingest): add async option to ingest proposal endpoint by @RyanHolstien in #6097
  • chore(deps): upgrade opentelemetry dependencies by @shirshanka in #6100
  • refactor(recommendations): Bump default max recommendations count for Platforms by @jjoyce0510 in #6113
  • feat(ingest): add Sandbox support by @rgudic in #6105
  • fix(mae): use JAVA_TOOL_OPTIONS instead of JDK_JAVA_OPTIONS by @szalai1 in #6114
  • feat(advanced-search): Complete Advanced Search: backend changes & tying UI together by @gabe-lyons in #6068
  • feat(search): improved search snippet FE logic by @gabe-lyons in #6109
  • feat(ingest): add CorpUser and CorpGroup to the Python SDK by @ttaubermarshall-stripe in #5930
  • fix(ingest): hide deprecated path_spec option from config by @hsheth2 in #5944
  • feat(posts): add posts feature to DataHub by @aditya-radhakrishnan in #6110
  • fix(ingest): remove unused mysql golden file by @hsheth2 in #6106
  • fix(ingestion): fix percent change computation in stale_entity_removal by @rslanka in #6121
  • refactor(ingest): use pydantic utilities for NamingPattern by @hsheth2 in #6013
  • fix(ingest): presto-on-hive - not failing on Hive type parsing error by @treff7es in #6118
  • fix(ingest): ignore usage and operation for snowflake datasets withou… by @mayurinehate in https://github.com...
Read more

DataHub v0.8.45

23 Sep 22:26
af6a423
Compare
Choose a tag to compare

Release Highlights

User Experience

  • Allow Term Groups to be the target of permissions
  • Customize browser favicon via REACT_APP_FAVICON_URL param
  • Some UX improvements for charts & dashboards entity pages to reduce confusion
  • Performance improvements on the lineage visualization
  • Search bar for dataset schema tab

Developer Experience

  • Add rest endpoint for restoring indices of a single entity (/aspects?action=restoreIndices)
  • Create new platform instances via CLI
  • Improved impact analysis performance due to an added caching layer
  • Support for Patch as seen in August 2022 town hall.

Metadata Ingestion

  • Introduces bigquery-beta source
  • Looker source memory usage dramatically reduced
  • Report memory usage during ingestion
  • Improve Tableau lineage
  • Usage statistics for Tableau
  • LookML can automatically clone your Git repository. LookML is now supported in UI-based ingestion.
  • dbt supports column-level meta mappings
  • Support for deletion & rollback of time series data
  • Upgrade to browse path forms

[see next page for list of commits]

What's Changed

  • fix(privileges) Add Term Groups as targetable entities for privileges by @chriscollins3456 in #5806
  • fix(javadocs): remove ampersand from pdl causing issue in doc generation for openapi by @RyanHolstien in #5808
  • chore(ingest): remove archived docs by @hsheth2 in #5793
  • feat(ingest): add rewrite option for metadata file check by @hsheth2 in #5763
  • feat(cli): add support for sampled reporting to keep logs manageable by @shirshanka in #5800
  • docs(refactor): Refactor Tags Feature Guide by @maggiehays in #5781
  • docs(feature-guide) Impact Analysis by @maggiehays in #5765
  • feat(theming): set custom favicon via env var by @gabe-lyons in #5810
  • test(smoke-test): check debug arg in executor requests by @hsheth2 in #5811
  • fix(ingest): bigquery-beta - Fixing dependencies by @treff7es in #5814
  • feat(ingest): looker - reduce memory requirements by @shirshanka in #5815
  • feat(restore-indices): add endpoint for restore indices, add basic check for graph by @anshbansal in #5805
  • fix(frontend): download node only when USE_SYSTEM_NODE is set to false by @szalai1 in #5817
  • doc: Make Airflow link clickable by @daha in #5803
  • feat(ingest):looker - reduce mem usage, misc reporting improvements by @shirshanka in #5823
  • feat(model, ingest): populate sizeInBytes in snowflake, fall back to table level profiling for large tables by @mayurinehate in #5774
  • chore(docker): make curl/wget commands quiet in docker by @hsheth2 in #5819
  • chore: cleanup references to the old ember app by @hsheth2 in #5797
  • fix(ingest): spark-lineage: Adding additional debug logs to spark lineage by @treff7es in #5772
  • fix(docker): add missing port mappings for non-neo4j quickstart by @hsheth2 in #5799
  • fix(ingest): looker - report dashboard scanning correctly by @shirshanka in #5829
  • feat(cli): report memory usage during ingest by @shirshanka in #5828
  • fix(ingest): presto-on-hive - Fixing mysql filter by @treff7es in #5825
  • docs(big query): add needed delete permission to list by @maaaikoool in #5826
  • chore(ingest): set isort combine_as_imports by @hsheth2 in #5820
  • fix(ingest): use AwsConnectionConfig instead of AwsSourceConfig by @hsheth2 in #5813
  • feat(ingest): looker test connection by @hsheth2 in #5768
  • feat(ingest): improve tableau lineage, workbooks query, fix pagination by @mayurinehate in #5756
  • fix(ingest): profiling - memory usage reduction by @shirshanka in #5830
  • feat(monitoring): enable JMX and OTEL for frontend pods by @szalai1 in #5834
  • fix(standalone-consumers): Exclude Solr from spring boot application config & make them run on M1 by @pedro93 in #5827
  • feat(hooks): Add toggle for enabling/disabling platform event hook by @pedro93 in #5840
  • feat(transformers): Add semantics & transform_aspect support in transformers by @mohdsiddique in #5514
  • feat(ci): auto label PRs by @anshbansal in #5839
  • feat(inputs): improving clarity on inputs for dashboards by @gabe-lyons in #5841
  • feat(ingest): add utility for converting MCEs to MCPs by @hsheth2 in #5812
  • chore(smoke): add additional log in smoke test by @hsheth2 in #5842
  • fix(ingest): fix doc generation import ordering issue with postgres by @hsheth2 in #5846
  • feat(docker) Adds Sasl support to base ingestion image by @pedro93 in #5855
  • fix(graphql) Fix null pointer exception when fetching entity aspect via graphql by @chriscollins3456 in #5857
  • fix(ingest): reporting should work with timestamps by @shirshanka in #5860
  • fix(patch-entity-registry): Remove exception for entities with key aspects. by @pghazanfari in #5831
  • fix(browse): Fixing browse path to remove requirement for simple name suffix by @jjoyce0510 in #5634
  • fix(ingest): bigquery - Fixing sharded regexp pattern config by @treff7es in #5861
  • perf(elastic search graph service): improving perf of lineage query by @gabe-lyons in #5858
  • chore(ingest): remove outdated GE compatibility hack by @hsheth2 in #5862
  • ci(ingest): test with python 3.10 by @hsheth2 in #5863
  • docs: improve doc generation, add better docs for snowflake, looker by @shirshanka in #5867
  • feat(ci): tweak auto-label globs by @anshbansal in #5849
  • fix(m1): preflight works with brew postgres@14 by @shirshanka in #5868
  • feat(smoke-tests) Make smoke tests use standalone consumers by @pedro93 in #5856
  • fix(domains): adding 10,000+ text when domain list caps out elastic count capacity by @gabe-lyons in #5838
  • docs(notifications): slack notification docs by @anshbansal in #5871
  • feat(docker): Update Dockerfiles to use java 11 runtime by @pedro93 in #5853
  • Scroll issue on Glossary related entity page by @Ankit-Keshari-Vituity in #5804
  • fix(ingest): include urns in rest sink failure logs by @hsheth2 in #5848
  • fix(docker): Bumps JRE 11 to latest by @pedro93 in #5875
  • feat(ingest): support reading config file from stdin by @hsheth2 in #5847
  • fix(ingest): remove dbt delete_tests_as_datasets option by @hsheth2 in #5865
  • fix(ingest): avrogen handling for missing fields with default values by @hsheth2 in #5844
  • refactor(ingest): add ALL_ENV_TYPES constant by @hsheth2 in #5866
  • feat(cli) Make docker compose quiet by @pedro93 in #5869
  • feat(datahub-protobuf): add support for shadow jar, publish by @shirshanka in #5882
  • feat(jars): better jar versioning for datahub-client, spark-lineage and protobuf by @shirshanka in #5883
  • fix(dev-docker): set right context for frontend dev build by @szalai1 in #5885
  • fix(ci): fix jar release action dependencies by @shirshanka in #5884
  • feat(schema) Add search filter to Schema tab by @chriscollins3456 in #5845
  • feat(ui) Add ...
Read more