DataHub v0.9.3
Release Highlights
Important Release Notes
With this release, if you are using Neo4J as your graph implementation, you need to set:
GRAPH_SERVICE_DIFF_MODE_ENABLED=false
For GMS (or MAE Consumer for standalone mode).
User Experience
- Column Level Lineage Impact Analysis is live! Read more about it here
- You can now sort Dataset field names alphabetically - this is super handy for finding columns within wide datasets that may not have an easy-to-follow order by default
- New - an “Explore All” button on the home page, making it easier to jump into the search experience
- Plus! We now have a “Share” button on entity pages, making it easier for you to share DataHub links with others
-
[Community Contribution] You can now assign the same user as different owner types - thanks for the contrib, @rtekal!
-
[Community Contribution] You can now see recommendations for Recently Edited entities on the homepage! - thanks for the contrib, @CorentinDuhamel
Metadata Ingestion
- Snowflake Automated PII Classification is here! We’re eager for feedback on the utility of this feature - check out this guide, take it for a spin, and let us know what you think!
- NEW! dbt Cloud ingestion is ready for ya - check out the module details here
- We’ve simplified the configs required to add stateful ingestion to an ingestion source - check out the updated docs here
- Speaking of stateful ingestion, it’s now available with:
- Looker & LookML ingestion sources
- [Community Contribution] Container-level ingestion – thanks for the contrib, @wangsaisai!
Developer Experience
- [Community Contribution] For those of you deploying DataHub with Neo4j, we now support Lineage Impact analysis via Neoj4 mulithop functionality. Thanks for the contrib, @djordje-mijatovic!
- We’ve loosened our SQLAlchemy dependencies to support Airflow 2.3+
What's Changed
- fix(spark-lineage): Smoke test fix + smoke test m1 support by @treff7es in #6372
- feat(ingest): supports MCEs in domain transformer by @hsheth2 in #6364
- feat(ingest): enable container stateful ingestion by @wangsaisai in #6343
- build(ingest): pin mypy version by @hsheth2 in #6391
- build: use acryl's gradle-avro-plugin by @hsheth2 in #6390
- fix(ingest): unity - add missing date type by @ms32035 in #6385
- fix(ingest): unity-catalog - Removing unneeded sqlalchemy dependency to fix install by @treff7es in #6379
- feat(ingest/tableau): re-authenticate if the token expires by @hsheth2 in #6380
- fix(ingest): use profiler config settings correctly by @hsheth2 in #6354
- fix(ingest): handle error when query returns no columns in snowflake lineage by @mayurinehate in #6404
- fix(ingest): fix missing snowflake lineage when table_pattern is set by @mayurinehate in #6410
- feat(ingest): loosen sqlalchemy dep & support airflow 2.3+ by @hsheth2 in #6204
- fix(ingest/s3): add status aspect for detected s3 datasets by @mayurinehate in #6402
- fix(ingest/snowflake): loosen snowflake connector version requirement by @hsheth2 in #6418
- fix(mysql): fix native data type for mysql set type by @mayurinehate in #6407
- perf(ui): virtualized schema table rows by @stanbaker in #6287
- fix(ui) Improve HoverEntityTooltip and truncate parent glossary nodes by @chriscollins3456 in #6417
- feat(ingest): support incremental lineage to dbt node from external platform by @mayurinehate in #6392
- fix(ingest): init dataset props if missing in transformer by @hsheth2 in #6429
- fix(change-event): remove unnecessary dependencies on EntityChangeEventGeneratorRegistryFactory by @aditya-radhakrishnan in #6431
- build(deps): bump moment-timezone from 0.5.34 to 0.5.35 in /datahub-web-react by @dependabot in #5783
- feat(frontend): Adding support to show externalUrl and institutionalMemoryFields for MLModels by @lurecas in #6053
- feat(model): adds properties, ownership, deprecated, institutional memory and tags as aspects for data platform instance entity by @sgomezvillamor in #5728
- docs(ingest/airflow): clarify docs around 1.x compat by @hsheth2 in #6436
- feat(recommendations): add last edited entities by @CorentinDuhamel in #6329
- fix(ingest): correctly compute entity change percentage by @hsheth2 in #6438
- docs(townhall) Updating Townhall History by @maggiehays in #6336
- Neo4j multihop support by @djordje-mijatovic in #6104
- fix(mae-consumer): Set proper variable expansion for JMX_OPTS and JAVA_OPTS in MAE docker by @skrydal in #6378
- docs(ingest): move prerequisite section before the ingestion recipe example by @mayurinehate in #6341
- fix(dataset): improve glossary term load performance for datasets by @Reilman79 in #6396
- feat(lineage) Implement CLL impact analysis for inputFields by @chriscollins3456 in #6426
- feat(ui) Add upgrade step to enable CLL impact analysis for existing data by @chriscollins3456 in #6427
- Added functionality to copy fieldpath and urn of each column by @Ankit-Keshari-Vituity in #6398
- fix(ingestion): add output converters for ODBC unsuported datatype in… by @LavinaVRovine in #6134
- fix(ui) Fix parentNodes overfetching everywhere it's used by @chriscollins3456 in #6446
- fix(ingest): snowflake - Fixing top query trimming in snowflake by @treff7es in #6447
- feat(elasticsearch): Updates to elasticsearch configuration, dao, tests by @david-leifker in #6269
- chore(ingest): fix mssql lint by @hsheth2 in #6453
- fix(ingest): add cli info to ingestion reporter by @hsheth2 in #6451
- fix(ui) Fix glossary side browser width fluctuating by @chriscollins3456 in #6457
- fix(python): Fix python dependencies for doc generation by @david-leifker in #6460
- docs(website): add homepage links by @jeffmerrick in #6458
- build(ingest): loosen jinja2 dependency for superset by @KulykDmytro in #6433
- fix(ingest): lowercase db name in mssql ingestion by @hsheth2 in #6448
- fix(ingest): handle missing schema in transformer by @hsheth2 in #6445
- feat(ingest): allow specific profiler config fields to override profile_table_level_only by @hsheth2 in #6366
- docs(enrichment) updating enrichment landing page by @maggiehays in #6286
- fix(home-page): remove redundant getAuthenticatedUser query by @aditya-radhakrishnan in #6464
- feat(ingest): detect old or missing docker compose by @hsheth2 in #6466
- feat(ingestion): powerbi # Power BI report support by @mohdsiddique in #6339
- fix(ingest/dbt): disable incremental lineage by default by @hsheth2 in #6467
- fix(loggin): print logging timestamp in ISO8601 format instead of jus… by @szalai1 in #6474
- docs(ingest/trino): add example of http connection by @hsheth2 in #6461
- refactor(ui): Simplify base glossary page toolbar by @jjoyce0510 in #6469
- revert: mssql - lowercase db name in mssql ingestion by @hsheth2 in #6481
- build: remove
Jinja2
dependency fromsuperset
by @KulykDmytro in #6476 - fix(roles): allows role service to unassign roles by @aditya-radhakrishnan in #6434
- fix(docs): update the Okta and Azure AD docs to clarify the point of ingesting users by @aditya-radhakrishnan in #6465
- Highlighted the description text on search by @Ankit-Keshari-Vituity in #6400
- Ownership type is deprecated by @jakobhanna in #6477
- feat(ui): Adding Explore all button on home page search by @jjoyce0510 in #6468
- fix(ingest): fix athena and GE lint errors by @hsheth2 in #6482
- refactor(ingest): simplify stateful ingestion config by @hsheth2 in #6454
- docs(ingest/tableau): required permissions + doc formatting by @hsheth2 in #6484
- feat(ingest): presto - Adding presto source by @treff7es in #6459
- fix(ui) Fix lineage graph rendering with duplicate nodes by @chriscollins3456 in #6480
- docs(cypress): adding local cypress running instructions by @gabe-lyons in #6492
- fix(managed ingestion): updating snowflake schema pattern placeholder text by @gabe-lyons in #6493
- feat(ui): Adding External URLs to search preview for Dataset, Container, DataFlow, DataJob by @jjoyce0510 in #6496
- fix(ingest/tableau): check
tableName
existence on datasource response by @lustefaniak in #6478 - fix(build): do not use neo4j for dev by @anshbansal in #6501
- docs(gms): update search example, do not use deprecated clause by @mayurinehate in #6340
- feat(ingest): add stateful ingestion support to looker and lookml source by @mayurinehate in #6443
- feat(ingest): dbt cloud integration by @hsheth2 in #6323
- fix(tableau): extra defensive error-handling by @hsheth2 in #6503
- fix(ingest): remove redundant types by @hsheth2 in #6486
- fix(ingest/snowflake): fix lineage allow/deny pattern typo by @hsheth2 in #6506
- fix(docs): add missing docs for 0.9.1 by @anshbansal in #6515
- feat(ui): Introducing Share Button on Entity Pages by @jjoyce0510 in #6450
- Added I AM auth for Opensearch by @syedzoherer in #6370
- fix(ingest): correctly handle transformer patch semantics by @hsheth2 in #6505
- feat(ingest/csv-enrich): handle BOM character by @hsheth2 in #6509
- feat(airflow): support kafka hook in the airflow plugin by @hsheth2 in #6508
- fix(patch): cover case where patch is used to create an entity by @RyanHolstien in #6504
- build(deps): bump loader-utils from 2.0.0 to 2.0.4 in /docs-website by @dependabot in #6452
- fix(ingest): add alias for bigquery-beta by @hsheth2 in #6521
- feat(ingest): add config for ingesting delta table without files by @mayurinehate in #6403
- fix(ingest): fix typo in unique count profiling by @mayurinehate in #6517
- fix(ui) Fix roles not always displaying on page load by @chriscollins3456 in #6524
- feat(datahub-upgrade): Added msk IAM auth as a build dependency. by @pghazanfari in #6439
- feat(kafka-setup): Added support for MSK IAM authentication. by @pghazanfari in #6435
- Added sorting method to fieldpath column of schema tab by @Ankit-Keshari-Vituity in #6510
- fix(ingest): make kafka emit callback optional by @hsheth2 in #6525
- feat(ingest): automated term classification for snowflake by @mayurinehate in #6376
- fix(ingest): fix typo in urn utilities by @bskim45 in #6520
- fix(ingest): fix trino properties and tests by @mayurinehate in #6518
- fix(build): remove warnings in github actions by @anshbansal in #6512
- fix(security): Bump ranger plugin commons dependency by @pedro93 in #6535
- fix(ingest): kafka - properly picking doc from union type by @treff7es in #6472
- feat(ingest): disable stateful_ingestion fail-safe by default by @hsheth2 in #6537
- fix(ingest/airflow): respect enabled flag in airflow plugin by @hsheth2 in #6528
- refactor(ui): Adding apollo caching to manage domains page. by @jjoyce0510 in #6494
- refactor(recommendations): Filtering for specific entity types in recommendations by @jjoyce0510 in #6538
- fix(ingest): handle groupby custom label case by @phongvu99 in #6456
- build(ingest): support flake8 6.0.0 by @hsheth2 in #6540
- fix(ui) Wrap schema field descriptions to allow read more/less always by @chriscollins3456 in #6541
- fix(ui) Display duplicate nodes in lineage viz by @chriscollins3456 in #6526
- style(ingest): fix lint checks for superset by @mayurinehate in #6548
- fix(envs): remove DATASET_ENABLE_SCSI stale env var by @szalai1 in #6546
- feat(upgrade): Make restore from backup logic generic by @pedro93 in #6536
- feat(ingest): refractor classification mixin, support new infotypes by @mayurinehate in #6545
- fix(ingest): bigquery - missing sqlalchemy dep and row count fix by @treff7es in #6553
- fix(ingest): bigquery - Fixing querying non-date partition columns in profiling by @treff7es in #6554
- feat(ingest): powerbi # scan all accessible workspaces by @looppi in #6441
- fix(ingest): bigquery - Setting partition id for profiling data and project_id fix by @treff7es in #6558
- fix(gms): fix java.lang.NoClassDefFoundError: com/sun/syndication/io/FeedException for apache-ranger authorizer by @mohdsiddique in #6560
- feat(ui): Add Test Connection Support for BigQuery ingestion source by @jjoyce0510 in #6543
- fix(contrib): Update base python image for es7-upgrade by @david-leifker in #6562
- fix(ingest): handle docker-compose version
v
prefix by @hsheth2 in #6561 - docs(ingest/kafka): add field descriptions of kafka-related configs to pydantic by @mmmeeedddsss in #6559
- feat(platform): Support @searchable + @relationship Annotations for Timeseries Aspects by @jjoyce0510 in #6455
- feat(models): Adding 'created', 'lastModified' timestamp to Dataset, Container, Dashboard, Chart by @jjoyce0510 in #6527
- fix(ingest): set DataProcessInstance created ts to start time by @hsheth2 in #6566
- feat(docs-site): fast reload command for markdown edits by @hsheth2 in #6539
- fix(ingest): graceful error handling in snowflake classification by @mayurinehate in #6568
- ci(label): add smoke test label by @anshbansal in #6571
- fix(ingest): fix types changes in clickhouse sqlalchemy 0.2.3 by @mayurinehate in #6572
- fix(tests): Misc updates for tests, auth log level, and quickstart by @david-leifker in #6491
- feat(ui) Add owner to dataset - allow same owner with a different type by @rtekal in #6463
- fix(verions): Update opentelemetry and updates from pr-5239 by @david-leifker in #6563
- refactor(airflow): remove verbose log from airflow plugin by @bskim45 in #6516
- feat(cli): remove inconsistency check command by @anshbansal in #6569
- fix(ingest): restrict snowflake's sqlalchemy dep by @hsheth2 in #6579
- docs(notes): add release notes for v0.1.69 managed DataHub by @anshbansal in #6573
- fix(test): fix delete smoke test by @david-leifker in #6585
New Contributors
- @wangsaisai made their first contribution in #6343
- @stanbaker made their first contribution in #6287
- @lurecas made their first contribution in #6053
- @Reilman79 made their first contribution in #6396
- @LavinaVRovine made their first contribution in #6134
- @KulykDmytro made their first contribution in #6433
- @jakobhanna made their first contribution in #6477
- @lustefaniak made their first contribution in #6478
- @syedzoherer made their first contribution in #6370
- @phongvu99 made their first contribution in #6456
- @looppi made their first contribution in #6441
- @rtekal made their first contribution in #6463
Full Changelog: v0.9.2...v0.9.3