Skip to content

Commit

Permalink
Merge branch 'master+robust-transformers+unified' of github.com:acryl…
Browse files Browse the repository at this point in the history
…data/datahub-fork into master+robust-transformers+unified
  • Loading branch information
siddiquebagwan-gslab committed Sep 5, 2022
2 parents dc858ae + e0c7c9f commit 99c24bf
Show file tree
Hide file tree
Showing 29 changed files with 704 additions and 531 deletions.
4 changes: 2 additions & 2 deletions datahub-frontend/conf/application.conf
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ ui.new.browse.dataset = true

# React App Authentication
# ~~~~~
# React currently supports OIDC SSO + self-configured JAAS (same as Ember) for authentication. Below you can find the supported configurations for
# React currently supports OIDC SSO + self-configured JAAS for authentication. Below you can find the supported configurations for
# each mechanism.
#
# Required OIDC Configuration Values:
Expand Down Expand Up @@ -232,4 +232,4 @@ metadataService.auth.enabled=${?METADATA_SERVICE_AUTH_ENABLED}
systemClientId = __datahub_system # Change this to something random.
systemClientSecret = JohnSnowKnowsNothing # Along with this.
systemClientId=${?DATAHUB_SYSTEM_CLIENT_ID}
systemClientSecret=${?DATAHUB_SYSTEM_CLIENT_SECRET}
systemClientSecret=${?DATAHUB_SYSTEM_CLIENT_SECRET}
6 changes: 1 addition & 5 deletions datahub-frontend/play.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,7 @@ configurations {
}

dependencies {
if (project.hasProperty('enableEmber') && project.getProperty('enableEmber').toBoolean()) {
assets project(path: ':datahub-web', configuration: 'assets')
} else {
assets project(path: ':datahub-web-react', configuration: 'assets')
}
assets project(path: ':datahub-web-react', configuration: 'assets')

constraints {
play('org.springframework:spring-core:5.2.3.RELEASE')
Expand Down
9 changes: 4 additions & 5 deletions datahub-web-react/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,14 @@ title: "datahub-web-react"
# DataHub React App

## About
This module contains a React version of the DataHub UI. This is now the production version of the DataHub client experience.
Notice that this is a completely separate frontend experience from the legacy Ember app and will remain so as it evolves.
This module contains a React application that serves as the DataHub UI.

Feel free to take a look around, deploy, and contribute.

For details about the motivation please see [this RFC](../docs/rfc/active/2055-react-app/README.md).

## Functional Goals
The initial milestone for the app was to achieve functional parity with the existing Ember app. This meant supporting
The initial milestone for the app was to achieve functional parity with the previous Ember app. This meant supporting

- Dataset Profiles, Search, Browse Experience
- User Profiles, Search
Expand All @@ -22,8 +21,8 @@ The initial milestone for the app was to achieve functional parity with the exis
This has since been achieved. The new set of functional goals are reflected in the latest version of the [DataHub Roadmap](../docs/roadmap.md).

## Design Goals
In building out the client experience, we intend to leverage learnings from the Ember app and incorporate feedback gathered
from organizations operating DataHub. Two themes have emerged to serve as guideposts:
In building out the client experience, we intend to leverage learnings from the previous Ember-based app and incorporate feedback gathered
from organizations operating DataHub. Two themes have emerged to serve as guideposts:

1. **Configurability**: The client experience should be configurable, such that deploying organizations can tailor certain
aspects to their needs. This includes theme / styling configurability, showing and hiding specific functionality,
Expand Down
3 changes: 1 addition & 2 deletions docker/datahub-frontend/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,13 @@ FROM --platform=$BUILDPLATFORM node:16.13.0-alpine3.14 AS prod-build
RUN apk --no-cache --update-cache --available upgrade \
&& apk --no-cache add perl openjdk8

ARG ENABLE_EMBER="false"
ARG USE_SYSTEM_NODE="true"
ENV CI=true
ENV GRADLE_OPTS="-Xms256m -Xmx512m"
COPY . datahub-src
RUN cd datahub-src \
&& ./gradlew :datahub-web-react:build -x test -x yarnTest -x yarnLint \
&& ./gradlew :datahub-frontend:dist -PenableEmber=${ENABLE_EMBER} -PuseSystemNode=${USE_SYSTEM_NODE} -x test -x yarnTest -x yarnLint \
&& ./gradlew :datahub-frontend:dist -PuseSystemNode=${USE_SYSTEM_NODE} -x test -x yarnTest -x yarnLint \
&& cp datahub-frontend/build/distributions/datahub-frontend.zip ../datahub-frontend.zip \
&& cd .. && rm -rf datahub-src && unzip datahub-frontend.zip

Expand Down
7 changes: 1 addition & 6 deletions docker/datahub-frontend/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,6 @@ If using React app:
http://localhost:9002
```

If using legacy Ember app:
```
http://localhost:9001
```

You can sign in with `datahub` as username and password.

## Build instructions
Expand All @@ -27,4 +22,4 @@ If you want to build the `datahub-frontend` Docker image yourself, you can run t

`DOCKER_BUILDKIT=1 docker build -t your_datahub_frontend -f ./docker/datahub-frontend/Dockerfile .`

Please note the final `.` and that the tag `your_datahub_frontend` is determined by you.
Please note the final `.` and that the tag `your_datahub_frontend` is determined by you.
12 changes: 6 additions & 6 deletions docker/datahub-gms/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ RUN apk --no-cache --update-cache --available upgrade \
echo >&2 "Unsupported architecture $(arch)" ; exit 1; \
fi \
&& apk --no-cache add tar curl openjdk8-jre bash coreutils gcompat \
&& curl https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-runner/9.4.46.v20220331/jetty-runner-9.4.46.v20220331.jar --output jetty-runner.jar \
&& curl https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-jmx/9.4.46.v20220331/jetty-jmx-9.4.46.v20220331.jar --output jetty-jmx.jar \
&& curl https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-util/9.4.46.v20220331/jetty-util-9.4.46.v20220331.jar --output jetty-util.jar \
&& wget https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v1.4.1/opentelemetry-javaagent-all.jar \
&& wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.16.1/jmx_prometheus_javaagent-0.16.1.jar -O jmx_prometheus_javaagent.jar \
&& curl -sS https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-runner/9.4.46.v20220331/jetty-runner-9.4.46.v20220331.jar --output jetty-runner.jar \
&& curl -sS https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-jmx/9.4.46.v20220331/jetty-jmx-9.4.46.v20220331.jar --output jetty-jmx.jar \
&& curl -sS https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-util/9.4.46.v20220331/jetty-util-9.4.46.v20220331.jar --output jetty-util.jar \
&& wget --no-verbose https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v1.4.1/opentelemetry-javaagent-all.jar \
&& wget --no-verbose https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.16.1/jmx_prometheus_javaagent-0.16.1.jar -O jmx_prometheus_javaagent.jar \
&& cp /usr/lib/jvm/java-1.8-openjdk/jre/lib/security/cacerts /tmp/kafka.client.truststore.jks \
&& curl -L https://github.com/treff7es/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-${DOCKERIZE_ARCH}-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
&& curl -sS -L https://github.com/treff7es/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-${DOCKERIZE_ARCH}-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv

FROM --platform=$BUILDPLATFORM alpine:3.14 AS prod-build

Expand Down
5 changes: 0 additions & 5 deletions docker/datahub-ingestion/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,6 @@
ARG APP_ENV=prod

FROM acryldata/datahub-ingestion-base as base
# ENV DOCKERIZE_VERSION v0.6.1
# RUN apk --no-cache add curl tar \
# && curl https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-runner/9.4.20.v20190813/jetty-runner-9.4.20.v20190813.jar --output jetty-runner.jar \
# && curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv


FROM openjdk:8 as prod-build
COPY . /datahub-src
Expand Down
6 changes: 3 additions & 3 deletions docker/datahub-mae-consumer/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ ARG APP_ENV=prod
FROM adoptopenjdk/openjdk8:alpine-jre as base
ENV DOCKERIZE_VERSION v0.6.1
RUN apk --no-cache add curl tar wget bash coreutils \
&& wget https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v1.4.1/opentelemetry-javaagent-all.jar \
&& wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.16.1/jmx_prometheus_javaagent-0.16.1.jar -O jmx_prometheus_javaagent.jar \
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
&& wget --no-verbose https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v1.4.1/opentelemetry-javaagent-all.jar \
&& wget --no-verbose https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.16.1/jmx_prometheus_javaagent-0.16.1.jar -O jmx_prometheus_javaagent.jar \
&& curl -sSL https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv

FROM adoptopenjdk/openjdk8:alpine-slim as prod-build
RUN apk --no-cache add openjdk8-jre perl
Expand Down
6 changes: 3 additions & 3 deletions docker/datahub-mce-consumer/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ ARG APP_ENV=prod
FROM adoptopenjdk/openjdk8:alpine-jre as base
ENV DOCKERIZE_VERSION v0.6.1
RUN apk --no-cache add curl tar wget openjdk8-jre bash \
&& wget https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v1.4.1/opentelemetry-javaagent-all.jar \
&& wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.16.1/jmx_prometheus_javaagent-0.16.1.jar -O jmx_prometheus_javaagent.jar \
&& wget --no-verbose https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v1.4.1/opentelemetry-javaagent-all.jar \
&& wget --no-verbose https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.16.1/jmx_prometheus_javaagent-0.16.1.jar -O jmx_prometheus_javaagent.jar \
&& cp /usr/lib/jvm/java-1.8-openjdk/jre/lib/security/cacerts /tmp/kafka.client.truststore.jks \
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
&& curl -sSL https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv

FROM openjdk:8 as prod-build
COPY . datahub-src
Expand Down
12 changes: 6 additions & 6 deletions docker/datahub-upgrade/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@ RUN apk --no-cache --update-cache --available upgrade \
echo >&2 "Unsupported architecture $(arch)" ; exit 1; \
fi \
&& apk --no-cache add tar curl openjdk8-jre bash coreutils gcompat \
&& curl https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-runner/9.4.46.v20220331/jetty-runner-9.4.46.v20220331.jar --output jetty-runner.jar \
&& curl https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-jmx/9.4.46.v20220331/jetty-jmx-9.4.46.v20220331.jar --output jetty-jmx.jar \
&& curl https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-util/9.4.46.v20220331/jetty-util-9.4.46.v20220331.jar --output jetty-util.jar \
&& wget https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v1.4.1/opentelemetry-javaagent-all.jar \
&& wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.16.1/jmx_prometheus_javaagent-0.16.1.jar -O jmx_prometheus_javaagent.jar \
&& curl -sS https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-runner/9.4.46.v20220331/jetty-runner-9.4.46.v20220331.jar --output jetty-runner.jar \
&& curl -sS https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-jmx/9.4.46.v20220331/jetty-jmx-9.4.46.v20220331.jar --output jetty-jmx.jar \
&& curl -sS https://repo1.maven.org/maven2/org/eclipse/jetty/jetty-util/9.4.46.v20220331/jetty-util-9.4.46.v20220331.jar --output jetty-util.jar \
&& wget --no-verbose https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/download/v1.4.1/opentelemetry-javaagent-all.jar \
&& wget --no-verbose https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.16.1/jmx_prometheus_javaagent-0.16.1.jar -O jmx_prometheus_javaagent.jar \
&& cp /usr/lib/jvm/java-1.8-openjdk/jre/lib/security/cacerts /tmp/kafka.client.truststore.jks \
&& curl -L https://github.com/treff7es/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-${DOCKERIZE_ARCH}-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
&& curl -sSL https://github.com/treff7es/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-${DOCKERIZE_ARCH}-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv

FROM --platform=$BUILDPLATFORM alpine:3.14 AS prod-build

Expand Down
2 changes: 1 addition & 1 deletion docker/mysql-setup/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ FROM alpine:3

ENV DOCKERIZE_VERSION v0.6.1
RUN apk add --no-cache mysql-client curl tar bash \
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
&& curl -sSL https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv

COPY docker/mysql-setup/init.sql /init.sql
COPY docker/mysql-setup/init.sh /init.sh
Expand Down
2 changes: 1 addition & 1 deletion docker/postgres-setup/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ FROM alpine:3

ENV DOCKERIZE_VERSION v0.6.1
RUN apk add --no-cache postgresql-client curl tar \
&& curl -L https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv
&& curl -sSL https://github.com/jwilder/dockerize/releases/download/$DOCKERIZE_VERSION/dockerize-linux-amd64-$DOCKERIZE_VERSION.tar.gz | tar -C /usr/local/bin -xzv

COPY docker/postgres-setup/init.sql /init.sql
COPY docker/postgres-setup/init.sh /init.sh
Expand Down
11 changes: 0 additions & 11 deletions docker/quickstart-ember.sh

This file was deleted.

2 changes: 1 addition & 1 deletion docs-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ module.exports = {
Transformers: [
"metadata-ingestion/docs/transformer/intro",
"metadata-ingestion/docs/transformer/dataset_transformer",
]
],
},
{
"Advanced Guides": [
Expand Down
2 changes: 1 addition & 1 deletion metadata-ingestion/docs/sources/snowflake/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
To get all metadata from Snowflake you need to use two plugins `snowflake` and `snowflake-usage`. Both of them are described in this page. These will require 2 separate recipes.


We encourage you to try out new `snowflake-beta` plugin as alternative to running both `snowflake` and `snowflake-usage` plugins and share feedback. `snowflake-beta` is much faster than `snowflake` for extracting metadata . Please note that, `snowflake-beta` plugin currently does not support column level profiling, unlike `snowflake` plugin.
We encourage you to try out new `snowflake-beta` plugin as alternative to running both `snowflake` and `snowflake-usage` plugins and share feedback. `snowflake-beta` is much faster than `snowflake` for extracting metadata .
6 changes: 5 additions & 1 deletion metadata-ingestion/src/datahub/cli/ingest_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,9 @@ def ingest() -> None:
default=False,
help="Turn off default reporting of ingestion results to DataHub",
)
@click.option(
"--no-spinner", type=bool, is_flag=True, default=False, help="Turn off spinner"
)
@click.pass_context
@telemetry.with_telemetry
@memory_leak_detector.with_leak_detection
Expand All @@ -117,6 +120,7 @@ def run(
test_source_connection: bool,
report_to: str,
no_default_report: bool,
no_spinner: bool,
) -> None:
"""Ingest metadata into DataHub."""

Expand All @@ -125,7 +129,7 @@ def run_pipeline_to_completion(
) -> int:
logger.info("Starting metadata ingestion")
with click_spinner.spinner(
beep=False, disable=False, force=False, stream=sys.stdout
beep=False, disable=no_spinner, force=False, stream=sys.stdout
):
try:
pipeline.run()
Expand Down
35 changes: 29 additions & 6 deletions metadata-ingestion/src/datahub/ingestion/api/report.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,12 @@
import pprint
import sys
from dataclasses import dataclass
from datetime import datetime, timedelta
from enum import Enum
from typing import Any, Dict

import humanfriendly

# The sort_dicts option was added in Python 3.8.
if sys.version_info >= (3, 8):
PPRINT_OPTIONS = {"sort_dicts": False}
Expand All @@ -18,6 +21,22 @@ class Report:
def to_str(some_val: Any) -> str:
if isinstance(some_val, Enum):
return some_val.name
elif isinstance(some_val, timedelta):
return humanfriendly.format_timespan(some_val)
elif isinstance(some_val, datetime):
now = datetime.now()
diff = now - some_val
if abs(diff) < timedelta(seconds=1):
# the timestamps are close enough that printing a duration isn't useful
return f"{some_val} (now)."
elif diff > timedelta(seconds=0):
# timestamp is in the past
return f"{some_val} ({humanfriendly.format_timespan(diff)} ago)."
else:
# timestamp is in the future
return (
f"{some_val} (in {humanfriendly.format_timespan(some_val - now)})."
)
else:
return str(some_val)

Expand All @@ -26,18 +45,21 @@ def to_dict(some_val: Any) -> Any:
"""A cheap way to generate a dictionary."""
if hasattr(some_val, "as_obj"):
return some_val.as_obj()
if hasattr(some_val, "dict"):
if hasattr(some_val, "dict"): # pydantic models
return some_val.dict()
elif isinstance(some_val, list):
if hasattr(some_val, "asdict"): # dataclasses
return some_val.asdict()
if isinstance(some_val, list):
return [Report.to_dict(v) for v in some_val if v is not None]
elif isinstance(some_val, dict):
if isinstance(some_val, dict):
return {
Report.to_str(k): Report.to_dict(v)
for k, v in some_val.items()
if v is not None
}
else:
return Report.to_str(some_val)

# fall through option
return Report.to_str(some_val)

def compute_stats(self) -> None:
"""A hook to compute derived stats"""
Expand All @@ -48,7 +70,8 @@ def as_obj(self) -> dict:
return {
str(key): Report.to_dict(value)
for (key, value) in self.__dict__.items()
if value is not None # ignore nulls
if value is not None
and not str(key).startswith("_") # ignore nulls and fields starting with _
}

def as_string(self) -> str:
Expand Down
12 changes: 7 additions & 5 deletions metadata-ingestion/src/datahub/ingestion/api/source.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,14 +55,16 @@ def report_failure(self, key: str, reason: str) -> None:

def __post_init__(self) -> None:
self.start_time = datetime.datetime.now()
self.running_time_in_seconds = 0
self.running_time: datetime.timedelta = datetime.timedelta(seconds=0)

def compute_stats(self) -> None:
duration = int((datetime.datetime.now() - self.start_time).total_seconds())
duration = datetime.datetime.now() - self.start_time
workunits_produced = self.events_produced
if duration > 0:
self.events_produced_per_sec: int = int(workunits_produced / duration)
self.running_time_in_seconds = duration
if duration.total_seconds() > 0:
self.events_produced_per_sec: int = int(
workunits_produced / duration.total_seconds()
)
self.running_time = duration
else:
self.read_rate = 0

Expand Down
Loading

0 comments on commit 99c24bf

Please sign in to comment.