Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenLineage Spark extension interfaces without runtime dependency hell #2809

Merged
merged 10 commits into from
Jul 5, 2024

Conversation

ddebowczyk92
Copy link
Contributor

@ddebowczyk92 ddebowczyk92 commented Jun 28, 2024

Problem

Closes: #2784

  • Your change modifies the core OpenLineage model
  • Your change modifies one or more OpenLineage facets

If you're contributing a new integration, please specify the scope of the integration and how/where it has been tested (e.g., Apache Spark integration supports S3 and GCS filesystem operations, tested with AWS EMR).

One-line summary:

New Spark extension interfaces without runtime dependency hell

Checklist

  • You've signed-off your work
  • Your pull request title follows our guidelines
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • Your comment includes a one-liner for the changelog about the specific purpose of the change (not required for changes to tests, docs, or CI config)
  • You've versioned the core OpenLineage model or facets according to SchemaVer (if relevant)
  • You've added a header to source files (if relevant)

SPDX-License-Identifier: Apache-2.0
Copyright 2018-2024 contributors to the OpenLineage project

@ddebowczyk92 ddebowczyk92 requested a review from a team as a code owner June 28, 2024 15:06
@boring-cyborg boring-cyborg bot added area:ci CI area:documentation Improvements or additions to documentation area:tests Testing code language:java Uses Java programming language labels Jun 28, 2024
@ddebowczyk92 ddebowczyk92 force-pushed the add-spark-interfaces branch 6 times, most recently from 9738a63 to 7c1bb82 Compare July 1, 2024 14:08
Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Massive amount of work on a highly desired feature.
I'm glad to see this being done with a dynamite test to verify integration is working.
Left few comments / questions to let me understand better the change.

Phaty PR. Thank you @ddebowczyk92 for bringing this.

.circleci/workflows/openlineage-spark.yml Show resolved Hide resolved
/* Copyright 2018-2024 contributors to the OpenLineage project
/* SPDX-License-Identifier: Apache-2.0
*/
package io.openlineage.shaded.spark.extension.v1.lifecycle.plan;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason for having io.openlineage.shaded.spark.extension.v1 instead of io.openlineage.spark.shaded.extension.v1 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe even different package name? io.openlineage.spark suggests it's a part of openlineage-spark mvn package.

io.openlineage.interfaces.spark?

.circleci/config.yml Outdated Show resolved Hide resolved
integration/spark-interfaces-shaded/README.md Outdated Show resolved Hide resolved
@@ -0,0 +1 @@
version=1.0.0-SNAPSHOT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this deliberate? how are you going to release it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think unshaded should have static version, manually bumped if needed. Release should skip it if the particular version is released.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. can we then release it manually and merge to main version version=1.0.0? I don't mind the approach taken but I think snapshot version should be avoided.

.circleci/continue_config.yml Outdated Show resolved Hide resolved
@ddebowczyk92 ddebowczyk92 force-pushed the add-spark-interfaces branch 16 times, most recently from 6682bb3 to cd3bd95 Compare July 4, 2024 11:57
Signed-off-by: Dominik Dębowczyk <[email protected]>
Signed-off-by: Dominik Dębowczyk <[email protected]>
Signed-off-by: Dominik Dębowczyk <[email protected]>
Signed-off-by: Dominik Dębowczyk <[email protected]>
Signed-off-by: Dominik Dębowczyk <[email protected]>
Signed-off-by: Dominik Dębowczyk <[email protected]>
@ddebowczyk92 ddebowczyk92 force-pushed the add-spark-interfaces branch from cd3bd95 to d1d1a13 Compare July 4, 2024 14:19
@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.52%. Comparing base (a31cc8d) to head (d1d1a13).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2809   +/-   ##
=======================================
  Coverage   85.52%   85.52%           
=======================================
  Files          58       58           
  Lines        3413     3413           
=======================================
  Hits         2919     2919           
  Misses        494      494           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

ignore: /.*/
context: release
requires:
- release-client-java
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this required?

Copy link
Member

@mobuchowski mobuchowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ddebowczyk92 for such a detailed work 👍

@@ -0,0 +1,38 @@
# Spark Extension Entrypoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs page in https://openlineage.io/docs/ (outside of this PR)

@@ -44,6 +43,7 @@ dependencies {
api("io.openlineage:openlineage-java:${project.version}")
api("io.openlineage:openlineage-sql-java:${project.version}")
api("io.micrometer:micrometer-core:${micrometerVersion}")
api("io.openlineage:spark-extension-entrypoint:1.0.0-SNAPSHOT")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chicken and egg release problem, right?

I think we just need to release entrypoint separately first, before next regular OL release.

@mobuchowski mobuchowski merged commit b736dab into OpenLineage:main Jul 5, 2024
70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:ci CI area:documentation Improvements or additions to documentation area:tests Testing code language:java Uses Java programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[PROPOSAL] Spark - OpenLineage interface without runtime dependency hell
4 participants