Skip to content

Commit

Permalink
feat(ingestion): spark - support lineage for delta lake writes (#6834)
Browse files Browse the repository at this point in the history
  • Loading branch information
danielli-ziprecruiter authored Dec 22, 2022
1 parent 3b8686d commit a6470fc
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 6 deletions.
3 changes: 2 additions & 1 deletion metadata-integration/java/spark-lineage/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,10 +186,11 @@ Below is a list of Spark commands that are parsed currently:

- InsertIntoHadoopFsRelationCommand
- SaveIntoDataSourceCommand (jdbc)
- SaveIntoDataSourceCommand (Delta Lake)
- CreateHiveTableAsSelectCommand
- InsertIntoHiveTable

Effectively, these support data sources/sinks corresponding to Hive, HDFS and JDBC.
Effectively, these support data sources/sinks corresponding to Hive, HDFS, JDBC, and Delta Lake.

DataFrame.persist command is supported for below LeafExecNodes:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -144,13 +144,17 @@ Optional<? extends Collection<SparkDataset>> fromSparkPlanNode(SparkPlan plan, S

Map<String, String> options = JavaConversions.mapAsJavaMap(cmd.options());
String url = options.getOrDefault("url", ""); // e.g. jdbc:postgresql://localhost:5432/sparktestdb
if (!url.contains("jdbc")) {
if (url.contains("jdbc")) {
String tbl = options.get("dbtable");
return Optional.of(Collections.singletonList(
new JdbcDataset(url, tbl, getCommonPlatformInstance(datahubConfig), getCommonFabricType(datahubConfig))));
} else if (options.containsKey("path")) {
return Optional.of(Collections.singletonList(new HdfsPathDataset(new Path(options.get("path")),
getCommonPlatformInstance(datahubConfig), getIncludeScheme(datahubConfig),
getCommonFabricType(datahubConfig))));
} else {
return Optional.empty();
}

String tbl = options.get("dbtable");
return Optional.of(Collections.singletonList(
new JdbcDataset(url, tbl, getCommonPlatformInstance(datahubConfig), getCommonFabricType(datahubConfig))));
});

PLAN_TO_DATASET.put(CreateDataSourceTableAsSelectCommand.class, (p, ctx, datahubConfig) -> {
Expand Down

0 comments on commit a6470fc

Please sign in to comment.