Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java csv destination #505

Merged
merged 17 commits into from
Oct 8, 2020
2 changes: 1 addition & 1 deletion airbyte-integrations/base-java/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,6 @@ ENV AIRBYTE_SPEC_CMD "./javabase.sh --spec"
ENV AIRBYTE_CHECK_CMD "./javabase.sh --check"
ENV AIRBYTE_DISCOVER_CMD "./javabase.sh --discover"
ENV AIRBYTE_READ_CMD "./javabase.sh --read"
ENV AIRBYTE_READ_CMD "./javabase.sh --write"
ENV AIRBYTE_WRITE_CMD "./javabase.sh --write"

ENTRYPOINT ["/airbyte/base.sh"]
2 changes: 1 addition & 1 deletion airbyte-integrations/base-java/javabase.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,4 @@ set -e

# wrap run script in a script so that we can lazy evaluate the value of APPLICATION. APPLICATION is
# set by the dockerfile that inherits base-java, so it cannot be evaluated when base-java is built.
bin/"$APPLICATION" "$@"
cat <&0 | bin/"$APPLICATION" "$@"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the integration runner be responsible for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline. agreed this was right. also agreed to add a comment explaining what's happening. which i did.

Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
/*
* MIT License
*
* Copyright (c) 2020 Airbyte
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in all
* copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/

package io.airbyte.integrations.base;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public abstract class FailureTrackingConsumer<T> implements DestinationConsumer<T> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in love with the name of this class. would love suggestions.


private static final Logger LOGGER = LoggerFactory.getLogger(FailureTrackingConsumer.class);

private boolean hasFailed = false;

protected abstract void acceptInternal(T t) throws Exception;

public void accept(T t) throws Exception {
try {
acceptInternal(t);
} catch (Exception e) {
hasFailed = true;
throw e;
}
}

protected abstract void close(boolean hasFailed) throws Exception;

public void close() throws Exception {
LOGGER.info("hasFailed: {}.", hasFailed);
close(hasFailed);
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@

package io.airbyte.integrations.base;

import java.nio.file.Path;

public class JavaBaseConstants {

public static String ARGS_CONFIG_KEY = "config";
Expand All @@ -36,8 +34,4 @@ public class JavaBaseConstants {
public static String ARGS_CATALOG_DESC = "input path for the catalog";
public static String ARGS_PATH_DESC = "path to the json-encoded state file";

// todo (cgardens) - this mount path should be passed in by the worker and read as an arg or
// environment variable by the runner.
public static Path LOCAL_MOUNT = Path.of("/local");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the need for an integration to know about this.


}
3 changes: 3 additions & 0 deletions airbyte-integrations/base/base.sh
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ function main() {
# todo: state should be optional: --state "$STATE_FILE"
eval "$AIRBYTE_READ_CMD" --config "$CONFIG_FILE" --catalog "$CATALOG_FILE"
;;
write)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should pass in an env var for source/dest so only one of read and write are valid at this level and show an error if the wrong one is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. let's do this in a separate PR.

eval "$AIRBYTE_WRITE_CMD" --config "$CONFIG_FILE" --catalog "$CATALOG_FILE"
;;
*)
error "Unknown command: $CMD"
;;
Expand Down
3 changes: 3 additions & 0 deletions airbyte-integrations/csv-destination/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
*
!Dockerfile
!build
8 changes: 8 additions & 0 deletions airbyte-integrations/csv-destination/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
FROM airbyte/base-java:dev

WORKDIR /airbyte
ENV APPLICATION csv-destination

COPY build/distributions/${APPLICATION}*.tar ${APPLICATION}.tar

RUN tar xf ${APPLICATION}.tar --strip-components=1
31 changes: 31 additions & 0 deletions airbyte-integrations/csv-destination/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import com.bmuschko.gradle.docker.tasks.image.DockerBuildImage
plugins {
id 'com.bmuschko.docker-remote-api'
id 'application'
}
dependencies {
implementation project(':airbyte-config:models')
implementation project(':airbyte-singer')
implementation project(':airbyte-integrations:base-java')

implementation 'org.apache.commons:commons-csv:1.4'
}

application {
mainClass = 'io.airbyte.integrations.destination.csv.CsvDestination'
}


def image = 'airbyte/airbyte-csv-destination:dev'

task imageName {
doLast {
println "IMAGE $image"
}
}

task buildImage(type: DockerBuildImage) {
inputDir = projectDir
images.add(image)
dependsOn ':airbyte-integrations:base-java:buildImage'
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
/*
* MIT License
*
* Copyright (c) 2020 Airbyte
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in all
* copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/

package io.airbyte.integrations.destination.csv;

import com.fasterxml.jackson.databind.JsonNode;
import com.google.common.base.Preconditions;
import io.airbyte.commons.json.Jsons;
import io.airbyte.commons.resources.MoreResources;
import io.airbyte.config.DestinationConnectionSpecification;
import io.airbyte.config.Schema;
import io.airbyte.config.StandardCheckConnectionOutput;
import io.airbyte.config.StandardCheckConnectionOutput.Status;
import io.airbyte.config.StandardDiscoverSchemaOutput;
import io.airbyte.config.Stream;
import io.airbyte.integrations.base.Destination;
import io.airbyte.integrations.base.DestinationConsumer;
import io.airbyte.integrations.base.FailureTrackingConsumer;
import io.airbyte.integrations.base.IntegrationRunner;
import io.airbyte.singer.SingerMessage;
import java.io.FileWriter;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardCopyOption;
import java.time.Instant;
import java.util.HashMap;
import java.util.Map;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.io.FileUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class CsvDestination implements Destination {

private static final Logger LOGGER = LoggerFactory.getLogger(CsvDestination.class);

private static final String COLUMN_NAME = "data"; // we output all data as a blog to a single column calle data.
private static final String DESTINATION_PATH_FIELD = "destination_path";

@Override
public DestinationConnectionSpecification spec() throws IOException {
final String resourceString = MoreResources.readResource("spec.json");
return Jsons.deserialize(resourceString, DestinationConnectionSpecification.class);
}

@Override
public StandardCheckConnectionOutput check(JsonNode config) {
try {
FileUtils.forceMkdir(getDestinationPath(config).toFile());
} catch (IOException e) {
return new StandardCheckConnectionOutput().withStatus(Status.FAILURE).withMessage(e.getMessage());
}
return new StandardCheckConnectionOutput().withStatus(Status.SUCCESS);
}

// todo (cgardens) - we currently don't leverage discover in our destinations, so skipping
// implementing it... for now.
@Override
Copy link
Contributor Author

@cgardens cgardens Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we okay with this choice? i don't want to write instantly dead code unless we think it will not be dead very soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably only support it for sources in bash.sh then. It doesn't seem necessary as a separate operation. Presumably being aware of the contents of the destination is something the sync/write operation needs to know internally without the need to expose it outside of the integration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think as we become cooler, it will be something we'll want. if we allow complex mapping of fields in source to fields in destination we will need, but for now. i think it's not helpful.

public StandardDiscoverSchemaOutput discover(JsonNode config) {
throw new RuntimeException("Not Implemented");
}

@Override
public DestinationConsumer<SingerMessage> write(JsonNode config, Schema schema) throws IOException {
final Path destinationDir = getDestinationPath(config);

FileUtils.forceMkdir(destinationDir.toFile());

final long now = Instant.now().toEpochMilli();
final Map<String, WriteConfig> writeConfigs = new HashMap<>();
for (final Stream stream : schema.getStreams()) {
final Path tmpPath = destinationDir.resolve(stream.getName() + "_" + now + ".csv");
final Path finalPath = destinationDir.resolve(stream.getName() + ".csv");
final FileWriter fileWriter = new FileWriter(tmpPath.toFile());
final CSVPrinter printer = new CSVPrinter(fileWriter, CSVFormat.DEFAULT.withHeader(COLUMN_NAME));
writeConfigs.put(stream.getName(), new WriteConfig(printer, tmpPath, finalPath));
}

return new CsvConsumer(writeConfigs, schema);
}

/**
* Extract provided relative path from csv config object and append to local mount path.
*
* @param config - csv config object
* @return absolute path with the relative path appended to the local volume mount.
*/
private Path getDestinationPath(JsonNode config) {
final String destinationRelativePath = config.get(DESTINATION_PATH_FIELD).asText();
Preconditions.checkNotNull(destinationRelativePath);

return Path.of(destinationRelativePath);
}

public static class WriteConfig {

private final CSVPrinter writer;
private final Path tmpPath;
private final Path finalPath;

public WriteConfig(CSVPrinter writer, Path tmpPath, Path finalPath) {
this.writer = writer;
this.tmpPath = tmpPath;
this.finalPath = finalPath;
}

public CSVPrinter getWriter() {
return writer;
}

public Path getTmpPath() {
return tmpPath;
}

public Path getFinalPath() {
return finalPath;
}

}

public static class CsvConsumer extends FailureTrackingConsumer<SingerMessage> {

private final Map<String, WriteConfig> writeConfigs;
private final Schema schema;

public CsvConsumer(Map<String, WriteConfig> writeConfigs, Schema schema) {
this.schema = schema;
LOGGER.info("initializing consumer.");

this.writeConfigs = writeConfigs;
}

@Override
protected void acceptInternal(SingerMessage singerMessage) throws Exception {
if (writeConfigs.containsKey(singerMessage.getStream())) {
writeConfigs.get(singerMessage.getStream()).getWriter().printRecord(Jsons.serialize(singerMessage.getRecord()));
} else {
throw new IllegalArgumentException(
String.format("Message contained record from a stream that was not in the catalog. \ncatalog: %s , \nmessage: %s",
Jsons.serialize(schema), Jsons.serialize(singerMessage)));
}
}

@Override
protected void close(boolean hasFailed) throws IOException {
LOGGER.info("finalizing consumer.");

for (final Map.Entry<String, WriteConfig> entries : writeConfigs.entrySet()) {
try {
entries.getValue().getWriter().flush();
entries.getValue().getWriter().close();
} catch (Exception e) {
hasFailed = true;
LOGGER.error("failed to close writer for: {}.", entries.getKey());
}
}
if (!hasFailed) {
for (final WriteConfig writeConfig : writeConfigs.values()) {
Files.move(writeConfig.getTmpPath(), writeConfig.getFinalPath(), StandardCopyOption.REPLACE_EXISTING);
}
}
for (final WriteConfig writeConfig : writeConfigs.values()) {
Files.deleteIfExists(writeConfig.getTmpPath());
}

}

}

public static void main(String[] args) throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here so people know this is for local development and testing, not the actual entrypoint to the integration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this changed. this is the actual entrypoint of the destination now. this moved because the version where IntegrationRunner was the entrypoint got into reflection and jar hell.

new IntegrationRunner(new CsvDestination()).run(args);
}

}
20 changes: 20 additions & 0 deletions airbyte-integrations/csv-destination/src/main/resources/spec.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"destinationId": "",
"destinationSpecificationId": "",
"documentationUrl": "https://docs.airbyte.io/integrations/destinations/local-csv",
"specification": {
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "CSV Destination Spec",
"type": "object",
"required": ["destination_path"],
"additionalProperties": false,
"properties": {
"destination_path": {
"description": "Path to the directory where csv files will be written. Must start with the local mount \"/local\". Any other directory appended on the end will be placed inside that local mount.",
"type": "string",
"examples": ["/local"],
"pattern": "(^\\/local\\/.*)|(^\\/local$)"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enforce that destination path uses the local mount.

}
}
}
}
34 changes: 34 additions & 0 deletions docs/integrations/destinations/local-csv2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Local CSV
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will replace local-csv.md with this one once we switch to this integration. it would be nice, if we could put this doc in the airbyte-integrations/csv-destination.


## Overview

This destination writes data to a directory on the _local_ filesystem on the host running Airbyte. By default, data is written to `/tmp/airbyte_local`. To change this location, modify the `LOCAL_ROOT` environment variable for Airbyte.

### Sync Overview

#### Output schema

This destination outputs files with the name of the stream. Each row will be written as a new line in the output CSV file.

#### Data Type Mapping

The output file will have a single column called `data` which will be populated by the full record as a json blob.

#### Features

This section should contain a table with the following format:

| Feature | Supported |
| :--- | :--- |
| Full Refresh Sync | Yes |

#### Performance considerations

This integration will be constrained by the speed at which your filesystem accepts writes.

## Getting Started

### Requirements:

* The `destination_path` field must start with `/local` which is the name of the local mount that points to `LOCAL_ROOT`. Any other directories in this path will be placed inside the `LOCAL_ROOT`. By default, the value of `LOCAL_ROOT` is `/tmp/airbyte_local`. e.g. if `destination_path` is `/local/my/data`, the output will be written to `/tmp/airbyte_local/my/data`.