-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
java csv destination #505
java csv destination #505
Changes from 7 commits
85f3d7e
8787d7c
a100b2c
e8cc2b9
54bf8ef
121d9b4
1300a65
ddf557a
c09b25a
9a3a1db
39bc4c7
6afcb67
020e1f8
dc81794
1aba6bd
4addc80
d35e670
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
/* | ||
* MIT License | ||
* | ||
* Copyright (c) 2020 Airbyte | ||
* | ||
* Permission is hereby granted, free of charge, to any person obtaining a copy | ||
* of this software and associated documentation files (the "Software"), to deal | ||
* in the Software without restriction, including without limitation the rights | ||
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
* copies of the Software, and to permit persons to whom the Software is | ||
* furnished to do so, subject to the following conditions: | ||
* | ||
* The above copyright notice and this permission notice shall be included in all | ||
* copies or substantial portions of the Software. | ||
* | ||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
* SOFTWARE. | ||
*/ | ||
|
||
package io.airbyte.integrations.base; | ||
|
||
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
public abstract class FailureTrackingConsumer<T> implements DestinationConsumer<T> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not in love with the name of this class. would love suggestions. |
||
|
||
private static final Logger LOGGER = LoggerFactory.getLogger(FailureTrackingConsumer.class); | ||
|
||
private boolean hasFailed = false; | ||
|
||
protected abstract void acceptInternal(T t) throws Exception; | ||
|
||
public void accept(T t) throws Exception { | ||
try { | ||
acceptInternal(t); | ||
} catch (Exception e) { | ||
hasFailed = true; | ||
throw e; | ||
} | ||
} | ||
|
||
protected abstract void close(boolean hasFailed) throws Exception; | ||
|
||
public void close() throws Exception { | ||
LOGGER.info("hasFailed: {}.", hasFailed); | ||
close(hasFailed); | ||
} | ||
|
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,8 +24,6 @@ | |
|
||
package io.airbyte.integrations.base; | ||
|
||
import java.nio.file.Path; | ||
|
||
public class JavaBaseConstants { | ||
|
||
public static String ARGS_CONFIG_KEY = "config"; | ||
|
@@ -36,8 +34,4 @@ public class JavaBaseConstants { | |
public static String ARGS_CATALOG_DESC = "input path for the catalog"; | ||
public static String ARGS_PATH_DESC = "path to the json-encoded state file"; | ||
|
||
// todo (cgardens) - this mount path should be passed in by the worker and read as an arg or | ||
// environment variable by the runner. | ||
public static Path LOCAL_MOUNT = Path.of("/local"); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removed the need for an integration to know about this. |
||
|
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -51,6 +51,9 @@ function main() { | |
# todo: state should be optional: --state "$STATE_FILE" | ||
eval "$AIRBYTE_READ_CMD" --config "$CONFIG_FILE" --catalog "$CATALOG_FILE" | ||
;; | ||
write) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should pass in an env var for source/dest so only one of read and write are valid at this level and show an error if the wrong one is called. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. agreed. let's do this in a separate PR. |
||
eval "$AIRBYTE_WRITE_CMD" --config "$CONFIG_FILE" --catalog "$CATALOG_FILE" | ||
;; | ||
*) | ||
error "Unknown command: $CMD" | ||
;; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
* | ||
!Dockerfile | ||
!build |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
FROM airbyte/base-java:dev | ||
|
||
WORKDIR /airbyte | ||
ENV APPLICATION csv-destination | ||
|
||
COPY build/distributions/${APPLICATION}*.tar ${APPLICATION}.tar | ||
|
||
RUN tar xf ${APPLICATION}.tar --strip-components=1 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
import com.bmuschko.gradle.docker.tasks.image.DockerBuildImage | ||
plugins { | ||
id 'com.bmuschko.docker-remote-api' | ||
id 'application' | ||
} | ||
dependencies { | ||
implementation project(':airbyte-config:models') | ||
implementation project(':airbyte-singer') | ||
implementation project(':airbyte-integrations:base-java') | ||
|
||
implementation 'org.apache.commons:commons-csv:1.4' | ||
} | ||
|
||
application { | ||
mainClass = 'io.airbyte.integrations.destination.csv.CsvDestination' | ||
} | ||
|
||
|
||
def image = 'airbyte/airbyte-csv-destination:dev' | ||
|
||
task imageName { | ||
doLast { | ||
println "IMAGE $image" | ||
} | ||
} | ||
|
||
task buildImage(type: DockerBuildImage) { | ||
inputDir = projectDir | ||
images.add(image) | ||
dependsOn ':airbyte-integrations:base-java:buildImage' | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,197 @@ | ||
/* | ||
* MIT License | ||
* | ||
* Copyright (c) 2020 Airbyte | ||
* | ||
* Permission is hereby granted, free of charge, to any person obtaining a copy | ||
* of this software and associated documentation files (the "Software"), to deal | ||
* in the Software without restriction, including without limitation the rights | ||
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
* copies of the Software, and to permit persons to whom the Software is | ||
* furnished to do so, subject to the following conditions: | ||
* | ||
* The above copyright notice and this permission notice shall be included in all | ||
* copies or substantial portions of the Software. | ||
* | ||
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
* SOFTWARE. | ||
*/ | ||
|
||
package io.airbyte.integrations.destination.csv; | ||
|
||
import com.fasterxml.jackson.databind.JsonNode; | ||
import com.google.common.base.Preconditions; | ||
import io.airbyte.commons.json.Jsons; | ||
import io.airbyte.commons.resources.MoreResources; | ||
import io.airbyte.config.DestinationConnectionSpecification; | ||
import io.airbyte.config.Schema; | ||
import io.airbyte.config.StandardCheckConnectionOutput; | ||
import io.airbyte.config.StandardCheckConnectionOutput.Status; | ||
import io.airbyte.config.StandardDiscoverSchemaOutput; | ||
import io.airbyte.config.Stream; | ||
import io.airbyte.integrations.base.Destination; | ||
import io.airbyte.integrations.base.DestinationConsumer; | ||
import io.airbyte.integrations.base.FailureTrackingConsumer; | ||
import io.airbyte.integrations.base.IntegrationRunner; | ||
import io.airbyte.singer.SingerMessage; | ||
import java.io.FileWriter; | ||
import java.io.IOException; | ||
import java.nio.file.Files; | ||
import java.nio.file.Path; | ||
import java.nio.file.StandardCopyOption; | ||
import java.time.Instant; | ||
import java.util.HashMap; | ||
import java.util.Map; | ||
import org.apache.commons.csv.CSVFormat; | ||
import org.apache.commons.csv.CSVPrinter; | ||
import org.apache.commons.io.FileUtils; | ||
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
public class CsvDestination implements Destination { | ||
|
||
private static final Logger LOGGER = LoggerFactory.getLogger(CsvDestination.class); | ||
|
||
private static final String COLUMN_NAME = "data"; // we output all data as a blog to a single column calle data. | ||
private static final String DESTINATION_PATH_FIELD = "destination_path"; | ||
|
||
@Override | ||
public DestinationConnectionSpecification spec() throws IOException { | ||
final String resourceString = MoreResources.readResource("spec.json"); | ||
return Jsons.deserialize(resourceString, DestinationConnectionSpecification.class); | ||
} | ||
|
||
@Override | ||
public StandardCheckConnectionOutput check(JsonNode config) { | ||
try { | ||
FileUtils.forceMkdir(getDestinationPath(config).toFile()); | ||
} catch (IOException e) { | ||
return new StandardCheckConnectionOutput().withStatus(Status.FAILURE).withMessage(e.getMessage()); | ||
} | ||
return new StandardCheckConnectionOutput().withStatus(Status.SUCCESS); | ||
} | ||
|
||
// todo (cgardens) - we currently don't leverage discover in our destinations, so skipping | ||
// implementing it... for now. | ||
@Override | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are we okay with this choice? i don't want to write instantly dead code unless we think it will not be dead very soon. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should probably only support it for sources in bash.sh then. It doesn't seem necessary as a separate operation. Presumably being aware of the contents of the destination is something the sync/write operation needs to know internally without the need to expose it outside of the integration. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think as we become cooler, it will be something we'll want. if we allow complex mapping of fields in source to fields in destination we will need, but for now. i think it's not helpful. |
||
public StandardDiscoverSchemaOutput discover(JsonNode config) { | ||
throw new RuntimeException("Not Implemented"); | ||
} | ||
|
||
@Override | ||
public DestinationConsumer<SingerMessage> write(JsonNode config, Schema schema) throws IOException { | ||
final Path destinationDir = getDestinationPath(config); | ||
|
||
FileUtils.forceMkdir(destinationDir.toFile()); | ||
|
||
final long now = Instant.now().toEpochMilli(); | ||
final Map<String, WriteConfig> writeConfigs = new HashMap<>(); | ||
for (final Stream stream : schema.getStreams()) { | ||
final Path tmpPath = destinationDir.resolve(stream.getName() + "_" + now + ".csv"); | ||
final Path finalPath = destinationDir.resolve(stream.getName() + ".csv"); | ||
final FileWriter fileWriter = new FileWriter(tmpPath.toFile()); | ||
final CSVPrinter printer = new CSVPrinter(fileWriter, CSVFormat.DEFAULT.withHeader(COLUMN_NAME)); | ||
writeConfigs.put(stream.getName(), new WriteConfig(printer, tmpPath, finalPath)); | ||
} | ||
|
||
return new CsvConsumer(writeConfigs, schema); | ||
} | ||
|
||
/** | ||
* Extract provided relative path from csv config object and append to local mount path. | ||
* | ||
* @param config - csv config object | ||
* @return absolute path with the relative path appended to the local volume mount. | ||
*/ | ||
private Path getDestinationPath(JsonNode config) { | ||
final String destinationRelativePath = config.get(DESTINATION_PATH_FIELD).asText(); | ||
Preconditions.checkNotNull(destinationRelativePath); | ||
|
||
return Path.of(destinationRelativePath); | ||
} | ||
|
||
public static class WriteConfig { | ||
|
||
private final CSVPrinter writer; | ||
private final Path tmpPath; | ||
private final Path finalPath; | ||
|
||
public WriteConfig(CSVPrinter writer, Path tmpPath, Path finalPath) { | ||
this.writer = writer; | ||
this.tmpPath = tmpPath; | ||
this.finalPath = finalPath; | ||
} | ||
|
||
public CSVPrinter getWriter() { | ||
return writer; | ||
} | ||
|
||
public Path getTmpPath() { | ||
return tmpPath; | ||
} | ||
|
||
public Path getFinalPath() { | ||
return finalPath; | ||
} | ||
|
||
} | ||
|
||
public static class CsvConsumer extends FailureTrackingConsumer<SingerMessage> { | ||
|
||
private final Map<String, WriteConfig> writeConfigs; | ||
private final Schema schema; | ||
|
||
public CsvConsumer(Map<String, WriteConfig> writeConfigs, Schema schema) { | ||
this.schema = schema; | ||
LOGGER.info("initializing consumer."); | ||
|
||
this.writeConfigs = writeConfigs; | ||
} | ||
|
||
@Override | ||
protected void acceptInternal(SingerMessage singerMessage) throws Exception { | ||
if (writeConfigs.containsKey(singerMessage.getStream())) { | ||
writeConfigs.get(singerMessage.getStream()).getWriter().printRecord(Jsons.serialize(singerMessage.getRecord())); | ||
} else { | ||
throw new IllegalArgumentException( | ||
String.format("Message contained record from a stream that was not in the catalog. \ncatalog: %s , \nmessage: %s", | ||
Jsons.serialize(schema), Jsons.serialize(singerMessage))); | ||
} | ||
} | ||
|
||
@Override | ||
protected void close(boolean hasFailed) throws IOException { | ||
LOGGER.info("finalizing consumer."); | ||
|
||
for (final Map.Entry<String, WriteConfig> entries : writeConfigs.entrySet()) { | ||
try { | ||
entries.getValue().getWriter().flush(); | ||
entries.getValue().getWriter().close(); | ||
} catch (Exception e) { | ||
hasFailed = true; | ||
LOGGER.error("failed to close writer for: {}.", entries.getKey()); | ||
} | ||
} | ||
if (!hasFailed) { | ||
for (final WriteConfig writeConfig : writeConfigs.values()) { | ||
Files.move(writeConfig.getTmpPath(), writeConfig.getFinalPath(), StandardCopyOption.REPLACE_EXISTING); | ||
} | ||
} | ||
for (final WriteConfig writeConfig : writeConfigs.values()) { | ||
Files.deleteIfExists(writeConfig.getTmpPath()); | ||
} | ||
|
||
} | ||
|
||
} | ||
|
||
public static void main(String[] args) throws Exception { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe add a comment here so people know this is for local development and testing, not the actual entrypoint to the integration? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this changed. this is the actual entrypoint of the destination now. this moved because the version where IntegrationRunner was the entrypoint got into reflection and jar hell. |
||
new IntegrationRunner(new CsvDestination()).run(args); | ||
} | ||
|
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
{ | ||
"destinationId": "", | ||
"destinationSpecificationId": "", | ||
"documentationUrl": "https://docs.airbyte.io/integrations/destinations/local-csv", | ||
"specification": { | ||
"$schema": "http://json-schema.org/draft-07/schema#", | ||
"title": "CSV Destination Spec", | ||
"type": "object", | ||
"required": ["destination_path"], | ||
"additionalProperties": false, | ||
"properties": { | ||
"destination_path": { | ||
"description": "Path to the directory where csv files will be written. Must start with the local mount \"/local\". Any other directory appended on the end will be placed inside that local mount.", | ||
cgardens marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"type": "string", | ||
"examples": ["/local"], | ||
"pattern": "(^\\/local\\/.*)|(^\\/local$)" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. enforce that destination path uses the local mount. |
||
} | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Local CSV | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will replace local-csv.md with this one once we switch to this integration. it would be nice, if we could put this doc in the |
||
|
||
## Overview | ||
|
||
This destination writes data to a directory on the _local_ filesystem on the host running Airbyte. By default, data is written to `/tmp/airbyte_local`. To change this location, modify the `LOCAL_ROOT` environment variable for Airbyte. | ||
cgardens marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Sync Overview | ||
|
||
#### Output schema | ||
|
||
This destination outputs files with the name of the stream. Each row will be written as a new line in the output CSV file. | ||
|
||
#### Data Type Mapping | ||
|
||
The output file will have a single column called `data` which will be populated by the full record as a json blob. | ||
|
||
#### Features | ||
|
||
This section should contain a table with the following format: | ||
|
||
| Feature | Supported | | ||
| :--- | :--- | | ||
| Full Refresh Sync | Yes | | ||
|
||
#### Performance considerations | ||
|
||
This integration will be constrained by the speed at which your filesystem accepts writes. | ||
|
||
## Getting Started | ||
|
||
### Requirements: | ||
|
||
* The `destination_path` field must start with `/local` which is the name of the local mount that points to `LOCAL_ROOT`. Any other directories in this path will be placed inside the `LOCAL_ROOT`. By default, the value of `LOCAL_ROOT` is `/tmp/airbyte_local`. e.g. if `destination_path` is `/local/my/data`, the output will be written to `/tmp/airbyte_local/my/data`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the integration runner be responsible for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline. agreed this was right. also agreed to add a comment explaining what's happening. which i did.