Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java csv destination #505

Merged
merged 17 commits into from
Oct 8, 2020
Merged

java csv destination #505

merged 17 commits into from
Oct 8, 2020

Conversation

cgardens
Copy link
Contributor

@cgardens cgardens commented Oct 6, 2020

What

  • Fixes issue in the javabase where standard in was not being passed all the way to the integration code.
  • Removes any knowledge of local mount from the integration.
  • Add FailureTrackingConsumer that makes it easy to write a destination consumer that passes in whether or not any failures occurred into close.
  • Adds local csv destination that does not depend on singer (and adheres to our new protocol of outputting a blob).
    • This works running just with docker run. I have not tested it against the worker.
      e.g.
cat /Users/charles/code/airbyte/airbyte-integrations/csv-destination/src/main/resources/messages.txt | docker run -i -v /Users/charles/code/airbyte/airbyte-integrations/csv-destination/src/main/resources/:/local_resource -v /tmp:/local airbyte/airbyte-csv-destination:dev write --config /local_resource/config.txt --catalog /local_resource/catalog.txt

Checklist

  • Pass in the local mount path from the dockerfile / .env (instead of assuming it is "/local") handle this in the configuration.
  • Unit tests for FailureTrackingConsumer
  • Unit tests for CsvDestination
  • Integration tests for the integration (maybe in separate PR)

Recommended reading order

  1. FailureTrackingConsumer.java
  2. CsvDestination.java
  3. spec.json
  4. component.ts
  5. the rest

Base automatically changed from cgardens/java_base to master October 7, 2020 18:21
@cgardens cgardens force-pushed the cgardens/java_csv_destination branch 2 times, most recently from 843b954 to 33b88a6 Compare October 7, 2020 22:41
@cgardens cgardens changed the base branch from master to cgardens/schema_to_catalog October 7, 2020 22:42
Base automatically changed from cgardens/schema_to_catalog to master October 7, 2020 22:57
@cgardens cgardens force-pushed the cgardens/java_csv_destination branch from 6e9d104 to 1300a65 Compare October 8, 2020 00:26
@@ -36,8 +34,4 @@
public static String ARGS_CATALOG_DESC = "input path for the catalog";
public static String ARGS_PATH_DESC = "path to the json-encoded state file";

// todo (cgardens) - this mount path should be passed in by the worker and read as an arg or
// environment variable by the runner.
public static Path LOCAL_MOUNT = Path.of("/local");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the need for an integration to know about this.


// todo (cgardens) - we currently don't leverage discover in our destinations, so skipping
// implementing it... for now.
@Override
Copy link
Contributor Author

@cgardens cgardens Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we okay with this choice? i don't want to write instantly dead code unless we think it will not be dead very soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably only support it for sources in bash.sh then. It doesn't seem necessary as a separate operation. Presumably being aware of the contents of the destination is something the sync/write operation needs to know internally without the need to expose it outside of the integration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think as we become cooler, it will be something we'll want. if we allow complex mapping of fields in source to fields in destination we will need, but for now. i think it's not helpful.

"description": "Path to the directory where csv files will be written. Must start with the local mount \"/local\". Any other directory appended on the end will be placed inside that local mount.",
"type": "string",
"examples": ["/local"],
"pattern": "(^\\/local\\/.*)|(^\\/local$)"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enforce that destination path uses the local mount.

@@ -0,0 +1,34 @@
# Local CSV
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will replace local-csv.md with this one once we switch to this integration. it would be nice, if we could put this doc in the airbyte-integrations/csv-destination.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public abstract class FailureTrackingConsumer<T> implements DestinationConsumer<T> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in love with the name of this class. would love suggestions.

@cgardens cgardens marked this pull request as ready for review October 8, 2020 00:39
@@ -4,4 +4,4 @@ set -e

# wrap run script in a script so that we can lazy evaluate the value of APPLICATION. APPLICATION is
# set by the dockerfile that inherits base-java, so it cannot be evaluated when base-java is built.
bin/"$APPLICATION" "$@"
cat <&0 | bin/"$APPLICATION" "$@"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the integration runner be responsible for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline. agreed this was right. also agreed to add a comment explaining what's happening. which i did.

@@ -51,6 +51,9 @@ function main() {
# todo: state should be optional: --state "$STATE_FILE"
eval "$AIRBYTE_READ_CMD" --config "$CONFIG_FILE" --catalog "$CATALOG_FILE"
;;
write)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should pass in an env var for source/dest so only one of read and write are valid at this level and show an error if the wrong one is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. let's do this in a separate PR.


// todo (cgardens) - we currently don't leverage discover in our destinations, so skipping
// implementing it... for now.
@Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably only support it for sources in bash.sh then. It doesn't seem necessary as a separate operation. Presumably being aware of the contents of the destination is something the sync/write operation needs to know internally without the need to expose it outside of the integration.


}

public static void main(String[] args) throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here so people know this is for local development and testing, not the actual entrypoint to the integration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this changed. this is the actual entrypoint of the destination now. this moved because the version where IntegrationRunner was the entrypoint got into reflection and jar hell.

@cgardens cgardens merged commit 1bf0de6 into master Oct 8, 2020
@cgardens cgardens deleted the cgardens/java_csv_destination branch October 8, 2020 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants