Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add csv importer #3795

Merged
merged 17 commits into from
Feb 27, 2024
Merged

add csv importer #3795

merged 17 commits into from
Feb 27, 2024

Conversation

ruslandoga
Copy link
Contributor

@ruslandoga ruslandoga commented Feb 19, 2024

Changes

This PR implements the CSV importer from S3.

Tests

  • Automated tests have been added

Changelog

  • Entry has been added to changelog

Documentation

  • Docs have been updated

Dark mode

  • This PR does not change the UI

@ruslandoga ruslandoga self-assigned this Feb 20, 2024
Enum.map(uploads, fn upload ->
%{"filename" => filename, "s3_path" => s3_path} = upload

".csv" = Path.extname(filename)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll revisit this after implementing imports. I think we'd do this sort of check there. If not, I'll make it into an ArgumentError to cancel the job early.

@ruslandoga ruslandoga marked this pull request as ready for review February 21, 2024 13:38
@ruslandoga ruslandoga requested a review from zoldar February 21, 2024 13:38
@ruslandoga
Copy link
Contributor Author

ruslandoga commented Feb 21, 2024

I'll need to add some documentation but the implementation itself and the tests are ready for a review.

# uses https://min.io
# docker run -d --rm -p 9000:9000 -p 9001:9001 --name minio minio/minio server /data --console-address ":9001"
# docker exec minio mc alias set local http://localhost:9000 minioadmin minioadmin
# docker exec minio mc mb local/imports
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments describe how to run MinIO locally for these tests. I'm using Docker for both MinIO and ClickHouse. ClickHouse accesses MinIO through Docker host (172.17.0.1).

Copy link
Contributor

@zoldar zoldar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍 We'll most likely need to refinement because import date range will have to be known up front, but it can be tackled in a follow-up.


@moduletag :minio

# uses https://min.io
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also add minio setup to Makefile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the tests @cnkk suggested using testcontainers

But for development it definitely would be useful to update Makefile as well. I'll do it

Copy link
Contributor Author

@ruslandoga ruslandoga Feb 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added Makefile changes in eb905f3 but testcontainers seems a bit more difficult 34b7b0b

For some reason the previous approach of accessing MinIO via Docker host doesn't work anymore locally but keeps working in the CI.

  • testcontainers reuse default bridge network, containers started by testcontainrers get 172.17.0.1/16 addresses, so the network is shared
  • on CI ClickHouse container can access MinIO via the host on 172.17.0.1:mapped-port but cannot access it on its ip address and "real port" like 172.17.0.6:9000
  • on my laptop (Docker Desktop) the situation is reversed: 172.17.0.6:9000 is accessible from ClickHouse, where 172.17.0.6 is the IP address for MinIO, but 172.17.0.1:mapped-port is not
  • switching back to "manual" container created with make minio ClickHouse can access MinIO at 172.17.0.1:6000 in both CI and Docker Desktop

Copy link
Contributor Author

@ruslandoga ruslandoga Feb 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the CI tests pass, I think this can be left for later. I don't think anyone would be running minio tests other than me and CI anyway :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, at least for me, tests work fine locally with testcontainer based setup provided I setup CH via make clickhouse.

%{"table" => table, "site_id" => site_id, "import_id" => import_id}
)

Date.range(min_date, max_date)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that can be tackled in a follow-up, but eventually we'll need to always know import date range before executing it. We have to determine the boundaries up front so that imported stats do not overlap with native stats. We might also provide an option to let user adjust the import date boundaries.

@ruslandoga ruslandoga requested a review from zoldar February 24, 2024 16:29
@ruslandoga ruslandoga requested a review from a team February 26, 2024 16:44
Copy link
Contributor

@zoldar zoldar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor stuff.


@moduletag :minio

# uses https://min.io
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, at least for me, tests work fine locally with testcontainer based setup provided I setup CH via make clickhouse.

Co-authored-by: Adrian Gruntkowski <[email protected]>
@zoldar zoldar merged commit f3423ae into plausible:master Feb 27, 2024
4 checks passed
@ruslandoga ruslandoga removed their assignment Mar 19, 2024
@ruslandoga ruslandoga mentioned this pull request Mar 22, 2024
4 tasks
@ruslandoga ruslandoga deleted the add-s3-csv-importer branch June 10, 2024 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants