Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV imports (UI) #3845

Merged
merged 9 commits into from
Mar 26, 2024
Merged

CSV imports (UI) #3845

merged 9 commits into from
Mar 26, 2024

Conversation

ruslandoga
Copy link
Contributor

@ruslandoga ruslandoga commented Feb 28, 2024

Changes

This PR adds UI for CSV imports via S3.

screen-recording-2024-03-20-at-183854_Esviayqm.mp4

Tests

  • Automated tests have been added

Changelog

  • Entry has been added to changelog

Documentation

  • Docs have been updated

Dark mode

  • The UI has been tested both in dark and light mode

Sorry, something went wrong.

@ruslandoga ruslandoga changed the title add basic CSV import CSV import UI Feb 28, 2024
This was referenced Mar 12, 2024
@ruslandoga ruslandoga changed the title CSV import UI CSV imports (UI) Mar 14, 2024
@ruslandoga ruslandoga mentioned this pull request Mar 15, 2024
4 tasks
@ruslandoga ruslandoga force-pushed the allow-csv-imports-ui branch 8 times, most recently from e25903d to a05b801 Compare March 20, 2024 10:35
@ruslandoga ruslandoga marked this pull request as ready for review March 20, 2024 10:35
@ruslandoga ruslandoga requested a review from zoldar March 20, 2024 10:35
Copy link
Contributor

@zoldar zoldar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rebase the branch against current master? 🙏

@@ -708,7 +708,7 @@ defmodule PlausibleWeb.SiteController do
|> redirect(external: Routes.site_path(conn, :settings_integrations, site.domain))
end

def export(conn, _params) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ruslandoga ruslandoga force-pushed the allow-csv-imports-ui branch from d9cec1e to c817a13 Compare March 21, 2024 12:07
@zoldar
Copy link
Contributor

zoldar commented Mar 21, 2024

  1. The upload dialog does not let me select particular files, I'm not sure why 🤔. The only way I can select the files is entering the (extracted) export directory and clicking "Upload". FF gives a warning that I'm trying to upload all the files from a given folder but they appear in the upload list once I confirm it.
    image

  2. I have first made an export from freshly loaded seeds, made sure minio container is running, extracted the export, uploaded it and initiated import. The worker seems to start and it reaches https://github.com/ruslandoga/analytics/blob/50f6a3d7bab940a2f3bc11cd4e26590e0533019f/lib/plausible/imported/csv_importer.ex#L41 for the first upload but remains stuck there for several minutes without any activity (the task is hanging in "executing" state for that time as well). Have you experienced anything like that?

  3. I'm not sure if we have discussed this, but are we planning to support uploading zipped archive as well? That would make the export->import process more streamlined for the common scenario of moving between hosted and cloud, one way or another.

@ruslandoga
Copy link
Contributor Author

ruslandoga commented Mar 21, 2024

  1. I'll try using Firefox tomorrow.
  2. No
  3. Yeah, but the importer would need to be reworked to support Zip archives, maybe https://clickhouse.com/blog/clickhouse-release-23-08#direct-import-from-archives-nikita-keba-antonio-andelic-pavel-kruglov could be used once it lands: Direct import from archives don't work for table function s3 ClickHouse/ClickHouse#54726

@ruslandoga
Copy link
Contributor Author

ruslandoga commented Mar 22, 2024

Removing webkitdirectory added in e803909 makes it "work" in FF. Here're some other problems I noticed:

  • with webkitdirectory set FF doesn't support directory drag and drop
  • with webkitdirectory set FF doesn't support picking individual files
  • with webkitdirectory set Chrome doesn't support picking individual files, but supports directory drag and drop
  • with webkitdirectory set both Chrome and FF warn about multiple files upload (and they both wrongly show the number of files as 10, I guess they count directory as a file)

I removed webkitdirectory in a4840c1 as it seems less intuitive than a basic multiple file upload.

Thank you very much for spotting it! I should've tested browsers other than Safari before committing.

@ruslandoga
Copy link
Contributor Author

ruslandoga commented Mar 22, 2024

Re 2, are there any errors on the Oban job? Maybe ClickHouse S3 timeout or similar. In that case it probably means you need to set S3_CLICKHOUSE_HOST=<MinIO ip address, from docker network inspect bridge> to make MinIO visible from ClickHouse container.

More info:

the first upload but remains stuck there for several minutes without any activity

Hm, but S3 timeout happens after 60s, so maybe it's something else. With S3 timeout, it would also appear to get stuck on a different line, right above Ch.query!

@zoldar
Copy link
Contributor

zoldar commented Mar 22, 2024

Oban Job kept executing without any errors.

I have resolved the issue by swapping "host.docker.internal" for "localhost" in S3 URLs sent to import worker:

diff --git a/lib/plausible/imported/csv_importer.ex b/lib/plausible/imported/csv_importer.ex
index 6376d851d..6f6c422ee 100644
--- a/lib/plausible/imported/csv_importer.ex
+++ b/lib/plausible/imported/csv_importer.ex
@@ -65,7 +65,9 @@ defmodule Plausible.Imported.CSVImporter do
           "end_date" => end_date
         }
 
+      IO.inspect upload, label: :ABOUT_TO_QUERY
       Ch.query!(ch, statement, params, timeout: :infinity)
+      IO.inspect upload, label: :AFTER_QUERY # THIS WAS NEVER SHOWN UNTIL I CHANGED THE HOST
     end)
   rescue
     # we are cancelling on any argument or ClickHouse errors
diff --git a/lib/plausible_web/live/csv_import.ex b/lib/plausible_web/live/csv_import.ex
index 38c97ca17..f538b3e5a 100644
--- a/lib/plausible_web/live/csv_import.ex
+++ b/lib/plausible_web/live/csv_import.ex
@@ -168,6 +168,8 @@ defmodule PlausibleWeb.Live.CSVImport do
     %{s3_url: s3_url, presigned_url: upload_url} =
       Plausible.S3.import_presign_upload(socket.assigns.site_id, entry.client_name)
 
+    s3_url = String.replace(s3_url, "localhost", "host.docker.internal")
+
     {:ok, %{uploader: "S3", s3_url: s3_url, url: upload_url}, socket}
   end

After the change, the import proceeds basically immediately.

In CH logs I can see connection failing with "connection refused" when running the insert query but it surprisingly seems to not propagate back to the client somehow? I'm not sure, I haven't digged too deep into this.

@zoldar
Copy link
Contributor

zoldar commented Mar 22, 2024

BTW something for a follow-up perhaps but IMHO it would make sense to hide CSV export and import UI when S3 is not configured, at least until local storage version is implemented. WDYT?

@ruslandoga
Copy link
Contributor Author

ruslandoga commented Mar 22, 2024

When I don't set S3_CLICKHOUSE_HOST or replace localhost in the S3 URLs Oban jobs fail for me:

iex> Plausible.Repo.all Oban.Job
[
  %Oban.Job{
    __meta__: #Ecto.Schema.Metadata<:loaded, "oban_jobs">,
    id: 19,
    state: "discarded",
    queue: "analytics_imports",
    worker: "Plausible.Workers.ImportAnalytics",
    args: %{
      "end_date" => "2024-01-23",
      "import_id" => 10,
      "start_date" => "2023-02-09",
      "uploads" => [
        %{
          "filename" => "imported_visitors_20230209_20240123.csv",
          "s3_url" => "http://localhost:10000/dev-imports/3/imported_visitors_20230209_20240123.csv"
        },
        %{
          "filename" => "imported_sources_20230209_20240123.csv",
          "s3_url" => "http://localhost:10000/dev-imports/3/imported_sources_20230209_20240123.csv"
        },
        %{
          "filename" => "imported_pages_20230209_20240123.csv",
          "s3_url" => "http://localhost:10000/dev-imports/3/imported_pages_20230209_20240123.csv"
        },
        %{
          "filename" => "imported_operating_systems_20230209_20240123.csv",
          "s3_url" => "http://localhost:10000/dev-imports/3/imported_operating_systems_20230209_20240123.csv"
        },
        %{
          "filename" => "imported_locations_20230209_20240123.csv",
          "s3_url" => "http://localhost:10000/dev-imports/3/imported_locations_20230209_20240123.csv"
        },
        %{
          "filename" => "imported_exit_pages_20230209_20240123.csv",
          "s3_url" => "http://localhost:10000/dev-imports/3/imported_exit_pages_20230209_20240123.csv"
        },
        %{
          "filename" => "imported_entry_pages_20230209_20240123.csv",
          "s3_url" => "http://localhost:10000/dev-imports/3/imported_entry_pages_20230209_20240123.csv"
        },
        %{
          "filename" => "imported_devices_20230209_20240123.csv",
          "s3_url" => "http://localhost:10000/dev-imports/3/imported_devices_20230209_20240123.csv"
        },
        %{
          "filename" => "imported_browsers_20230209_20240123.csv",
          "s3_url" => "http://localhost:10000/dev-imports/3/imported_browsers_20230209_20240123.csv"
        }
      ]
    },
    meta: %{},
    tags: [],
    errors: [
      %{
        "at" => "2024-03-22T10:11:52.453813Z",
        "attempt" => 1,
        "error" => "** (Oban.PerformError) Plausible.Workers.ImportAnalytics failed with {:discard, \"Code: 499. DB::Exception: Failed to get object info: Poco::Exception. Code: 1000, e.code() = 111, Connection refused (version 23.3.7.5 (official build)). HTTP response code: 18446744073709551615. (S3_ERROR) (version 23.3.7.5 (official build))\\n\"}"
      }
    ],
    attempt: 1,
    attempted_by: ["192", "eac88a52-9e1c-4360-af93-b4655763d6a2"],
    max_attempts: 3,
    priority: 0,
    attempted_at: ~U[2024-03-22 10:11:26.687453Z],
    cancelled_at: nil,
    completed_at: nil,
    discarded_at: ~U[2024-03-22 10:11:52.453798Z],
    inserted_at: ~U[2024-03-22 10:11:26.652690Z],
    scheduled_at: ~U[2024-03-22 10:11:26.652690Z],
    conf: nil,
    conflict?: false,
    replace: nil,
    unique: nil,
    unsaved_error: nil
  }
]

I wonder what's different between our setups (I'm running make clickhouse-prod and make minio). Ideally the jobs would fail properly for everyone.

@ruslandoga
Copy link
Contributor Author

ruslandoga commented Mar 22, 2024

BTW something for a follow-up perhaps but IMHO it would make sense to hide CSV export and import UI when S3 is not configured, at least until local storage version is implemented. WDYT?

It's already implemented :) I'll just copy it from #3692 once this PR is finished. I've started experimenting with various export/import implementations for CE. I'll keep them in draft for now but I'm sure they would be ready by the time the feature flag goes away.

This was referenced Mar 25, 2024
@zoldar zoldar merged commit c263df5 into plausible:master Mar 26, 2024
9 checks passed
@ruslandoga ruslandoga deleted the allow-csv-imports-ui branch June 10, 2024 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants