Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigating: slow S3 proxying #1884

Closed
srenatus opened this issue Dec 13, 2024 · 4 comments
Closed

investigating: slow S3 proxying #1884

srenatus opened this issue Dec 13, 2024 · 4 comments

Comments

@srenatus
Copy link

Heya!

Thanks again for OF, I like using this a lot. ✨

I've got a problem, though, with serving the reports from S3. We're using a proxy that does the corp's authentication, and then services the requests by retrieving and forwarding the files from S3 (using Caddy + https://github.com/sagikazarmark/caddy-fs-s3). It works pretty well, in general, but fetching a 12MB duckdb database is taking surprisingly long -- it's basically trickling in:

Image

This makes the overall report viewing feel quite sluggish:
Image

Now, I don't know where to start. My first hunch is that if it just fetched the file at once, it would probably perform better -- I know that duckdb does range-requests, but in this case, I'd like to opt-out if possible. Relatedly, I don't know if this really is an OF problem or a duckdb-wasm one. I don't think it's go to do with our proxying setup, I can curl the entire file (through the proxy) in ~0.12s.

If you've got any hints to share, I'd appreciate it. Maybe I'm missing something very basic here 😳

@mbostock
Copy link
Member

This sounds like it was fixed in 7704416 (#1734). What version of Framework are you using? Or are you explicitly asking DuckDB to create a view instead of a table?

@srenatus
Copy link
Author

I'm on the latest release,

└── @observablehq/[email protected]

But your snippet is useful -- it's starting with a check for .parquet and I am using a duckdb "db file" which I've put into the preamble as

---
toc: true
sql:
  thisthing: ./data/thisthing.db
---

I don't suppose I can use parquet files there, can I? 💡 I most certainly can. And I've only got one table in each db file anyways, so there's nothing lost. 🥳

Then this is what I'll be trying next. Thanks a bunch! I think this is most likely resolved, I'll reopen and come back here if it's not.

@mbostock
Copy link
Member

Maybe we should increase the default block size for the ATTACH statement, or allow it to be customizable.

@srenatus
Copy link
Author

Yep, this looks better now:

Image

In summary:

Image

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants