-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] add new dbt.deps type: url to internally hosted tarball #4205
Comments
in support of dbt-labs#4205
I've started packaging this up on a fork here. Would love to get approval for running workflow related tests ❤️. Thanks! |
@tskleonard I appreciate the use case you're outlining! It sounds like support for an arbitrary tarball is a reasonable compromise: it would unblock users in your situation, while still putting most of the onus on them to host the right code at the right URLs. In particular, I like the point about not supporting version/dependency resolution. If that's something you want, then a self-hosted Hub mirror is actually the right-sized solution! I agree that it's decidedly overkill for the simpler case you're outlining. @leahwicz Let's check in with folks on the dbt Cloud side of the house, to see if there's security concern around support for downloading arbitrary tarballs. Users still wouldn't be able to execute arbitrary commands, but this does feel like a step beyond the current options of (a) downloading a known tarball, registered in the dbt Hub, or (b) Just noting, if of interest, that this feels adjacent to some conversations we've had in the past:
|
HI @jtcohen6 - Thanks for the excellent feedback here. One day I would love to stand up an internal hubs service, with private repos, caching, some sort of artifactory backed proxy/public package passthrough. Until then, I'm glad you see the use in such a solution as direct link urls as an in-between solution. Your point about security concerns for support of downloading arbitrary tarballs is well taken. I've updated my proposal (and code) to include an optional Also appreciate the links to previous conversations on Slack around this theme. I'm not surprised other users in larger orgs are coming with similar problems. I've got the feature working, and will finish up with docs, tests and cleanup over next couple of days. Will update (the PR )[https://github.com//pull/4220] accordingly, and reach out once ready for review. Thanks! |
in support of dbt-labs#4205 flake8 fixes adding max size tarball condition clean up imports typing adding sha1 and subdirectory options; improve logging feedback sha1: allow user to specify sha1 in packages.yaml, will only install if package matches subdirectory: allow user to specify subdirectory of package in tarfile, if the package is a non standard structure (like with git subdirectory option) simple tests added flake fixes cleanup cleanup comments; remove asserts
in support of dbt-labs#4205
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Will pick this back up in the next several weeks 👍 |
@timle2 thanks so much for creating this issue and picking this up. Running into a lot of DBT projects that just commit packages directly into their repository because they cannot access the internet when running DBT in Airflow for instance. The described tarball solution would help us out tremendously! |
@the-serious-programmer (and any others), I've picked this back up (rewrite # 3) and hoping to get this past the finish line this month. Please all, advocate for your support on this feature in #4689 it's the only way this will make it to release! |
* v0 - new dbt deps type: tarball url in support of #4205 * flake8 fixes * adding max size tarball condition * clean up imports * typing * adding sha1 and subdirectory options; improve logging feedback sha1: allow user to specify sha1 in packages.yaml, will only install if package matches subdirectory: allow user to specify subdirectory of package in tarfile, if the package is a non standard structure (like with git subdirectory option) * simple tests added * flake fixes * changes to support tests; adding exceptions; fire_event logging * new logging events * tarball exceptions added * build out tests * removing in memory tarball test * update type codes to M - Misc * adding new events to test_events * fix spacing for flake * add retry download code - as used in registry calls * clean * remove saving tar in memory inside tarfile object will hit url multiple times instead * remove duplicative code after refactor * black updates * black formatting * black formatting * refactor - no more in-memory tarfile - all as file operations now - remove tarfile passing, always use tempfile instead - reorganize system.* functions, removing duplicative code - more notes on current flow and structure - esp need for pattern of 1) unpack 2) scan for package dir 3) copy to destination. - cleaning * cleaning and sync to new tarball code * cleaning and sync to new tarball code * requested changes from PR #4689 (comment) * reversions from revision 2 removing sha1 check to simplify/mirror hub install pattern * simplify/mirror hub install pattern to simplify/mirror hub install pattern - removing sha1 check - supply name/version to act as our 'metadata' source * simplify/mirror hub install pattern simplify with goal of mirroring hub install pattern - supporting subfolders like git packages, and sha1 checks are removed - existing code from RegistryPinnedPackage (install() and download_and_untar()) performs the operations - RegistryPinnedPackage install() and download_and_untar() are not currently set up as functions that can be used across classes - this should be moved to dbt.deps.base, or to a dbt.deps.common file - need dbt labs feedback on how to proceed (or leave as is) * remove revisions, no longer doing package check * slim down to basic tests more complex features have been removed (sha1, subfolder) so testing is much simpler! * fix naming to match hubs behavior remove version from package folder name * refactor install and download to upstream PinnedPackage class i'm on the fence if this is right approach, but seems like most sensible after some thought * Create Features-20221107-105018.yaml * fix flake, black, mypy errors * additional flake/black fixes * Update .changes/unreleased/Features-20221107-105018.yaml fix username on changelog Co-authored-by: Emily Rockman <[email protected]> * change to fstring Co-authored-by: Emily Rockman <[email protected]> * cleaning - remove comment * remove comment/question for dbt team * in support of issuecomment 1334055944 #4689 (comment) * in support of issuecomment 1334118433 #4689 (comment) * black fixes; remove debug bits * remove `.format` & add 'tarball' as version 'tarball' as version so that the temp files format nicely: [tempfile_location]/dbt_utils_2..tar.gz # old vs [tempfile_location]/dbt_utils_1.tarball.tar.gz # current * port os.path refs in `PinnedPackage._install` to pathlib * lowercase as per PR feedback * update tests after removing version arg goes along with 8787ba4 Co-authored-by: Emily Rockman <[email protected]>
+1 to tarballs hosted on S3/GC, much easier to interact with than having to setup artifactory. |
Is there an existing feature request for this?
Describe the Feature
Summary
From experience with dbt at a large org, there is no great way to distribute internal (non-public) dbt modules. The only currently supported solution is leveraging GitPackages + shipping ssh credentials to wherever
dbt deps
would be run. A new second option for a private module source, is suggested below, which relies on direct tarball links, and that is unpinned only. The implication is creating a source type that does not leverage all the pinned/unpinned internals, but the trade off is reasonable to grant users a clean direct install source for private packages that is not git.Full Description
dbt deps currently supports package installation from 3 options
https://hub.getdbt.com/
. Implemented via dedicated client due to API interactions with the Registry. Audience: end users, production systems, with modules that can be posted publicly. Great for beginners and power users alike.Whats missing?
Solution?
Appendix:
Why git is a poor solution/why a second option for private modules would benefit end users
How I got here:
From experience with dbt at a large org, there is no great way to distribute internal (non-public) dbt modules. Gatekeeping for non-public repos relies on git credentials. But this is poor security practice (and not supported by our CI/CD infra, for good reason). Simply put, those reliant on exiting infra for managing builds may not have the luxury of issuing arbitrary git clone commands to external repos (as dbt deps does for GitPackage sources).
Describe alternatives you've considered
LocalPackage
symlinks to the packages on the core dbt image (along with some docker magic). It's working, but it's a hack, and being able to do straight install from a url would make me so much happier.https://hub.getdbt.com/
. Redirect to internal service by leveraging 'DBT_PACKAGE_HUB_URL`. Emulate current hub.getdbt.com API behavior in new internal only service, json object returned by API for a given package has a clear tarball source link. This would be a complicated way to do the same process as suggested here. Self hosted hub service remains an option for power users, but complex solution to simple problem to say the least.Who will this benefit?
Any user looking to run
dbt deps
for internally hosted modules that wishes to not host the packages on git (or is not permitted).Are you interested in contributing this feature?
Yes, PR in progress here
Sketch of planned work:
LocalPackage
.registry
install design patternurl
type, which maps newly created dbt.deps.tarball remoteTarball.Anything else?
First time dbt contributor, long time day to day dbt user. Looking forward to packaging this up and making my first contribution!
The text was updated successfully, but these errors were encountered: