-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial discussion #1
Comments
Regarding which way the implementation should take, regarding #2 it would make sense to take the second way a.k.a. depend on |
I think a good first question can be : "what language to use ?". And what about embarqued deamon ? |
Yeah, I automatically started with Python, but during the analysis, I was also bit thinking if it is the way to go. I spent this morning thinking about it again and I think that Python is the way to go. The main reason for me is the requirements that other languages would imply. JavaScript would require NodeJS installed, which is definitely something which we should not require for running Python's package manager. Yes, there are ways how to compile JS to binaries, but not sure how much it works and how much the js-ipfs would work with it (which would be the main reason for picking JS over Python). Go in itself does not have such a hard-dependency like JS, but as you mentioned it might be a bit overkill. Lastly, a big plus for Python is that it could directly tap into pip and expand its capabilities (see #2 ), moreover quite some code can be reused from other solutions like already mentioned Regarding the integrated daemon, it would be definitely a big plus and would lower the adoption barrier. That said, I think we should start with basic functionality and then expand on top of it. Python allows package binaries per platform, so in future, we could ship a version of the package with bundled |
Hi, the high-level overview seems ok, similar to npm-on-ipfs. It should work except for IPNS, which might be too slow. npm-on-ipfs uses js-ipfs, Since you seem to want to rely on the go-ipfs daemon, it's a good opportunity to test the ipns-pubub experimental feature and see how that goes. Otherwise you will need to publish your current on a non-ipns way (via dns or via an http endpoint). One important thing to explore is in what format are pypi packages (I guess some kind of zip), and why (why was this chosen, what are the special characteristics of that format). One thing that would be amazing is to have custom importers for that format so that the ipfs DAG reflects not a chunked binary blob but the tree structure and files packaged. Unless they are simply TARs, this will however require significant work. But the reward is much better deduplication possibilities. At least, you should try to import packages using the Rabin chunker and the normal (fixed-size) one and see if size grows as fast as new versions of packages come in. |
@hsanjuan thanks for the feedback! I am bit reluctant to go with the HTTP endpoint way like npm-on-ipfs, because I think it bit defers the point of decentralization (actually when I was studying the npm-on-ipfs, the endpoint was down...). I am aware of the performance issue with IPNS, that is why I plan to have the "refresh" process as a background task, which would result only in the first run to be long. That is a very good point regarding the chunking! I haven't thought of that. Python has two main distribution formats - source distributions (eq. tars with Python's source code) and binary distributions (eq. custom format, that can contain compiled sources like C etc. and hence can be platform specific). I will dig a bit more how this could be handled. Could you please provide pointers where I could study bit more about importers? https://github.com/ipfs/go-ipfs-chunker ? Also if I would implement custom importer, is there a way how to plug it into |
It does, but you also want something usable in the real world so sometimes compromises are necessary, even if temporary :).
There is no way to side-load custom importers, but go-ipfs folks probably would not mind including new impoters. Actually, impoters are two things, the chunker (go-ipfs-chunker does that), and the dag builder (ipfs supports balanced and trickle now: https://github.com/ipfs/go-unixfs/tree/master/importer). There is an example of a custom TAR importer at https://github.com/ipfs/go-ipfs/blob/master/tar/format.go which is used by the |
This issue serves as a hub for initial discussion. I am here presenting my thoughts on how the implementation could be carried out. The project shares similarities and will take inspiration from the project npm-on-ipfs.
Goal of
dpip
The goal of dpip (distributed pip) is to bring IPFS into Python Package Index (pypi.org) ecosystem. It should serve as a functional unit, but also as a demonstrator for further discussion regarding native adoption into PyPa ecosystem.
High-level architecture
Currently, I see three main components that should be part of the implementation:
pip
replacement that will proxy most of the call, except few specific onesData flow
Index
There needs to be a mechanism of translation of package name into IPFS hash. The most natural approach for this might be using MFS, which is an approach that
npm-on-ipfs
takes. The root IPFS hash of whole PyPi namespace is mounted into a prefixed MFS path and should be regularly refreshed.As IPNS resolution can take quite some time together with mounting it to MFS, the index's refresh process could run as a detached process in the background. The refresh process will require to implement checks for correct behavior like, there should not be multiple refresh process spawn, etc.
Look up of package then follows same structure like in pypi, where the path is constructed using the package names like:
/<normalized_package_name/<wheels or sdist tarballs>
.dpip / pip
dpip will serve as a wrapper around pip, which will proxy most of the call except those which will directly link with IPFS. For now, I have identified these:
The pip will be extended mainly using the
--index
/--index-url
parameter that will override the default lookup on pypi.org. The--extra-index-url
can be used for fallback to pypi.org.dpip
will be shipped with default IPNS index address, that will be provided by authors of this tool. But it will provide options to specify different IPNS address that should be used for the index. In the future, there should be a command that will allow verifying that package in the IPNS index is same as in PyPi index.It is a question of how
dpip
should be implemented. There are two approaches I see:pip
CLI's interface and proxy the calls using spawning new process with calls topip
with the parameter--index
. This is an approach thatnpm-on-ipfs
used. No direct dependency onpip
or any specific version is needed. It would require to implement HTTP server that would follow the PEP 503.pip
as a package and invoke the proper functions based on the CLI arguments and options. This would most probably require to depend on specific versions ofpip
to ensure proper compatibility but still should aim to be able to function with an as wide range ofpip
's versions as possible.Pinning
It would be beneficial to allow the user's of
dpip
to pin installed packages. This could be done usingpip
's cache, where wheels and sdists are present.Questions:
dpip
's implementation go?npm-on-ipfs
provide in-process IPFS daemon, which lowers the adoption barrier. Shoulddpip
go in the same direction and if so how to approach? Packagego-ipfs
withdpip
?Mirror daemon
The mirror daemon should be bound to specific IPNS address where the mirror will be placed, so people could produce their own indexes if they desire so. The IPFS's deduplication mechanism should work here for our benefit.
Project to use or at least inspire from is https://github.com/pypa/bandersnatch that provides full PyPi mirror and PEP503 compliant server.
Questions:
The text was updated successfully, but these errors were encountered: