diff --git a/docs/concepts/bitswap.md b/docs/concepts/bitswap.md index 9f5497ab9..705ad6e96 100644 --- a/docs/concepts/bitswap.md +++ b/docs/concepts/bitswap.md @@ -33,13 +33,13 @@ Want-list { #### Discovery -To find peers that have a file, a node running the Bitswap protocol first sends a request called a _want-have_ to all the peers it is connected to. This _want-have_ request contains the CID of the root block of the file (the root block is at the top of the DAG of blocks that make up the file). Peers that have the root block send a _have_ response and are added to a session. Peers that don't have the block send a _dont-have_ response. If none of the peers have the root block, Bitswap queries the Distributed Hash Table (DHT) to ask who can provide the root block. +To find peers that have a file, a node running the Bitswap protocol first sends a request called a _want-have_ to all the peers it is connected to. This _want-have_ request contains the CID of the root block of the file (the root block is at the top of the DAG of blocks that make up the file). Peers that have the root block send a _have_ response and are added to a session. Peers that don't have the block send a _dont-have_ response. Bitswap builds up a map of which nodes have and don't have each block. ![Diagram of the _want-have/want-block_ process.](./images/bitswap/diagram-of-the-want-have-want-block-process.png =740x537) #### Transfer -Once peers have been added to a session, for each block that the client wants, Bitswap sends _want-have_ to each session peer to find out which peers have the block. Peers respond with _have_ or _dont_have_. Bitswap builds up a map of which nodes have and don't have each block. Bitswap sends _want-block_ to peers that have the block, and they respond with the block itself. If no peers have the block, Bitswap queries the DHT to find providers who have the block. +Bitswap sends _want-block_ to peers that have the block, and they respond with the block itself. If none of the peers have the root block, Bitswap queries the Distributed Hash Table (DHT) to ask who can provide the root block. ### Additional references diff --git a/docs/concepts/faq.md b/docs/concepts/faq.md index 3e9429b1f..4c01620f7 100644 --- a/docs/concepts/faq.md +++ b/docs/concepts/faq.md @@ -26,6 +26,9 @@ The quickest way to get IPFS up and running on your machine is by installing [IP For installing and initializing IPFS from the command line, check out the [command-line quick start](../how-to/command-line-quick-start.md) guide. +### Why doesn't my SHA hash match my CID? +When you add a file to IPFS, IPFS splits it into smaller blocks. IPFS hashes each of these pieces individually, building a [Merkle Directed Acyclic Graphs (DAGs)](../concepts/merkle-dag.md) and resulting in an overall different hash. + ## Contributing to IPFS ### How do I start contributing to IPFS? @@ -40,7 +43,7 @@ Filecoin and IPFS are two separate, complementary protocols, both created by Pro In short: IPFS addresses and moves content, while Filecoin is an incentive layer to persist data. -These components are separable - you can use one without the other, and IPFS already supports more self-organized or altruistic forms of data persistence via tools like [IPFS Cluster](https://cluster.ipfs.io/). Compatibility between IPFS and Filecoin is intended to be as seamless as possible, but we expect it to evolve over time. You can view the [draft spec for IPFS-Filecoin Interoperability](https://github.com/filecoin-project/specs/issues/143) and [ideas for future improvements](https://github.com/filecoin-project/specs/issues/144) to learn more. +These components are separable - you can use one without the other, and IPFS already supports more self-organized or altruistic forms of data persistence via tools like [IPFS Cluster](https://cluster.ipfs.io/). Compatibility between IPFS and Filecoin is intended to be as seamless as possible, but we expect it to evolve. You can view the [draft spec for IPFS-Filecoin Interoperability](https://github.com/filecoin-project/specs/issues/143) and [ideas for future improvements](https://github.com/filecoin-project/specs/issues/144) to learn more. ## IPFS and Protocol Labs diff --git a/docs/concepts/glossary.md b/docs/concepts/glossary.md index e16402fff..2707109da 100644 --- a/docs/concepts/glossary.md +++ b/docs/concepts/glossary.md @@ -66,7 +66,7 @@ A Block is a binary blob of data identified by a [CID](#cid). It could be raw by ### Bootstrap node -A Bootstrap Node is a trusted peer on the IPFS network through which an IPFS node learns about other peers on the network. [More about Bootstrapping](../how-to/modify-bootstrap-list.md) +A Bootstrap Node is a trusted peer on the IPFS network through which an IPFS node learns about other peers on the network. Both go-ipfs and js-ipfs use bootstrap nodes to enter the Distributed Hash Table (DHT). See [Bootstrap](../concepts/nodes/#bootstrap) ## C @@ -148,6 +148,10 @@ The Datastore is the on-disk storage system used by an IPFS node. Configuration Direct Connection Upgrade through Relay (DCUtR) protocol enables [hole punching](#hole-punching) for NAT traversal when port forwarding is not possible. A peer will coordinate with the counterparty using a [relayed connection](#circuit-relay-v2), to upgrade to a direct connection through a NAT/firewall whenever possible. [More about DCUtR](https://github.com/libp2p/specs/blob/master/relay/DCUtR.md) +### Delegate routing node + +GO-IPFS nodes with their API ports exposed and some HTTP API commands accessible. JS-IPFS nodes use them to query the DHT and also publish content without having to actually run DHT logic on their own. See [Delegate routing](../concepts/nodes/#types) + ### DHT A _Distributed Hash Table_ (DHT) is a distributed key-value store where keys are cryptographic hashes. In IPFS, each peer is responsible for a subset of the IPFS DHT. [More about DHT](dht.md) @@ -186,6 +190,10 @@ An IPFS Gateway acts as a bridge between traditional web browsers and IPFS. Thro Garbage Collection (GC) is the process within each IPFS node of clearing out cached files and blocks. Nodes need to clear out previously cached resources to make room for new resources. [Pinned resources](#pinning) are never deleted. +### GO-IPFS node + +The primary IPFS reference implementation, i.e., implements all requirements from the corresponding IPFS specification. It runs on servers and user machines with full IPFS capabilities, enabling experimentation. See [Nodes > GO-IPFS](../concepts/nodes/#go-ipfs). + ### Graph In computer science, a Graph is an abstract data type from the field of graph theory within mathematics. The [Merkle-DAG](#merkledag) used in IPFS is a specialized graph. @@ -224,6 +232,10 @@ The InterPlanetary Name System (IPNS) is a system for creating and updating muta ## J +### JS-IPFS node + +* Runs in the browser with a limited set of capabilities. See [Nodes > JS-IPFS](../concepts/nodes/#implementations). + ### JSON JavaScript Object Notation (JSON) is a lightweight data-interchange format. JSON is a text format that is completely language independent, human-readable, and easy to parse and generate. [More about JSON](https://www.json.org/) @@ -298,7 +310,7 @@ Network Address Translation (NAT) enables communication between two networks by ### Node -In IPFS, a node or [peer](#peer) is the IPFS program that you run on your local computer to store files and then connect to the IPFS network. [More about IPFS Node](../how-to/command-line-quick-start.md#take-your-node-online). +In IPFS, a node or [peer](#peer) is the IPFS program that you run on your local computer to store files and then connect to the IPFS network. See [Nodes](../concepts/nodes/#nodes). ### Node (in graphs) @@ -330,6 +342,10 @@ Pinning is the method of telling an IPFS node that particular data is important A vendor-agnostic [API specification](https://ipfs.github.io/pinning-services-api-spec/) that anyone can implement to provide a service for [remote pinning](#remote-pinning). +### Preload node + +Part of the process of making a UnixFS DAG publicly available via the preload node's `wantlist`, causing it to fetch data. Other nodes requesting the content can then resolve it from the preload node using Bitswap, as the data is now present in the preload node’s blockstore. See [Nodes > Preload](https://docs.ipfs.io/concepts/nodes/#preload). + ### Protobuf Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. IPFS uses it in [DAG-PB](#dag-pb). [More about Protocol Buffers](https://en.wikipedia.org/wiki/Protocol_Buffers) @@ -342,13 +358,13 @@ Publish-subscribe (Pubsub) is an experimental feature in IPFS. Publishers send m ## R -### Remote Pinning +### Relay node -A variant of [pinning](#pinning) that uses a third-party service to ensure that data persists on IPFS, even when your local node goes offline or your local copy of data is deleted during garbage collection. [More about working with remote pinning services](../how-to/work-with-pinning-services.md). +A means to establish connectivity between libp2p nodes (e.g., IPFS nodes) that wouldn't otherwise be able to establish a direct connection to each other. This may be due to nodes that are behind NAT (Network Address Translation), reverse proxies, firewalls, etc. See [Nodes > Relay](../concepts/nodes/#relay) -### Relay +### Remote Pinning -The Relay is a means to establish connectivity between libp2p nodes (e.g., IPFS nodes) that wouldn't otherwise be able to establish a direct connection to each other. This may be due to nodes that are behind NAT, reverse proxies, firewalls, etc. [More about Relay](https://github.com/libp2p/specs/tree/master/relay) +A variant of [pinning](#pinning) that uses a third-party service to ensure that data persists on IPFS, even when your local node goes offline or your local copy of data is deleted during garbage collection. [More about working with remote pinning services](../how-to/work-with-pinning-services.md). ### Repo @@ -356,7 +372,7 @@ The Repository (Repo) is a directory where IPFS stores all its settings and inte ### Root -A root is a [node](#node) in a [graph](#graph) that links to at least one other node. In an IPLD graph, roots are used to aggregate multiple chunks of a file together. +A root is a [node](#node) in a [graph](#graph) that links to at least one other node. In an IPLD graph, roots are used to aggregate multiple chunks of a file together. If you have a 600MiB file `A`, it can be split into 3 chunks `B`, `C`, and `D` since the block size of IPFS is 256MiB. The node `A` that links to each of these three chunks is the root. The CID of this root is what IPFS shows you as the CID of the file. @@ -384,7 +400,7 @@ A Self-certifying File System (SFS) is a distributed file system that doesn't re ### Sharding -An introduction of horizontal partition of data in a database or a data structure. The main purpose is to spread load and improve performance. An example of sharding in IPFS is [HAMT-sharding](#hamt-sharding) of big [UnixFS](#unixfs) directories. +An introduction of horizontal partition of data in a database or a data structure. The main purpose is to spread load and improve performance. An example of sharding in IPFS is [HAMT-sharding](#hamt-sharding) of big [UnixFS](#unixfs) directories. ### Signing (Cryptographic) diff --git a/docs/concepts/hashing.md b/docs/concepts/hashing.md index 9c53038b6..5660fcc81 100644 --- a/docs/concepts/hashing.md +++ b/docs/concepts/hashing.md @@ -6,10 +6,6 @@ description: Learn about cryptographic hashes and why they're critical to how IP # Hashing -::: tip -If you're interested in how cryptographic hashes fit into how IPFS works with files in general, check out this video from IPFS Camp 2019! [Core Course: How IPFS Deals With Files](https://www.youtube.com/watch?v=Z5zNPwMDYGg) -::: - Cryptographic hashes are functions that take some arbitrary input and return a fixed-length value. The particular value depends on the given hash algorithm in use, such as [SHA-1](https://en.wikipedia.org/wiki/SHA-1) (used by git), [SHA-256](https://en.wikipedia.org/wiki/SHA-2), or [BLAKE2](), but a given hash algorithm always returns the same value for a given input. Have a look at Wikipedia's [full list of hash functions](https://en.wikipedia.org/wiki/List_of_hash_functions) for more. As an example, the input: @@ -32,32 +28,37 @@ However, the exact same input generates the following output using **SHA-256**: Notice that the second hash is longer than the first one. This is because SHA-1 creates a 160-bit hash, while SHA-256 creates a 256-bit hash. The prepended `0x` indicates that the following hash is represented as a hexadecimal number. -Hashes can be represented in different bases (`base2`, `base16`, `base32`, etc.). In fact, IPFS makes use of that as part of its [content identifiers](content-addressing.md) and supports multiple base representations at the same time, using the [Multibase](https://github.com/multiformats/multibase) protocol. +Hashes can be represented in different bases (`base2`, `base16`, `base32`, etc.). In fact, IPFS uses that as part of its [content identifiers](content-addressing.md) and supports multiple base representations at the same time, using the [Multibase](https://github.com/multiformats/multibase) protocol. For example, the SHA-256 hash of "Hello world" from above can be represented as base 32 as: ``` mtwirsqawjuoloq2gvtyug2tc3jbf5htm2zeo4rsknfiv3fdp46a ``` +::: tip +If you're interested in how cryptographic hashes fit into how IPFS works with files in general, check out this video from IPFS Camp 2019! [Core Course: How IPFS Deals With Files](https://www.youtube.com/watch?v=Z5zNPwMDYGg) +::: -## Hashes are important +## Important hash characteristics -Cryptographic hashes come with a couple of very important characteristics: +Cryptographic hashes come with a several important characteristics: - **deterministic** - the same input message always returns exactly the same output hash - **uncorrelated** - a small change in the message should generate a completely different hash - **unique** - it's infeasible to generate the same hash from two different messages - **one-way** - it's infeasible to guess or calculate the input message from its hash -These features also mean we can use a cryptographic hash to identify any piece of data: the hash is unique to the data we calculated it from and it's not too long so sending it around the network doesn't take up a lot of resource. A hash is a fixed length, so the SHA-256 hash of a one-gigabyte video file is still only 32 bytes. +These features also mean we can use a cryptographic hash to identify any piece of data: the hash is unique to the data we calculated it from and it's not too long so sending it around the network doesn't take up a lot of resource. A hash is a fixed length, so the SHA-256 hash of a one-gigabyte video file is still only 32 bytes. -That's critical for a distributed system like IPFS, where we want to be able to store and retrieve data from many places. A computer running IPFS can ask all the peers it's connected to whether they have a file with a particular hash and, if one of them does, they send back the whole file. Without a short, unique identifier like a cryptographic hash, that wouldn't be possible. This technique is called [content addressing](content-addressing.md) — because the content itself is used to form an address, rather than information about the computer and disk location it's stored at. +That's critical for a distributed system like IPFS, where we want to be able to store and retrieve data from many places. A computer running IPFS can ask all the peers it's connected to whether they have a file with a particular hash and, if one of them does, they send back the whole file. Without a short, unique identifier like a cryptographic hash, [content addressing](content-addressing.md) wouldn't be possible. -## Content identifiers are not file hashes +## Example: Content Identifiers are not file hashes -Hash functions are widely used as to check for file integrity. A download provider may publish the output of a hash function for a file, often called a _checksum_. The checksum enables users to verify that a file has not been altered since it was published. This check is done by performing the same hash function against the downloaded file that was used to generate the checksum. If that checksum that the user receives from the downloaded file exactly matches the checksum on the website, then the user knows that the file was not altered and can be trusted. +Hash functions are widely used to check for file integrity. Because IPFS splits content into blocks and verifies them through [directed acyclic graphs (DAGs)](../concepts/merkle-dag.md), SHA file hashes won't match CIDs. Here's an example of what will happen if you try to do that. -Let us look at a concrete example. When you download an image file for [Ubuntu Linux](https://ubuntu.com/) you might see the following `SHA-256` checksum on the Ubuntu website listed for verification purposes: +A download provider may publish the output of a hash function for a file, often called a _checksum_. The checksum enables users to verify that a file has not been altered since it was published. This check is done by performing the same hash function against the downloaded file that was used to generate the checksum. If that checksum that the user receives from the downloaded file exactly matches the checksum on the website, then the user knows that the file was not altered and can be trusted. + +For example, when you download an image file for [Ubuntu Linux](https://ubuntu.com/) you might see the following `SHA-256` checksum on the Ubuntu website listed for verification purposes: ``` 0xB45165ED3CD437B9FFAD02A2AAD22A4DDC69162470E2622982889CE5826F6E3D ubuntu-20.04.1-desktop-amd64.iso @@ -80,7 +81,7 @@ added QmPK1s3pNYLi9ERiq3BDxKa4XosgWwFRQUydHUtz4YgpqB ubuntu-20.04.1-desktop-amd6 2.59 GiB / 2.59 GiB [==========================================================================================] 100.00% ``` -The string `QmPK1s3pNYLi9ERiq3BDxKa4XosgWwFRQUydHUtz4YgpqB` returned by the `ipfs add` command is the content identifier (CID) of the file `ubuntu-20.04.1-desktop-amd64.iso`. We can utilize the [CID Inspector](https://cid.ipfs.io/) to see what the CID includes. The actual hash is listed under `DIGEST (HEX)`: +The string `QmPK1s3pNYLi9ERiq3BDxKa4XosgWwFRQUydHUtz4YgpqB` returned by the `ipfs add` command is the content identifier (CID) of the file `ubuntu-20.04.1-desktop-amd64.iso`. We can use the [CID Inspector](https://cid.ipfs.io/) to see what the CID includes. The actual hash is listed under `DIGEST (HEX)`: ``` NAME: sha2-256 @@ -101,4 +102,6 @@ ubuntu-20.04.1-desktop-amd64.iso: FAILED shasum: WARNING: 1 computed checksum did NOT match ``` -As we can see, the hash included in the CID does NOT match the hash of the input file `ubuntu-20.04.1-desktop-amd64.iso`. To understand what the hash contained in the CID is, we must understand how IPFS stores files. IPFS uses a [directed acyclic graph (DAG)](merkle-dag.md) to keep track of all the data stored in IPFS. A CID identifies one specific node in this graph. This identifier is the result of hashing the node's contents using a cryptographic hash function like `SHA256`. +As we can see, the hash included in the CID does NOT match the hash of the input file `ubuntu-20.04.1-desktop-amd64.iso`. + +To understand what the hash contained in the CID is, we must understand how IPFS stores files. IPFS uses a directed acyclic graph (DAG) to keep track of all the data stored in IPFS. A CID identifies one specific node in this graph. This identifier is the result of hashing the node's contents using a cryptographic hash function like SHA256. diff --git a/docs/concepts/how-ipfs-works.md b/docs/concepts/how-ipfs-works.md index 02d370cb1..21a835ff2 100644 --- a/docs/concepts/how-ipfs-works.md +++ b/docs/concepts/how-ipfs-works.md @@ -32,15 +32,17 @@ That problem exists for the internet and on your computer! Right now, content is By contrast, every piece of content that uses the IPFS protocol has a [_content identifier_](content-addressing.md), or CID, that is its _hash_. The hash is unique to the content that it came from, even though it may look short compared to the original content. If hashes are new to you, check out our [guide to cryptographic hashing](hashing.md) for an introduction. -Many distributed systems make use of content addressing through hashes as a means for not just identifying content but also linking it together — everything from the commits that back your code to the blockchains that run cryptocurrencies leverage this strategy. However, the underlying data structures in these systems are not necessarily interoperable. +Many distributed systems use content addressing through hashes as a means for not just identifying content, but also linking it together — everything from the commits that back your code to the blockchains that run cryptocurrencies leverage this strategy. However, the underlying data structures in these systems are not necessarily interoperable. -This is where the [Interplanetary Linked Data (IPLD) project](https://ipld.io/) comes in. IPLD translates between hash-linked data structures allowing for the unification of the data across distributed systems. IPLD provides libraries for combining pluggable modules (parsers for each possible type of IPLD node) to resolve a path, selector, or query across many linked nodes, allowing you to explore data regardless of the underlying protocol. IPLD provides a way to translate between content-addressable data structures: _"Oh, you use Git-style, no worries, I can follow those links. Oh, you use Ethereum, I got you, I can follow those links too!"_ +This is where the [Interplanetary Linked Data (IPLD) project](https://ipld.io/) comes in. IPLD translates between hash-linked data structures, allowing for the unification of the data across distributed systems. IPLD provides libraries for combining pluggable modules (parsers for each possible type of IPLD node) to resolve a path, selector, or query across many linked nodes, allowing you to explore data regardless of the underlying protocol. IPLD provides a way to translate between content-addressable data structures: _"Oh, you use Git-style, no worries, I can follow those links. Oh, you use Ethereum, I got you, I can follow those links too!"_ -IPFS follows particular data-structure preferences and conventions. The IPFS protocol uses those conventions and IPLD to get from raw content to an IPFS address that uniquely identifies content on the IPFS network. The next section explores how links between content are embedded within that content address through a DAG data structure. +IPFS follows particular data-structure preferences and conventions. The IPFS protocol uses those conventions and IPLD to get from raw content to an IPFS address that uniquely identifies content on the IPFS network. + +The next section explores how links between content are embedded within that content address through a DAG data structure. ## Directed acyclic graphs (DAGs) -IPFS and many other distributed systems take advantage of a data structure called [directed acyclic graphs](https://en.wikipedia.org/wiki/Directed_acyclic_graph), or DAGs. Specifically, they use _Merkle DAGs_, which are DAGs where each node has a unique identifier that is a hash of the node's contents. Sound familiar? This refers back to the _CID_ concept that we covered in the previous section. Put another way: identifying a data object (like a Merkle DAG node) by the value of its hash _is content addressing_. Check out our [guide to Merkle DAGs](merkle-dag.md) for a more in-depth treatment of this topic. +IPFS and many other distributed systems take advantage of a data structure called [directed acyclic graphs](https://en.wikipedia.org/wiki/Directed_acyclic_graph), or DAGs. Specifically, they use _Merkle DAGs_, where each node has a unique identifier that is a hash of the node's contents. Sound familiar? This refers back to the _CID_ concept that we covered in the previous section. Put another way: identifying a data object (like a Merkle DAG node) by the value of its hash _is content addressing_. Check out our [guide to Merkle DAGs](merkle-dag.md) for a more in-depth treatment of this topic. IPFS uses a Merkle DAG that is optimized for representing directories and files, but you can structure a Merkle DAG in many different ways. For example, Git uses a Merkle DAG that has many versions of your repo inside of it. @@ -52,7 +54,7 @@ It's easy to see a Merkle DAG representation of a file of your choice using the Merkle DAGs are a bit of a ["turtles all the way down"](https://ipfs.io/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco/wiki/Turtles_all_the_way_down.html) scenario; that is, _everything_ has a CID. Let's say you have a file, and its CID identifies it. What if that file is in a folder with several other files? Those files will have CIDs too. What about that folder's CID? It would be a hash of the CIDs from the files underneath (i.e., the folder's content). In turn, those files are made up of blocks, and each of those blocks has a CID. You can see how a file system on your computer could be represented as a DAG. You can also see, hopefully, how Merkle DAG graphs start to form. For a visual exploration of this concept, take a look at the [IPLD Explorer](https://explore.ipld.io/#/explore/QmSnuWmxptJZdLJpKRarxBMS2Ju2oANVrgbr2xWbie9b2D). -Another useful feature of Merkle DAGs and breaking content into blocks is that if you have two similar files, they can share parts of the Merkle DAG, i.e., parts of different Merkle DAGs can reference the same subset of data. For example, if you update a website, only updated files receive new content addresses. Your old version and your new version can refer to the same blocks for everything else. This can make transferring versions of large datasets (such as genomics research or weather data) more efficient because you only need to transfer the parts that are new or have changed instead of creating entirely new files each time. +Another useful feature of Merkle DAGs and breaking content into blocks is that if you have two similar files, they can share parts of the Merkle DAG, i.e., parts of different Merkle DAGs can reference the same subset of data. For example, if you update a website, only updated files receive new content addresses. Your old version and your new version can refer to the same blocks for everything else. This can make transferring versions of large datasets (such as genomics research or weather data) more efficient because you only need to transfer the parts that are new or changed, instead of creating entirely new files each time. So, to recap, IPFS lets you give CIDs to content and link that content together in a Merkle DAG. Now let's move on to the last piece: how you find and move content. @@ -62,15 +64,23 @@ To find which peers are hosting the content you're after (_discovery_), IPFS use The [libp2p project](https://libp2p.io/) is the part of the IPFS ecosystem that provides the DHT and handles peers connecting and talking to each other. (Note that, as with IPLD, libp2p can also be used as a tool for other distributed systems, not just IPFS.) -Once you know where your content is (or, more precisely, which peers are storing each of the blocks that make up the content you're after), you use the DHT again to find the current location of those peers (_routing_). So, in order to get to content, you use libp2p to query the DHT twice. +Once you know where your content is (or, more precisely, which peers are storing each of the blocks that make up the content you're after), you use the DHT again to find the current location of those peers (_routing_). So, to get to content, use libp2p to query the DHT twice. You've discovered your content, and you've found the current location(s) of that content. Now, you need to connect to that content and get it (_exchange_). To request blocks from and send blocks to other peers, IPFS currently uses a module called [_Bitswap_](https://github.com/ipfs/specs/blob/master/BITSWAP.md). Bitswap allows you to connect to the peer or peers that have the content you want, send them your _wantlist_ (a list of all the blocks you're interested in), and have them send you the blocks you requested. Once those blocks arrive, you can verify them by hashing their content to get CIDs and compare them to the CIDs that you requested. These CIDs also allow you to deduplicate blocks if needed. There are [other content replication protocols under discussion](https://github.com/ipfs/camp/blob/master/DEEP_DIVES/24-replication-protocol.md) as well, the most developed of which is [_Graphsync_](https://github.com/ipld/specs/blob/master/block-layer/graphsync/graphsync.md). There's also a proposal under discussion to [extend the Bitswap protocol](https://github.com/ipfs/go-bitswap/issues/186) to add functionality around requests and responses. +## SHA file hashes won't match Content IDs + +You may be used to verifying the integrity of a file by matching SHA hashes, but SHA hashes won't match CIDs. Because IPFS splits a file into blocks, each block has its own CID, including separate CIDs for any parent nodes. + +The DAG keeps track of all the content stored in IPFS as blocks, not files, and Merkle DAGs are self-verified structures. To learn more about DAGs, see [directed acyclic graph (DAG)](../concepts/merkle-dag.md). + +For a detailed example of what happens when you try to compare SHA hashes with CIDs, see [Content Identifiers are not hashes](../concepts/hashing/#content-identifiers-are-not-file-hashes). + ### Libp2p -What makes libp2p especially useful for peer to peer connections is _connection multiplexing_. Traditionally, every service in a system opens a different connection to communicate with other services of the same kind remotely. Using IPFS, you open just one connection, and you multiplex everything on that. For everything your peers need to talk to each other about, you send a little bit of each thing, and the other end knows how to sort those chunks where they belong. +What makes libp2p especially useful for peer-to-peer connections is _connection multiplexing_. Traditionally, every service in a system opens a different connection to communicate with other services of the same kind remotely. Using IPFS, you open just one connection, and you multiplex everything on that. For everything your peers need to talk to each other about, you send a little bit of each thing, and the other end knows how to sort those chunks where they belong. This is useful because establishing connections is usually hard to set up and expensive to maintain. With multiplexing, once you have that connection, you can do whatever you need on it. diff --git a/docs/concepts/nodes.md b/docs/concepts/nodes.md index 35c76dad9..88976c788 100644 --- a/docs/concepts/nodes.md +++ b/docs/concepts/nodes.md @@ -5,34 +5,20 @@ description: "Participants in the IPFS network are called nodes. Nodes are the m # Nodes -Participants in the IPFS network are called _nodes_. Nodes are the most crucial aspect of IPFS - without nodes running the IPFS daemon, there would be no IPFS Network. +Participants in the IPFS network are called _nodes_. Nodes are an IPFS program that you run on your local computer to store files and connect to the IPFS network. They're the most crucial aspect of IPFS. Without nodes running the IPFS daemon (explained below), there would be no IPFS Network. -## Implementations - -Protocol Labs manages two primary implementations of the IPFS spec: Go-IPFS and JS-IPFS. - -### Go-IPFS +You're likely to see the term _node_ throughout the IPFS docs, issues, and related code. It's a very general term, so its meaning depends on the context. There are three main categories of nodes: IPFS nodes, data nodes, and libp2p nodes for applications. -The Go implementation is designed to run on servers and user machines with the full capabilities of IPFS. New IPFS features are usually created on Go-IPFS before any other implementation. Features include: +* __IPFS Nodes__ are programs that run on a computer that can exchange data with other IPFS nodes. They go by several different names, but we refer to them by a different term, depending on the context: + * _node_: Use _node_ when you're referring to an individual point on the network. It's a very general term. For example, when you open IPFS Desktop, you establish yourself as a node with the potential to interact with other nodes. See [Configure a node](https://docs.ipfs.io/how-to/configure-node/). + * _peer_: Use _peer_ when you're talking about the relationship of one node (even your own) to other nodes. It refers to their relationship as equals, with no central authority, so your node is a peer to other peers.See [Observe peers](../how-to/observe-peers/), [Exchange files between nodes](../how-to/exchange-files-between-nodes/), and [Peering with content providers](https://docs.ipfs.io/how-to/peering-with-content-providers/). + * _daemon_: Use _daemon_ when talking about a node's activity status. When a node is online and running in the background, listening for requests for its data, it's called a _daemon_. See [Take your node online](../how-to/command-line-quick-start/#take-your-node-online) + * _instance_: Use _instance_ when talking about a library or program that is able to communicate with other IPFS instances, for example, when using Bitswap to trade data back and forth (whether in Go or JS). See [Bitswap](../concepts/bitswap/) [Go-IPFS](../reference/go/api/), and [JS-IPFS](../reference/js/api/#ipfs-and-javascript) for general concepts, and [Preload](../concepts/nodes/#preload), [Bootstrap](../concepts/nodes/#bootstrap), [Delegate routing](../concepts/nodes/#delegate-routing) below for node specifics. -- TCP and QUIC transports are enabled by default. -- `/ws/` transport disabled by default. -- HTTP gateway with subdomain support for origin isolation between content roots. -- Various [experimental features](https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md) +* __Data nodes__, Use _data nodes_ when talking about actual pieces of data on IPFS, such as DAG nodes, UnixFS nodes, and IPLD nodes. When you add a file with the `ipfs add myfile.txt` command, IPFS breaks them up into several nodes that each contain a chunk of the file and are linked to each other. See [Merkle Directed Acyclic Graphs (DAGs)](../concepts/merkle-dag/), [Unix File System (UnixFS)](../concepts/file-systems/#unix-file-system-unixfs), and stay tuned for [InterPlanetary Linked Data (IPLD) model](../concepts/ipld/) docs, which is in progress. -### JS-IPFS +* __libp2p peer__ Use _libp2p peer_ when talking about libp2p nodes on which you can build applications. They're usually referred to as _peers_ in libp2p, because it provides solutions for essential peer-to-peer elements like transport, security, peer routing, and content discovery. See [concepts](../concepts/libp2p.md) -The Javascript implementation is designed to run in the browser with a limited set of capabilities. Features include: - -- Can connect to server nodes using secure WebSockets. - - WSS requires manual setup of TLS at the server. -- Can connect to a browser node using WebRTC using a centralized [ws-webrtc-star signaling service](https://github.com/libp2p/js-libp2p-webrtc-star). - -Specific limitations of the JS-IPFS implementation are: - -- Unless using WSS, a JS-IPFS node cannot connect to the main public DHT. They will only connect to other JS-IPFS nodes. -- The performance of the DHT is not on-par with the Go-IPFS implementation. -- The HTTP gateway is present, but it has no subdomain support ## Types @@ -45,7 +31,7 @@ There are different types of IPFS nodes. And depending on the use-case, a single ### Preload -When users want to make a UnixFS DAG publicly available, they call `ipfs refs -r ` on a randomly chosen preload node's HTTP API. This puts the CID in the preload nodes' `wantlist`, which then causes it to fetch the data from the user. Other nodes requesting the content can then resolve it from the preload node using bitswap, as the data is now present in the preload node’s blockstore. +Use to make a UnixFS DAG publicly available by calling `ipfs refs -r ` on a randomly chosen preload node's HTTP API. This puts the CID in the preload nodes' wantlist, causing it to fetch the data from the user. Other nodes requesting the content can then resolve it from the preload node using bitswap, as the data is now present in the preload node’s blockstore. Features of a preload node: @@ -58,14 +44,16 @@ Features of a preload node: Limitations of a preload node: -- Default preload nodes provided by Protocol Labs garbage collect every hour, so preloaded content only survives for that long. This is configurable, however, one can run their own nodes with different policy. -- Requires client to be smart about what gets preloaded: recursive preload of a big DAG. +- Default preload nodes provided by Protocol Labs garbage collect every hour, so preloaded content only survives for that long. However, this is configurable. You can run nodes with customized policies. +- Requires client to be smart about what gets preloaded: recursive preload of a big DAG. +- Only works with dag-pb CIDs because that's all the refs command understands. It's harder to find non-dag-pb content, e.g., you need a connection to the publishing js-ipfs instance or it needs to be put on the DHT by a delegate node. ### Relay -If an IPFS node deems itself unreachable by the public internet, IPFS nodes may choose use a relay node as a kind of VPN in an attempt to reach the unreachable node. +If an IPFS node deems itself unreachable by the public internet, IPFS nodes may choose to use a relay node as a kind of VPN in an attempt to reach the unreachable node. Features of a relay node: + - Implements either [v1](https://github.com/libp2p/specs/blob/master/relay/circuit-v1.md) or [v2](https://github.com/libp2p/specs/blob/master/relay/circuit-v2.md) of the Circuit Relay protocol. - Can be either Go-IPFS or JS-IPFS nodes; however there are standalone implementations as well: - [js-libp2p-relay-server](https://github.com/libp2p/js-libp2p-relay-server) (supports circuit v1) @@ -74,8 +62,11 @@ Features of a relay node: - JS-IPFS nodes can also use relay nodes to overcome the lack of transport compatibility within the JS-IPFS implementation. A browser node with WebSockets/webRTC transports can talk with a Go-IPFS node that only communicates through TCP using a relay that supports both transports. This is not enabled by default and needs to be set up. Limitations of relay nodes: -- v1 relays can be used by anyone without any limits, unless [go-libp2p-relay-daemon](https://github.com/libp2p/go-libp2p-relay-daemon) is used with ACLs set up. +- v1 relays can be used by anyone without any limits, unless [go-libp2p-relay-daemon](https://github.com/libp2p/go-libp2p-relay-daemon) is used with ACLs (Access Control Lists) set up. - v2 relays are "limited relays" that are designed to be used for [Direct Connection Upgrade through Relay](https://github.com/libp2p/specs/blob/master/relay/DCUtR.md) (aka hole punching). +- Not configurable in go-ipfs; uses a preset list of relays + +See [p2p-circuit relay](https://github.com/libp2p/specs/tree/master/relay) ### Bootstrap @@ -84,21 +75,23 @@ Both Go-IPFS and JS-IPFS nodes use bootstrap nodes to initially enter the DHT. Features of a bootstrap node: - All default bootstrap nodes are part of the public DHT. -- They are used by both Go-IPFS and JS-IPFS nodes. -- The list of bootstrap nodes a Go-IPFS or JS-IPFS node connects to is configurable. +- The list of bootstrap nodes a Go-IPFS or JS-IPFS node connects to is configurable in their config files. Limitations of a bootstrap node: - If an IPFS node only has one bootstrap node listed in that configuration and that bootstrap node goes offline, the IPFS node will lose access to the public DHT if it were to restart. -### Delegate routing +[More about Bootstrapping](../how-to/modify-bootstrap-list.md) + +### Delegate routing node -When IPFS nodes are unable to run DHT logic on their own, they _delegate_ the task to a delegate routing node. Publishing works with arbitrary CID codecs, as the [js-delegate-content module](https://github.com/libp2p/js-libp2p-delegated-content-routing/blob/master/src/index.js#L127-L128) publishes CIDs at the block level rather than the IPLD or DAG level. +When IPFS nodes are unable to run Distributed Hash Tag (DHT) logic on their own, they _delegate_ the task to a delegate routing node. Publishing works with arbitrary CID codecs (compression/decompression technology), as the [js-delegate-content module](https://github.com/libp2p/js-libp2p-delegated-content-routing/blob/master/src/index.js#L127-L128) publishes CIDs at the block level rather than the IPLD or DAG level. Features of a delegate routing node: -- They are Go-IPFS nodes with some HTTP RPC API commands exposed unser `/api/v0`. +- They are Go-IPFS nodes with their API ports exposed and some API commands accessible under `/api/v0`. - Usable by both Go-IPFS and JS-IPFS nodes. +- JS-IPFS nodes use them to query the DHT and also publish content without having to actually run DHT logic on their own. - Often on the same _server_ as a [preload](#preload) node, though both the delegate routing service and preload service are addressed differently. This is done by having different multiaddrs that resolve to the same machine. - Delegate routing nodes are in the default JS-IPFS configuration as bootstrap nodes, so they will maintain libp2p swarm connections to them at all times. - They are configured as regular bootstrap nodes, but have the string 'preload' in their multiaddrs. @@ -107,4 +100,37 @@ Limitations of a delegate routing node: - On default delegate nodes provided by Protocol Labs, the garbage collection happens every hour, so provided content only survives for that long. If the uploading JS-IPFS node is still running, it will issue periodic re-provides using the same publishing mechanic, which extends the life of the content on the DHT. +## Implementations + +Protocol Labs manages two primary implementations of the IPFS spec: Go-IPFS and JS-IPFS. These implementations use specific types of nodes to perform server, browser, and other client functions. + +### Go-IPFS + +The Go implementation is designed to run on servers and user machines with full IPFS capabilities, enabling experimentation. New IPFS features are usually created on Go-IPFS before any other implementation. + +Features include: + +- TCP and QUIC transports are enabled by default. +- `/ws/` transport disabled by default. +- HTTP gateway with subdomain support for origin isolation between content roots. +- Various [experimental features](https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md) + +See [API > Working with Go](https://docs.ipfs.io/reference/go/api/#working-with-go) + +### JS-IPFS + +The Javascript implementation is designed to run in the browser with a limited set of IPFS capabilities. + +Features include: + +- Can connect to server nodes using secure WebSockets. + - WSS requires manual setup of TLS at the server. +- Can connect to a browser node using WebRTC using a centralized [ws-webrtc-star signaling service](https://github.com/libp2p/js-libp2p-webrtc-star). + +Specific limitations of the JS-IPFS implementation are: + +- Unless using WSS, a JS-IPFS node cannot connect to the main public DHT. They will only connect to other JS-IPFS nodes. +- The performance of the DHT is not on-par with the Go-IPFS implementation. +- The HTTP gateway is present, but it has no subdomain support (can't open TCP port) +See [More about IPFS Node](../how-to/command-line-quick-start.md#take-your-node-online) diff --git a/docs/how-to/configure-node.md b/docs/how-to/configure-node.md index 667403108..576339537 100644 --- a/docs/how-to/configure-node.md +++ b/docs/how-to/configure-node.md @@ -99,7 +99,7 @@ This document refers to the standard JSON types (e.g., `null`, `string`, Flags allow enabling and disabling features. However, unlike simple booleans, they can also be `null` (or omitted) to indicate that the default value should be chosen. This makes it easier for go-ipfs to change the defaults in the -future unless the user _explicitly_ sets the flag to either `true` (enabled) or +future, unless the user _explicitly_ sets the flag to either `true` (enabled) or `false` (disabled). Flags have three possible states: - `null` or missing (apply the default value). @@ -242,7 +242,7 @@ Type: `string` (one of `"enabled"` or `"disabled"`) ### `AutoNAT.Throttle` -When set, this option configure's the AutoNAT services throttling behavior. By +When set, this option configures the AutoNAT services throttling behavior. By default, go-ipfs will rate-limit the number of NAT checks performed for other nodes to 30 per minute, and 3 per peer.