diff --git a/README.md b/README.md index 7a948835d8..ce7a2af826 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,7 @@ ![License](https://img.shields.io/badge/license-MIT-blue.svg) ![Go Report Card](https://goreportcard.com/badge/github.com/NVIDIA/aistore) -AIStore (AIS for short) is a built from scratch, lightweight storage stack tailored for AI apps. It's an elastic cluster that can grow and shrink at runtime and can be ad-hoc deployed, with or without Kubernetes, anywhere from a single Linux machine to a bare-metal cluster of any size. +AIStore (AIS for short) is a built-from-scratch, lightweight storage stack tailored for AI apps. It's an elastic cluster that can grow and shrink at runtime and can be ad-hoc deployed, with or without Kubernetes, anywhere from a single Linux machine to a bare-metal cluster of any size. AIS [consistently shows balanced I/O distribution and linear scalability](https://aistore.nvidia.com/blog/2024/02/16/multihome-bench) across arbitrary numbers of clustered nodes. The ability to scale linearly with each added disk was, and remains, one of the main incentives. Much of the initial design was also driven by the ideas to [offload](https://aistore.nvidia.com/blog/2023/06/09/aisio-transforms-with-webdataset-pt-3) custom dataset transformations (often referred to as [ETL](https://aistore.nvidia.com/blog/2021/10/21/ais-etl-1)). And finally, since AIS is a software system that aggregates Linux machines to provide storage for user data, there's the requirement number one: reliability and data protection. @@ -60,7 +60,7 @@ Since prerequisites boil down to, essentially, having Linux with a disk the depl | Option | Objective | | --- | ---| -| [Local playground](https://github.com/NVIDIA/aistore/blob/main/docs/getting_started.md#local-playground) | AIS developers or first-time users, Linux or Mac OS; to get started, run `make kill cli aisloader deploy <<< $'N\nM'`, where `N` is a number of targets, `M` - gateways | +| [Local playground](https://github.com/NVIDIA/aistore/blob/main/docs/getting_started.md#local-playground) | AIS developers or first-time users, Linux or Mac OS; to get started, run `make kill cli aisloader deploy <<< $'N\nM'`, where `N` is a number of [targets](/docs/overview.md#terminology), `M` - gateways | | Minimal production-ready deployment | This option utilizes preinstalled docker image and is targeting first-time users or researchers (who could immediately start training their models on smaller datasets) | | [Easy automated GCP/GKE deployment](https://github.com/NVIDIA/aistore/blob/main/docs/getting_started.md#kubernetes-deployments) | Developers, first-time users, AI researchers | | [Large-scale production deployment](https://github.com/NVIDIA/ais-k8s) | Requires Kubernetes and is provided via a separate repository: [ais-k8s](https://github.com/NVIDIA/ais-k8s) | @@ -80,8 +80,8 @@ AIStore supports multiple ways to populate itself with existing datasets, includ * **copy** multiple matching objects; * **archive** multiple objects * **prefetch** remote bucket or parts of thereof; -* **download** raw http(s) addressible directories, including (but not limited to) Cloud storages; -* **promote** NFS or SMB shares accessible by one or multiple (or all) AIS target nodes; +* **download** raw http(s) addressable directories, including (but not limited to) Cloud storages; +* **promote** NFS or SMB shares accessible by one or multiple (or all) AIS [target](/docs/overview.md#terminology) nodes; > The on-demand "way" is maybe the most popular, whereby users just start running their workloads against a [remote bucket](docs/providers.md) with AIS cluster positioned as an intermediate fast tier. @@ -89,7 +89,7 @@ But there's more. In [v3.22](https://github.com/NVIDIA/aistore/releases/tag/v1.3 ## Installing from release binaries -Generally, AIStore (cluster) requires at least some sort of [deployment](/deploy#contents) procedure. There are standalone binaries, though, that can be [built](Makefile) from source or, alternatively, installed directly from GitHub: +Generally, AIStore (cluster) requires at least some sort of [deployment](/deploy#contents) procedure. There are standalone binaries, though, that can be [built](Makefile) from source or installed directly from GitHub: ```console $ ./scripts/install_from_binaries.sh --help @@ -99,25 +99,13 @@ The script installs [aisloader](/docs/aisloader.md) and [CLI](/docs/cli.md) from ## PyTorch integration -AIS is one of the PyTorch [Iterable Datapipes](https://github.com/pytorch/data/tree/main/torchdata/datapipes/iter/load#iterable-datapipes). +PyTorch integration is a growing set of datasets (both iterable and map-style), samplers, and dataloaders: -Specifically, [TorchData](https://github.com/pytorch/data) library provides: -* [AISFileLister](https://pytorch.org/data/main/generated/torchdata.datapipes.iter.AISFileLister.html#aisfilelister) -* [AISFileLoader](https://pytorch.org/data/main/generated/torchdata.datapipes.iter.AISFileLoader.html#aisfileloader) +* [Taxonomy of abstractions and API reference](/docs/pytorch.md) +* [AIS plugin for PyTorch: usage examples](https://github.com/NVIDIA/aistore/tree/main/python/aistore/pytorch/README.md) +* [Jupyter notebook examples](https://github.com/NVIDIA/aistore/tree/main/python/examples/aisio-pytorch/) -to list and, respectively, load data from AIStore. - -Further references and usage examples - in our technical blog at https://aistore.nvidia.com/blog: -* [PyTorch: Loading Data from AIStore](https://aistore.nvidia.com/blog/2022/07/12/aisio-pytorch) -* [Python SDK: Getting Started](https://aistore.nvidia.com/blog/2022/07/20/python-sdk) - -Since AIS natively supports a number of [remote backends](/docs/providers.md), you can also use (PyTorch + AIS) to iterate over Amazon S3 and Google Cloud buckets, and more. - -## Reuse - -This repo includes [SGL and Slab allocator](/memsys) intended to optimize memory usage, [Streams and Stream Bundles](/transport) to multiplex messages over long-lived HTTP connections, and a few other sub-packages providing rather generic functionality. - -With a little effort, they all could be extracted and used outside. +Since AIS natively supports [remote backends](/docs/providers.md), you can also use (PyTorch + AIS) to iterate over Amazon S3, GCS and Azure buckets, and more. ## Guides and References @@ -151,9 +139,6 @@ With a little effort, they all could be extracted and used outside. - [Jobs](/docs/cli/job.md) - Security and Access Control - [Authentication Server (AuthN)](/docs/authn.md) -- Tutorials - - [Tutorials](/docs/tutorials/README.md) - - [Videos](/docs/videos.md) - Power tools and extensions - [Reading, writing, and listing *archives*](/docs/archive.md) - [Distributed Shuffle](/docs/dsort.md) @@ -195,16 +180,16 @@ With a little effort, they all could be extracted and used outside. - [Getting started](/docs/getting_started.md) - [Docker](/docs/docker_main.md) - [Useful scripts](/docs/development.md) - - Profiling, race-detecting, and more + - Profiling, race-detecting and more - Batch jobs - [Batch operations](/docs/batch.md) - - [eXtended Actions (xactions)](/xact/README.md) + - [eXtended Actions (xactions)](https://github.com/NVIDIA/aistore/blob/main/xact/README.md) - [CLI: `ais job`](/docs/cli/job.md) and [`ais show job`](/docs/cli/show.md), including: - [prefetch remote datasets](/docs/cli/object.md#prefetch-objects) - [copy bucket](/docs/cli/bucket.md#copy-bucket) - [copy multiple objects](/docs/cli/bucket.md#copy-multiple-objects) - [download remote BLOBs](/docs/cli/blob-downloader.md) - - [promote NFS or SMB share](https://aistore.nvidia.com/blog/2022/03/17/promote), and more + - [promote NFS or SMB share](https://aistore.nvidia.com/blog/2022/03/17/promote) - Assorted Topics - [Virtual directories](/docs/howto_virt_dirs.md) - [System files](/docs/sysfiles.md) diff --git a/cmd/cli/cli/const.go b/cmd/cli/cli/const.go index 2a70da110a..6883687680 100644 --- a/cmd/cli/cli/const.go +++ b/cmd/cli/cli/const.go @@ -228,7 +228,7 @@ const ( jobShowRebalanceArgument = "[REB_ID] [NODE_ID]" // Perf - showPerfArgument = "show performance counters, throughput, latency, and more (" + tabtab + " specific view)" + showPerfArgument = "show performance counters, throughput, latency, disks, used/available capacities (" + tabtab + " specific view)" // ETL etlNameArgument = "ETL_NAME" diff --git a/cmd/cli/cli/show_hdlr.go b/cmd/cli/cli/show_hdlr.go index 14c3e90912..4e3b2028c9 100644 --- a/cmd/cli/cli/show_hdlr.go +++ b/cmd/cli/cli/show_hdlr.go @@ -138,7 +138,7 @@ var ( } showCmdCluster = cli.Command{ Name: cmdCluster, - Usage: "show cluster nodes and utilization", + Usage: "main dashboard: show cluster at-a-glance (nodes, software versions, utilization, capacity, memory and more)", ArgsUsage: showClusterArgument, Flags: showCmdsFlags[cmdCluster], Action: showClusterHandler, @@ -146,7 +146,7 @@ var ( Subcommands: []cli.Command{ { Name: cmdSmap, - Usage: "show Smap (cluster map)", + Usage: "show cluster map (Smap)", ArgsUsage: optionalNodeIDArgument, Flags: showCmdsFlags[cmdSmap], Action: showSmapHandler, @@ -154,7 +154,7 @@ var ( }, { Name: cmdBMD, - Usage: "show BMD (bucket metadata)", + Usage: "show bucket metadata (BMD)", ArgsUsage: optionalNodeIDArgument, Flags: showCmdsFlags[cmdBMD], Action: showBMDHandler, diff --git a/docs/_posts/2021-07-30-etl.md b/docs/_posts/2021-07-30-etl.md index 2bf57aa5f3..d5bd6237f9 100644 --- a/docs/_posts/2021-07-30-etl.md +++ b/docs/_posts/2021-07-30-etl.md @@ -18,7 +18,7 @@ Of course, I’m talking about ETL workloads. Machine learning has three, and on ETL – or you can simply say “data preprocessing” because that’s what it is (my advice, though, if I may, would be to say “ETL” as it may help institute a sense of shared values, etc.) – in short, ETL is something that is usually done prior to training. -Examples? Well, ask a random person to name a fruit, and you’ll promptly hear back “an apple.” Similarly, ask anyone to name an ETL workload, and many, maybe most, will immediately respond with “augmentation”. Which in and of itself is a shortcut for a bunch of concrete sprightly verbs: flip, rotate, scale, crop, and more. +Examples? Well, ask a random person to name a fruit, and you’ll promptly hear back “an apple.” Similarly, ask anyone to name an ETL workload, and many, maybe most, will immediately respond with “augmentation”. Which in and of itself is a shortcut for a bunch of concrete sprightly verbs: flip, rotate, scale, crop and more. My point? My point is, and always will be, that any model – and any deep-learning neural network, in particular – is only as good as the data you feed into it. That’s why they flip and rotate and what-not. And that’s precisely why they augment or, more specifically, extract-transform-load, raw datasets commonly used to train deep learning classifiers. Preprocess, train, and repeat. Reprocess, retrain, and compare the resulting mAP (for instance). And so on. diff --git a/docs/_posts/2023-04-03-transform-images-with-python-sdk.md b/docs/_posts/2023-04-03-transform-images-with-python-sdk.md index 98e151fda0..64b122d497 100644 --- a/docs/_posts/2023-04-03-transform-images-with-python-sdk.md +++ b/docs/_posts/2023-04-03-transform-images-with-python-sdk.md @@ -190,7 +190,7 @@ etl_group(image_etl) ### AIS/PyTorch connector -In the steps above, we demonstrated a few ways to transform objects, but to use the results we need to load them into a Pytorch Dataset and DataLoader. In PyTorch, a dataset can be defined by inheriting [`torch.utils.data.Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset). Datasets can be fed into a `DataLoader` to handle batching, shuffling, etc. (see ['torch.utils.data.DataLoader'](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)). +In the steps above, we demonstrated a few ways to transform objects, but to use the results we need to load them into a PyTorch Dataset and DataLoader. In PyTorch, a dataset can be defined by inheriting [`torch.utils.data.Dataset`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset). Datasets can be fed into a `DataLoader` to handle batching, shuffling, etc. (see ['torch.utils.data.DataLoader'](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)). To implement inline ETL, transforming objects as we read them, you will need to create a custom PyTorch Dataset as described [by PyTorch here](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html). In the future, AIS will likely provide some of this functionality directly. For now, we will use the output of the offline ETL (bucket-to-bucket) described above and use the provided `AISDataset` to read the transformed results. More info on reading AIS data into PyTorch can be found [on the AIS blog here](https://aiatscale.org/blog/2022/07/12/aisio-pytorch). diff --git a/docs/_posts/2023-04-10-tco-any-to-any.md b/docs/_posts/2023-04-10-tco-any-to-any.md index 842214c03c..f73cc9afb2 100644 --- a/docs/_posts/2023-04-10-tco-any-to-any.md +++ b/docs/_posts/2023-04-10-tco-any-to-any.md @@ -121,4 +121,4 @@ And that's the upshot. ## References -* [Lifecycle management: maintenance mode, rebalance/rebuild, and more](/docs/lifecycle_node.md) +* [Lifecycle management: maintenance mode, rebalance/rebuild and more](/docs/lifecycle_node.md) diff --git a/docs/_posts/2023-06-09-aisio-transforms-with-webdataset-pt-3.md b/docs/_posts/2023-06-09-aisio-transforms-with-webdataset-pt-3.md index c459400e9e..40b7992f4f 100644 --- a/docs/_posts/2023-06-09-aisio-transforms-with-webdataset-pt-3.md +++ b/docs/_posts/2023-06-09-aisio-transforms-with-webdataset-pt-3.md @@ -128,10 +128,10 @@ def view_data(dataloader): 2. Documentation, blogs, videos: - https://aiatscale.org - https://github.com/NVIDIA/aistore/tree/main/docs - - Pytorch intro to Datasets and DataLoaders: https://pytorch.org/tutorials/beginner/basics/data_tutorial.html + - PyTorch intro to Datasets and DataLoaders: https://pytorch.org/tutorials/beginner/basics/data_tutorial.html - Discussion on Datasets, DataPipes, DataLoaders: https://sebastianraschka.com/blog/2022/datapipes.html 3. Full code example - - [Pytorch Pipelines With WebDataset Example](/python/examples/aisio-pytorch/pytorch_webdataset.py) + - [PyTorch Pipelines With WebDataset Example](/python/examples/aisio-pytorch/pytorch_webdataset.py) 4. Dataset - [The Oxford-IIIT Pet Dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/) diff --git a/docs/archive.md b/docs/archive.md index 5bda05acf6..512bf22f6d 100644 --- a/docs/archive.md +++ b/docs/archive.md @@ -29,7 +29,7 @@ All sharding formats are equally supported across the entire set of AIS APIs. Fo > ie., objects formatted as .tar, .tgz, etc. - see above -and including the corresponding pathnames into generated result sets. Clients can run concurrent multi-object (source bucket => destination bucket) transactions to en masse generate new archives from [selected](/docs/batch.md) subsets of files, and more. +and including the corresponding pathnames into generated result sets. Clients can run concurrent multi-object (source bucket => destination bucket) transactions to en masse generate new archives from [selected](/docs/batch.md) subsets of files. APPEND to existing archives is also provided but limited to [TAR only](https://aistore.nvidia.com/blog/2021/08/10/tar-append). diff --git a/docs/batch.md b/docs/batch.md index c262895839..ff8a40bb09 100644 --- a/docs/batch.md +++ b/docs/batch.md @@ -39,7 +39,7 @@ Complete and most recently updated list of supported jobs can be found in this [ Last (but not the least) is - time. Job execution may take many seconds, sometimes minutes or hours. -Examples include erasure coding or n-way mirroring a dataset, resharding and reshuffling a dataset, and more. +Examples include erasure coding or n-way mirroring a dataset, resharding and reshuffling a dataset and more. Global rebalance gets (automatically) triggered by any membership changes (nodes joining, leaving, powercycling, etc.) that can be further visualized via `ais show rebalance` CLI. diff --git a/docs/blob_downloader.md b/docs/blob_downloader.md index 0da1b74304..316d4a5536 100644 --- a/docs/blob_downloader.md +++ b/docs/blob_downloader.md @@ -16,7 +16,7 @@ AIStore supports multiple ways to populate itself with existing datasets, includ * **copy** multiple matching objects; * **archive** multiple objects * **prefetch** remote bucket or parts of thereof; -* **download** raw http(s) addressible directories, including (but not limited to) Cloud storages; +* **download** raw http(s) addressable directories, including (but not limited to) Cloud storages; * **promote** NFS or SMB shares accessible by one or multiple (or all) AIS target nodes; > The on-demand "way" is maybe the most popular, whereby users just start running their workloads against a [remote bucket](docs/providers.md) with AIS cluster positioned as an intermediate fast tier. diff --git a/docs/bucket.md b/docs/bucket.md index b429cc8d6e..5497234607 100644 --- a/docs/bucket.md +++ b/docs/bucket.md @@ -41,11 +41,11 @@ AIStore uses the popular and well-known bucket abstraction, originally (likely) Similar to S3, AIS bucket is a _container for objects_. -> An object, in turn, is a file **and** a metadata that describes that object and normally includes: checksum, version, references to copies (replicas), size, last access time, source bucket (if object's origin is a Cloud bucket), custom user-defined attributes, and more. +> An object, in turn, is a file **and** a metadata that describes that object and normally includes: checksum, version, references to copies (replicas), size, last access time, source bucket (if object's origin is a Cloud bucket), custom user-defined attributes and more. AIS is a flat `/` storage hierarchy where named buckets store user datasets. -In addition, each AIS bucket is a point of applying (per-bucket) management policies: checksumming, versioning, erasure coding, mirroring, LRU eviction, checksum and/or version validation, and more. +In addition, each AIS bucket is a point of applying (per-bucket) management policies: checksumming, versioning, erasure coding, mirroring, LRU eviction, checksum and/or version validation. AIS buckets *contain* user data performing the same function as, for instance: @@ -695,7 +695,7 @@ For background and usage examples, please see [CLI: AWS-specific bucket configur * [`ais ls`](https://github.com/NVIDIA/aistore/blob/main/docs/cli/bucket.md#list-objects) * [Virtual directories](/docs/howto_virt_dirs.md) -`ListObjects` API returns a page of object names and, optionally, their properties (including sizes, access time, checksums, and more), in addition to a token that serves as a cursor, or a marker for the *next* page retrieval. +`ListObjects` API returns a page of object names and, optionally, their properties (including sizes, access time, checksums), in addition to a token that serves as a cursor, or a marker for the *next* page retrieval. > Go [ListObjects](https://github.com/NVIDIA/aistore/blob/main/api/bucket.go) API diff --git a/docs/cli.md b/docs/cli.md index ecb976f3d4..945105aa71 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -142,7 +142,7 @@ Following is a brief summary (that's non-exhaustive and slightly outdated): | [`ais job`](/docs/cli/job.md) | Query and manage jobs (aka eXtended actions or `xactions`). | | [`ais object`](/docs/cli/object.md) | PUT and GET (write and read), APPEND, archive, concat, list (buckets, objects), move, evict, promote, ... | | [`ais search`](/docs/cli/search.md) | Search `ais` commands. | -| [`ais show`](/docs/cli/show.md) | Monitor anything and everything: performance (all aspects), buckets, jobs, remote clusters, and more. | +| [`ais show`](/docs/cli/show.md) | Monitor anything and everything: performance (all aspects), buckets, jobs, remote clusters and more. | | [`ais log`](/docs/cli/log.md) | Download ais nodes' logs or view the logs in real time. | | [`ais storage`](/docs/cli/storage.md) | Show capacity usage on a per bucket basis (num objects and sizes), attach/detach mountpaths (disks). | {: .nobreak} diff --git a/docs/cli/bucket.md b/docs/cli/bucket.md index fe33819ac3..2fab07ad24 100644 --- a/docs/cli/bucket.md +++ b/docs/cli/bucket.md @@ -204,7 +204,7 @@ OPTIONS: - all buckets, including accessible (visible) remote buckets that are _not present_ in the cluster --cached list only those objects from a remote bucket that are present ("cached") --name-only faster request to retrieve only the names of objects (if defined, '--props' flag will be ignored) - --props value comma-separated list of object properties including name, size, version, copies, and more; e.g.: + --props value comma-separated list of object properties including name, size, version, copies and more; e.g.: --props all --props name,size,cached --props "ec, copies, custom, location" @@ -299,7 +299,7 @@ OPTIONS: --cached list only in-cluster objects - only those objects from a remote bucket that are present ("cached") --name-only faster request to retrieve only the names of objects (if defined, '--props' flag will be ignored) - --props value comma-separated list of object properties including name, size, version, copies, and more; e.g.: + --props value comma-separated list of object properties including name, size, version, copies and more; e.g.: --props all --props name,size,cached --props "ec, copies, custom, location" @@ -375,7 +375,7 @@ OPTIONS: | `--template` | `string` | template for matching object names, e.g.: 'shard-{900..999}.tar' | `""` | | `--prefix` | `string` | list objects matching a given prefix | `""` | | `--page-size` | `int` | maximum number of names per page (0 - the maximum is defined by the corresponding backend) | `0` | -| `--props` | `string` | comma-separated list of object properties including name, size, version, copies, EC data and parity info, custom metadata, location, and more; to include all properties, type '--props all' (default: "name,size") | `"name,size"` | +| `--props` | `string` | comma-separated list of object properties including name, size, version, copies, EC data and parity info, custom metadata, location and more; to include all properties, type '--props all' (default: "name,size") | `"name,size"` | | `--limit` | `int` | limit object name count (0 - unlimited) | `0` | | `--show-unmatched` | `bool` | list objects that were not matched by regex and/or template | `false` | | `--all` | `bool` | depending on context: all objects (including misplaced ones and copies) _or_ all buckets (including remote buckets that are not present in the cluster) | `false` | diff --git a/docs/cli/cluster.md b/docs/cli/cluster.md index cc96326650..e0fae274d9 100644 --- a/docs/cli/cluster.md +++ b/docs/cli/cluster.md @@ -100,17 +100,17 @@ The command has a rather long(ish) short description and multiple subcommands: ```console $ ais show cluster --help NAME: - ais show cluster - show cluster nodes and utilization + ais show cluster - main dashboard: cluster at-a-glance (nodes, software versions, utilization, capacity, memory and more) USAGE: ais show cluster command [command options] [NODE_ID] | [target [NODE_ID]] | [proxy [NODE_ID]] | [smap [NODE_ID]] | [bmd [NODE_ID]] | [config [NODE_ID]] | [stats [NODE_ID]] COMMANDS: - smap show Smap (cluster map) - bmd show BMD (bucket metadata) + smap show cluster map (Smap) + bmd show bucket metadata (BMD) config show cluster and node configuration - stats (alias for "ais show performance") show performance counters, throughput, latency, and more (press to select specific view) + stats (alias for "ais show performance") show performance counters, throughput, latency and more (press to select specific view) OPTIONS: --refresh value interval for continuous monitoring; @@ -124,12 +124,14 @@ OPTIONS: To quickly exemplify, let's assume the cluster has a (target) node called `t[xyz]`. Then: -### show cluster: all nodes (including t[xyz]) and gateways, as well as deployed version and runtime stats +### Main CLI dashboard: all storage nodes and gateways, deployed version, capacity, memory, and runtime stats: + ```console $ ais show cluster ``` -### show all target (nodes) and, again, runtime statistics, software version, deployment type, K8s pods, and more +### same as above, with only targets selected + ```console $ ais show cluster target ``` @@ -264,7 +266,7 @@ counters throughput latency capacity disk $ ais show cluster stats --help NAME: - ais show cluster stats - (alias for "ais show performance") show performance counters, throughput, latency, and more (press to select specific view) + ais show cluster stats - (alias for "ais show performance") show performance counters, throughput, latency and more (press to select specific view) USAGE: ais show cluster stats command [command options] [TARGET_ID] @@ -273,7 +275,7 @@ COMMANDS: counters show (GET, PUT, DELETE, RENAME, EVICT, APPEND) object counts, as well as: - numbers of list-objects requests; - (GET, PUT, etc.) cumulative and average sizes; - - associated error counters, if any, and more. + - associated error counters, if any. throughput show GET and PUT throughput, associated (cumulative, average) sizes and counters latency show GET, PUT, and APPEND latencies and average sizes capacity show target mountpaths, disks, and used/available capacity diff --git a/docs/cli/help.md b/docs/cli/help.md index ee0fc74998..4175838b0c 100644 --- a/docs/cli/help.md +++ b/docs/cli/help.md @@ -90,7 +90,7 @@ COMMANDS: storage monitor and manage clustered storage archive Create multi-object archive, append files to an existing archive log show log - performance show performance counters, throughput, latency, and more (press to select specific view) + performance show performance counters, throughput, latency and more (press to select specific view) remote-cluster show attached AIS clusters alias manage top-level aliases put (alias for "object put") PUT or APPEND one file or one directory, or multiple files and/or directories. diff --git a/docs/cli/job.md b/docs/cli/job.md index 3c660e960b..e443c0b2cc 100644 --- a/docs/cli/job.md +++ b/docs/cli/job.md @@ -37,7 +37,7 @@ Rest of this document covers starting, stopping, and otherwise managing job kind ### See also - [static descriptors (source code)](https://github.com/NVIDIA/aistore/blob/main/xact/api.go#L108) -- [`xact` package README](/xact/README.md). +- [`xact` package README](https://github.com/NVIDIA/aistore/blob/main/xact/README.md). - [`batch jobs`](/docs/batch.md) - [CLI: `dsort` (distributed shuffle)](/docs/cli/dsort.md) - [CLI: `download` from any remote source](/docs/cli/download.md) diff --git a/docs/cli/performance.md b/docs/cli/performance.md index 20553dba5c..6c955ca55e 100644 --- a/docs/cli/performance.md +++ b/docs/cli/performance.md @@ -51,7 +51,7 @@ NAME: ais show performance counters - show (GET, PUT, DELETE, RENAME, EVICT, APPEND) object counts, as well as: - numbers of list-objects requests; - (GET, PUT, etc.) cumulative and average sizes; - - associated error counters, if any, and more. + - associated error counters, if any. USAGE: ais show performance counters [command options] [TARGET_ID] diff --git a/docs/cli/show.md b/docs/cli/show.md index ff1e5c0045..c954efa05b 100644 --- a/docs/cli/show.md +++ b/docs/cli/show.md @@ -70,7 +70,7 @@ The command's help screen follows below - notice the command-line options (aka f ```console $ ais show performance --help NAME: - ais show performance - show performance counters, throughput, latency, and more (press to select specific view) + ais show performance - show performance counters, throughput, latency, disks, used/available capacities (press to select specific view) USAGE: ais show performance command [command options] [TARGET_ID] @@ -79,7 +79,7 @@ COMMANDS: counters show (GET, PUT, DELETE, RENAME, EVICT, APPEND) object counts, as well as: - numbers of list-objects requests; - (GET, PUT, etc.) cumulative and average sizes; - - associated error counters, if any, and more. + - associated error counters, if any. throughput show GET and PUT throughput, associated (cumulative, average) sizes and counters latency show GET, PUT, and APPEND latencies and average sizes capacity show target mountpaths, disks, and used/available capacity @@ -322,16 +322,16 @@ proxy target smap bmd config stats ```console $ ais show cluster --help NAME: - ais show cluster - show cluster nodes and utilization + ais show cluster - main dashboard: show cluster at-a-glance (nodes, software versions, utilization, capacity, memory and more) USAGE: ais show cluster command [command options] [NODE_ID] | [target [NODE_ID]] | [proxy [NODE_ID]] | [smap [NODE_ID]] | [bmd [NODE_ID]] | [config [NODE_ID]] | [stats [NODE_ID]] COMMANDS: - smap show Smap (cluster map) - bmd show BMD (bucket metadata) + smap show cluster map (Smap) + bmd show bucket metadata (BMD) config show cluster and node configuration - stats (alias for "ais show performance") show performance counters, throughput, latency, and more (press to select specific view) + stats (alias for "ais show performance") show performance counters, throughput, latency, disks, used/available capacities (press to select specific view) OPTIONS: --refresh value interval for continuous monitoring; diff --git a/docs/configuration.md b/docs/configuration.md index 031860413c..87b0934b99 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -29,7 +29,7 @@ Majority of the configuration knobs can be changed at runtime (and at any time). For the most part, commands to view and update (CLI, cluster, node) configuration can be found [here](/docs/cli/config.md). -The [same document](docs/cli/config.md) also contains a brief theory of operation, command descriptions, numerous usage examples, and more. +The [same document](docs/cli/config.md) also contains a brief theory of operation, command descriptions, numerous usage examples and more. > **Important:** as an input, CLI accepts both plain text and JSON-formatted values. For the latter, make sure to embed the (JSON value) argument into single quotes, e.g.: diff --git a/docs/docs.md b/docs/docs.md index 9a0803d242..d5ed479deb 100644 --- a/docs/docs.md +++ b/docs/docs.md @@ -49,12 +49,10 @@ redirect_from: - [Jobs](/docs/cli/job.md) - Security and Access Control - [Authentication Server (AuthN)](/docs/authn.md) -- Tutorials - - [Tutorials](/docs/tutorials/README.md) - - [Videos](/docs/videos.md) - Power tools and extensions - [Reading, writing, and listing *archives*](/docs/archive.md) - - [Distributed Shuffle](/docs/dsort.md) + - [Distributed Shuffle (`dsort`)](/docs/dsort.md) + - [Initial Sharding utility (`ishard`)](https://github.com/NVIDIA/aistore/blob/main/cmd/ishard/README.md) - [Downloader](/docs/downloader.md) - [Extract, Transform, Load](/docs/etl.md) - [Tools and utilities](/docs/tools.md) @@ -86,18 +84,25 @@ redirect_from: - [Feature flags](/docs/feature_flags.md) - Observability - [Observability](/docs/metrics.md) + - [Reference: all supported metrics](/docs/metrics-reference.md) - [Prometheus](/docs/prometheus.md) - [CLI: `ais show performance`](/docs/cli/show.md) - For users and developers - [Getting started](/docs/getting_started.md) - [Docker](/docs/docker_main.md) - [Useful scripts](/docs/development.md) - - Profiling, race-detecting, and more + - Profiling, race-detecting and more - Batch jobs - [Batch operations](/docs/batch.md) - - [eXtended Actions (xactions)](/xact/README.md) - - [CLI: `ais job`](/docs/cli/job.md) and [`ais show job`](/docs/cli/show.md) + - [eXtended Actions (xactions)](https://github.com/NVIDIA/aistore/blob/main/xact/README.md) + - [CLI: `ais job`](/docs/cli/job.md) and [`ais show job`](/docs/cli/show.md), including: + - [prefetch remote datasets](/docs/cli/object.md#prefetch-objects) + - [copy bucket](/docs/cli/bucket.md#copy-bucket) + - [copy multiple objects](/docs/cli/bucket.md#copy-multiple-objects) + - [download remote BLOBs](/docs/cli/blob-downloader.md) + - [promote NFS or SMB share](https://aistore.nvidia.com/blog/2022/03/17/promote) - Assorted Topics + - [Virtual directories](/docs/howto_virt_dirs.md) - [System files](/docs/sysfiles.md) - [Switching cluster between HTTP and HTTPS](/docs/switch_https.md) - [TLS: testing with self-signed certificates](/docs/getting_started.md#tls-testing-with-self-signed-certificates) @@ -109,4 +114,4 @@ redirect_from: - [Downloader](/docs/downloader.md) - [On-disk layout](/docs/on_disk_layout.md) - [Buckets: definition, operations, properties](https://github.com/NVIDIA/aistore/blob/main/docs/bucket.md#bucket) - - [Out of band updates](/docs/out_of_band.md) + - [Out-of-band updates](/docs/out_of_band.md) diff --git a/docs/getting_started.md b/docs/getting_started.md index 074d363f71..0f46cec1af 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -41,7 +41,7 @@ Depending on your Linux distribution, you may or may not have `GCC`, `sysstat`, Speaking of distributions, our current default recommendation is Ubuntu Server 20.04 LTS. But Ubuntu 18.04 and CentOS 8.x (or later) will also work. As well as numerous others. -For the [local filesystem](/docs/performance.md), we currently recommend xfs. But again, this (default) recommendation shall not be interpreted as a limitation of any kind: other fine choices include zfs, ext4, f2fs, and more. +For the [local filesystem](/docs/performance.md), we currently recommend xfs. But again, this (default) recommendation shall not be interpreted as a limitation of any kind: other fine choices include zfs, ext4, f2fs and more. Since AIS itself provides n-way mirroring and erasure coding, hardware RAID would _not_ be recommended. But can be used, and will work. @@ -559,7 +559,7 @@ In particular, the `make` provides a growing number of developer-friendly comman * **deploy** the AIS cluster on your local development machine; * **run** all or selected tests; -* **instrument** AIS binary with race detection, CPU and/or memory profiling, and more. +* **instrument** AIS binary with race detection, CPU and/or memory profiling and more. Of course, local build is intended for development only. For production, there is a separate [dedicated repository](https://github.com/NVIDIA/ais-k8s) noted below. diff --git a/docs/howto_virt_dirs.md b/docs/howto_virt_dirs.md index 5dfac5c21c..65156df535 100644 --- a/docs/howto_virt_dirs.md +++ b/docs/howto_virt_dirs.md @@ -13,7 +13,7 @@ Train, for instance, on all audio files under `en_es_synthetic/v1/train/`, or si The motivation may become clearer if I say that the entire real-life dataset contains many millions of objects and numerous _virtual directories_, including the aforementioned `en_es_synthetic/v1/train/`. -Needless to say, aistore provides for all of that, and more. There is a certainty subtlety, however, that makes sense to illustrate on examples. +Needless to say, aistore provides for all of that and more. There is a certainty subtlety, however, that makes sense to illustrate on examples. ## But first, the rules diff --git a/docs/http_api.md b/docs/http_api.md index bf634a46a7..cd5fb68361 100644 --- a/docs/http_api.md +++ b/docs/http_api.md @@ -143,7 +143,7 @@ $ curl -s -L -X GET 'http://aistore/gs/my-google-bucket' | jq > AIS provides S3 compatibility layer via its "/s3" endpoint. [S3 compatibility](/docs/s3compat.md) shall not be confused with "easy URL" mapping, whereby a path (e.g.) "gs/mybucket/myobject" gets replaced with "v1/objects/mybucket/myobject?provider=gcp" with _no_ other changes to the request and response parameters and components. -> For detals and more usage examples, please see [easy URL readme](/docs/easy_url.md). +> For detals and additional usage examples, please see [easy URL readme](/docs/easy_url.md). ## API Reference @@ -187,7 +187,7 @@ This and the next section reference a variety of URL paths (e.g., `/v1/cluster`) | Get Cluster Map from a specific node (any node in the cluster) | See [Querying information](#querying-information) section below | (to be added) | `api.GetNodeClusterMap` | | Get Cluster System information | GET /v1/cluster | See [Querying information](#querying-information) section below | `api.GetClusterSysInfo` | | Get Cluster statistics | GET /v1/cluster | See [Querying information](#querying-information) section below | `api.GetClusterStats` | -| Get remote AIS-cluster information (access URL, primary gateway, cluster map version, and more) | GET /v1/cluster | See [Querying information](#querying-information) section below | `api.GetRemoteAIS` | +| Get remote AIS-cluster information (access URL, primary gateway, cluster map version and more) | GET /v1/cluster | See [Querying information](#querying-information) section below | `api.GetRemoteAIS` | | Attach remote AIS cluster | PUT /v1/cluster/attach | (to be added) | `api.AttachRemoteAIS` | | Detach remote AIS cluster | PUT /v1/cluster/detach | (to be added) | `api.DetachRemoteAIS` | @@ -711,7 +711,7 @@ Following is a brief summary of the majority of supported monitoring operations $ curl -X GET http://G/v1/cluster?what=stats ``` -Execution flow for this single command causes intra-cluster broadcast whereby requesting proxy (which could be any proxy in the cluster) consolidates all results from all other nodes in a JSON-formatted output. The latter contains both http proxy and storage targets request counters, per-target used/available capacities, and more. For example: +Execution flow for this single command causes intra-cluster broadcast whereby requesting proxy (which could be any proxy in the cluster) consolidates all results from all other nodes in a JSON-formatted output. The latter contains both http proxy and storage targets request counters, per-target used/available capacitiesand more. For example: ![AIStore statistics](images/ais-get-stats.png) diff --git a/docs/index.md b/docs/index.md index 782fdb18ab..16001f3000 100644 --- a/docs/index.md +++ b/docs/index.md @@ -12,7 +12,7 @@ redirect_from: ![License](https://img.shields.io/badge/license-MIT-blue.svg) ![Go Report Card](https://goreportcard.com/badge/github.com/NVIDIA/aistore) -AIStore (AIS for short) is a built from scratch, lightweight storage stack tailored for AI apps. It's an elastic cluster that can grow and shrink at runtime and can be ad-hoc deployed, with or without Kubernetes, anywhere from a single Linux machine to a bare-metal cluster of any size. +AIStore (AIS for short) is a built-from-scratch, lightweight storage stack tailored for AI apps. It's an elastic cluster that can grow and shrink at runtime and can be ad-hoc deployed, with or without Kubernetes, anywhere from a single Linux machine to a bare-metal cluster of any size. AIS [consistently shows balanced I/O distribution and linear scalability](https://aistore.nvidia.com/blog/2024/02/16/multihome-bench) across arbitrary numbers of clustered nodes. The ability to scale linearly with each added disk was, and remains, one of the main incentives. Much of the initial design was also driven by the ideas to [offload](https://aistore.nvidia.com/blog/2023/06/09/aisio-transforms-with-webdataset-pt-3) custom dataset transformations (often referred to as [ETL](https://aistore.nvidia.com/blog/2021/10/21/ais-etl-1)). And finally, since AIS is a software system that aggregates Linux machines to provide storage for user data, there's the requirement number one: reliability and data protection. @@ -59,7 +59,7 @@ Since prerequisites boil down to, essentially, having Linux with a disk the depl | Option | Objective | | --- | ---| -| [Local playground](https://github.com/NVIDIA/aistore/blob/main/docs/getting_started.md#local-playground) | AIS developers and development, Linux or Mac OS | +| [Local playground](https://github.com/NVIDIA/aistore/blob/main/docs/getting_started.md#local-playground) | AIS developers or first-time users, Linux or Mac OS; to get started, run `make kill cli aisloader deploy <<< $'N\nM'`, where `N` is a number of [targets](/docs/overview.md#terminology), `M` - gateways | | Minimal production-ready deployment | This option utilizes preinstalled docker image and is targeting first-time users or researchers (who could immediately start training their models on smaller datasets) | | [Easy automated GCP/GKE deployment](https://github.com/NVIDIA/aistore/blob/main/docs/getting_started.md#kubernetes-deployments) | Developers, first-time users, AI researchers | | [Large-scale production deployment](https://github.com/NVIDIA/ais-k8s) | Requires Kubernetes and is provided via a separate repository: [ais-k8s](https://github.com/NVIDIA/ais-k8s) | @@ -79,8 +79,8 @@ AIStore supports multiple ways to populate itself with existing datasets, includ * **copy** multiple matching objects; * **archive** multiple objects * **prefetch** remote bucket or parts of thereof; -* **download** raw http(s) addressible directories, including (but not limited to) Cloud storages; -* **promote** NFS or SMB shares accessible by one or multiple (or all) AIS target nodes; +* **download** raw http(s) addressable directories, including (but not limited to) Cloud storages; +* **promote** NFS or SMB shares accessible by one or multiple (or all) AIS [target](/docs/overview.md#terminology) nodes; > The on-demand "way" is maybe the most popular, whereby users just start running their workloads against a [remote bucket](docs/providers.md) with AIS cluster positioned as an intermediate fast tier. @@ -88,7 +88,7 @@ But there's more. In [v3.22](https://github.com/NVIDIA/aistore/releases/tag/v1.3 ## Installing from release binaries -Generally, AIStore (cluster) requires at least some sort of [deployment](/deploy#contents) procedure. There are standalone binaries, though, that can be [built](Makefile) from source or, alternatively, installed directly from GitHub: +Generally, AIStore (cluster) requires at least some sort of [deployment](/deploy#contents) procedure. There are standalone binaries, though, that can be [built](Makefile) from source or installed directly from GitHub: ```console $ ./scripts/install_from_binaries.sh --help @@ -98,25 +98,13 @@ The script installs [aisloader](/docs/aisloader.md) and [CLI](/docs/cli.md) from ## PyTorch integration -AIS is one of the PyTorch [Iterable Datapipes](https://github.com/pytorch/data/tree/main/torchdata/datapipes/iter/load#iterable-datapipes). +PyTorch integration is a growing set of datasets (both iterable and map-style), samplers, and dataloaders: -Specifically, [TorchData](https://github.com/pytorch/data) library provides: -* [AISFileLister](https://pytorch.org/data/main/generated/torchdata.datapipes.iter.AISFileLister.html#aisfilelister) -* [AISFileLoader](https://pytorch.org/data/main/generated/torchdata.datapipes.iter.AISFileLoader.html#aisfileloader) +* [Taxonomy of abstractions and API reference](/docs/pytorch.md) +* [AIS plugin for PyTorch: usage examples](https://github.com/NVIDIA/aistore/tree/main/python/aistore/pytorch/README.md) +* [Jupyter notebook examples](https://github.com/NVIDIA/aistore/tree/main/python/examples/aisio-pytorch/) -to list and, respectively, load data from AIStore. - -Further references and usage examples - in our technical blog at https://aistore.nvidia.com/blog: -* [PyTorch: Loading Data from AIStore](https://aistore.nvidia.com/blog/2022/07/12/aisio-pytorch) -* [Python SDK: Getting Started](https://aistore.nvidia.com/blog/2022/07/20/python-sdk) - -Since AIS natively supports a number of [remote backends](/docs/providers.md), you can also use (PyTorch + AIS) to iterate over Amazon S3 and Google Cloud buckets, and more. - -## Reuse - -This repo includes [SGL and Slab allocator](/memsys) intended to optimize memory usage, [Streams and Stream Bundles](/transport) to multiplex messages over long-lived HTTP connections, and a few other sub-packages providing rather generic functionality. - -With a little effort, they all could be extracted and used outside. +Since AIS natively supports [remote backends](/docs/providers.md), you can also use (PyTorch + AIS) to iterate over Amazon S3, GCS and Azure buckets, and more. ## Guides and References @@ -150,9 +138,6 @@ With a little effort, they all could be extracted and used outside. - [Jobs](/docs/cli/job.md) - Security and Access Control - [Authentication Server (AuthN)](/docs/authn.md) -- Tutorials - - [Tutorials](/docs/tutorials/README.md) - - [Videos](/docs/videos.md) - Power tools and extensions - [Reading, writing, and listing *archives*](/docs/archive.md) - [Distributed Shuffle](/docs/dsort.md) @@ -194,17 +179,18 @@ With a little effort, they all could be extracted and used outside. - [Getting started](/docs/getting_started.md) - [Docker](/docs/docker_main.md) - [Useful scripts](/docs/development.md) - - Profiling, race-detecting, and more + - Profiling, race-detecting and more - Batch jobs - [Batch operations](/docs/batch.md) - - [eXtended Actions (xactions)](/xact/README.md) + - [eXtended Actions (xactions)](https://github.com/NVIDIA/aistore/blob/main/xact/README.md) - [CLI: `ais job`](/docs/cli/job.md) and [`ais show job`](/docs/cli/show.md), including: - - [prefetch remote dataset](/docs/cli/object.md#prefetch-objects) + - [prefetch remote datasets](/docs/cli/object.md#prefetch-objects) - [copy bucket](/docs/cli/bucket.md#copy-bucket) - [copy multiple objects](/docs/cli/bucket.md#copy-multiple-objects) - [download remote BLOBs](/docs/cli/blob-downloader.md) - - [promote NFS or SMB share](https://aistore.nvidia.com/blog/2022/03/17/promote), and more + - [promote NFS or SMB share](https://aistore.nvidia.com/blog/2022/03/17/promote) - Assorted Topics + - [Virtual directories](/docs/howto_virt_dirs.md) - [System files](/docs/sysfiles.md) - [Switching cluster between HTTP and HTTPS](/docs/switch_https.md) - [TLS: testing with self-signed certificates](/docs/getting_started.md#tls-testing-with-self-signed-certificates) diff --git a/docs/join_cluster.md b/docs/join_cluster.md index edc2a0220f..c9b87f826d 100644 --- a/docs/join_cluster.md +++ b/docs/join_cluster.md @@ -18,9 +18,13 @@ Also, see related: * [CLI: `ais cluster` command](/docs/cli/cluster.md) * [Scripted integration tests](https://github.com/NVIDIA/aistore/tree/main/ais/test/scripts) -## Joining a Cluster: _discovery_ URL, and more +## Joining a Cluster: _discovery_ URL -First, some basic facts. AIStore clusters can be deployed with an arbitrary number of AIStore proxies. Each proxy/gateway implements RESTful API and provides full access to objects stored in the cluster. Each proxy collaborates with all other proxies to perform majority-voted HA failovers (section [Highly Available Control Plane](ha.md). +First, some basic facts. AIStore clusters can be deployed with an arbitrary number of AIStore proxies (a.k.a. gateways). + +Each proxy/gateway implements RESTful APIs (both native and S3 compatible) and provides full access to user data stored in the cluster. + +Each proxy collaborates with other proxies in the cluster to perform majority-voted HA failovers (section [Highly Available Control Plane](ha.md). All _electable_ proxies are functionally equivalent. The one that is elected as _primary_ is, among other things, responsible to _join_ nodes to the running cluster. diff --git a/docs/lifecycle.md b/docs/lifecycle.md index 631a1d2202..aaef8df333 100644 --- a/docs/lifecycle.md +++ b/docs/lifecycle.md @@ -143,7 +143,7 @@ Still, the `join` command can solve the case when the node is misconfigured. Sec * [`aisnode` command line](/docs/command_line.md) -When rebalancing, the cluster remains fully operational and can be used to read and write data, list, create, and destroy buckets, run jobs, and more. In other words, none of the listed lifecycle operations requires downtime. The idea is that users never notice (and if the cluster has enough spare capacity - they won't). +When rebalancing, the cluster remains fully operational and can be used to read and write data, list, create, and destroy buckets, run jobs and more. In other words, none of the listed lifecycle operations requires downtime. The idea is that users never notice (and if the cluster has enough spare capacity - they won't). ## References diff --git a/docs/lifecycle_node.md b/docs/lifecycle_node.md index d90a121b8a..0413e77958 100644 --- a/docs/lifecycle_node.md +++ b/docs/lifecycle_node.md @@ -143,7 +143,7 @@ Still, the `join` command can solve the case when the node is misconfigured. Sec * [`aisnode` command line](/docs/command_line.md) -When rebalancing, the cluster remains fully operational and can be used to read and write data, list, create, and destroy buckets, run jobs, and more. In other words, none of the listed lifecycle operations requires downtime. The idea is that users never notice (and if the cluster has enough spare capacity - they won't). +When rebalancing, the cluster remains fully operational and can be used to read and write data, list, create, and destroy buckets, run jobs and more. In other words, none of the listed lifecycle operations requires downtime. The idea is that users never notice (and if the cluster has enough spare capacity - they won't). ## References diff --git a/docs/metrics-reference.md b/docs/metrics-reference.md index 74a3f6ba07..2c805e9d60 100644 --- a/docs/metrics-reference.md +++ b/docs/metrics-reference.md @@ -13,12 +13,14 @@ redirect_from: | --- | --- | --- | --- | --- | | `get.n` | `get_count` | counter | total number of executed GET(object) requests | default | | `put.n` | `put_count` | counter | total number of executed PUT(object) requests | default | +| `head.n` | `head_count` | counter | total number of executed HEAD(object) requests | default | | `append.n` | `append_count` | counter | total number of executed APPEND(object) requests | default | | `del.n` | `del_count` | counter | total number of executed DELETE(object) requests | default | | `ren.n` | `ren_count` | counter | total number of executed rename(object) requests | default | | `lst.n` | `lst_count` | counter | total number of executed list-objects requests | default | | `err.get.n` | `err_get_count` | counter | total number of GET(object) errors | default | | `err.put.n` | `err_put_count` | counter | total number of PUT(object) errors | default | +| `err.head.n` | `err_head_count` | counter | total number of HEAD(object) errors | default | | `err.append.n` | `err_append_count` | counter | total number of APPEND(object) errors | default | | `err.del.n` | `err_del_count` | counter | total number of DELETE(object) errors | default | | `err.ren.n` | `err_ren_count` | counter | total number of rename(object) errors | default | @@ -83,26 +85,18 @@ redirect_from: | `remais.get.ns.total` | `remote_get_ns_total` | total | GET: total cumulative time (nanoseconds) to execute cold GETs and store new object versions in-cluster | map[backend:remais node_id:``] | | `remais.e2e.get.ns.total` | `remote_e2e_get_ns_total` | total | GET: total end-to-end time (nanoseconds) servicing remote requests; includes: receiving request, executing cold-GET, storing new object version in-cluster, and transmitting response | map[backend:remais node_id:``] | | `remais.get.size` | `remote_get_bytes_total` | size | GET: total cumulative size (bytes) of all cold-GET transactions | map[backend:remais node_id:``] | +| `remais.head.n` | `remote_head_count` | counter | HEAD: total number of executed remote requests to a given backend | map[backend:remais node_id:``] | | `remais.put.n` | `remote_put_count` | counter | PUT: total number of executed remote requests to a given backend | map[backend:remais node_id:``] | | `remais.put.ns.total` | `remote_put_ns_total` | total | PUT: total cumulative time (nanoseconds) to execute remote requests and store new object versions in-cluster | map[backend:remais node_id:``] | | `remais.e2e.put.ns.total` | `remote_e2e_put_ns_total` | total | PUT: total end-to-end time (nanoseconds) servicing remote requests; includes: receiving PUT payload, storing it in-cluster, executing remote PUT, finalizing new in-cluster object | map[backend:remais node_id:``] | -| `remais.put.size` | `remote_e2e_put_bytes_total` | size | PUT: total cumulative size (bytes) of all PUTs to a given remote backend | map[backend:remais node _id:ClCt8081] | +| `remais.put.size` | `remote_e2e_put_bytes_total` | size | PUT: total cumulative size (bytes) of all PUTs to a given remote backend | map[backend:remais node_id:ClCt8081] | | `remais.ver.change.n` | `remote_ver_change_count` | counter | number of out-of-band updates (by a 3rd party performing remote PUTs outside this cluster) | map[backend:remais node_id:``] | | `remais.ver.change.size` | `remote_ver_change_bytes_total` | size | total cumulative size of objects that were updated out-of-band | map[backend:remais node_id:``] | -| `ht.get.n` | `remote_get_count` | counter | GET: total number of executed remote requests (cold GETs) | map[backend:ht node_id:``] | -| `ht.get.ns.total` | `remote_get_ns_total` | total | GET: total cumulative time (nanoseconds) to execute cold GETs and store new object versions in-cluster | map[backend:ht node_id:``] | -| `ht.e2e.get.ns.total` | `remote_e2e_get_ns_total` | total | GET: total end-to-end time (nanoseconds) servicing remote requests; includes: receiving request, executing cold-GET, storing new object version in-cluster, and transmitting response | map[backend:ht node_id:``] | -| `ht.get.size` | `remote_get_bytes_total` | size | GET: total cumulative size (bytes) of all cold-GET transactions | map[backend:ht node_id:``] | -| `ht.put.n` | `remote_put_count` | counter | PUT: total number of executed remote requests to a given backend | map[backend:ht node_id:``] | -| `ht.put.ns.total` | `remote_put_ns_total` | total | PUT: total cumulative time (nanoseconds) to execute remote requests and store new object versions in-cluster | map[backend:ht node_id:``] | -| `ht.e2e.put.ns.total` | `remote_e2e_put_ns_total` | total | PUT: total end-to-end time (nanoseconds) servicing remote requests; includes: receiving PUT payload, storing it in-cluster, executing remote PUT, finalizing new in-cluster object | map[backend:ht node_id:``] | -| `ht.put.size` | `remote_e2e_put_bytes_total` | size | PUT: total cumulative size (bytes) of all PUTs to a given remote backend | map[backend:ht node_id:``] | -| `ht.ver.change.n` | `remote_ver_change_count` | counter | number of out-of-band updates (by a 3rd party performing remote PUTs outside this cluster) | map[backend:ht node_id:``] | -| `ht.ver.change.size` | `remote_ver_change_bytes_total` | size | total cumulative size of objects that were updated out-of-band | map[backend:ht node_id:``] | | `gcp.get.n` | `remote_get_count` | counter | GET: total number of executed remote requests (cold GETs) | map[backend:gcp node_id:``] | | `gcp.get.ns.total` | `remote_get_ns_total` | total | GET: total cumulative time (nanoseconds) to execute cold GETs and store new object versions in-cluster | map[backend:gcp node_id:``] | | `gcp.e2e.get.ns.total` | `remote_e2e_get_ns_total` | total | GET: total end-to-end time (nanoseconds) servicing remote requests; includes: receiving request, executing cold-GET, storing new object version in-cluster, and transmitting response | map[backend:gcp node_id:``] | | `gcp.get.size` | `remote_get_bytes_total` | size | GET: total cumulative size (bytes) of all cold-GET transactions | map[backend:gcp node_id:``] | +| `gcp.head.n` | `remote_head_count` | counter | HEAD: total number of executed remote requests to a given backend | map[backend:gcp node_id:``] | | `gcp.put.n` | `remote_put_count` | counter | PUT: total number of executed remote requests to a given backend | map[backend:gcp node_id:``] | | `gcp.put.ns.total` | `remote_put_ns_total` | total | PUT: total cumulative time (nanoseconds) to execute remote requests and store new object versions in-cluster | map[backend:gcp node_id:``] | | `gcp.e2e.put.ns.total` | `remote_e2e_put_ns_total` | total | PUT: total end-to-end time (nanoseconds) servicing remote requests; includes: receiving PUT payload, storing it in-cluster, executing remote PUT, finalizing new in-cluster object | map[backend:gcp node_id:``] | @@ -113,6 +107,7 @@ redirect_from: | `aws.get.ns.total` | `remote_get_ns_total` | total | GET: total cumulative time (nanoseconds) to execute cold GETs and store new object versions in-cluster | map[backend:aws node_id:``] | | `aws.e2e.get.ns.total` | `remote_e2e_get_ns_total` | total | GET: total end-to-end time (nanoseconds) servicing remote requests; includes: receiving request , executing cold-GET, storing new object version in-cluster, and transmitting response | map[backend:aws node_id:``] | | `aws.get.size` | `remote_get_bytes_total` | size | GET: total cumulative size (bytes) of all cold-GET transactions | map[backend:aws node_id:``] | +| `aws.head.n` | `remote_head_count` | counter | HEAD: total number of executed remote requests to a given backend | map[backend:aws node_id:``] | | `aws.put.n` | `remote_put_count` | counter | PUT: total number of executed remote requests to a given backend | map[backend:aws node_id:``] | | `aws.put.ns.total` | `remote_put_ns_total` | total | PUT: total cumulative time (nanoseconds) to execute remote requests and store new object versions in-cluster | map[backend:aws node_id:``] | | `aws.e2e.put.ns.total` | `remote_e2e_put_ns_total` | total | PUT: total end-to-end time (nanoseconds) servicing remote requests; includes: receiving PUT payload, storing it in-cluster, executing remote PUT, finalizing new in-cluster object | map[backend:aws node_id:``] | @@ -123,6 +118,7 @@ redirect_from: | `azure.get.ns.total` | `remote_get_ns_total` | total | GET: total cumulative time (nanoseconds) to execute cold GETs and store new object versions in-cluster | map[backend:azure node_id:``] | | `azure.e2e.get.ns.total` | `remote_e2e_get_ns_total` | total | GET: total end-to-end time (nanoseconds) servicing remote requests; includes: receiving request, executing cold-GET, storing new object version in-cluster, and transmitting response | map[backend:azure node_id:``] | | `azure.get.size` | `remote_get_bytes_total` | size | GET: total cumulative size (bytes) of all cold-GET transactions | map[backend:azure node_id:``] | +| `azure.head.n` | `remote_head_count` | counter | HEAD: total number of executed remote requests to a given backend | map[backend:azure node_id:``] | | `azure.put.n` | `remote_put_count` | counter | PUT: total number of executed remote requests to a given backend | map[backend:azure node_id:``] | | `azure.put.ns.total` | `remote_put_ns_total` | total | PUT: total cumulative time (nanoseconds) to execute remote requests and store new object versions in-cluster | map[backend:azure node_id:``] | | `azure.e2e.put.ns.total` | `remote_e2e_put_ns_total` | total | PUT: total end-to-end time (nanoseconds) servicing remote requests; includes: receiving PUT payload, storing it in-cluster, executing remote PUT, finalizing new in-cluster object | map[backend:azure node_id:``] | diff --git a/docs/metrics.md b/docs/metrics.md index 5d945e18b9..7e8cc1184a 100644 --- a/docs/metrics.md +++ b/docs/metrics.md @@ -13,7 +13,7 @@ redirect_from: ## Introduction -AIStore tracks, logs, and reports a large and growing number of counters, latencies and throughputs including (but not limited to) metrics that reflect cluster recovery and global rebalancing, all [extended long-running operations](/xact/README.md), and, of course, the basic read, write, list transactions, and more. +AIStore tracks, logs, and reports a large and growing number of counters, latencies and throughputs including (but not limited to) metrics that reflect cluster recovery and global rebalancing, all [extended long-running operations](https://github.com/NVIDIA/aistore/blob/main/xact/README.md), and, of course, the basic read, write, list transactions and more. Viewership is equally supported via: diff --git a/docs/on_disk_layout.md b/docs/on_disk_layout.md index 12716a13f9..8ba8bde6ab 100644 --- a/docs/on_disk_layout.md +++ b/docs/on_disk_layout.md @@ -7,15 +7,17 @@ redirect_from: - /docs/on_disk_layout.md/ --- -AIStore 3.0 introduces new on-disk layout that addresses several motivations including (but not limited to) the motivation to support multiple cloud backends. One of those Clouds can be (and starting with v3.0 **is**) AIStore itself with the immediate availability of AIS-to-AIS caching and a gamut of future capabilities: continuous data protection, DR, and more. +AIStore 3.0 introduced new on-disk layout that addressed several motivations including (but not limited to) the motivation to support multiple remote backends. -At a high level, with v3.0: +One of those remote backends can be AIStore itself, with immediate availability of AIS-to-AIS caching and a gamut of related data recovery capabilities. + +At a high level: - in addition to checksum, all metadata (including object metadata) is versioned to provide for **backward compatibility** when (and *if*) there are any future changes; - cluster-wide control structures - in particular, cluster map and bucket metadata - are now uniformly GUID-protected and LZ4-compressed; - bucket metadata is replicated, with multiple protected and versioned copies stored on data drives of **all** storage targets in a cluster. -In addition, release 3.0 adds configurable namespaces whereby users can choose to group selected buckets for the purposes of physical isolation from all other buckets and datasets, and/or applying common (for this group) storage management policies: erasure coding, n-way mirroring, etc. But more about it later. +In addition, AIS supports configurable namespaces whereby users can choose to group selected buckets for the purposes of physical isolation from all other buckets and datasets, and/or applying common (for this group) storage management policies: erasure coding, n-way mirroring, etc. But more about it later. Here's a simplified drawing depicting two [providers](providers.md), AIS and AWS, and two buckets, `ABC` and `XYZ`, respectively. In the picture, `mpath` is a single [mountpath](configuration.md) - a single disk **or** a volume formatted with a local filesystem of choice, **and** a local directory (`mpath/`): diff --git a/docs/out_of_band.md b/docs/out_of_band.md index 5611757f0b..52dd14bbe7 100755 --- a/docs/out_of_band.md +++ b/docs/out_of_band.md @@ -66,7 +66,7 @@ OPTIONS: * [`ais cp` command](/docs/cli/bucket.md) and, in particular, its `--sync` option. - [Example copying buckets and multi-objects with simultaneous synchronization](/docs/cli/bucket.md#example-copying-buckets-and-multi-objects-with-simultaneous-synchronization) -## Out-of-band writes, deletes, and more +## Out-of-band writes, deletes and more 1. with version validation enabled, aistore will detect both out-of-band writes and deletes; 2. buckets with versioning disabled are also supported; diff --git a/docs/overview.md b/docs/overview.md index ff3dfa2ed1..573221ae58 100644 --- a/docs/overview.md +++ b/docs/overview.md @@ -56,6 +56,12 @@ All user data is equally distributed (or [balanced](/docs/rebalance.md)) across ## Terminology +* **Target** - storage node. In the docs and the code, instead of saying something like "storage node in an aistore cluster" we simply say "target." + +* **Proxy** - a **gateway** providing API access point. One of the proxies is elected, or designated, to be the _primary_ (or leader) of the cluster. There may be any number of ais proxies/gateways. The terms "proxy" and "gateway" are often used interchangeably. + +> Each proxy/gateway implements RESTful APIs (both native and S3 compatible) and provides full access to user data stored in the cluster. Each proxy collaborates with other proxies in the cluster to perform majority-voted HA failover. + * [Backend Provider](providers.md) - an abstraction, and simultaneously an API-supported option, that allows to delineate between "remote" and "local" buckets with respect to a given AIS cluster. * [Unified Global Namespace](providers.md) - AIS clusters *attached* to each other, effectively, form a super-cluster providing unified global namespace whereby all buckets and all objects of all included clusters are uniformly accessible via any and all individual access points (of those clusters). @@ -66,7 +72,7 @@ All user data is equally distributed (or [balanced](/docs/rebalance.md)) across - it is safe to execute the 4 listed operations (enable, disable, attach, detach) at any point during runtime; - in a typical deployment, the total number of mountpaths would compute as a direct product of (number of storage targets) x (number of disks in each target). -* [Xaction](/xact/README.md) - asynchronous batch operations that may take many seconds (minutes, hours, etc.) to execute - are called *eXtended actions* or simply *xactions*. CLI and [CLI documentation](/docs/cli) refers to such operations as **jobs** - the more familiar term that can be used interchangeably. Examples include erasure coding or n-way mirroring a dataset, resharding and reshuffling a dataset, archiving multiple objects, copying buckets, and many more. All [eXtended actions](/xact/README.md) support generic [API](/api/xaction.go) and [CLI](/docs/cli/job.md#show-job-statistics) to show both common counters (byte and object numbers) as well as operation-specific extended statistics. +* [Xaction](https://github.com/NVIDIA/aistore/blob/main/xact/README.md) - asynchronous batch operations that may take many seconds (minutes, hours, etc.) to execute - are called *eXtended actions* or simply *xactions*. CLI and [CLI documentation](/docs/cli) refers to such operations as **jobs** - the more familiar term that can be used interchangeably. Examples include erasure coding or n-way mirroring a dataset, resharding and reshuffling a dataset, archiving multiple objects, copying buckets, and many more. All [eXtended actions](https://github.com/NVIDIA/aistore/blob/main/xact/README.md) support generic [API](/api/xaction.go) and [CLI](/docs/cli/job.md#show-job-statistics) to show both common counters (byte and object numbers) as well as operation-specific extended statistics. ## Design Philosophy @@ -167,7 +173,7 @@ Notwithstanding, AIS stores and then maintains object replicas, erasure-coded sl Common way to use AIStore include the most fundamental and, often, the very first step: populating AIS cluster with an existing dataset, or datasets. Those (datasets) can come from remote buckets (AWS, Google Cloud, Azure), HDFS directories, NFS shares, local files, or any vanilla HTTP(S) locations. -To this end, AIS provides 6 (six) easy ways ranging from the conventional on-demand caching to *promoting* colocated files and directories, and more. +To this end, AIS provides 6 (six) easy ways ranging from the conventional on-demand caching to *promoting* colocated files and directories. > Related references and examples include this [technical blog](https://aistore.nvidia.com/blog/2021/12/07/cp-files-to-ais) that shows how to copy a file-based dataset in two easy steps. diff --git a/docs/prometheus.md b/docs/prometheus.md index 42128a095e..c79ba99dd5 100644 --- a/docs/prometheus.md +++ b/docs/prometheus.md @@ -14,7 +14,7 @@ redirect_from: ## Monitoring AIStore with Prometheus -AIStore tracks a growing list of performance counters, utilization percentages, latency and throughput metrics, transmitted and received stats (total bytes and numbers of objects), error counters, and more. +AIStore tracks a growing list of performance counters, utilization percentages, latency and throughput metrics, transmitted and received stats (total bytes and numbers of objects), error counters and more. Viewership is equally supported via: * AIS node logs diff --git a/docs/python_sdk.md b/docs/python_sdk.md index b0087c89f5..8013647575 100644 --- a/docs/python_sdk.md +++ b/docs/python_sdk.md @@ -10,7 +10,7 @@ redirect_from: AIStore Python SDK is a growing set of client-side objects and methods to access and utilize AIS clusters. This document contains API documentation for the AIStore Python SDK. -> For our PyTorch integration, please refer to the [Pytorch Docs](https://github.com/NVIDIA/aistore/tree/main/docs/pytorch.md). +> For our PyTorch integration, please refer to the [PyTorch Docs](https://github.com/NVIDIA/aistore/tree/main/docs/pytorch.md). For more information, please refer to [AIS Python SDK](https://pypi.org/project/aistore) available via Python Package Index (PyPI) or see [https://github.com/NVIDIA/aistore/tree/main/python/aistore](https://github.com/NVIDIA/aistore/tree/main/python/aistore). * [client](#client) diff --git a/docs/pytorch.md b/docs/pytorch.md index cdf8843ef9..75a1eb34fb 100644 --- a/docs/pytorch.md +++ b/docs/pytorch.md @@ -7,13 +7,13 @@ redirect_from: - /docs/pytorch.md/ --- -The AIStore Pytorch integration is a growing set of datasets, samplers, and datapipes that allow you to use easily add AIStore support -to a codebase using Pytorch. This document contains API documentation for the AIStore Pytorch integration. +In AIStore, PyTorch integration is a growing set of datasets (both iterable and map-style), samplers, and dataloaders. This readme illustrates taxonomy of the associated abstractions and provides API reference documentation. -> For usage examples, please refer to the [Pytorch README](https://github.com/NVIDIA/aistore/tree/main/python/aistore/pytorch/README.md). -For more in-depth examples, see our [notebook examples](https://github.com/NVIDIA/aistore/tree/main/python/examples/aisio-pytorch/). +For usage examples, please see: +* [AIS plugin for PyTorch](https://github.com/NVIDIA/aistore/tree/main/python/aistore/pytorch/README.md) +* [Jupyter notebook examples](https://github.com/NVIDIA/aistore/tree/main/python/examples/aisio-pytorch/) -![Pytorch Structure](/docs/images/pytorch_structure.webp) +![PyTorch Structure](/docs/images/pytorch_structure.webp) * [base\_map\_dataset](#base_map_dataset) * [AISBaseMapDataset](#base_map_dataset.AISBaseMapDataset) * [base\_iter\_dataset](#base_iter_dataset) @@ -241,7 +241,7 @@ a `lambda` which cannot be pickled in multithreaded contexts. Worker Supported Request Client for PyTorch -This client allows Pytorch workers to have separate request sessions per thread +This client allows PyTorch workers to have separate request sessions per thread which is needed in order to use workers in a DataLoader as the default implementation of RequestClient and requests is not thread-safe. @@ -255,7 +255,7 @@ Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved. class WorkerRequestClient(RequestClient) ``` -Extension that supports Pytorch and multiple workers of internal client for +Extension that supports PyTorch and multiple workers of internal client for buckets, objects, jobs, etc. to use for making requests to an AIS cluster. **Arguments**: @@ -271,7 +271,7 @@ buckets, objects, jobs, etc. to use for making requests to an AIS cluster. def session() ``` -Returns: Active request session acquired for a specific Pytorch dataloader worker +Returns: Active request session acquired for a specific PyTorch dataloader worker Multishard Stream Dataset for AIS. diff --git a/docs/s3compat.md b/docs/s3compat.md index ad772a7f40..a1c0bfa233 100644 --- a/docs/s3compat.md +++ b/docs/s3compat.md @@ -17,7 +17,7 @@ This document talks about the 2. and 3. - about AIS providing S3 compatible API There's a separate, albeit closely related, [document](/docs/s3cmd.md) that explains how to configure `s3cmd` and then maybe tweak AIStore configuration to work with it: -* [Getting Started with `s3cmd`](/docs/s3cmd.md) - also contains configuration, tips, usage examples, and more. +* [Getting Started with `s3cmd`](/docs/s3cmd.md) - also contains configuration, tips, usage examples and more. For additional background, see: diff --git a/docs/storage_svcs.md b/docs/storage_svcs.md index 14db786d82..035a4f25ca 100644 --- a/docs/storage_svcs.md +++ b/docs/storage_svcs.md @@ -215,7 +215,7 @@ $ ais start mirror --copies 2 ais://b $ ais start mirror --copies 3 ais://c ``` -The operations (above) are in fact [extended actions](/xact/README.md) that run asynchronously. Both Cloud and ais buckets are supported. You can monitor completion of those operations via generic [xaction API](/api/xaction.go). +The operations (above) are in fact [extended actions](https://github.com/NVIDIA/aistore/blob/main/xact/README.md) that run asynchronously. Both Cloud and ais buckets are supported. You can monitor completion of those operations via generic [xaction API](/api/xaction.go). Subsequently, all PUTs into an n-way configured bucket also generate **n** copies for all newly created objects. Which also goes to say that the ("make-n-copies") operation, in addition to creating or destroying replicas of existing objects will also automatically re-enable(if n > 1) or disable (if n == 1) mirroring as far as subsequent PUTs are concerned. diff --git a/python/aistore/pytorch/README.md b/python/aistore/pytorch/README.md index e02b32d097..67cca3d336 100644 --- a/python/aistore/pytorch/README.md +++ b/python/aistore/pytorch/README.md @@ -6,9 +6,9 @@ AIS plugin is a PyTorch dataset library to access datasets stored on AIStore. PyTorch comes with powerful data loading capabilities, but loading data in PyTorch is fairly complex. One of the best ways to handle it is to start small and then add complexities as and when you need them. -![Pytorch Structure](../../../docs/images/pytorch_structure.webp) +![PyTorch Structure](../../../docs/images/pytorch_structure.webp) -In our plugin, we extend the base Dataset, Sampler, and IterableDataset Torch clases to provide AIStore Object functionality natively to Pytorch. You can extend AISBaseMapDataset instead of Dataset and AISBaseIterDataset instead of IterableDataset in your custom datasets to automatically obtain object fetching functionality. But if you'd like fully complete datasets that fetch objects and load their data, then you can use AISMapDataset and AISIterData. +In our plugin, we extend the base Dataset, Sampler, and IterableDataset Torch clases to provide AIStore Object functionality natively to PyTorch. You can extend AISBaseMapDataset instead of Dataset and AISBaseIterDataset instead of IterableDataset in your custom datasets to automatically obtain object fetching functionality. But if you'd like fully complete datasets that fetch objects and load their data, then you can use AISMapDataset and AISIterData. ### PyTorch DataLoader diff --git a/python/tests/README.md b/python/tests/README.md index 6a4310d6c9..e048dbb789 100644 --- a/python/tests/README.md +++ b/python/tests/README.md @@ -1,6 +1,6 @@ # Python Tests -This directory contains unit tests and integration tests for each of the python package interfaces we provide to access AIStore, including the Amazon S3 botocore, SDK, and Pytorch datasets APIs. +This directory contains unit tests and integration tests for each of the python package interfaces we provide to access AIStore, including the Amazon S3 botocore, SDK, and PyTorch datasets APIs. It also contains tests for verifying s3 compatibility. --- diff --git a/transport/README.md b/transport/README.md index 307b9dd3b9..5534470041 100644 --- a/transport/README.md +++ b/transport/README.md @@ -1,4 +1,4 @@ -Package `transport` provides streaming object-based transport over HTTP for massive intra-AIS data transfers. AIStore utilizes this package for cluster-wide (aka "global") rebalancing, distributed merge-sort, and more. +Package `transport` provides streaming object-based transport over HTTP for massive intra-AIS data transfers. AIStore utilizes this package for cluster-wide (aka "global") rebalancing, distributed merge-sort and more. - [Build](#build) - [Description](#description) @@ -184,7 +184,7 @@ For usage examples and details, please see tests in the package directory. ## Stream Bundle -Stream bundle (`transport.StreamBundle`) in this package is motivated by the need to broadcast and multicast continuously over a set of long-lived TCP sessions. The scenarios in storage clustering include intra-cluster replication and erasure coding, rebalancing (upon *target-added* and *target-removed* events) and MapReduce-generated flows, and more. +Stream bundle (`transport.StreamBundle`) in this package is motivated by the need to broadcast and multicast continuously over a set of long-lived TCP sessions. The scenarios in storage clustering include intra-cluster replication and erasure coding, rebalancing (upon *target-added* and *target-removed* events) and MapReduce-generated flows and more. In each specific case, a given clustered node needs to maintain control and/or data flows between itself and multiple other clustered nodes, where each of the flows would be transferring large numbers of control and data objects, or parts of thereof. diff --git a/xact/README.md b/xact/README.md index b98e5f8820..d087ce8613 100644 --- a/xact/README.md +++ b/xact/README.md @@ -22,7 +22,6 @@ Xactions start running based on a wide variety of runtime conditions that includ * user request (e.g., to reduce the number of local object copies in a given bucket) * adding or removing storage targets (the events that trigger cluster-wide rebalancing) * adding or removing local disks (the events that cause resilver to start moving stored content between *mountpaths* - see [Managing mountpaths](/docs/configuration.md#managing-mountpaths)) -* and more... Further, to reduce congestion and minimize interference with user-generated workload, extended actions (self-)throttle themselves based on configurable watermarks. The latter include `disk_util_low_wm` and `disk_util_high_wm` (see [configuration](/deploy/dev/local/aisnode_config.sh)). Roughly speaking, the idea is that when local disk utilization falls below the low watermark (`disk_util_low_wm`) extended actions that utilize local storage can run at full throttle. And vice versa. @@ -57,7 +56,6 @@ Supported extended actions are enumerated in the [user-facing API](/cmn/api.go) * consensus voting (when conducting new leader [election](/docs/ha.md#election)) * erasure-encoding objects in a EC-configured bucket (see [Erasure coding](/docs/storage_svcs.md#erasure-coding)) * creating additional local replicas, and reducing number of object replicas in a given locally-mirrored bucket (see [Storage Services](/docs/storage_svcs.md)) -* and more... There are different actions that may be taken upon xaction. Actions include stats, start and stop. @@ -151,6 +149,6 @@ the most recent xactions will be displayed, for each bucket, kind or (bucket, ki ## References -For xaction-related CLI documentation and examples, supported multi-object (batch) operations, and more, please see: +For xaction-related CLI documentation and examples and supported multi-object (batch) operations, please see: * [Batch operations](/docs/batch.md)